# Histórico e Análise de Experimentos de Classificação de Intenções de Busca

Este *notebook* tem como objetivo permitir o registro visual dos experimentos feitos e oferecer um modo de recuperar um experimento e analisar os resultados em mais detalhes.

## Bibliotecas e Funções

In [7]:
# General
import sys
import funcy as fp
from pathlib import Path

# Visualization / Presentation
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.core.display import HTML, display

import mlflow
from mlflow.tracking import MlflowClient
import pandas as pd

# Carregar, além de atualizar frequentemente, código personalizado disponível em ../src
%load_ext autoreload 
%autoreload 2
sys.path.append(str(Path.cwd().parent))
from src import settings
from src.utils.notebooks import display_side_by_side
from src.pipeline.inference_pipeline import load_model_resources

# Configurações para a exibição de conteúdo do Pandas e das bibliotecas gráficas
%matplotlib inline 
sns.set(rc={'figure.figsize':(25,10)})
pd.set_option('display.max_rows', None)
pd.set_option("display.max_columns", None)
pd.set_option('max_colwidth', 150)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Recuperação do melhor resultado

Considerando os experimentos feitos no notebook [Classificação de Intenções](04.3_Classificacao_de_Intencoes.ipynb), faz-se a recuperação dos melhores resultados de cada algoritmo, tendo como métrica principal o F1.

In [8]:
EXPERIMENT_ID = '2'

mlflow_client = MlflowClient()

best_experiments_result = [
    mlflow.search_runs(experiment_ids=[experiment_id], 
                       max_results=100, 
                       order_by=['metrics.F1 DESC'], 
                       filter_string='attributes.status="FINISHED"')
    for experiment_id in [EXPERIMENT_ID]
]

best_results = pd.concat(best_experiments_result, axis=0)

O melhores resultados recuperados:

In [9]:
columns_to_show = ['experiment_name', 'tags.mlflow.runName', 'run_id', 'experiment_id', 'params.model_name',
                   'metrics.f1', 'metrics.precision', 'metrics.recall', 'metrics.auc', 'metrics.mc', 'metrics.training_time']

(best_results
 .assign(experiment_name=lambda f: f['experiment_id'].apply(lambda id: mlflow_client.get_experiment(id).name))
 .sort_values(by='metrics.f1', ascending=False)
 .loc[lambda f: ~f['params.model_name'].isna()]
 .drop_duplicates('params.model_name')
 [columns_to_show]  
 .head()
)

Unnamed: 0,experiment_name,tags.mlflow.runName,run_id,experiment_id,params.model_name,metrics.f1,metrics.precision,metrics.recall,metrics.auc,metrics.mc,metrics.training_time
62,04_SupervisedQueryIntentClassification,01_2_More Restrictive Qualified Queries_CB,726952678b9f4cb2949b9707fa66a273,2,CB,0.666667,0.674419,0.659091,0.730489,0.463042,3.278462
45,04_SupervisedQueryIntentClassification,01_4_More Restrictive Qualified Queries_SVC-Linear,77c9f07b95f84557b0d5899f9fd78568,2,SVC-Linear,0.645914,0.664,0.628788,0.714868,0.434457,2.549131
48,04_SupervisedQueryIntentClassification,01_3_More Restrictive Qualified Queries_LGB,c9a698923c6d40719b1353fb99a027a6,2,LGB,0.644628,0.709091,0.590909,0.719625,0.457895,2.290449
67,04_SupervisedQueryIntentClassification,01_1_More Restrictive Qualified Queries_GaussianNB,c342ae09559c4205b024da1f3fb63730,2,GaussianNB,0.631579,0.534031,0.772727,0.676458,0.345353,0.003971
15,04_SupervisedQueryIntentClassification,01_8_More Restrictive Qualified Queries_MLP,19d377dfc5e94afaaa95a169d4e6db71,2,MLP,0.626415,0.62406,0.628788,0.695911,0.391274,1.485022


## Restauração de Experimentos

A partir da escolha de uma execução individual, é possível restaurar os elementos utilizados na experimentação para aplicá-los aos dados.

In [4]:
RUN_ID = '11bed0968e80465f8775b94e95accffc'

preprocessing_model, model, label_encoder = load_model_resources(RUN_ID)

Para validar o funcionamento da restauração do modelo, parte dos dados de treinamento são recuperados para uma avaliação.

In [5]:
columns_to_read = ['query', 'intent_class', 'intent_description']

frame = pd.read_csv(Path(settings.DATA_PATH).joinpath('interim', 'query_intent_training.csv'), usecols=columns_to_read)

display_side_by_side([frame.head(10)], ['Conjunto de Dados de Intenções de Busca'])

Unnamed: 0,query,intent_class,intent_description
0,super mario,0,Busca Exploratória
1,estojo personalizado,0,Busca Exploratória
2,girassol,0,Busca Exploratória
3,difusor lembrancinha,0,Busca Exploratória
4,festa borboleta,0,Busca Exploratória
5,lapis personalizados,0,Busca Exploratória
6,tag para lembrancinha de maternidade,0,Busca Exploratória
7,kit jardinagem,1,Busca Focada
8,newborn,0,Busca Exploratória
9,kit bebe,0,Busca Exploratória


Tendo os dados, é possível refazer a preparação e fazer a predição da intenção de busca de cada elemento.

In [6]:
frame_slice = frame.sample(50)

# Processa os dados para inferência
features = preprocessing_model.predict(frame_slice)

# Realiza a inferência
frame_slice['pred'] = model.predict(features)

display_side_by_side([frame_slice, 
                      pd.DataFrame(features).describe().T.head(10)], 
                     ['Dados Recuperados e Predição', 
                      f'Features (10 de {features.shape[1]})'])

del frame_slice



Unnamed: 0,query,intent_class,intent_description,pred
650,kit quadros,0,Busca Exploratória,0
840,kit higiene porcelana,1,Busca Focada,1
813,caneca de acrilico personalizada,1,Busca Focada,1
391,tapete quarto,0,Busca Exploratória,0
397,lembrancas de aniversario de 60 anos,1,Busca Focada,1
1443,papel de parede de amor,0,Busca Exploratória,0
1923,ladybug,0,Busca Exploratória,0
2065,lembrancinhas para aniversario de 80 anos,1,Busca Focada,1
1790,pintura de paisagem,0,Busca Exploratória,0
969,decoracao de esmalteria,0,Busca Exploratória,0

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
0,50.0,-0.017062,0.040629,-0.131692,-0.032555,-0.016095,-0.001193,0.136547
1,50.0,-0.02126,0.037135,-0.072454,-0.044021,-0.02771,-0.003383,0.120648
2,50.0,0.029344,0.026048,-0.04606,0.017184,0.025248,0.045847,0.099698
3,50.0,0.00069,0.042143,-0.100926,-0.017787,0.009025,0.021925,0.118467
4,50.0,-0.023839,0.03524,-0.108511,-0.048022,-0.029488,-0.006299,0.058604
5,50.0,-0.0587,0.041893,-0.146247,-0.087343,-0.061588,-0.025104,0.029871
6,50.0,-0.006123,0.034422,-0.072469,-0.024733,-0.007392,0.011428,0.13224
7,50.0,-0.009429,0.03231,-0.078838,-0.030512,-0.01244,0.008542,0.074884
8,50.0,0.013989,0.039703,-0.066028,-0.008578,0.01508,0.034531,0.174635
9,50.0,-0.022759,0.045315,-0.096996,-0.050459,-0.033435,-0.000796,0.108667
