Nesta fase, irei iniciar a etapa de experimentação, onde irei testar uma série 
de modelos de classificação para encontrar o que melhor se adequa ao problema.
Vale ressaltar que apenas algumas colunas serão utilizadas para a criação dos
modelos, pois algumas só são obtidas ao fim do processo do SISU, como é o caso
da coluna "NOTA_CORTE" e "CLASSIFICACAO". Outras colunas como códigos das IES
e cursos também não serão utilizadas, pois não são relevantes para o problema.
Ao final, além do modelo, um conjunto de dados no formato .db será gerado para
ser consumido pelo app final. Segue abaixo as colunas que serão utilizadas:

- Modelo: IES, UF_CAMPUS, MUNICIPIO_CAMPUS, NOME_CURSO, GRAU, TURNO, 
TIPO_MOD_CONCORRENCIA, QT_VAGAS_CONCORRENCIA, PERCENTUAL_BONUS, PESO_L, PESO_CH,
PESO_CN, PESO_M, PESO_R, NOTA_MINIMA_L, NOTA_MINIMA_CH, NOTA_MINIMA_CN, 
NOTA_MINIMA_M, NOTA_MINIMA_R, MEDIA_MINIMA, OPCAO, NOTA_L, NOTA_CH, NOTA_CN, 
NOTA_M, NOTA_R, NOTA_L_COM_PESO, NOTA_CH_COM_PESO, NOTA_CN_COM_PESO, 
NOTA_M_COM_PESO, NOTA_R_COM_PESO, NOTA_CANDIDATO e APROVADO.

Vale ressaltar que parte das informações que serão utilizadas no Web App serão 
buscadas nos dados do SISU, como é o caso da QT_VAGAS_CONCORRENCIA, que é um 
valor que a universidade define para cada curso e não o usuário. Outras serão 
calculadas manualmente, como no caso das notas com peso.


In [1]:
import mlflow
import optuna
import pandas as pd
import category_encoders as ce

# Preprocessing & Models
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder, PowerTransformer
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier


# Métricas
from sklearn.metrics import log_loss

In [2]:
# Lendo os dados
dados_sisu = pd.read_parquet('../data/processed/dados_transformados.parquet/')

In [3]:
# Definindo as colunas que serão utilizadas para treinar o modelo
colunas_para_buscar = ['IES', 'UF_CAMPUS', 'MUNICIPIO_CAMPUS', 'NOME_CURSO', 
                       'GRAU', 'TURNO', 'TIPO_MOD_CONCORRENCIA', 
                       'QT_VAGAS_CONCORRENCIA', 'PERCENTUAL_BONUS', 'PESO_L', 
                       'PESO_CH', 'PESO_CN', 'PESO_M', 'PESO_R', 
                       'NOTA_MINIMA_L', 'NOTA_MINIMA_CH', 'NOTA_MINIMA_CN', 
                       'NOTA_MINIMA_M', 'NOTA_MINIMA_R', 'MEDIA_MINIMA', 
                       'OPCAO', 'NOTA_L', 'NOTA_CH', 'NOTA_CN', 'NOTA_M', 
                       'NOTA_R', 'NOTA_L_COM_PESO', 'NOTA_CH_COM_PESO', 
                       'NOTA_CN_COM_PESO', 'NOTA_M_COM_PESO', 'NOTA_R_COM_PESO',
                       'NOTA_CANDIDATO', 'APROVADO']

In [4]:
# Filtrando as colunas
dados_sisu = dados_sisu[colunas_para_buscar]

In [5]:
# Verificando o balanceamento da variável alvo
dados_sisu['APROVADO'].value_counts(normalize=True)

N    0.898245
S    0.101755
Name: APROVADO, dtype: float64

In [6]:
# Define o local para salvar os experimentos
mlflow.set_tracking_uri('../mlruns')

# Criando/acessando o experimento
mlflow.set_experiment('Comparando modelos')

<Experiment: artifact_location='/workspaces/sisu_analysis/notebooks/../mlruns/552820451131722872', creation_time=1701886234717, experiment_id='552820451131722872', last_update_time=1701886234717, lifecycle_stage='active', name='Comparando modelos', tags={}>

In [7]:
# Dividindo os dados em variaveis explicativas e variavel alvo
x = dados_sisu.drop(columns=['APROVADO'])
y = dados_sisu['APROVADO'].map({'S': 1, 'N': 0})

# Dividindo os dados em treino e teste
x_treino, x_teste, y_treino, y_teste = train_test_split(x, y, test_size=0.45, random_state=42, stratify=y)

# Dividindo os dados em teste e dev
x_teste, x_dev, y_teste, y_dev = train_test_split(x_teste, y_teste, test_size=0.5, random_state=42, stratify=y_teste)

# Dividindo os dados em dev e calibração
x_dev, x_calib, y_dev, y_calib = train_test_split(x_dev, y_dev, test_size=0.5, random_state=42, stratify=y_dev)


In [8]:
# Criando um scaler padrão
scale = y_treino.value_counts()[0] / y_treino.value_counts()[1] 

# Criando dicionário com os modelos
dict_models_scale_sensitive_cw = {"LR": LogisticRegression(random_state=200, 
                                                           class_weight='balanced')}

dict_models_scale_sensitive_no_cw = {"LR": LogisticRegression(random_state=200)}

dict_models_tree_based_cw = {"LGBM": LGBMClassifier(is_unbalance=True,
                                                 random_state=200),
                          "XGB": XGBClassifier(scale_pos_weight=scale,
                                               random_state=200),
                          "CTBC": CatBoostClassifier(auto_class_weights='Balanced',
                                                     random_state=200)}

dict_models_tree_based_no_cw = {"LGBM": LGBMClassifier(random_state=200),
                          "XGB": XGBClassifier(random_state=200),
                          "CTBC": CatBoostClassifier(random_state=200)}

# Criando dicionário com os encoders
dict_encoders = {"OHE": OneHotEncoder(drop='first'),
                 "TE": ce.TargetEncoder(),
                 "BE": ce.BinaryEncoder(),
                 "ME": ce.MEstimateEncoder(),
                 "WOE": ce.WOEEncoder(),
                 "CE": ce.CatBoostEncoder(),
                 "GE":ce.GrayEncoder()}

dict_scalers = {"SS": StandardScaler()}

# Criando dicionário com os transformers
dict_transformers = {"PT": PowerTransformer()}

In [9]:
# Definindo as folds
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=200)

# Retornando os nomes das colunas com mais de 25 valores únicos
cat_cols = x_treino.select_dtypes(include='object').columns
high_dim_cols = cat_cols[x_treino[cat_cols].nunique() > 25]

# Retornando os nomes das colunas com menos de 25 valores únicos
cat_cols = [col for col in cat_cols if col not in high_dim_cols]

# Buscando as colunas numéricas
num_cols = x_treino.select_dtypes(include=['int', 'float']).columns

In [10]:
## Iniciando os experimentos sem transformers e com class_weight
#for tag, model in dict_models_scale_sensitive_cw.items():
#    for tag_encoder, encoder in dict_encoders.items():
#        for tag_scaler, scaler in dict_scalers.items():
#            
#            # Gerando a tag de identificação do modelo
#            nome_modelo = f'{tag}_CW_{tag_encoder}_{tag_scaler}'
#            
#            with mlflow.start_run(run_name=nome_modelo):
#                 
#                 # Criando os pipeline com os transformers
#                 pipe_cat = Pipeline([('encoder', encoder)])
#                 pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#                 pipe_num = Pipeline([('scaler', scaler)])
#                 
#                 # Criando o transformador
#                 transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                                 ('num', pipe_num, num_cols),
#                                                 ('high_dim', pipe_high_dim, high_dim_cols)])
#                 
#                 # Criando o pipeline final
#                 pipe = Pipeline([('transformer', transformer),
#                                 ('model', model)])
#                 
#                 # Executando o cross validation
#                 cross_val_scores = cross_val_score(pipe, x_treino, y_treino, cv=kf, scoring='neg_log_loss')
#                 
#                 # Calculando a média das métricas
#                 mean_score = cross_val_scores.mean()           
#                 
#                 # Salvando a métrica da folder 1
#                 mlflow.log_metric('log_loss_fold_1', cross_val_scores[0])
#                 
#                 # Salvando a métrica da folder 2
#                 mlflow.log_metric('log_loss_fold_2', cross_val_scores[1])
#                
#                 # Salvando a métrica da folder 3
#                 mlflow.log_metric('log_loss_fold_3', cross_val_scores[2])
#                
#                 # Salvando a métrica da folder 4
#                 mlflow.log_metric('log_loss_fold_4', cross_val_scores[3])
#                
#                 # Salvando a métrica da folder 5
#                 mlflow.log_metric('log_loss_fold_5', cross_val_scores[4])
#                 
#                 # Salvando as métricas
#                 mlflow.log_metric('log_loss_mean', mean_score)

In [11]:
## Iniciando os experimentos sem transformers e sem class_weight
#for tag, model in dict_models_scale_sensitive_no_cw.items():
#    for tag_encoder, encoder in dict_encoders.items():
#        for tag_scaler, scaler in dict_scalers.items():
#            
#            # Gerando a tag de identificação do modelo
#            nome_modelo = f'{tag}_NO_CW_{tag_encoder}_{tag_scaler}'
#            
#            with mlflow.start_run(run_name=nome_modelo):
#                 
#                 # Criando os pipeline com os transformers
#                 pipe_cat = Pipeline([('encoder', encoder)])
#                 pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#                 pipe_num = Pipeline([('scaler', scaler)])
#                 
#                 # Criando o transformador
#                 transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                                 ('num', pipe_num, num_cols),
#                                                 ('high_dim', pipe_high_dim, high_dim_cols)])
#                 
#                 # Criando o pipeline final
#                 pipe = Pipeline([('transformer', transformer),
#                                 ('model', model)])
#                 
#                 # Executando o cross validation
#                 cross_val_scores = cross_val_score(pipe, x_treino, y_treino, cv=kf, scoring='neg_log_loss')
#                 
#                 # Calculando a média das métricas
#                 mean_score = cross_val_scores.mean()           
#                 
#                 # Salvando a métrica da folder 1
#                 mlflow.log_metric('log_loss_fold_1', cross_val_scores[0])
#                 
#                 # Salvando a métrica da folder 2
#                 mlflow.log_metric('log_loss_fold_2', cross_val_scores[1])
#                
#                 # Salvando a métrica da folder 3
#                 mlflow.log_metric('log_loss_fold_3', cross_val_scores[2])
#                
#                 # Salvando a métrica da folder 4
#                 mlflow.log_metric('log_loss_fold_4', cross_val_scores[3])
#                
#                 # Salvando a métrica da folder 5
#                 mlflow.log_metric('log_loss_fold_5', cross_val_scores[4])
#                 
#                 # Salvando as métricas
#                 mlflow.log_metric('log_loss_mean', mean_score)

In [12]:
## Iniciando os experimentos com transformers e sem class_weight
#for tag, model in dict_models_scale_sensitive_no_cw.items():
#    for tag_encoder, encoder in dict_encoders.items():
#        for tag_scaler, scaler in dict_scalers.items():
#            for tag_transformer, transformer in dict_transformers.items():
#            
#                # Gerando a tag de identificação do modelo
#                nome_modelo = f'{tag}_NO_CW_{tag_encoder}_{tag_scaler}_{tag_transformer}'
#
#                with mlflow.start_run(run_name=nome_modelo):
#
#                     # Criando os pipeline com os transformers
#                     pipe_cat = Pipeline([('encoder', encoder)])
#                     pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#                     pipe_num = Pipeline([('scaler', scaler),
#                                          ('transformer', transformer)])
#
#                     # Criando o transformador
#                     transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                                     ('num', pipe_num, num_cols),
#                                                     ('high_dim', pipe_high_dim, high_dim_cols)])
#
#                     # Criando o pipeline final
#                     pipe = Pipeline([('transformer', transformer),
#                                     ('model', model)])
#
#                     # Executando o cross validation
#                     cross_val_scores = cross_val_score(pipe, x_treino, y_treino, cv=kf, scoring='neg_log_loss')
#
#                     # Calculando a média das métricas
#                     mean_score = cross_val_scores.mean()         
#
#                     # Salvando a métrica da folder 1
#                     mlflow.log_metric('log_loss_fold_1', cross_val_scores[0])
#
#                     # Salvando a métrica da folder 2
#                     mlflow.log_metric('log_loss_fold_2', cross_val_scores[1])
#
#                     # Salvando a métrica da folder 3
#                     mlflow.log_metric('log_loss_fold_3', cross_val_scores[2])
#
#                     # Salvando a métrica da folder 4
#                     mlflow.log_metric('log_loss_fold_4', cross_val_scores[3])
#
#                     # Salvando a métrica da folder 5
#                     mlflow.log_metric('log_loss_fold_5', cross_val_scores[4])
#
#                     # Salvando as métricas
#                     mlflow.log_metric('log_loss_mean', mean_score)

In [13]:
## Iniciando os experimentos com transformers e com class_weight
#for tag, model in dict_models_scale_sensitive_cw.items():
#    for tag_encoder, encoder in dict_encoders.items():
#        for tag_scaler, scaler in dict_scalers.items():
#            for tag_transformer, transformer in dict_transformers.items():
#            
#                # Gerando a tag de identificação do modelo
#                nome_modelo = f'{tag}_CW_{tag_encoder}_{tag_scaler}_{tag_transformer}'
#
#                with mlflow.start_run(run_name=nome_modelo):
#
#                     # Criando os pipeline com os transformers
#                     pipe_cat = Pipeline([('encoder', encoder)])
#                     pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#                     pipe_num = Pipeline([('scaler', scaler),
#                                          ('transformer', transformer)])
#
#                     # Criando o transformador
#                     transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                                     ('num', pipe_num, num_cols),
#                                                     ('high_dim', pipe_high_dim, high_dim_cols)])
#
#                     # Criando o pipeline final
#                     pipe = Pipeline([('transformer', transformer),
#                                     ('model', model)])
#
#                     # Executando o cross validation
#                     cross_val_scores = cross_val_score(pipe, x_treino, y_treino, cv=kf, scoring='neg_log_loss')
#
#                     # Calculando a média das métricas
#                     mean_score = cross_val_scores.mean()         
#
#                     # Salvando a métrica da folder 1
#                     mlflow.log_metric('log_loss_fold_1', cross_val_scores[0])
#
#                     # Salvando a métrica da folder 2
#                     mlflow.log_metric('log_loss_fold_2', cross_val_scores[1])
#
#                     # Salvando a métrica da folder 3
#                     mlflow.log_metric('log_loss_fold_3', cross_val_scores[2])
#
#                     # Salvando a métrica da folder 4
#                     mlflow.log_metric('log_loss_fold_4', cross_val_scores[3])
#
#                     # Salvando a métrica da folder 5
#                     mlflow.log_metric('log_loss_fold_5', cross_val_scores[4])
#
#                     # Salvando as métricas
#                     mlflow.log_metric('log_loss_mean', mean_score)

In [14]:
## Iniciando os experimentos sem transformers e com class_weight
#for tag, model in dict_models_tree_based_cw.items():
#    for tag_encoder, encoder in dict_encoders.items():
#            
#            # Gerando a tag de identificação do modelo
#            nome_modelo = f'{tag}_CW_{tag_encoder}'
#            
#            with mlflow.start_run(run_name=nome_modelo):
#                 
#                 # Criando os pipeline com os transformers
#                 pipe_cat = Pipeline([('encoder', encoder)])
#                 pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#                 
#                 # Criando o transformador
#                 transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                                 ('high_dim', pipe_high_dim, high_dim_cols)])
#                 
#                 # Criando o pipeline final
#                 pipe = Pipeline([('transformer', transformer),
#                                 ('model', model)])
#                 
#                 # Executando o cross validation
#                 cross_val_scores = cross_val_score(pipe, x_treino, y_treino, cv=kf, scoring='neg_log_loss')
#                 
#                 # Calculando a média das métricas
#                 mean_score = cross_val_scores.mean()         
#                 
#                 # Salvando a métrica da folder 1
#                 mlflow.log_metric('log_loss_fold_1', cross_val_scores[0])
#                 
#                 # Salvando a métrica da folder 2
#                 mlflow.log_metric('log_loss_fold_2', cross_val_scores[1])
#                
#                 # Salvando a métrica da folder 3
#                 mlflow.log_metric('log_loss_fold_3', cross_val_scores[2])
#                
#                 # Salvando a métrica da folder 4
#                 mlflow.log_metric('log_loss_fold_4', cross_val_scores[3])
#                
#                 # Salvando a métrica da folder 5
#                 mlflow.log_metric('log_loss_fold_5', cross_val_scores[4])
#                 
#                 # Salvando as métricas
#                 mlflow.log_metric('log_loss_mean', mean_score)
#

In [15]:
## Iniciando os experimentos sem transformers e sem class_weight
#for tag, model in dict_models_tree_based_no_cw.items():
#    for tag_encoder, encoder in dict_encoders.items():
#            
#            # Gerando a tag de identificação do modelo
#            nome_modelo = f'{tag}_NO_CW_{tag_encoder}'
#            
#            with mlflow.start_run(run_name=nome_modelo):
#                 
#                 # Criando os pipeline com os transformers
#                 pipe_cat = Pipeline([('encoder', encoder)])
#                 pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#                 
#                 # Criando o transformador
#                 transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                                 ('high_dim', pipe_high_dim, high_dim_cols)])
#                 
#                 # Criando o pipeline final
#                 pipe = Pipeline([('transformer', transformer),
#                                 ('model', model)])
#                 
#                 # Executando o cross validation
#                 cross_val_scores = cross_val_score(pipe, x_treino, y_treino, cv=kf, scoring='neg_log_loss')
#                 
#                 # Calculando a média das métricas
#                 mean_score = cross_val_scores.mean()         
#                 
#                 # Salvando a métrica da folder 1
#                 mlflow.log_metric('log_loss_fold_1', cross_val_scores[0])
#                 
#                 # Salvando a métrica da folder 2
#                 mlflow.log_metric('log_loss_fold_2', cross_val_scores[1])
#                
#                 # Salvando a métrica da folder 3
#                 mlflow.log_metric('log_loss_fold_3', cross_val_scores[2])
#                
#                 # Salvando a métrica da folder 4
#                 mlflow.log_metric('log_loss_fold_4', cross_val_scores[3])
#                
#                 # Salvando a métrica da folder 5
#                 mlflow.log_metric('log_loss_fold_5', cross_val_scores[4])
#                 
#                 # Salvando as métricas
#                 mlflow.log_metric('log_loss_mean', mean_score)
#

In [16]:
# Especificando as colunas para retornar
colunas = ['tags.mlflow.runName', 'metrics.log_loss_mean']

# Buscando os melhores modelos
mlflow.search_runs(order_by=['metrics.log_loss_mean DESC'], max_results=15)[colunas]

Unnamed: 0,tags.mlflow.runName,metrics.log_loss_mean
0,LR_NO_CW_BE_SS,-0.263045
1,LR_NO_CW_BE_SS_PT,-0.263931
2,LR_NO_CW_GE_SS,-0.265065
3,LR_NO_CW_GE_SS_PT,-0.265762
4,LR_NO_CW_OHE_SS_PT,-0.265789
5,LR_NO_CW_OHE_SS,-0.266268
6,LR_NO_CW_WOE_SS,-0.27055
7,LR_NO_CW_TE_SS,-0.27077
8,LR_NO_CW_ME_SS,-0.27077
9,LR_NO_CW_CE_SS,-0.27077


Os modelos sem Class_Weight foram os melhores classificadores. Dentre os 
melhores, é perceptível a baixa diferença entre os modelos. Dado a isso, iremos
selecionar os dois melhores modelos para a tunagem de hiperparâmetros, que são 
a Regressão Logística e o CatBoost. Os dois serão testados em um ensemble.

Como a natureza da solução exige respostas rápidas, a Regressão Logística terá 
preferência sobre o CatBoost e o Ensemble de ambos.

### Regressão Logística

In [17]:
## Criando função para tunar o modelo
#def objective(trial):
#
#    params = {
#        'C': trial.suggest_float('C', 1e-4, 1e+4, log=True),
#        'penalty': trial.suggest_categorical('penalty', [None, 'l2']),
#        'solver': trial.suggest_categorical('solver', ['lbfgs', 'saga', 'newton-cholesky']),
#        'max_iter': trial.suggest_int('max_iter', 50, 1000),
#        'fit_intercept': trial.suggest_categorical('fit_intercept', [True, False]),
#        'class_weight': trial.suggest_categorical('class_weight', [None, 'balanced']),
#        'random_state': 200
#    }
#    
#    # Criando os pipeline com os transformers
#    pipe_cat = Pipeline([('encoder', ce.BinaryEncoder())])
#    pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#    pipe_num = Pipeline([('scaler', StandardScaler())])
#    
#    # Criando o transformador
#    transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                    ('num', pipe_num, num_cols),
#                                    ('high_dim', pipe_high_dim, high_dim_cols)])
#    
#    # Criando o pipeline final
#    pipe = Pipeline([('transformer', transformer),
#                    ('model', LogisticRegression(**params))])
#
#    # Treinando o modelo com os dados de treino
#    pipe.fit(x_treino, y_treino)
#   
#    logloss = log_loss(y_dev, pipe.predict_proba(x_dev))
#    
#    return logloss
#
## Criando o estudo de otimização
#study = optuna.create_study(direction = 'minimize')
#study.optimize(objective, n_trials = 15)

[I 2023-12-07 18:52:18,642] A new study created in memory with name: no-name-efe560b1-4cc6-4caa-a2e2-b359ce6e04fc
[I 2023-12-07 18:56:07,592] Trial 0 finished with value: 0.6076452777922804 and parameters: {'C': 0.3491528335036483, 'penalty': 'l2', 'solver': 'saga', 'max_iter': 232, 'fit_intercept': False, 'class_weight': 'balanced'}. Best is trial 0 with value: 0.6076452777922804.
[I 2023-12-07 18:56:20,501] Trial 1 finished with value: 0.267158642020306 and parameters: {'C': 270.48268823332904, 'penalty': None, 'solver': 'lbfgs', 'max_iter': 145, 'fit_intercept': False, 'class_weight': None}. Best is trial 1 with value: 0.267158642020306.
[I 2023-12-07 18:56:33,348] Trial 2 finished with value: 0.26659118663750986 and parameters: {'C': 0.18125674548421533, 'penalty': 'l2', 'solver': 'lbfgs', 'max_iter': 187, 'fit_intercept': True, 'class_weight': None}. Best is trial 2 with value: 0.26659118663750986.
[I 2023-12-07 19:02:40,901] Trial 3 finished with value: 0.3263684820545403 and par

In [31]:
# O resultado do hiperparâmetro otimizado pode ser diferente a cada execução.
# Resultado atingido: 0.23

# Checando os melhores parâmetros
#study.best_params

In [28]:
# Criando os pipeline com os transformers
pipe_cat = Pipeline([('encoder', ce.BinaryEncoder())])
pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
pipe_num = Pipeline([('scaler', StandardScaler())])

# Criando o transformador
transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
                                ('num', pipe_num, num_cols),
                                ('high_dim', pipe_high_dim, high_dim_cols)])

# Criando o pipeline final
best_lr = Pipeline([('transformer', transformer),
                ('model', LogisticRegression(C=0.08250109742237544,
                                             penalty='l2',
                                             solver='newton-cholesky',
                                             max_iter=244,
                                             fit_intercept=False,
                                             class_weight=None,
                                             random_state=200))])

best_lr.fit(x_treino, y_treino)

### CatBoost

In [30]:
## Criando função para tunar o modelo
#def objective(trial):
#
#    params = {
#        'objective': 'Logloss',
#        'eval_metric': 'Logloss',
#        'scale_pos_weight': trial.suggest_float('scale_pos_weight', 1, 10),
#        'depth': trial.suggest_int('depth', 3, 10),
#        'min_child_samples': trial.suggest_int('min_child_samples', 1, 20),
#        'subsample': trial.suggest_float('subsample', 0.5, 1),
#        'colsample_bylevel': trial.suggest_float('colsample_bylevel', 0.5, 1),
#        'reg_lambda': trial.suggest_float('reg_lambda', 1e-3, 10),
#        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1),
#        'random_state': 200
#    }
#    
#    pipe_cat = Pipeline([('encoder', ce.GrayEncoder())])
#    pipe_high_dim = Pipeline([('encoder', ce.CountEncoder())])
#
#    # Criando o transformador
#    transformer = ColumnTransformer([('cat', pipe_cat, cat_cols),
#                                    ('high_dim', pipe_high_dim, high_dim_cols)])
#
#    # Criando o pipeline final
#    pipe = Pipeline([('transformer', transformer),
#                    ('rf', CatBoostClassifier(**params))])
#
#    # Treinando o modelo com os dados de treino
#    pipe.fit(x_treino, y_treino)
#   
#    logloss = log_loss(y_dev, pipe.predict_proba(x_dev))
#    
#    return logloss
#
## Criando o estudo de otimização
#study = optuna.create_study(direction = 'minimize')
#study.optimize(objective, n_trials = 15)

[I 2023-12-07 19:16:40,494] A new study created in memory with name: no-name-e83bb914-65d4-4d46-a894-3d522794d4a1


0:	learn: 0.6476222	total: 174ms	remaining: 2m 53s
1:	learn: 0.6098738	total: 259ms	remaining: 2m 9s
2:	learn: 0.5782618	total: 347ms	remaining: 1m 55s
3:	learn: 0.5517062	total: 438ms	remaining: 1m 49s
4:	learn: 0.5312711	total: 531ms	remaining: 1m 45s
5:	learn: 0.5135467	total: 621ms	remaining: 1m 42s
6:	learn: 0.4996119	total: 718ms	remaining: 1m 41s
7:	learn: 0.4881099	total: 810ms	remaining: 1m 40s
8:	learn: 0.4780260	total: 902ms	remaining: 1m 39s
9:	learn: 0.4699675	total: 991ms	remaining: 1m 38s
10:	learn: 0.4632087	total: 1.08s	remaining: 1m 37s
11:	learn: 0.4574896	total: 1.17s	remaining: 1m 36s
12:	learn: 0.4524741	total: 1.27s	remaining: 1m 36s
13:	learn: 0.4482609	total: 1.36s	remaining: 1m 35s
14:	learn: 0.4450364	total: 1.45s	remaining: 1m 35s
15:	learn: 0.4417377	total: 1.54s	remaining: 1m 35s
16:	learn: 0.4391160	total: 1.63s	remaining: 1m 34s
17:	learn: 0.4365027	total: 1.72s	remaining: 1m 34s
18:	learn: 0.4341332	total: 1.82s	remaining: 1m 34s
19:	learn: 0.4324779	to

[I 2023-12-07 19:18:21,032] Trial 0 finished with value: 0.2994706230366258 and parameters: {'scale_pos_weight': 1.946917166950045, 'depth': 7, 'min_child_samples': 3, 'subsample': 0.963250533239097, 'colsample_bylevel': 0.5575870684991423, 'reg_lambda': 8.693374908645778, 'learning_rate': 0.07820772969101444}. Best is trial 0 with value: 0.2994706230366258.


0:	learn: 0.6899778	total: 54.3ms	remaining: 54.3s
1:	learn: 0.6865409	total: 110ms	remaining: 54.9s
2:	learn: 0.6835903	total: 167ms	remaining: 55.4s
3:	learn: 0.6805316	total: 229ms	remaining: 57s
4:	learn: 0.6779316	total: 303ms	remaining: 1m
5:	learn: 0.6753097	total: 411ms	remaining: 1m 8s
6:	learn: 0.6726434	total: 470ms	remaining: 1m 6s
7:	learn: 0.6700961	total: 529ms	remaining: 1m 5s
8:	learn: 0.6676040	total: 585ms	remaining: 1m 4s
9:	learn: 0.6655080	total: 639ms	remaining: 1m 3s
10:	learn: 0.6630790	total: 698ms	remaining: 1m 2s
11:	learn: 0.6610455	total: 760ms	remaining: 1m 2s
12:	learn: 0.6588544	total: 826ms	remaining: 1m 2s
13:	learn: 0.6569086	total: 884ms	remaining: 1m 2s
14:	learn: 0.6551649	total: 946ms	remaining: 1m 2s
15:	learn: 0.6534985	total: 1.01s	remaining: 1m 2s
16:	learn: 0.6520063	total: 1.07s	remaining: 1m 1s
17:	learn: 0.6504890	total: 1.13s	remaining: 1m 1s
18:	learn: 0.6487740	total: 1.19s	remaining: 1m 1s
19:	learn: 0.6473745	total: 1.25s	remaining: 

[I 2023-12-07 19:19:29,260] Trial 1 finished with value: 0.5249203389222242 and parameters: {'scale_pos_weight': 7.144454796693722, 'depth': 4, 'min_child_samples': 7, 'subsample': 0.5387070162286863, 'colsample_bylevel': 0.7172778419785718, 'reg_lambda': 8.903789627731983, 'learning_rate': 0.02480673985781702}. Best is trial 0 with value: 0.2994706230366258.


0:	learn: 0.6697869	total: 121ms	remaining: 2m
1:	learn: 0.6481052	total: 208ms	remaining: 1m 43s
2:	learn: 0.6277107	total: 295ms	remaining: 1m 37s
3:	learn: 0.6087464	total: 384ms	remaining: 1m 35s
4:	learn: 0.5907259	total: 479ms	remaining: 1m 35s
5:	learn: 0.5739418	total: 571ms	remaining: 1m 34s
6:	learn: 0.5587510	total: 662ms	remaining: 1m 33s
7:	learn: 0.5442726	total: 753ms	remaining: 1m 33s
8:	learn: 0.5310499	total: 841ms	remaining: 1m 32s
9:	learn: 0.5182262	total: 928ms	remaining: 1m 31s
10:	learn: 0.5062665	total: 1.02s	remaining: 1m 31s
11:	learn: 0.4946047	total: 1.11s	remaining: 1m 31s
12:	learn: 0.4836847	total: 1.23s	remaining: 1m 33s
13:	learn: 0.4734643	total: 1.32s	remaining: 1m 32s
14:	learn: 0.4642518	total: 1.41s	remaining: 1m 32s
15:	learn: 0.4552601	total: 1.51s	remaining: 1m 32s
16:	learn: 0.4476298	total: 1.61s	remaining: 1m 33s
17:	learn: 0.4397993	total: 1.71s	remaining: 1m 33s
18:	learn: 0.4333283	total: 1.79s	remaining: 1m 32s
19:	learn: 0.4265760	total

[I 2023-12-07 19:21:09,546] Trial 2 finished with value: 0.2830682937035806 and parameters: {'scale_pos_weight': 1.0575481709607752, 'depth': 7, 'min_child_samples': 14, 'subsample': 0.943172155483075, 'colsample_bylevel': 0.6216384267427848, 'reg_lambda': 7.321073299565056, 'learning_rate': 0.02436625989762644}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6504305	total: 84.4ms	remaining: 1m 24s
1:	learn: 0.6142091	total: 159ms	remaining: 1m 19s
2:	learn: 0.5819061	total: 220ms	remaining: 1m 13s
3:	learn: 0.5541364	total: 282ms	remaining: 1m 10s
4:	learn: 0.5319729	total: 344ms	remaining: 1m 8s
5:	learn: 0.5105839	total: 405ms	remaining: 1m 7s
6:	learn: 0.4920606	total: 477ms	remaining: 1m 7s
7:	learn: 0.4760600	total: 557ms	remaining: 1m 9s
8:	learn: 0.4614014	total: 625ms	remaining: 1m 8s
9:	learn: 0.4497840	total: 689ms	remaining: 1m 8s
10:	learn: 0.4386151	total: 751ms	remaining: 1m 7s
11:	learn: 0.4299050	total: 813ms	remaining: 1m 6s
12:	learn: 0.4210299	total: 876ms	remaining: 1m 6s
13:	learn: 0.4139550	total: 936ms	remaining: 1m 5s
14:	learn: 0.4074650	total: 1s	remaining: 1m 5s
15:	learn: 0.4014543	total: 1.06s	remaining: 1m 5s
16:	learn: 0.3968565	total: 1.13s	remaining: 1m 5s
17:	learn: 0.3929846	total: 1.19s	remaining: 1m 5s
18:	learn: 0.3891934	total: 1.25s	remaining: 1m 4s
19:	learn: 0.3861527	total: 1.32s	remai

[I 2023-12-07 19:22:19,373] Trial 3 finished with value: 0.28981859293601503 and parameters: {'scale_pos_weight': 1.3311179651696619, 'depth': 3, 'min_child_samples': 18, 'subsample': 0.9265063643856981, 'colsample_bylevel': 0.8337730314029514, 'reg_lambda': 6.0902282885992705, 'learning_rate': 0.05467516907002009}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6912875	total: 87.4ms	remaining: 1m 27s
1:	learn: 0.6893262	total: 167ms	remaining: 1m 23s
2:	learn: 0.6876093	total: 248ms	remaining: 1m 22s
3:	learn: 0.6858353	total: 347ms	remaining: 1m 26s
4:	learn: 0.6840967	total: 457ms	remaining: 1m 30s
5:	learn: 0.6824443	total: 556ms	remaining: 1m 32s
6:	learn: 0.6807957	total: 648ms	remaining: 1m 31s
7:	learn: 0.6791881	total: 731ms	remaining: 1m 30s
8:	learn: 0.6775563	total: 816ms	remaining: 1m 29s
9:	learn: 0.6760444	total: 903ms	remaining: 1m 29s
10:	learn: 0.6745082	total: 985ms	remaining: 1m 28s
11:	learn: 0.6731462	total: 1.07s	remaining: 1m 28s
12:	learn: 0.6717546	total: 1.15s	remaining: 1m 27s
13:	learn: 0.6703324	total: 1.24s	remaining: 1m 27s
14:	learn: 0.6690434	total: 1.32s	remaining: 1m 27s
15:	learn: 0.6678752	total: 1.41s	remaining: 1m 26s
16:	learn: 0.6666146	total: 1.53s	remaining: 1m 28s
17:	learn: 0.6653194	total: 1.61s	remaining: 1m 27s
18:	learn: 0.6640663	total: 1.7s	remaining: 1m 27s
19:	learn: 0.6628528	t

[I 2023-12-07 19:23:55,465] Trial 4 finished with value: 0.5530377768633967 and parameters: {'scale_pos_weight': 8.033525139618748, 'depth': 8, 'min_child_samples': 7, 'subsample': 0.6289631959230746, 'colsample_bylevel': 0.5461346551890172, 'reg_lambda': 3.1480877037703436, 'learning_rate': 0.011704951353581475}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6594931	total: 72.8ms	remaining: 1m 12s
1:	learn: 0.6336029	total: 157ms	remaining: 1m 18s
2:	learn: 0.6128429	total: 242ms	remaining: 1m 20s
3:	learn: 0.5964724	total: 309ms	remaining: 1m 17s
4:	learn: 0.5833187	total: 377ms	remaining: 1m 14s
5:	learn: 0.5724554	total: 446ms	remaining: 1m 13s
6:	learn: 0.5634581	total: 513ms	remaining: 1m 12s
7:	learn: 0.5565088	total: 578ms	remaining: 1m 11s
8:	learn: 0.5507863	total: 643ms	remaining: 1m 10s
9:	learn: 0.5457521	total: 708ms	remaining: 1m 10s
10:	learn: 0.5415056	total: 775ms	remaining: 1m 9s
11:	learn: 0.5381371	total: 844ms	remaining: 1m 9s
12:	learn: 0.5353908	total: 912ms	remaining: 1m 9s
13:	learn: 0.5332259	total: 978ms	remaining: 1m 8s
14:	learn: 0.5311567	total: 1.04s	remaining: 1m 8s
15:	learn: 0.5290377	total: 1.11s	remaining: 1m 8s
16:	learn: 0.5276465	total: 1.17s	remaining: 1m 7s
17:	learn: 0.5264889	total: 1.24s	remaining: 1m 7s
18:	learn: 0.5255178	total: 1.3s	remaining: 1m 7s
19:	learn: 0.5241041	total: 1.3

[I 2023-12-07 19:25:13,068] Trial 5 finished with value: 0.3512179599234252 and parameters: {'scale_pos_weight': 3.2396470134801487, 'depth': 5, 'min_child_samples': 17, 'subsample': 0.7040756461347045, 'colsample_bylevel': 0.8097253101312645, 'reg_lambda': 4.036047616874699, 'learning_rate': 0.09316810940644456}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6832655	total: 97.1ms	remaining: 1m 37s
1:	learn: 0.6744066	total: 200ms	remaining: 1m 39s
2:	learn: 0.6666279	total: 314ms	remaining: 1m 44s
3:	learn: 0.6599205	total: 428ms	remaining: 1m 46s
4:	learn: 0.6538449	total: 515ms	remaining: 1m 42s
5:	learn: 0.6483910	total: 595ms	remaining: 1m 38s
6:	learn: 0.6431885	total: 692ms	remaining: 1m 38s
7:	learn: 0.6386315	total: 792ms	remaining: 1m 38s
8:	learn: 0.6347723	total: 911ms	remaining: 1m 40s
9:	learn: 0.6311332	total: 1.01s	remaining: 1m 40s
10:	learn: 0.6279441	total: 1.13s	remaining: 1m 41s
11:	learn: 0.6251857	total: 1.26s	remaining: 1m 44s
12:	learn: 0.6223386	total: 1.37s	remaining: 1m 44s
13:	learn: 0.6198907	total: 1.46s	remaining: 1m 42s
14:	learn: 0.6176914	total: 1.55s	remaining: 1m 41s
15:	learn: 0.6156838	total: 1.64s	remaining: 1m 41s
16:	learn: 0.6136155	total: 1.74s	remaining: 1m 40s
17:	learn: 0.6119731	total: 1.83s	remaining: 1m 39s
18:	learn: 0.6104108	total: 1.92s	remaining: 1m 39s
19:	learn: 0.6089740	

[I 2023-12-07 19:26:54,733] Trial 6 finished with value: 0.4808542436323911 and parameters: {'scale_pos_weight': 6.588108142138871, 'depth': 9, 'min_child_samples': 5, 'subsample': 0.5382493436284026, 'colsample_bylevel': 0.8884218686562573, 'reg_lambda': 0.06676605234552623, 'learning_rate': 0.054055822478811556}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6720511	total: 84.6ms	remaining: 1m 24s
1:	learn: 0.6553207	total: 164ms	remaining: 1m 21s
2:	learn: 0.6410429	total: 251ms	remaining: 1m 23s
3:	learn: 0.6295748	total: 340ms	remaining: 1m 24s
4:	learn: 0.6205811	total: 447ms	remaining: 1m 28s
5:	learn: 0.6125279	total: 540ms	remaining: 1m 29s
6:	learn: 0.6059493	total: 629ms	remaining: 1m 29s
7:	learn: 0.6003648	total: 718ms	remaining: 1m 28s
8:	learn: 0.5955242	total: 810ms	remaining: 1m 29s
9:	learn: 0.5917566	total: 890ms	remaining: 1m 28s
10:	learn: 0.5877879	total: 977ms	remaining: 1m 27s
11:	learn: 0.5845962	total: 1.06s	remaining: 1m 27s
12:	learn: 0.5821533	total: 1.17s	remaining: 1m 29s
13:	learn: 0.5798785	total: 1.26s	remaining: 1m 29s
14:	learn: 0.5778137	total: 1.35s	remaining: 1m 28s
15:	learn: 0.5760224	total: 1.43s	remaining: 1m 27s
16:	learn: 0.5744280	total: 1.52s	remaining: 1m 27s
17:	learn: 0.5731921	total: 1.61s	remaining: 1m 27s
18:	learn: 0.5718685	total: 1.7s	remaining: 1m 27s
19:	learn: 0.5704897	t

[I 2023-12-07 19:28:36,486] Trial 7 finished with value: 0.40531189999228856 and parameters: {'scale_pos_weight': 4.6491523969306385, 'depth': 8, 'min_child_samples': 20, 'subsample': 0.7104876956110637, 'colsample_bylevel': 0.767921960420836, 'reg_lambda': 6.681263909358281, 'learning_rate': 0.08350397673763918}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6787388	total: 71.1ms	remaining: 1m 11s
1:	learn: 0.6670299	total: 140ms	remaining: 1m 9s
2:	learn: 0.6569244	total: 209ms	remaining: 1m 9s
3:	learn: 0.6481810	total: 277ms	remaining: 1m 8s
4:	learn: 0.6409051	total: 348ms	remaining: 1m 9s
5:	learn: 0.6342432	total: 418ms	remaining: 1m 9s
6:	learn: 0.6290203	total: 486ms	remaining: 1m 8s
7:	learn: 0.6238780	total: 558ms	remaining: 1m 9s
8:	learn: 0.6195497	total: 630ms	remaining: 1m 9s
9:	learn: 0.6157910	total: 709ms	remaining: 1m 10s
10:	learn: 0.6125580	total: 776ms	remaining: 1m 9s
11:	learn: 0.6097622	total: 858ms	remaining: 1m 10s
12:	learn: 0.6070378	total: 978ms	remaining: 1m 14s
13:	learn: 0.6047435	total: 1.05s	remaining: 1m 14s
14:	learn: 0.6029626	total: 1.12s	remaining: 1m 13s
15:	learn: 0.6011637	total: 1.19s	remaining: 1m 13s
16:	learn: 0.5994865	total: 1.26s	remaining: 1m 12s
17:	learn: 0.5981282	total: 1.33s	remaining: 1m 12s
18:	learn: 0.5968739	total: 1.4s	remaining: 1m 12s
19:	learn: 0.5954381	total: 1.4

[I 2023-12-07 19:29:57,713] Trial 8 finished with value: 0.4439018552920213 and parameters: {'scale_pos_weight': 5.561429680742217, 'depth': 7, 'min_child_samples': 4, 'subsample': 0.5354117446491379, 'colsample_bylevel': 0.8276156058625637, 'reg_lambda': 0.054142304367127414, 'learning_rate': 0.06992961729713851}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6803563	total: 85.7ms	remaining: 1m 25s
1:	learn: 0.6695358	total: 169ms	remaining: 1m 24s
2:	learn: 0.6607088	total: 250ms	remaining: 1m 22s
3:	learn: 0.6530857	total: 334ms	remaining: 1m 23s
4:	learn: 0.6469842	total: 421ms	remaining: 1m 23s
5:	learn: 0.6414940	total: 504ms	remaining: 1m 23s
6:	learn: 0.6369343	total: 596ms	remaining: 1m 24s
7:	learn: 0.6331264	total: 684ms	remaining: 1m 24s
8:	learn: 0.6297564	total: 776ms	remaining: 1m 25s
9:	learn: 0.6269844	total: 863ms	remaining: 1m 25s
10:	learn: 0.6244494	total: 950ms	remaining: 1m 25s
11:	learn: 0.6224601	total: 1.03s	remaining: 1m 25s
12:	learn: 0.6204255	total: 1.13s	remaining: 1m 25s
13:	learn: 0.6186430	total: 1.22s	remaining: 1m 25s
14:	learn: 0.6173067	total: 1.34s	remaining: 1m 28s
15:	learn: 0.6158899	total: 1.46s	remaining: 1m 29s
16:	learn: 0.6148484	total: 1.57s	remaining: 1m 31s
17:	learn: 0.6136400	total: 1.69s	remaining: 1m 32s
18:	learn: 0.6125481	total: 1.84s	remaining: 1m 34s
19:	learn: 0.6115062	

[I 2023-12-07 19:31:35,517] Trial 9 finished with value: 0.5816983707975028 and parameters: {'scale_pos_weight': 9.382967152571009, 'depth': 7, 'min_child_samples': 11, 'subsample': 0.9207157458052505, 'colsample_bylevel': 0.7363582753680502, 'reg_lambda': 3.273499696944755, 'learning_rate': 0.08637800568122628}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6794673	total: 135ms	remaining: 2m 14s
1:	learn: 0.6672521	total: 267ms	remaining: 2m 13s
2:	learn: 0.6556750	total: 444ms	remaining: 2m 27s
3:	learn: 0.6451942	total: 594ms	remaining: 2m 27s
4:	learn: 0.6352652	total: 755ms	remaining: 2m 30s
5:	learn: 0.6262592	total: 881ms	remaining: 2m 25s
6:	learn: 0.6181070	total: 1.02s	remaining: 2m 24s
7:	learn: 0.6102965	total: 1.16s	remaining: 2m 23s
8:	learn: 0.6032783	total: 1.3s	remaining: 2m 23s
9:	learn: 0.5966585	total: 1.44s	remaining: 2m 22s
10:	learn: 0.5904434	total: 1.57s	remaining: 2m 21s
11:	learn: 0.5849014	total: 1.7s	remaining: 2m 20s
12:	learn: 0.5798140	total: 1.84s	remaining: 2m 19s
13:	learn: 0.5747700	total: 1.97s	remaining: 2m 18s
14:	learn: 0.5704709	total: 2.13s	remaining: 2m 19s
15:	learn: 0.5663370	total: 2.27s	remaining: 2m 19s
16:	learn: 0.5623369	total: 2.41s	remaining: 2m 19s
17:	learn: 0.5587133	total: 2.54s	remaining: 2m 18s
18:	learn: 0.5553180	total: 2.68s	remaining: 2m 18s
19:	learn: 0.5523051	tot

[I 2023-12-07 19:34:00,347] Trial 10 finished with value: 0.34763114543376705 and parameters: {'scale_pos_weight': 3.207924010390244, 'depth': 10, 'min_child_samples': 14, 'subsample': 0.8291946996844023, 'colsample_bylevel': 0.6609991421058322, 'reg_lambda': 7.625407067325823, 'learning_rate': 0.03401647879224588}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6501427	total: 76.9ms	remaining: 1m 16s
1:	learn: 0.6123916	total: 165ms	remaining: 1m 22s
2:	learn: 0.5793315	total: 241ms	remaining: 1m 20s
3:	learn: 0.5506431	total: 314ms	remaining: 1m 18s
4:	learn: 0.5246419	total: 391ms	remaining: 1m 17s
5:	learn: 0.5013292	total: 455ms	remaining: 1m 15s
6:	learn: 0.4800698	total: 522ms	remaining: 1m 14s
7:	learn: 0.4617813	total: 587ms	remaining: 1m 12s
8:	learn: 0.4466860	total: 645ms	remaining: 1m 11s
9:	learn: 0.4314820	total: 709ms	remaining: 1m 10s
10:	learn: 0.4189438	total: 771ms	remaining: 1m 9s
11:	learn: 0.4067647	total: 836ms	remaining: 1m 8s
12:	learn: 0.3964864	total: 896ms	remaining: 1m 8s
13:	learn: 0.3881239	total: 959ms	remaining: 1m 7s
14:	learn: 0.3796794	total: 1.02s	remaining: 1m 7s
15:	learn: 0.3721279	total: 1.08s	remaining: 1m 6s
16:	learn: 0.3659264	total: 1.15s	remaining: 1m 6s
17:	learn: 0.3600033	total: 1.21s	remaining: 1m 5s
18:	learn: 0.3550848	total: 1.27s	remaining: 1m 5s
19:	learn: 0.3501547	total: 1.

[I 2023-12-07 19:35:08,580] Trial 11 finished with value: 0.28695964829195697 and parameters: {'scale_pos_weight': 1.0124590428331608, 'depth': 3, 'min_child_samples': 15, 'subsample': 0.9965527279976341, 'colsample_bylevel': 0.9844047623241325, 'reg_lambda': 6.07647840830704, 'learning_rate': 0.04607313885698084}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6654462	total: 69.5ms	remaining: 1m 9s
1:	learn: 0.6406244	total: 141ms	remaining: 1m 10s
2:	learn: 0.6174620	total: 213ms	remaining: 1m 10s
3:	learn: 0.5967892	total: 288ms	remaining: 1m 11s
4:	learn: 0.5784543	total: 364ms	remaining: 1m 12s
5:	learn: 0.5617759	total: 442ms	remaining: 1m 13s
6:	learn: 0.5465653	total: 517ms	remaining: 1m 13s
7:	learn: 0.5326430	total: 596ms	remaining: 1m 13s
8:	learn: 0.5195109	total: 675ms	remaining: 1m 14s
9:	learn: 0.5080241	total: 752ms	remaining: 1m 14s
10:	learn: 0.4976206	total: 827ms	remaining: 1m 14s
11:	learn: 0.4882768	total: 901ms	remaining: 1m 14s
12:	learn: 0.4795494	total: 979ms	remaining: 1m 14s
13:	learn: 0.4716118	total: 1.1s	remaining: 1m 17s
14:	learn: 0.4647489	total: 1.21s	remaining: 1m 19s
15:	learn: 0.4586642	total: 1.28s	remaining: 1m 19s
16:	learn: 0.4526332	total: 1.36s	remaining: 1m 18s
17:	learn: 0.4478294	total: 1.43s	remaining: 1m 18s
18:	learn: 0.4427684	total: 1.5s	remaining: 1m 17s
19:	learn: 0.4384677	tot

[I 2023-12-07 19:36:28,938] Trial 12 finished with value: 0.2929045759095017 and parameters: {'scale_pos_weight': 1.576944884480049, 'depth': 5, 'min_child_samples': 13, 'subsample': 0.9820969726200324, 'colsample_bylevel': 0.9846208220325542, 'reg_lambda': 6.0614631042340905, 'learning_rate': 0.0390497672673001}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6565755	total: 67.2ms	remaining: 1m 7s
1:	learn: 0.6240340	total: 136ms	remaining: 1m 8s
2:	learn: 0.5946047	total: 207ms	remaining: 1m 8s
3:	learn: 0.5683271	total: 304ms	remaining: 1m 15s
4:	learn: 0.5448624	total: 418ms	remaining: 1m 23s
5:	learn: 0.5241701	total: 518ms	remaining: 1m 25s
6:	learn: 0.5054651	total: 605ms	remaining: 1m 25s
7:	learn: 0.4896747	total: 682ms	remaining: 1m 24s
8:	learn: 0.4738612	total: 759ms	remaining: 1m 23s
9:	learn: 0.4605696	total: 841ms	remaining: 1m 23s
10:	learn: 0.4481678	total: 912ms	remaining: 1m 22s
11:	learn: 0.4365240	total: 988ms	remaining: 1m 21s
12:	learn: 0.4257426	total: 1.06s	remaining: 1m 20s
13:	learn: 0.4161763	total: 1.13s	remaining: 1m 19s
14:	learn: 0.4074654	total: 1.2s	remaining: 1m 18s
15:	learn: 0.3995004	total: 1.27s	remaining: 1m 18s
16:	learn: 0.3923802	total: 1.34s	remaining: 1m 17s
17:	learn: 0.3863855	total: 1.42s	remaining: 1m 17s
18:	learn: 0.3807993	total: 1.5s	remaining: 1m 17s
19:	learn: 0.3754478	total

[I 2023-12-07 19:37:52,558] Trial 13 finished with value: 0.28412938779799246 and parameters: {'scale_pos_weight': 1.1053092681742818, 'depth': 5, 'min_child_samples': 15, 'subsample': 0.8655341168057921, 'colsample_bylevel': 0.6337703602552688, 'reg_lambda': 9.599245044680334, 'learning_rate': 0.04015203111073107}. Best is trial 2 with value: 0.2830682937035806.


0:	learn: 0.6834177	total: 69.7ms	remaining: 1m 9s
1:	learn: 0.6746220	total: 140ms	remaining: 1m 10s
2:	learn: 0.6656862	total: 258ms	remaining: 1m 25s
3:	learn: 0.6567852	total: 332ms	remaining: 1m 22s
4:	learn: 0.6492813	total: 400ms	remaining: 1m 19s
5:	learn: 0.6417207	total: 479ms	remaining: 1m 19s
6:	learn: 0.6345323	total: 549ms	remaining: 1m 17s
7:	learn: 0.6279609	total: 620ms	remaining: 1m 16s
8:	learn: 0.6215722	total: 693ms	remaining: 1m 16s
9:	learn: 0.6156941	total: 766ms	remaining: 1m 15s
10:	learn: 0.6101135	total: 835ms	remaining: 1m 15s
11:	learn: 0.6049031	total: 905ms	remaining: 1m 14s
12:	learn: 0.5998851	total: 973ms	remaining: 1m 13s
13:	learn: 0.5951622	total: 1.04s	remaining: 1m 13s
14:	learn: 0.5904824	total: 1.12s	remaining: 1m 13s
15:	learn: 0.5860408	total: 1.19s	remaining: 1m 13s
16:	learn: 0.5819130	total: 1.26s	remaining: 1m 12s
17:	learn: 0.5780564	total: 1.33s	remaining: 1m 12s
18:	learn: 0.5741763	total: 1.41s	remaining: 1m 12s
19:	learn: 0.5707590	t

[I 2023-12-07 19:39:14,670] Trial 14 finished with value: 0.34483423450560774 and parameters: {'scale_pos_weight': 2.9037919555069944, 'depth': 5, 'min_child_samples': 10, 'subsample': 0.8481145066472896, 'colsample_bylevel': 0.6306372349949931, 'reg_lambda': 9.969297091677698, 'learning_rate': 0.023893554449801166}. Best is trial 2 with value: 0.2830682937035806.


Não foi registrado um aumento significativa com a tunagem, por isso, o modelo
será o padrão.