# Tune

Trabalhando com um algoritmo de boosting temos diversos parametros que podem ser alterados, mudando a performance do modelo, nesta etapa utilizaremos a biblioteca `optuna` para tunagem dos hiperparametros. 

In [1]:
import pandas as pd
import lightgbm as lgbm
import optuna
import pickle

import sys
sys.path.append('../')

from src.utils.modeling import objective

  from .autonotebook import tqdm as notebook_tqdm


# Data

In [2]:
df_train = pd.read_parquet('../data/encoded/fraud_dataset_v2_train.parquet.gzip')
df_valid = pd.read_parquet('../data/encoded/fraud_dataset_v2_valid.parquet.gzip')

In [3]:
selector = pickle.load(open('../model/encoders/selector.pkl', 'rb'))

# Tune

In [4]:
sampler = optuna.samplers.TPESampler(seed=777)
study = optuna.create_study(direction='maximize', sampler=sampler)

# Don't use n_jobs to keep replicability
study.optimize(lambda trial: objective(trial,
                                       selector.transform(df_train), 
                                       df_train['fraude'],
                                       [(selector.transform(df_valid), 
                                         df_valid['fraude'])]), n_trials=15, n_jobs=1)

print("Best parms:", study.best_params)

[I 2025-02-14 01:53:35,981] A new study created in memory with name: no-name-6163a031-e161-4eb5-af1f-240455d639a3




[I 2025-02-14 01:53:38,690] Trial 0 finished with value: 0.5933796751703597 and parameters: {'n_estimators': 160, 'learning_rate': 0.027965164266499885, 'num_leaves': 18, 'max_depth': 6, 'min_child_samples': 85}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:53:44,068] Trial 1 finished with value: 0.5865513577219524 and parameters: {'n_estimators': 480, 'learning_rate': 0.11853616726044315, 'num_leaves': 118, 'max_depth': 5, 'min_child_samples': 66}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:53:48,126] Trial 2 finished with value: 0.5868132663777975 and parameters: {'n_estimators': 130, 'learning_rate': 0.013113110332485146, 'num_leaves': 93, 'max_depth': 5, 'min_child_samples': 99}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:53:50,031] Trial 3 finished with value: 0.5904036838109162 and parameters: {'n_estimators': 350, 'learning_rate': 0.10164140628090507, 'num_leaves': 87, 'max_depth': 5, 'min_child_samples': 40}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:53:52,709] Trial 4 finished with value: 0.583958470854301 and parameters: {'n_estimators': 190, 'learning_rate': 0.01885377534053743, 'num_leaves': 65, 'max_depth': 4, 'min_child_samples': 63}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:53:54,988] Trial 5 finished with value: 0.5885856918321463 and parameters: {'n_estimators': 460, 'learning_rate': 0.08303666629012729, 'num_leaves': 45, 'max_depth': 4, 'min_child_samples': 83}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:53:58,566] Trial 6 finished with value: 0.5864209382927908 and parameters: {'n_estimators': 190, 'learning_rate': 0.058004363720065344, 'num_leaves': 83, 'max_depth': 7, 'min_child_samples': 56}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:54:03,722] Trial 7 finished with value: 0.5471464117523308 and parameters: {'n_estimators': 100, 'learning_rate': 0.05945045816068611, 'num_leaves': 136, 'max_depth': 9, 'min_child_samples': 16}. Best is trial 0 with value: 0.5933796751703597.




[I 2025-02-14 01:54:05,940] Trial 8 finished with value: 0.593849787149406 and parameters: {'n_estimators': 220, 'learning_rate': 0.08017390503781506, 'num_leaves': 112, 'max_depth': 6, 'min_child_samples': 78}. Best is trial 8 with value: 0.593849787149406.




[I 2025-02-14 01:54:09,055] Trial 9 finished with value: 0.5908503624628177 and parameters: {'n_estimators': 170, 'learning_rate': 0.06668632616374084, 'num_leaves': 87, 'max_depth': 6, 'min_child_samples': 81}. Best is trial 8 with value: 0.593849787149406.




[I 2025-02-14 01:54:10,046] Trial 10 finished with value: 0.5351183447894208 and parameters: {'n_estimators': 280, 'learning_rate': 0.2995116389699143, 'num_leaves': 117, 'max_depth': 10, 'min_child_samples': 35}. Best is trial 8 with value: 0.593849787149406.




[I 2025-02-14 01:54:13,885] Trial 11 finished with value: 0.5940852055975325 and parameters: {'n_estimators': 270, 'learning_rate': 0.026774200057591912, 'num_leaves': 20, 'max_depth': 7, 'min_child_samples': 100}. Best is trial 11 with value: 0.5940852055975325.




[I 2025-02-14 01:54:16,990] Trial 12 finished with value: 0.5969760419436243 and parameters: {'n_estimators': 280, 'learning_rate': 0.031197196014482127, 'num_leaves': 14, 'max_depth': 8, 'min_child_samples': 100}. Best is trial 12 with value: 0.5969760419436243.




[I 2025-02-14 01:54:20,824] Trial 13 finished with value: 0.5955675836674486 and parameters: {'n_estimators': 330, 'learning_rate': 0.030038712401366924, 'num_leaves': 11, 'max_depth': 8, 'min_child_samples': 97}. Best is trial 12 with value: 0.5969760419436243.




[I 2025-02-14 01:54:24,556] Trial 14 finished with value: 0.5940996747818175 and parameters: {'n_estimators': 360, 'learning_rate': 0.03257597201704711, 'num_leaves': 40, 'max_depth': 8, 'min_child_samples': 95}. Best is trial 12 with value: 0.5969760419436243.


Best parms: {'n_estimators': 280, 'learning_rate': 0.031197196014482127, 'num_leaves': 14, 'max_depth': 8, 'min_child_samples': 100}


In [6]:
study.best_params

{'n_estimators': 280,
 'learning_rate': 0.031197196014482127,
 'num_leaves': 14,
 'max_depth': 8,
 'min_child_samples': 100}