# Etapa 04 – Otimização de Hiperparâmetros com Optuna

Após aplicar os modelos com balanceamento de classes e thresholds dinâmicos, partimos para a **otimização de hiperparâmetros** para extrair o máximo desempenho de cada algoritmo.

---

## 🎯 Objetivo

- Identificar os melhores hiperparâmetros para os modelos:
  - Regressão Logística
  - Random Forest
  - LightGBM

- Utilizar a métrica **F1-Score da classe positiva (desligamento)** como critério de avaliação, priorizando o equilíbrio entre precisão e recall.

---

## ⚙️ Metodologia

- Utilizamos o framework **Optuna** para busca automática dos melhores hiperparâmetros.
- Cada modelo teve uma função `objective` separada.
- A base de dados foi preparada pela função `preparar_dados_rh()`.
- Realizamos **30 iterações (trials)** por modelo para obter uma boa cobertura do espaço de busca.

---

## 🔍 Hiperparâmetros otimizados

### 🔹 Regressão Logística
- `C`: controle da regularização (loguniform entre 1e-4 e 1e2)
- `solver`: escolha entre `'liblinear'` e `'lbfgs'`
- `class_weight='balanced'` fixado

### 🔹 Random Forest
- `n_estimators`: número de árvores (entre 50 e 200)
- `max_depth`: profundidade máxima da árvore (entre 2 e 20)
- `class_weight='balanced'` fixado

### 🔹 LightGBM
- `num_leaves`: número de folhas por árvore (entre 20 e 150)
- `learning_rate`: taxa de aprendizado (loguniform entre 1e-3 e 0.2)
- `n_estimators`: número de iterações (entre 50 e 300)
- `is_unbalance=True` fixado

---

## 📈 Métrica de Avaliação
A métrica utilizada durante a otimização foi o **F1-Score da classe 1** no conjunto de teste. Essa escolha se justifica pois:

- O dataset é desbalanceado (apenas ~20% de desligamentos).
- Precisamos capturar tanto **quantos casos de desligamento conseguimos prever** (`recall`) quanto **com que precisão os prevemos** (`precision`).
- O F1-Score reflete o equilíbrio entre os dois.

---

## 📌 Resultados (exemplo)

```text
✅ Logistic Regression
Melhor F1-Score: 0.38
Melhores Hiperparâmetros: {'C': 0.47, 'solver': 'liblinear'}

✅ Random Forest
Melhor F1-Score: 0.41
Melhores Hiperparâmetros: {'n_estimators': 180, 'max_depth': 11}

✅ LightGBM
Melhor F1-Score: 0.44
Melhores Hiperparâmetros: {'num_leaves': 42, 'learning_rate': 0.07, 'n_estimators': 220}
```
---

## 🧠 Conclusão
```text
A introdução da função genérica `tunar_modelo_optuna()` tornou o processo de otimização mais limpo, modular e reutilizável. Com isso, foi possível aplicar o mesmo framework de tuning para múltiplos algoritmos com pouquíssimas linhas de código, mantendo consistência nos critérios de validação cruzada e avaliação.

O uso do Optuna permitiu uma busca eficiente e orientada por F1-Score para a classe de desligamento, garantindo modelos mais ajustados à realidade desbalanceada do problema.

A próxima etapa será consolidar os melhores modelos, avaliar com métricas complementares (AUC, curva de precisão-recall) e aplicar técnicas de interpretabilidade para guiar a tomada de decisão no RH.

```

In [1]:
from sklearn.metrics import classification_report
import sys
import os
sys.path.append(os.path.abspath("../"))
from scripts.func import preparar_dados_rh, tunar_modelo_optuna

# Preparar dados
df_encoded, X_train, X_test, y_train, y_test = preparar_dados_rh(qtd_amostras=100_000, semente=42)

# Tunar modelos
model_lgbm, study_lgbm = tunar_modelo_optuna(modelo_nome="lightgbm", X_train=X_train, y_train=y_train, n_trials=30)
model_rf, study_rf = tunar_modelo_optuna(modelo_nome="random_forest", X_train=X_train, y_train=y_train, n_trials=30)
model_lr, study_lr = tunar_modelo_optuna(modelo_nome="logistic_regression", X_train=X_train, y_train=y_train, n_trials=30)

print("\n📌 Avaliação no conjunto de teste:")
for nome, modelo in zip(["LightGBM", "Random Forest", "Logistic Regression"],
                        [model_lgbm, model_rf, model_lr]):
    print(f"\n📍 {nome}")
    y_pred = modelo.predict(X_test)
    print(classification_report(y_test, y_pred, zero_division=0))

  from .autonotebook import tqdm as notebook_tqdm
[I 2025-07-02 03:28:11,831] A new study created in memory with name: Tuning_lightgbm
Best trial: 0. Best value: 0.624425:   3%|▎         | 1/30 [01:20<38:57, 80.61s/it]

[I 2025-07-02 03:29:32,453] Trial 0 finished with value: 0.6244254275568708 and parameters: {'learning_rate': 0.008468008575248327, 'num_leaves': 286, 'max_depth': 12, 'min_child_samples': 64, 'feature_fraction': 0.6624074561769746, 'bagging_fraction': 0.662397808134481, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044}. Best is trial 0 with value: 0.6244254275568708.


Best trial: 0. Best value: 0.624425:   7%|▋         | 2/30 [01:31<18:27, 39.54s/it]

[I 2025-07-02 03:29:43,239] Trial 1 finished with value: 0.6190044641435256 and parameters: {'learning_rate': 0.05675206026988745, 'num_leaves': 15, 'max_depth': 15, 'min_child_samples': 85, 'feature_fraction': 0.6849356442713105, 'bagging_fraction': 0.6727299868828402, 'bagging_freq': 2, 'lambda_l1': 1.5212112147976886, 'lambda_l2': 2.6237821581611893}. Best is trial 0 with value: 0.6244254275568708.


Best trial: 0. Best value: 0.624425:  10%|█         | 3/30 [02:21<19:57, 44.35s/it]

[I 2025-07-02 03:30:33,320] Trial 2 finished with value: 0.6239762352475035 and parameters: {'learning_rate': 0.01174843954800703, 'num_leaves': 94, 'max_depth': 10, 'min_child_samples': 22, 'feature_fraction': 0.7168578594140872, 'bagging_fraction': 0.7465447373174766, 'bagging_freq': 5, 'lambda_l1': 3.925879806965068, 'lambda_l2': 0.9983689107917987}. Best is trial 0 with value: 0.6244254275568708.


Best trial: 3. Best value: 0.637271:  13%|█▎        | 4/30 [02:27<12:35, 29.05s/it]

[I 2025-07-02 03:30:38,923] Trial 3 finished with value: 0.6372711164756679 and parameters: {'learning_rate': 0.018785426399210624, 'num_leaves': 182, 'max_depth': 3, 'min_child_samples': 65, 'feature_fraction': 0.6682096494749166, 'bagging_fraction': 0.6260206371941118, 'bagging_freq': 10, 'lambda_l1': 4.828160165372797, 'lambda_l2': 4.041986740582305}. Best is trial 3 with value: 0.6372711164756679.


Best trial: 4. Best value: 0.637419:  17%|█▋        | 5/30 [02:48<11:01, 26.46s/it]

[I 2025-07-02 03:31:00,793] Trial 4 finished with value: 0.6374185563387708 and parameters: {'learning_rate': 0.0056828375585122656, 'num_leaves': 38, 'max_depth': 11, 'min_child_samples': 50, 'feature_fraction': 0.6488152939379115, 'bagging_fraction': 0.798070764044508, 'bagging_freq': 1, 'lambda_l1': 4.546602010393911, 'lambda_l2': 1.2938999080000846}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  20%|██        | 6/30 [03:32<12:54, 32.26s/it]

[I 2025-07-02 03:31:44,301] Trial 5 finished with value: 0.597352063022227 and parameters: {'learning_rate': 0.043767126303409544, 'num_leaves': 100, 'max_depth': 9, 'min_child_samples': 59, 'feature_fraction': 0.6739417822102108, 'bagging_fraction': 0.9878338511058234, 'bagging_freq': 8, 'lambda_l1': 4.697494707820946, 'lambda_l2': 4.474136752138244}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  23%|██▎       | 7/30 [03:41<09:30, 24.80s/it]

[I 2025-07-02 03:31:53,751] Trial 6 finished with value: 0.629896068624351 and parameters: {'learning_rate': 0.03027406546455652, 'num_leaves': 278, 'max_depth': 4, 'min_child_samples': 27, 'feature_fraction': 0.6180909155642152, 'bagging_fraction': 0.7301321323053057, 'bagging_freq': 4, 'lambda_l1': 1.3567451588694794, 'lambda_l2': 4.143687545759647}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  27%|██▋       | 8/30 [04:26<11:22, 31.04s/it]

[I 2025-07-02 03:32:38,141] Trial 7 finished with value: 0.6270182614008155 and parameters: {'learning_rate': 0.0076510536667541975, 'num_leaves': 91, 'max_depth': 10, 'min_child_samples': 22, 'feature_fraction': 0.9208787923016158, 'bagging_fraction': 0.6298202574719083, 'bagging_freq': 10, 'lambda_l1': 3.861223846483287, 'lambda_l2': 0.993578407670862}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  30%|███       | 9/30 [05:29<14:21, 41.05s/it]

[I 2025-07-02 03:33:41,196] Trial 8 finished with value: 0.6350294641093938 and parameters: {'learning_rate': 0.0010319982330247674, 'num_leaves': 247, 'max_depth': 12, 'min_child_samples': 76, 'feature_fraction': 0.9085081386743783, 'bagging_fraction': 0.6296178606936361, 'bagging_freq': 4, 'lambda_l1': 0.5793452976256486, 'lambda_l2': 4.315517129377968}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  33%|███▎      | 10/30 [05:35<10:05, 30.26s/it]

[I 2025-07-02 03:33:47,314] Trial 9 finished with value: 0.6345610962870348 and parameters: {'learning_rate': 0.03499331111708852, 'num_leaves': 106, 'max_depth': 3, 'min_child_samples': 38, 'feature_fraction': 0.7300733288106988, 'bagging_fraction': 0.8918424713352257, 'bagging_freq': 7, 'lambda_l1': 4.436063712881633, 'lambda_l2': 2.3610746258097466}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  37%|███▋      | 11/30 [05:44<07:28, 23.62s/it]

[I 2025-07-02 03:33:55,861] Trial 10 finished with value: 0.591895151284174 and parameters: {'learning_rate': 0.2193905022254065, 'num_leaves': 11, 'max_depth': 7, 'min_child_samples': 44, 'feature_fraction': 0.8144754354791038, 'bagging_fraction': 0.8429521169792499, 'bagging_freq': 2, 'lambda_l1': 3.067811097520366, 'lambda_l2': 0.23696217751263182}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 11. Best value: 0.638355:  40%|████      | 12/30 [06:02<06:38, 22.12s/it]

[I 2025-07-02 03:34:14,546] Trial 11 finished with value: 0.6383553507723262 and parameters: {'learning_rate': 0.0023770372217125028, 'num_leaves': 198, 'max_depth': 6, 'min_child_samples': 70, 'feature_fraction': 0.7878496840385453, 'bagging_fraction': 0.7927585318653319, 'bagging_freq': 10, 'lambda_l1': 3.035921370166128, 'lambda_l2': 1.7067080470293317}. Best is trial 11 with value: 0.6383553507723262.


Best trial: 12. Best value: 0.638478:  43%|████▎     | 13/30 [06:21<05:56, 20.98s/it]

[I 2025-07-02 03:34:32,903] Trial 12 finished with value: 0.638478483822221 and parameters: {'learning_rate': 0.002254590549186906, 'num_leaves': 197, 'max_depth': 6, 'min_child_samples': 99, 'feature_fraction': 0.8244367020014848, 'bagging_fraction': 0.8141268371993765, 'bagging_freq': 7, 'lambda_l1': 2.864265403136679, 'lambda_l2': 1.6921629101016422}. Best is trial 12 with value: 0.638478483822221.


Best trial: 12. Best value: 0.638478:  47%|████▋     | 14/30 [06:39<05:22, 20.16s/it]

[I 2025-07-02 03:34:51,183] Trial 13 finished with value: 0.6384519115464026 and parameters: {'learning_rate': 0.0017345515600983236, 'num_leaves': 199, 'max_depth': 6, 'min_child_samples': 100, 'feature_fraction': 0.825610090928053, 'bagging_fraction': 0.9126738533684695, 'bagging_freq': 8, 'lambda_l1': 2.8008803524924515, 'lambda_l2': 1.8622066402666226}. Best is trial 12 with value: 0.638478483822221.


Best trial: 12. Best value: 0.638478:  50%|█████     | 15/30 [07:03<05:19, 21.31s/it]

[I 2025-07-02 03:35:15,141] Trial 14 finished with value: 0.6380747155452836 and parameters: {'learning_rate': 0.0010778297218068933, 'num_leaves': 219, 'max_depth': 7, 'min_child_samples': 100, 'feature_fraction': 0.852433737415586, 'bagging_fraction': 0.9111341683349905, 'bagging_freq': 7, 'lambda_l1': 2.2458480892846575, 'lambda_l2': 2.0789298723577474}. Best is trial 12 with value: 0.638478483822221.


Best trial: 15. Best value: 0.639343:  53%|█████▎    | 16/30 [07:17<04:26, 19.07s/it]

[I 2025-07-02 03:35:29,000] Trial 15 finished with value: 0.6393428909839372 and parameters: {'learning_rate': 0.0025772946144344043, 'num_leaves': 142, 'max_depth': 5, 'min_child_samples': 96, 'feature_fraction': 0.8670586836589573, 'bagging_fraction': 0.9705384491631063, 'bagging_freq': 8, 'lambda_l1': 2.32819966143344, 'lambda_l2': 3.2579587851645835}. Best is trial 15 with value: 0.6393428909839372.


Best trial: 16. Best value: 0.639373:  57%|█████▋    | 17/30 [07:32<03:52, 17.85s/it]

[I 2025-07-02 03:35:44,023] Trial 16 finished with value: 0.6393732826732426 and parameters: {'learning_rate': 0.0035674081664957107, 'num_leaves': 147, 'max_depth': 5, 'min_child_samples': 86, 'feature_fraction': 0.87498687476921, 'bagging_fraction': 0.9787219705654253, 'bagging_freq': 6, 'lambda_l1': 1.9910527639025013, 'lambda_l2': 3.3425770772126717}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  60%|██████    | 18/30 [07:48<03:26, 17.25s/it]

[I 2025-07-02 03:35:59,868] Trial 17 finished with value: 0.6384040750148164 and parameters: {'learning_rate': 0.004370267382550643, 'num_leaves': 137, 'max_depth': 5, 'min_child_samples': 84, 'feature_fraction': 0.9562561635055674, 'bagging_fraction': 0.9995781511268159, 'bagging_freq': 6, 'lambda_l1': 1.8089740851242242, 'lambda_l2': 3.3561848367527602}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  63%|██████▎   | 19/30 [08:29<04:29, 24.47s/it]

[I 2025-07-02 03:36:41,167] Trial 18 finished with value: 0.6332412974717977 and parameters: {'learning_rate': 0.0034908446951837343, 'num_leaves': 141, 'max_depth': 8, 'min_child_samples': 88, 'feature_fraction': 0.8768205286194898, 'bagging_fraction': 0.9566218932844948, 'bagging_freq': 9, 'lambda_l1': 0.7819000372264544, 'lambda_l2': 3.5268490753633954}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  67%|██████▋   | 20/30 [08:38<03:19, 19.99s/it]

[I 2025-07-02 03:36:50,710] Trial 19 finished with value: 0.6124740889333004 and parameters: {'learning_rate': 0.09451459461562424, 'num_leaves': 161, 'max_depth': 4, 'min_child_samples': 76, 'feature_fraction': 0.9845798676688794, 'bagging_fraction': 0.9414689916318519, 'bagging_freq': 5, 'lambda_l1': 2.268302678751437, 'lambda_l2': 4.819497065799107}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  70%|███████   | 21/30 [09:10<03:31, 23.45s/it]

[I 2025-07-02 03:37:22,228] Trial 20 finished with value: 0.6267083604941197 and parameters: {'learning_rate': 0.01440673090433955, 'num_leaves': 55, 'max_depth': 15, 'min_child_samples': 90, 'feature_fraction': 0.766983112121334, 'bagging_fraction': 0.8592568806861586, 'bagging_freq': 6, 'lambda_l1': 0.05810480813776442, 'lambda_l2': 3.5835953820045923}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  73%|███████▎  | 22/30 [09:23<02:43, 20.48s/it]

[I 2025-07-02 03:37:35,778] Trial 21 finished with value: 0.6393269782001808 and parameters: {'learning_rate': 0.0022887593349744076, 'num_leaves': 235, 'max_depth': 5, 'min_child_samples': 95, 'feature_fraction': 0.864366223935975, 'bagging_fraction': 0.9578721599964157, 'bagging_freq': 8, 'lambda_l1': 3.525885802381468, 'lambda_l2': 2.848313073435282}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 22. Best value: 0.639406:  77%|███████▋  | 23/30 [09:38<02:09, 18.56s/it]

[I 2025-07-02 03:37:49,876] Trial 22 finished with value: 0.6394061143120777 and parameters: {'learning_rate': 0.003255892721730623, 'num_leaves': 157, 'max_depth': 5, 'min_child_samples': 92, 'feature_fraction': 0.8648969096225119, 'bagging_fraction': 0.961687083582021, 'bagging_freq': 8, 'lambda_l1': 3.572146420534935, 'lambda_l2': 3.091860286683792}. Best is trial 22 with value: 0.6394061143120777.


Best trial: 23. Best value: 0.6404:  80%|████████  | 24/30 [09:47<01:34, 15.80s/it]  

[I 2025-07-02 03:37:59,221] Trial 23 finished with value: 0.6403998207666525 and parameters: {'learning_rate': 0.0041826257775779, 'num_leaves': 126, 'max_depth': 4, 'min_child_samples': 79, 'feature_fraction': 0.9074420621691284, 'bagging_fraction': 0.9796284662391809, 'bagging_freq': 9, 'lambda_l1': 1.9825679994186263, 'lambda_l2': 3.170064962868606}. Best is trial 23 with value: 0.6403998207666525.


Best trial: 23. Best value: 0.6404:  83%|████████▎ | 25/30 [09:56<01:09, 13.84s/it]

[I 2025-07-02 03:38:08,485] Trial 24 finished with value: 0.6403026381930884 and parameters: {'learning_rate': 0.005077313720642067, 'num_leaves': 123, 'max_depth': 4, 'min_child_samples': 76, 'feature_fraction': 0.9193848182959573, 'bagging_fraction': 0.932895657334162, 'bagging_freq': 9, 'lambda_l1': 1.842231609610077, 'lambda_l2': 3.81574611631762}. Best is trial 23 with value: 0.6403998207666525.


Best trial: 25. Best value: 0.641008:  87%|████████▋ | 26/30 [10:02<00:46, 11.53s/it]

[I 2025-07-02 03:38:14,618] Trial 25 finished with value: 0.6410081191509109 and parameters: {'learning_rate': 0.00599475639768284, 'num_leaves': 118, 'max_depth': 3, 'min_child_samples': 77, 'feature_fraction': 0.924543232663553, 'bagging_fraction': 0.9316754076154603, 'bagging_freq': 9, 'lambda_l1': 1.0695487310764085, 'lambda_l2': 3.925802362228854}. Best is trial 25 with value: 0.6410081191509109.


Best trial: 25. Best value: 0.641008:  90%|█████████ | 27/30 [10:08<00:29,  9.90s/it]

[I 2025-07-02 03:38:20,726] Trial 26 finished with value: 0.640967765367825 and parameters: {'learning_rate': 0.006320296204082007, 'num_leaves': 69, 'max_depth': 3, 'min_child_samples': 77, 'feature_fraction': 0.9352954828810333, 'bagging_fraction': 0.879753881292549, 'bagging_freq': 9, 'lambda_l1': 1.115121165910072, 'lambda_l2': 3.7385887639189836}. Best is trial 25 with value: 0.6410081191509109.


Best trial: 25. Best value: 0.641008:  93%|█████████▎| 28/30 [10:15<00:17,  8.78s/it]

[I 2025-07-02 03:38:26,879] Trial 27 finished with value: 0.6402163500091286 and parameters: {'learning_rate': 0.009432034955026808, 'num_leaves': 67, 'max_depth': 3, 'min_child_samples': 55, 'feature_fraction': 0.983676311508369, 'bagging_fraction': 0.8756315596686006, 'bagging_freq': 9, 'lambda_l1': 1.0858473930913755, 'lambda_l2': 3.763034599512568}. Best is trial 25 with value: 0.6410081191509109.


Best trial: 28. Best value: 0.641066:  97%|█████████▋| 29/30 [10:21<00:08,  8.03s/it]

[I 2025-07-02 03:38:33,156] Trial 28 finished with value: 0.6410656170156999 and parameters: {'learning_rate': 0.0057895274890742195, 'num_leaves': 73, 'max_depth': 3, 'min_child_samples': 70, 'feature_fraction': 0.9551883483386383, 'bagging_fraction': 0.9128563041504967, 'bagging_freq': 9, 'lambda_l1': 0.39829367118956727, 'lambda_l2': 4.781042957486154}. Best is trial 28 with value: 0.6410656170156999.


Best trial: 28. Best value: 0.641066: 100%|██████████| 30/30 [10:27<00:00, 20.91s/it]


[I 2025-07-02 03:38:39,237] Trial 29 finished with value: 0.636315280520333 and parameters: {'learning_rate': 0.022477924168153415, 'num_leaves': 77, 'max_depth': 3, 'min_child_samples': 67, 'feature_fraction': 0.9443130856149027, 'bagging_fraction': 0.8485281092713779, 'bagging_freq': 10, 'lambda_l1': 0.05174219484548148, 'lambda_l2': 4.9939983310912455}. Best is trial 28 with value: 0.6410656170156999.

✅ Melhor AUC: 0.6411
🏆 Melhor conjunto de hiperparâmetros:
   - learning_rate: 0.0057895274890742195
   - num_leaves: 73
   - max_depth: 3
   - min_child_samples: 70
   - feature_fraction: 0.9551883483386383
   - bagging_fraction: 0.9128563041504967
   - bagging_freq: 9
   - lambda_l1: 0.39829367118956727
   - lambda_l2: 4.781042957486154


[I 2025-07-02 03:38:39,545] A new study created in memory with name: Tuning_random_forest
Best trial: 0. Best value: 0.624194:   3%|▎         | 1/30 [00:07<03:42,  7.68s/it]

[I 2025-07-02 03:38:47,223] Trial 0 finished with value: 0.6241940649488521 and parameters: {'n_estimators': 250, 'max_depth': 20, 'min_samples_split': 15, 'min_samples_leaf': 6, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.6241940649488521.


Best trial: 0. Best value: 0.624194:   7%|▋         | 2/30 [00:18<04:30,  9.65s/it]

[I 2025-07-02 03:38:58,254] Trial 1 finished with value: 0.619982764121531 and parameters: {'n_estimators': 447, 'max_depth': 13, 'min_samples_split': 15, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.6241940649488521.


Best trial: 2. Best value: 0.634725:  10%|█         | 3/30 [00:29<04:33, 10.13s/it]

[I 2025-07-02 03:39:08,946] Trial 2 finished with value: 0.6347251969674369 and parameters: {'n_estimators': 172, 'max_depth': 6, 'min_samples_split': 7, 'min_samples_leaf': 6, 'max_features': None}. Best is trial 2 with value: 0.6347251969674369.


Best trial: 2. Best value: 0.634725:  13%|█▎        | 4/30 [00:32<03:08,  7.23s/it]

[I 2025-07-02 03:39:11,740] Trial 3 finished with value: 0.6321757544006475 and parameters: {'n_estimators': 155, 'max_depth': 8, 'min_samples_split': 8, 'min_samples_leaf': 5, 'max_features': 'sqrt'}. Best is trial 2 with value: 0.6347251969674369.


Best trial: 2. Best value: 0.634725:  17%|█▋        | 5/30 [00:42<03:32,  8.51s/it]

[I 2025-07-02 03:39:22,515] Trial 4 finished with value: 0.6176606255373246 and parameters: {'n_estimators': 337, 'max_depth': 3, 'min_samples_split': 13, 'min_samples_leaf': 2, 'max_features': None}. Best is trial 2 with value: 0.6347251969674369.


Best trial: 2. Best value: 0.634725:  20%|██        | 6/30 [01:17<06:59, 17.49s/it]

[I 2025-07-02 03:39:57,446] Trial 5 finished with value: 0.6309694400201282 and parameters: {'n_estimators': 424, 'max_depth': 8, 'min_samples_split': 3, 'min_samples_leaf': 7, 'max_features': None}. Best is trial 2 with value: 0.6347251969674369.


Best trial: 2. Best value: 0.634725:  23%|██▎       | 7/30 [01:35<06:45, 17.64s/it]

[I 2025-07-02 03:40:15,398] Trial 6 finished with value: 0.6112363524350912 and parameters: {'n_estimators': 113, 'max_depth': 19, 'min_samples_split': 6, 'min_samples_leaf': 7, 'max_features': None}. Best is trial 2 with value: 0.6347251969674369.


Best trial: 2. Best value: 0.634725:  27%|██▋       | 8/30 [02:02<07:32, 20.59s/it]

[I 2025-07-02 03:40:42,295] Trial 7 finished with value: 0.6175279236604316 and parameters: {'n_estimators': 174, 'max_depth': 20, 'min_samples_split': 16, 'min_samples_leaf': 10, 'max_features': None}. Best is trial 2 with value: 0.6347251969674369.


Best trial: 8. Best value: 0.634819:  30%|███       | 9/30 [02:11<05:52, 16.77s/it]

[I 2025-07-02 03:40:50,669] Trial 8 finished with value: 0.634818532234518 and parameters: {'n_estimators': 135, 'max_depth': 6, 'min_samples_split': 2, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 8 with value: 0.634818532234518.


Best trial: 8. Best value: 0.634819:  33%|███▎      | 10/30 [02:30<05:53, 17.66s/it]

[I 2025-07-02 03:41:10,334] Trial 9 finished with value: 0.6306056660929528 and parameters: {'n_estimators': 243, 'max_depth': 8, 'min_samples_split': 12, 'min_samples_leaf': 2, 'max_features': None}. Best is trial 8 with value: 0.634818532234518.


Best trial: 8. Best value: 0.634819:  37%|███▋      | 11/30 [02:38<04:36, 14.54s/it]

[I 2025-07-02 03:41:17,794] Trial 10 finished with value: 0.6235011170941477 and parameters: {'n_estimators': 305, 'max_depth': 13, 'min_samples_split': 20, 'min_samples_leaf': 4, 'max_features': 'log2'}. Best is trial 8 with value: 0.634818532234518.


Best trial: 8. Best value: 0.634819:  40%|████      | 12/30 [02:40<03:12, 10.70s/it]

[I 2025-07-02 03:41:19,700] Trial 11 finished with value: 0.6297922569584208 and parameters: {'n_estimators': 197, 'max_depth': 3, 'min_samples_split': 2, 'min_samples_leaf': 4, 'max_features': 'log2'}. Best is trial 8 with value: 0.634818532234518.


Best trial: 12. Best value: 0.634823:  43%|████▎     | 13/30 [02:46<02:40,  9.42s/it]

[I 2025-07-02 03:41:26,188] Trial 12 finished with value: 0.6348231493193552 and parameters: {'n_estimators': 104, 'max_depth': 6, 'min_samples_split': 7, 'min_samples_leaf': 9, 'max_features': None}. Best is trial 12 with value: 0.6348231493193552.


Best trial: 12. Best value: 0.634823:  47%|████▋     | 14/30 [02:58<02:40, 10.03s/it]

[I 2025-07-02 03:41:37,634] Trial 13 finished with value: 0.6215217231332483 and parameters: {'n_estimators': 104, 'max_depth': 11, 'min_samples_split': 5, 'min_samples_leaf': 10, 'max_features': None}. Best is trial 12 with value: 0.6348231493193552.


Best trial: 12. Best value: 0.634823:  50%|█████     | 15/30 [03:16<03:08, 12.54s/it]

[I 2025-07-02 03:41:55,979] Trial 14 finished with value: 0.6314409716397232 and parameters: {'n_estimators': 361, 'max_depth': 5, 'min_samples_split': 9, 'min_samples_leaf': 8, 'max_features': None}. Best is trial 12 with value: 0.6348231493193552.


Best trial: 12. Best value: 0.634823:  53%|█████▎    | 16/30 [03:21<02:22, 10.21s/it]

[I 2025-07-02 03:42:00,787] Trial 15 finished with value: 0.6287394145215309 and parameters: {'n_estimators': 222, 'max_depth': 11, 'min_samples_split': 4, 'min_samples_leaf': 9, 'max_features': 'log2'}. Best is trial 12 with value: 0.6348231493193552.


Best trial: 12. Best value: 0.634823:  57%|█████▋    | 17/30 [04:33<06:15, 28.89s/it]

[I 2025-07-02 03:43:13,110] Trial 16 finished with value: 0.6046286015702473 and parameters: {'n_estimators': 500, 'max_depth': 16, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 12 with value: 0.6348231493193552.


Best trial: 17. Best value: 0.634895:  60%|██████    | 18/30 [04:41<04:31, 22.67s/it]

[I 2025-07-02 03:43:21,292] Trial 17 finished with value: 0.6348947062496294 and parameters: {'n_estimators': 133, 'max_depth': 6, 'min_samples_split': 2, 'min_samples_leaf': 3, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  63%|██████▎   | 19/30 [04:46<03:10, 17.32s/it]

[I 2025-07-02 03:43:26,154] Trial 18 finished with value: 0.6290375557450055 and parameters: {'n_estimators': 265, 'max_depth': 9, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  67%|██████▋   | 20/30 [04:48<02:06, 12.66s/it]

[I 2025-07-02 03:43:27,959] Trial 19 finished with value: 0.6327257477308255 and parameters: {'n_estimators': 137, 'max_depth': 5, 'min_samples_split': 7, 'min_samples_leaf': 8, 'max_features': 'log2'}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  70%|███████   | 21/30 [05:09<02:15, 15.10s/it]

[I 2025-07-02 03:43:48,742] Trial 20 finished with value: 0.6206317539836002 and parameters: {'n_estimators': 207, 'max_depth': 10, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  73%|███████▎  | 22/30 [05:15<01:40, 12.50s/it]

[I 2025-07-02 03:43:55,195] Trial 21 finished with value: 0.6345860506778441 and parameters: {'n_estimators': 103, 'max_depth': 6, 'min_samples_split': 2, 'min_samples_leaf': 5, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  77%|███████▋  | 23/30 [05:24<01:19, 11.40s/it]

[I 2025-07-02 03:44:04,010] Trial 22 finished with value: 0.6347432168492222 and parameters: {'n_estimators': 143, 'max_depth': 6, 'min_samples_split': 2, 'min_samples_leaf': 3, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  80%|████████  | 24/30 [05:30<00:58,  9.76s/it]

[I 2025-07-02 03:44:09,945] Trial 23 finished with value: 0.6277490807164938 and parameters: {'n_estimators': 139, 'max_depth': 4, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  83%|████████▎ | 25/30 [05:44<00:55, 11.00s/it]

[I 2025-07-02 03:44:23,850] Trial 24 finished with value: 0.6337397239176389 and parameters: {'n_estimators': 194, 'max_depth': 7, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  87%|████████▋ | 26/30 [06:01<00:50, 12.74s/it]

[I 2025-07-02 03:44:40,646] Trial 25 finished with value: 0.6082466871054578 and parameters: {'n_estimators': 131, 'max_depth': 13, 'min_samples_split': 9, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  90%|█████████ | 27/30 [06:08<00:33, 11.14s/it]

[I 2025-07-02 03:44:48,051] Trial 26 finished with value: 0.6276495646705637 and parameters: {'n_estimators': 174, 'max_depth': 4, 'min_samples_split': 3, 'min_samples_leaf': 5, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  93%|█████████▎| 28/30 [06:34<00:31, 15.71s/it]

[I 2025-07-02 03:45:14,437] Trial 27 finished with value: 0.6276084134145311 and parameters: {'n_estimators': 293, 'max_depth': 9, 'min_samples_split': 6, 'min_samples_leaf': 7, 'max_features': None}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895:  97%|█████████▋| 29/30 [06:41<00:12, 12.86s/it]

[I 2025-07-02 03:45:20,637] Trial 28 finished with value: 0.6148276909920766 and parameters: {'n_estimators': 222, 'max_depth': 15, 'min_samples_split': 11, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 17 with value: 0.6348947062496294.


Best trial: 17. Best value: 0.634895: 100%|██████████| 30/30 [06:44<00:00, 13.48s/it]


[I 2025-07-02 03:45:23,897] Trial 29 finished with value: 0.6327360799057239 and parameters: {'n_estimators': 259, 'max_depth': 5, 'min_samples_split': 3, 'min_samples_leaf': 6, 'max_features': 'log2'}. Best is trial 17 with value: 0.6348947062496294.

✅ Melhor AUC: 0.6349
🏆 Melhor conjunto de hiperparâmetros:
   - n_estimators: 133
   - max_depth: 6
   - min_samples_split: 2
   - min_samples_leaf: 3
   - max_features: None


[I 2025-07-02 03:45:44,716] A new study created in memory with name: Tuning_logistic_regression
Best trial: 0. Best value: 0.613031:   3%|▎         | 1/30 [00:08<04:07,  8.53s/it]

[I 2025-07-02 03:45:53,247] Trial 0 finished with value: 0.6130312218287319 and parameters: {'C': 0.03148911647956861, 'penalty': 'l1'}. Best is trial 0 with value: 0.6130312218287319.


Best trial: 1. Best value: 0.613532:   7%|▋         | 2/30 [00:52<13:48, 29.57s/it]

[I 2025-07-02 03:46:37,552] Trial 1 finished with value: 0.6135315293042184 and parameters: {'C': 0.24810409748678114, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  10%|█         | 3/30 [00:53<07:20, 16.30s/it]

[I 2025-07-02 03:46:38,062] Trial 2 finished with value: 0.6046400503016972 and parameters: {'C': 0.0017073967431528124, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  13%|█▎        | 4/30 [00:54<04:27, 10.27s/it]

[I 2025-07-02 03:46:39,083] Trial 3 finished with value: 0.6134738367778446 and parameters: {'C': 0.6796578090758157, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  17%|█▋        | 5/30 [05:26<43:38, 104.75s/it]

[I 2025-07-02 03:51:11,345] Trial 4 finished with value: 0.6134723036733328 and parameters: {'C': 2.136832907235876, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  20%|██        | 6/30 [05:27<27:44, 69.36s/it] 

[I 2025-07-02 03:51:12,020] Trial 5 finished with value: 0.6127966820143388 and parameters: {'C': 0.00541524411940254, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  23%|██▎       | 7/30 [05:28<18:00, 46.96s/it]

[I 2025-07-02 03:51:12,871] Trial 6 finished with value: 0.6135210206801065 and parameters: {'C': 0.05342937261279776, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  27%|██▋       | 8/30 [05:28<11:48, 32.21s/it]

[I 2025-07-02 03:51:13,492] Trial 7 finished with value: 0.612549734216561 and parameters: {'C': 0.003613894271216527, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  30%|███       | 9/30 [05:51<10:17, 29.39s/it]

[I 2025-07-02 03:51:36,675] Trial 8 finished with value: 0.6132383098356043 and parameters: {'C': 0.06672367170464209, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  33%|███▎      | 10/30 [06:27<10:25, 31.27s/it]

[I 2025-07-02 03:52:12,145] Trial 9 finished with value: 0.6134690374632968 and parameters: {'C': 0.11400863701127326, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  37%|███▋      | 11/30 [13:53<50:07, 158.27s/it]

[I 2025-07-02 03:59:38,366] Trial 10 finished with value: 0.6134601578980878 and parameters: {'C': 6.521702977644381, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  40%|████      | 12/30 [13:54<33:07, 110.42s/it]

[I 2025-07-02 03:59:39,364] Trial 11 finished with value: 0.613484423633579 and parameters: {'C': 0.39067299376807896, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  43%|████▎     | 13/30 [13:55<21:52, 77.21s/it] 

[I 2025-07-02 03:59:40,164] Trial 12 finished with value: 0.6133117923587004 and parameters: {'C': 0.016809079170583906, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  47%|████▋     | 14/30 [13:56<14:27, 54.19s/it]

[I 2025-07-02 03:59:41,170] Trial 13 finished with value: 0.6134995207735617 and parameters: {'C': 0.23569313040910927, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  50%|█████     | 15/30 [16:46<22:14, 88.98s/it]

[I 2025-07-02 04:02:30,748] Trial 14 finished with value: 0.6134807659695721 and parameters: {'C': 1.3630710204863543, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  53%|█████▎    | 16/30 [16:46<14:34, 62.44s/it]

[I 2025-07-02 04:02:31,559] Trial 15 finished with value: 0.6132277577389537 and parameters: {'C': 0.013275283883341525, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  57%|█████▋    | 17/30 [16:47<09:31, 43.97s/it]

[I 2025-07-02 04:02:32,571] Trial 16 finished with value: 0.6135255756343047 and parameters: {'C': 0.10204852842064464, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  60%|██████    | 18/30 [17:32<08:48, 44.08s/it]

[I 2025-07-02 04:03:16,906] Trial 17 finished with value: 0.6135216791233189 and parameters: {'C': 0.15966110845966244, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  63%|██████▎   | 19/30 [24:23<28:16, 154.23s/it]

[I 2025-07-02 04:10:07,749] Trial 18 finished with value: 0.6134611150836736 and parameters: {'C': 6.121322071009796, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  67%|██████▋   | 20/30 [24:23<18:01, 108.19s/it]

[I 2025-07-02 04:10:08,636] Trial 19 finished with value: 0.6134717839216755 and parameters: {'C': 0.7543261736789976, 'penalty': 'l2'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  70%|███████   | 21/30 [25:09<13:25, 89.47s/it] 

[I 2025-07-02 04:10:54,473] Trial 20 finished with value: 0.6135278667915517 and parameters: {'C': 0.27264416060348245, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  73%|███████▎  | 22/30 [25:56<10:12, 76.52s/it]

[I 2025-07-02 04:11:40,767] Trial 21 finished with value: 0.6135268274435431 and parameters: {'C': 0.2730834958148281, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  77%|███████▋  | 23/30 [26:45<07:59, 68.45s/it]

[I 2025-07-02 04:12:30,418] Trial 22 finished with value: 0.6135232529708627 and parameters: {'C': 0.3240199726509766, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  80%|████████  | 24/30 [29:17<09:19, 93.33s/it]

[I 2025-07-02 04:15:01,775] Trial 23 finished with value: 0.6134824600163601 and parameters: {'C': 1.2238222226974749, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  83%|████████▎ | 25/30 [34:27<13:12, 158.46s/it]

[I 2025-07-02 04:20:12,173] Trial 24 finished with value: 0.6134713760751338 and parameters: {'C': 2.4433433117837677, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 1. Best value: 0.613532:  87%|████████▋ | 26/30 [35:40<08:51, 132.96s/it]

[I 2025-07-02 04:21:25,643] Trial 25 finished with value: 0.6135056986963294 and parameters: {'C': 0.5583250236889903, 'penalty': 'l1'}. Best is trial 1 with value: 0.6135315293042184.


Best trial: 26. Best value: 0.613533:  90%|█████████ | 27/30 [36:23<05:17, 105.85s/it]

[I 2025-07-02 04:22:08,242] Trial 26 finished with value: 0.6135328250736571 and parameters: {'C': 0.20741339470550815, 'penalty': 'l1'}. Best is trial 26 with value: 0.6135328250736571.


Best trial: 26. Best value: 0.613533:  93%|█████████▎| 28/30 [36:28<02:31, 75.54s/it] 

[I 2025-07-02 04:22:13,067] Trial 27 finished with value: 0.6130962357610279 and parameters: {'C': 0.04016279039622344, 'penalty': 'l1'}. Best is trial 26 with value: 0.6135328250736571.


Best trial: 26. Best value: 0.613533:  97%|█████████▋| 29/30 [37:05<01:04, 64.13s/it]

[I 2025-07-02 04:22:50,588] Trial 28 finished with value: 0.6135191484085669 and parameters: {'C': 0.15927583972309522, 'penalty': 'l1'}. Best is trial 26 with value: 0.6135328250736571.


Best trial: 26. Best value: 0.613533: 100%|██████████| 30/30 [37:12<00:00, 74.42s/it]


[I 2025-07-02 04:22:57,234] Trial 29 finished with value: 0.6130713781954892 and parameters: {'C': 0.034089686069196455, 'penalty': 'l1'}. Best is trial 26 with value: 0.6135328250736571.

✅ Melhor AUC: 0.6135
🏆 Melhor conjunto de hiperparâmetros:
   - C: 0.20741339470550815
   - penalty: l1


ValueError: Solver lbfgs supports only 'l2' or None penalties, got l1 penalty.


# Etapa 04.5 – Treinamento Final e Salvamento do Melhor Modelo

Antes de partirmos para a interpretabilidade na Etapa 05, é necessário garantir que o modelo treinado com os melhores hiperparâmetros esteja disponível em disco, assim como o preprocessador utilizado. Isso permite que a aplicação posterior (via API ou script) utilize os mesmos padrões de transformação e predição.

---

## 🎯 Objetivo

- Treinar o modelo LightGBM com os **melhores hiperparâmetros** identificados na otimização com Optuna.
- Salvar o modelo final em um arquivo `.pkl` para uso posterior.
- Salvar o **preprocessador de dados** utilizado, garantindo que novos dados sejam transformados da mesma forma.

---

## 🧪 Pipeline Executado

### 1. Preparação dos Dados

```python
from scripts.func import preparar_dados_rh

df_encoded, X_train, X_test, y_train, y_test = preparar_dados_rh(qtd_amostras=100_000, semente=42)
```

---

### 2. Otimização e Treinamento com os Melhores Parâmetros

```python
from scripts.func import tunar_modelo_optuna

modelo_lgbm, estudo_lgbm = tunar_modelo_optuna(
    modelo_nome='lightgbm',
    X_train=X_train,
    y_train=y_train,
    n_trials=30,
    semente=42
)
```

---

### 3. Criação do Preprocessador

```python
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

colunas_numericas = X_train.select_dtypes(include=['int64', 'float64']).columns.tolist()
colunas_categoricas = X_train.select_dtypes(include=['object', 'category']).columns.tolist()

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), colunas_numericas),
    ('cat', OneHotEncoder(drop='first', sparse_output=False), colunas_categoricas)
])

preprocessor.fit(X_train)
```

---

### 4. Salvamento dos Arquivos

```python
import joblib
import os

os.makedirs('models', exist_ok=True)

joblib.dump(modelo_lgbm, 'models/lightgbm_model.pkl')
joblib.dump(preprocessor, 'models/preprocessor.pkl')
```

---

## 💾 Resultado

- `models/lightgbm_model.pkl`: Modelo final com os hiperparâmetros otimizados.
- `models/preprocessor.pkl`: Pipeline de transformação de dados usado no treinamento.

---

## ✅ Próximo Passo

Com o modelo e preprocessador salvos, estamos prontos para seguir para a **Etapa 05 – Aplicação e Interpretabilidade**, onde iremos:

- Gerar gráficos de ROC, Precision-Recall e matriz de confusão.
- Visualizar a importância das variáveis.
- Aplicar técnicas de interpretabilidade com SHAP.
- Preparar uma função final para uso em API ou serviço de predição.


In [2]:
import joblib
import os
import sys
import os
sys.path.append(os.path.abspath("../"))
from scripts.func import preparar_dados_rh, tunar_modelo_optuna

# 1. Gerar dados
df_encoded, X_train, X_test, y_train, y_test = preparar_dados_rh(qtd_amostras=100_000, semente=42)

# 2. Treinar o melhor modelo
modelo_lgbm, estudo_lgbm = tunar_modelo_optuna(
    modelo_nome='lightgbm',
    X_train=X_train,
    y_train=y_train,
    n_trials=30,
    semente=42
)

# 3. Criar e treinar o preprocessor
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

colunas_numericas = X_train.select_dtypes(include=['int64', 'float64']).columns.tolist()
colunas_categoricas = X_train.select_dtypes(include=['object', 'category']).columns.tolist()

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), colunas_numericas),
    ('cat', OneHotEncoder(drop='first', sparse_output=False), colunas_categoricas)
])

preprocessor.fit(X_train)

# 4. Salvar arquivos

root_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
models_dir = os.path.join(root_dir, 'models')
if not os.path.exists(models_dir):
    os.makedirs(models_dir, exist_ok=True)

joblib.dump(modelo_lgbm, os.path.join(models_dir, 'lightgbm_model.pkl'))
joblib.dump(preprocessor, os.path.join(models_dir, 'preprocessor.pkl'))


[I 2025-07-02 04:51:23,213] A new study created in memory with name: Tuning_lightgbm
Best trial: 0. Best value: 0.624425:   3%|▎         | 1/30 [01:19<38:27, 79.58s/it]

[I 2025-07-02 04:52:42,780] Trial 0 finished with value: 0.6244254275568708 and parameters: {'learning_rate': 0.008468008575248327, 'num_leaves': 286, 'max_depth': 12, 'min_child_samples': 64, 'feature_fraction': 0.6624074561769746, 'bagging_fraction': 0.662397808134481, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044}. Best is trial 0 with value: 0.6244254275568708.


Best trial: 0. Best value: 0.624425:   7%|▋         | 2/30 [01:30<18:13, 39.04s/it]

[I 2025-07-02 04:52:53,447] Trial 1 finished with value: 0.6190044641435256 and parameters: {'learning_rate': 0.05675206026988745, 'num_leaves': 15, 'max_depth': 15, 'min_child_samples': 85, 'feature_fraction': 0.6849356442713105, 'bagging_fraction': 0.6727299868828402, 'bagging_freq': 2, 'lambda_l1': 1.5212112147976886, 'lambda_l2': 2.6237821581611893}. Best is trial 0 with value: 0.6244254275568708.


Best trial: 0. Best value: 0.624425:  10%|█         | 3/30 [02:14<18:44, 41.65s/it]

[I 2025-07-02 04:53:38,207] Trial 2 finished with value: 0.6239762352475035 and parameters: {'learning_rate': 0.01174843954800703, 'num_leaves': 94, 'max_depth': 10, 'min_child_samples': 22, 'feature_fraction': 0.7168578594140872, 'bagging_fraction': 0.7465447373174766, 'bagging_freq': 5, 'lambda_l1': 3.925879806965068, 'lambda_l2': 0.9983689107917987}. Best is trial 0 with value: 0.6244254275568708.


Best trial: 3. Best value: 0.637271:  13%|█▎        | 4/30 [02:20<11:51, 27.37s/it]

[I 2025-07-02 04:53:43,678] Trial 3 finished with value: 0.6372711164756679 and parameters: {'learning_rate': 0.018785426399210624, 'num_leaves': 182, 'max_depth': 3, 'min_child_samples': 65, 'feature_fraction': 0.6682096494749166, 'bagging_fraction': 0.6260206371941118, 'bagging_freq': 10, 'lambda_l1': 4.828160165372797, 'lambda_l2': 4.041986740582305}. Best is trial 3 with value: 0.6372711164756679.


Best trial: 4. Best value: 0.637419:  17%|█▋        | 5/30 [02:41<10:25, 25.02s/it]

[I 2025-07-02 04:54:04,529] Trial 4 finished with value: 0.6374185563387708 and parameters: {'learning_rate': 0.0056828375585122656, 'num_leaves': 38, 'max_depth': 11, 'min_child_samples': 50, 'feature_fraction': 0.6488152939379115, 'bagging_fraction': 0.798070764044508, 'bagging_freq': 1, 'lambda_l1': 4.546602010393911, 'lambda_l2': 1.2938999080000846}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  20%|██        | 6/30 [03:22<12:10, 30.45s/it]

[I 2025-07-02 04:54:45,517] Trial 5 finished with value: 0.597352063022227 and parameters: {'learning_rate': 0.043767126303409544, 'num_leaves': 100, 'max_depth': 9, 'min_child_samples': 59, 'feature_fraction': 0.6739417822102108, 'bagging_fraction': 0.9878338511058234, 'bagging_freq': 8, 'lambda_l1': 4.697494707820946, 'lambda_l2': 4.474136752138244}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  23%|██▎       | 7/30 [03:31<08:59, 23.44s/it]

[I 2025-07-02 04:54:54,526] Trial 6 finished with value: 0.629896068624351 and parameters: {'learning_rate': 0.03027406546455652, 'num_leaves': 278, 'max_depth': 4, 'min_child_samples': 27, 'feature_fraction': 0.6180909155642152, 'bagging_fraction': 0.7301321323053057, 'bagging_freq': 4, 'lambda_l1': 1.3567451588694794, 'lambda_l2': 4.143687545759647}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  27%|██▋       | 8/30 [04:13<10:46, 29.37s/it]

[I 2025-07-02 04:55:36,590] Trial 7 finished with value: 0.6270182614008155 and parameters: {'learning_rate': 0.0076510536667541975, 'num_leaves': 91, 'max_depth': 10, 'min_child_samples': 22, 'feature_fraction': 0.9208787923016158, 'bagging_fraction': 0.6298202574719083, 'bagging_freq': 10, 'lambda_l1': 3.861223846483287, 'lambda_l2': 0.993578407670862}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  30%|███       | 9/30 [05:11<13:26, 38.42s/it]

[I 2025-07-02 04:56:34,911] Trial 8 finished with value: 0.6350294641093938 and parameters: {'learning_rate': 0.0010319982330247674, 'num_leaves': 247, 'max_depth': 12, 'min_child_samples': 76, 'feature_fraction': 0.9085081386743783, 'bagging_fraction': 0.6296178606936361, 'bagging_freq': 4, 'lambda_l1': 0.5793452976256486, 'lambda_l2': 4.315517129377968}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  33%|███▎      | 10/30 [05:17<09:27, 28.37s/it]

[I 2025-07-02 04:56:40,779] Trial 9 finished with value: 0.6345610962870348 and parameters: {'learning_rate': 0.03499331111708852, 'num_leaves': 106, 'max_depth': 3, 'min_child_samples': 38, 'feature_fraction': 0.7300733288106988, 'bagging_fraction': 0.8918424713352257, 'bagging_freq': 7, 'lambda_l1': 4.436063712881633, 'lambda_l2': 2.3610746258097466}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 4. Best value: 0.637419:  37%|███▋      | 11/30 [05:25<07:01, 22.18s/it]

[I 2025-07-02 04:56:48,923] Trial 10 finished with value: 0.591895151284174 and parameters: {'learning_rate': 0.2193905022254065, 'num_leaves': 11, 'max_depth': 7, 'min_child_samples': 44, 'feature_fraction': 0.8144754354791038, 'bagging_fraction': 0.8429521169792499, 'bagging_freq': 2, 'lambda_l1': 3.067811097520366, 'lambda_l2': 0.23696217751263182}. Best is trial 4 with value: 0.6374185563387708.


Best trial: 11. Best value: 0.638355:  40%|████      | 12/30 [05:43<06:16, 20.92s/it]

[I 2025-07-02 04:57:06,974] Trial 11 finished with value: 0.6383553507723262 and parameters: {'learning_rate': 0.0023770372217125028, 'num_leaves': 198, 'max_depth': 6, 'min_child_samples': 70, 'feature_fraction': 0.7878496840385453, 'bagging_fraction': 0.7927585318653319, 'bagging_freq': 10, 'lambda_l1': 3.035921370166128, 'lambda_l2': 1.7067080470293317}. Best is trial 11 with value: 0.6383553507723262.


Best trial: 12. Best value: 0.638478:  43%|████▎     | 13/30 [06:01<05:37, 19.85s/it]

[I 2025-07-02 04:57:24,364] Trial 12 finished with value: 0.638478483822221 and parameters: {'learning_rate': 0.002254590549186906, 'num_leaves': 197, 'max_depth': 6, 'min_child_samples': 99, 'feature_fraction': 0.8244367020014848, 'bagging_fraction': 0.8141268371993765, 'bagging_freq': 7, 'lambda_l1': 2.864265403136679, 'lambda_l2': 1.6921629101016422}. Best is trial 12 with value: 0.638478483822221.


Best trial: 12. Best value: 0.638478:  47%|████▋     | 14/30 [06:18<05:07, 19.19s/it]

[I 2025-07-02 04:57:42,027] Trial 13 finished with value: 0.6384519115464026 and parameters: {'learning_rate': 0.0017345515600983236, 'num_leaves': 199, 'max_depth': 6, 'min_child_samples': 100, 'feature_fraction': 0.825610090928053, 'bagging_fraction': 0.9126738533684695, 'bagging_freq': 8, 'lambda_l1': 2.8008803524924515, 'lambda_l2': 1.8622066402666226}. Best is trial 12 with value: 0.638478483822221.


Best trial: 12. Best value: 0.638478:  50%|█████     | 15/30 [06:42<05:08, 20.59s/it]

[I 2025-07-02 04:58:05,869] Trial 14 finished with value: 0.6380747155452836 and parameters: {'learning_rate': 0.0010778297218068933, 'num_leaves': 219, 'max_depth': 7, 'min_child_samples': 100, 'feature_fraction': 0.852433737415586, 'bagging_fraction': 0.9111341683349905, 'bagging_freq': 7, 'lambda_l1': 2.2458480892846575, 'lambda_l2': 2.0789298723577474}. Best is trial 12 with value: 0.638478483822221.


Best trial: 15. Best value: 0.639343:  53%|█████▎    | 16/30 [06:56<04:19, 18.55s/it]

[I 2025-07-02 04:58:19,675] Trial 15 finished with value: 0.6393428909839372 and parameters: {'learning_rate': 0.0025772946144344043, 'num_leaves': 142, 'max_depth': 5, 'min_child_samples': 96, 'feature_fraction': 0.8670586836589573, 'bagging_fraction': 0.9705384491631063, 'bagging_freq': 8, 'lambda_l1': 2.32819966143344, 'lambda_l2': 3.2579587851645835}. Best is trial 15 with value: 0.6393428909839372.


Best trial: 16. Best value: 0.639373:  57%|█████▋    | 17/30 [07:10<03:44, 17.29s/it]

[I 2025-07-02 04:58:34,030] Trial 16 finished with value: 0.6393732826732426 and parameters: {'learning_rate': 0.0035674081664957107, 'num_leaves': 147, 'max_depth': 5, 'min_child_samples': 86, 'feature_fraction': 0.87498687476921, 'bagging_fraction': 0.9787219705654253, 'bagging_freq': 6, 'lambda_l1': 1.9910527639025013, 'lambda_l2': 3.3425770772126717}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  60%|██████    | 18/30 [07:25<03:17, 16.47s/it]

[I 2025-07-02 04:58:48,589] Trial 17 finished with value: 0.6384040750148164 and parameters: {'learning_rate': 0.004370267382550643, 'num_leaves': 137, 'max_depth': 5, 'min_child_samples': 84, 'feature_fraction': 0.9562561635055674, 'bagging_fraction': 0.9995781511268159, 'bagging_freq': 6, 'lambda_l1': 1.8089740851242242, 'lambda_l2': 3.3561848367527602}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  63%|██████▎   | 19/30 [08:06<04:21, 23.81s/it]

[I 2025-07-02 04:59:29,500] Trial 18 finished with value: 0.6332412974717977 and parameters: {'learning_rate': 0.0034908446951837343, 'num_leaves': 141, 'max_depth': 8, 'min_child_samples': 88, 'feature_fraction': 0.8768205286194898, 'bagging_fraction': 0.9566218932844948, 'bagging_freq': 9, 'lambda_l1': 0.7819000372264544, 'lambda_l2': 3.5268490753633954}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  67%|██████▋   | 20/30 [08:16<03:16, 19.62s/it]

[I 2025-07-02 04:59:39,360] Trial 19 finished with value: 0.6124740889333004 and parameters: {'learning_rate': 0.09451459461562424, 'num_leaves': 161, 'max_depth': 4, 'min_child_samples': 76, 'feature_fraction': 0.9845798676688794, 'bagging_fraction': 0.9414689916318519, 'bagging_freq': 5, 'lambda_l1': 2.268302678751437, 'lambda_l2': 4.819497065799107}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  70%|███████   | 21/30 [08:47<03:29, 23.25s/it]

[I 2025-07-02 05:00:11,084] Trial 20 finished with value: 0.6267083604941197 and parameters: {'learning_rate': 0.01440673090433955, 'num_leaves': 55, 'max_depth': 15, 'min_child_samples': 90, 'feature_fraction': 0.766983112121334, 'bagging_fraction': 0.8592568806861586, 'bagging_freq': 6, 'lambda_l1': 0.05810480813776442, 'lambda_l2': 3.5835953820045923}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 16. Best value: 0.639373:  73%|███████▎  | 22/30 [09:01<02:43, 20.40s/it]

[I 2025-07-02 05:00:24,844] Trial 21 finished with value: 0.6393269782001808 and parameters: {'learning_rate': 0.0022887593349744076, 'num_leaves': 235, 'max_depth': 5, 'min_child_samples': 95, 'feature_fraction': 0.864366223935975, 'bagging_fraction': 0.9578721599964157, 'bagging_freq': 8, 'lambda_l1': 3.525885802381468, 'lambda_l2': 2.848313073435282}. Best is trial 16 with value: 0.6393732826732426.


Best trial: 22. Best value: 0.639406:  77%|███████▋  | 23/30 [09:15<02:09, 18.51s/it]

[I 2025-07-02 05:00:38,924] Trial 22 finished with value: 0.6394061143120777 and parameters: {'learning_rate': 0.003255892721730623, 'num_leaves': 157, 'max_depth': 5, 'min_child_samples': 92, 'feature_fraction': 0.8648969096225119, 'bagging_fraction': 0.961687083582021, 'bagging_freq': 8, 'lambda_l1': 3.572146420534935, 'lambda_l2': 3.091860286683792}. Best is trial 22 with value: 0.6394061143120777.


Best trial: 23. Best value: 0.6404:  80%|████████  | 24/30 [09:25<01:35, 15.84s/it]  

[I 2025-07-02 05:00:48,554] Trial 23 finished with value: 0.6403998207666525 and parameters: {'learning_rate': 0.0041826257775779, 'num_leaves': 126, 'max_depth': 4, 'min_child_samples': 79, 'feature_fraction': 0.9074420621691284, 'bagging_fraction': 0.9796284662391809, 'bagging_freq': 9, 'lambda_l1': 1.9825679994186263, 'lambda_l2': 3.170064962868606}. Best is trial 23 with value: 0.6403998207666525.


Best trial: 23. Best value: 0.6404:  83%|████████▎ | 25/30 [09:35<01:09, 13.99s/it]

[I 2025-07-02 05:00:58,220] Trial 24 finished with value: 0.6403026381930884 and parameters: {'learning_rate': 0.005077313720642067, 'num_leaves': 123, 'max_depth': 4, 'min_child_samples': 76, 'feature_fraction': 0.9193848182959573, 'bagging_fraction': 0.932895657334162, 'bagging_freq': 9, 'lambda_l1': 1.842231609610077, 'lambda_l2': 3.81574611631762}. Best is trial 23 with value: 0.6403998207666525.


Best trial: 25. Best value: 0.641008:  87%|████████▋ | 26/30 [09:41<00:46, 11.68s/it]

[I 2025-07-02 05:01:04,505] Trial 25 finished with value: 0.6410081191509109 and parameters: {'learning_rate': 0.00599475639768284, 'num_leaves': 118, 'max_depth': 3, 'min_child_samples': 77, 'feature_fraction': 0.924543232663553, 'bagging_fraction': 0.9316754076154603, 'bagging_freq': 9, 'lambda_l1': 1.0695487310764085, 'lambda_l2': 3.925802362228854}. Best is trial 25 with value: 0.6410081191509109.


Best trial: 25. Best value: 0.641008:  90%|█████████ | 27/30 [09:47<00:30, 10.06s/it]

[I 2025-07-02 05:01:10,778] Trial 26 finished with value: 0.640967765367825 and parameters: {'learning_rate': 0.006320296204082007, 'num_leaves': 69, 'max_depth': 3, 'min_child_samples': 77, 'feature_fraction': 0.9352954828810333, 'bagging_fraction': 0.879753881292549, 'bagging_freq': 9, 'lambda_l1': 1.115121165910072, 'lambda_l2': 3.7385887639189836}. Best is trial 25 with value: 0.6410081191509109.


Best trial: 25. Best value: 0.641008:  93%|█████████▎| 28/30 [09:53<00:17,  8.90s/it]

[I 2025-07-02 05:01:16,974] Trial 27 finished with value: 0.6402163500091286 and parameters: {'learning_rate': 0.009432034955026808, 'num_leaves': 67, 'max_depth': 3, 'min_child_samples': 55, 'feature_fraction': 0.983676311508369, 'bagging_fraction': 0.8756315596686006, 'bagging_freq': 9, 'lambda_l1': 1.0858473930913755, 'lambda_l2': 3.763034599512568}. Best is trial 25 with value: 0.6410081191509109.


Best trial: 28. Best value: 0.641066:  97%|█████████▋| 29/30 [09:59<00:08,  8.09s/it]

[I 2025-07-02 05:01:23,178] Trial 28 finished with value: 0.6410656170156999 and parameters: {'learning_rate': 0.0057895274890742195, 'num_leaves': 73, 'max_depth': 3, 'min_child_samples': 70, 'feature_fraction': 0.9551883483386383, 'bagging_fraction': 0.9128563041504967, 'bagging_freq': 9, 'lambda_l1': 0.39829367118956727, 'lambda_l2': 4.781042957486154}. Best is trial 28 with value: 0.6410656170156999.


Best trial: 28. Best value: 0.641066: 100%|██████████| 30/30 [10:05<00:00, 20.20s/it]


[I 2025-07-02 05:01:29,176] Trial 29 finished with value: 0.636315280520333 and parameters: {'learning_rate': 0.022477924168153415, 'num_leaves': 77, 'max_depth': 3, 'min_child_samples': 67, 'feature_fraction': 0.9443130856149027, 'bagging_fraction': 0.8485281092713779, 'bagging_freq': 10, 'lambda_l1': 0.05174219484548148, 'lambda_l2': 4.9939983310912455}. Best is trial 28 with value: 0.6410656170156999.

✅ Melhor AUC: 0.6411
🏆 Melhor conjunto de hiperparâmetros:
   - learning_rate: 0.0057895274890742195
   - num_leaves: 73
   - max_depth: 3
   - min_child_samples: 70
   - feature_fraction: 0.9551883483386383
   - bagging_fraction: 0.9128563041504967
   - bagging_freq: 9
   - lambda_l1: 0.39829367118956727
   - lambda_l2: 4.781042957486154


['models/preprocessor.pkl']