# ET-287 - Processamento de sinais usando redes neurais

_Aluno: Denys Derlian Carvalho Brito_

---

## Projeto Exame - Classifica√ß√£o de Desempenho em Criptografia Qu√¢ntica com Redes Neurais Profundas

**Objetivo:** Desenvolver um classificador baseado em redes neurais profundas capaz de prever, a partir de vari√°veis heterog√™neas de um ambiente de criptografia qu√¢ntica, a classe de desempenho (‚Äú√ìtimo‚Äù vs. ‚ÄúSub√≥timo‚Äù) com alta acur√°cia e boa capacidade de generaliza√ß√£o.


No contexto da computa√ß√£o qu√¢ntica, a criptografia qu√¢ntica oferece novos paradigmas de seguran√ßa baseados em princ√≠pios de mec√¢nica qu√¢ntica. Sob essa perspectiva, a aplica√ß√£o de t√©cnicas de aprendizado de m√°quina tem sido explorada para otimizar e aprimorar o desempenho dos sistemas de criptografia qu√¢ntica. Todavia, a complexidade inerente a esses sistemas, aliada √† heterogeneidade dos dados gerados, imp√µe desafios significativos para a constru√ß√£o de modelos preditivos eficazes. Nesse cen√°rio, prop√µe uma abordagem que foca na classifica√ß√£o da **performance da rede** de forma integrada, considerando n√£o apenas os par√¢metros qu√¢nticos e de seguran√ßa, mas tamb√©m m√©tricas de performance de rede, caracter√≠sticas de _big data_ e utiliza√ß√£o de recursos computacionais, que em conjunto determinam a viabilidade pr√°tica dos sistema em cen√°rios operacionais reais.

---

## 5. Algoritmos de ML cl√°ssico para compara√ß√£o: _Random Forest, SVM e XGBoost_

In [1]:
# Importart bibliotecas a partir do m√≥dulo config
from modules.config import *

  from .autonotebook import tqdm as notebook_tqdm
2025-11-19 01:05:49.230880: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-11-19 01:05:49.742258: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-19 01:05:51.892921: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


Num GPUs Available:  1


In [2]:
# Carregar os dados pre-processados
df_preprocessed = pd.read_csv("./data/preprocessed/df_preprocessed.csv")
X_numerical_df = pd.read_csv("./data/preprocessed/X_numerical_preprocessed.csv")
X_categorical_df = pd.read_csv("./data/preprocessed/X_categorical_preprocessed.csv")
y = pd.read_csv("./data/preprocessed/y_preprocessed.csv")


TARGET_VARIABLE = 'Performance_Target'
CATEGORICAL_COLUMNS = df_preprocessed.select_dtypes(include=['object', 'category']).columns.tolist()
NUMERICAL_COLUMNS = df_preprocessed.select_dtypes(include=['number']).columns.difference([TARGET_VARIABLE]).tolist()

In [3]:
X_dev_idx_ml, X_test_idx_ml = train_test_split(
        np.arange(len(y)), 
        test_size=0.2, 
        stratify=y.values.ravel(), 
        random_state=RANDOM_SEED
    )

X_num_dev_ml = X_numerical_df.iloc[X_dev_idx_ml].values
X_cat_dev_ml = X_categorical_df.iloc[X_dev_idx_ml].values
X_num_test_ml = X_numerical_df.iloc[X_test_idx_ml].values
X_cat_test_ml = X_categorical_df.iloc[X_test_idx_ml].values
y_dev_ml = y.iloc[X_dev_idx_ml].values.ravel().astype(int)
y_test_ml = y.iloc[X_test_idx_ml].values.ravel().astype(int)
print(f"‚úì Split criado: Dev={len(y_dev_ml)}, Test={len(y_test_ml)}")

# Aplicar scaling no dev set e transformar test set
scaler_ml = MinMaxScaler()
X_num_dev_scaled_ml = scaler_ml.fit_transform(X_num_dev_ml)
X_num_test_scaled_ml = scaler_ml.transform(X_num_test_ml)

# Concatenar features num√©ricas (escaladas) + categ√≥ricas (label-encoded)
X_dev_ml = np.hstack([X_num_dev_scaled_ml, X_cat_dev_ml])
X_test_ml = np.hstack([X_num_test_scaled_ml, X_cat_test_ml])

print(f"\nüìä Dados preparados para modelos cl√°ssicos:")
print(f"  Dev set: {X_dev_ml.shape} | Test set: {X_test_ml.shape}")
print(f"  Features: {X_num_dev_scaled_ml.shape[1]} num√©ricas + {X_cat_dev_ml.shape[1]} categ√≥ricas")
print(f"  Classes dev: {np.bincount(y_dev_ml)}")
print(f"  Classes test: {np.bincount(y_test_ml)}")

‚úì Split criado: Dev=800, Test=200

üìä Dados preparados para modelos cl√°ssicos:
  Dev set: (800, 28) | Test set: (200, 28)
  Features: 18 num√©ricas + 10 categ√≥ricas
  Classes dev: [734  66]
  Classes test: [184  16]


In [4]:
def evaluate_classic_model(model, model_name, X_train, y_train, X_test, y_test):
    """
    Treina e avalia um modelo cl√°ssico de ML
    """
    print(f"\n{'‚îÄ'*70}")
    print(f"üîÑ Treinando {model_name}...")
    print(f"{'‚îÄ'*70}")
    
    # Treinar
    start_time = time.time()
    model.fit(X_train, y_train)
    train_time = time.time() - start_time
    
    # Predi√ß√µes
    y_pred_prob = model.predict_proba(X_test)[:, 1]
    
    # Encontrar threshold √≥timo para F1
    prec, rec, thr = precision_recall_curve(y_test, y_pred_prob)
    f1_scores = 2 * (prec[:-1] * rec[:-1]) / (prec[:-1] + rec[:-1] + 1e-8)
    best_thr = thr[np.argmax(f1_scores)]
    
    y_pred = (y_pred_prob >= best_thr).astype(int)
    
    # M√©tricas
    roc_auc = roc_auc_score(y_test, y_pred_prob)
    pr_auc = average_precision_score(y_test, y_pred_prob)
    f1 = f1_score(y_test, y_pred, zero_division=0)
    precision = precision_score(y_test, y_pred, zero_division=0)
    accuracy = accuracy_score(y_test, y_pred) 
    recall = recall_score(y_test, y_pred, zero_division=0)
    cm = confusion_matrix(y_test, y_pred)
    
    # Resultados
    print(f"‚úì Treinamento conclu√≠do em {train_time:.2f}s")
    print(f"  Threshold otimizado: {best_thr:.4f}")
    print(f"\nüìä M√©tricas no Test Set:")
    print(f"  ROC-AUC:   {roc_auc:.4f}")
    print(f"  PR-AUC:    {pr_auc:.4f}")
    print(f"  F1-Score:  {f1:.4f}")
    print(f"  Precision: {precision:.4f}")
    print(f"  Recall:    {recall:.4f}")
    print(f"  Accuracy:  {accuracy:.4f}")
    print(f"\nConfusion Matrix:")
    print(cm)
    
    return {
        'model_name': model_name,
        'train_time': train_time,
        'threshold': best_thr,
        'roc_auc': roc_auc,
        'pr_auc': pr_auc,
        'f1_score': f1,
        'precision': precision,
        'recall': recall,
        'accuracy': accuracy,
        'confusion_matrix': cm
    }

print("‚úì Fun√ß√£o de avalia√ß√£o definida")

‚úì Fun√ß√£o de avalia√ß√£o definida


In [5]:
# Configura√ß√£o para otimiza√ß√£o Optuna dos modelos cl√°ssicos
OPTUNA_CLASSIC_N_TRIALS = 50  # N√∫mero de trials para cada modelo
OPTUNA_CLASSIC_CV_SPLITS = 5  # Stratified K-Fold splits para valida√ß√£o

print(f"‚öôÔ∏è Configura√ß√£o Optuna para Modelos Cl√°ssicos:")
print(f"  Trials por modelo: {OPTUNA_CLASSIC_N_TRIALS}")
print(f"  CV Splits: {OPTUNA_CLASSIC_CV_SPLITS}-fold stratified")
print(f"  M√©trica de otimiza√ß√£o: PR-AUC (ideal para classes desbalanceadas)")

‚öôÔ∏è Configura√ß√£o Optuna para Modelos Cl√°ssicos:
  Trials por modelo: 50
  CV Splits: 5-fold stratified
  M√©trica de otimiza√ß√£o: PR-AUC (ideal para classes desbalanceadas)


### 5.1 _Random Forest_

In [6]:
def compute_class_weight(class_weight, *, classes, y):
    """Estimate class weights for unbalanced datasets.

    Parameters
    ----------
    class_weight : dict, "balanced" or None
        If "balanced", class weights will be given by
        `n_samples / (n_classes * np.bincount(y))`.
        If a dictionary is given, keys are classes and values are corresponding class
        weights.
        If `None` is given, the class weights will be uniform.

    classes : ndarray
        Array of the classes occurring in the data, as given by
        `np.unique(y_org)` with `y_org` the original class labels.

    y : array-like of shape (n_samples,)
        Array of original class labels per sample.

    Returns
    -------
    class_weight_vect : ndarray of shape (n_classes,)
        Array with `class_weight_vect[i]` the weight for i-th class.

    References
    ----------
    The "balanced" heuristic is inspired by
    Logistic Regression in Rare Events Data, King, Zen, 2001.

    Examples
    --------
    >>> import numpy as np
    >>> from sklearn.utils.class_weight import compute_class_weight
    >>> y = [1, 1, 1, 1, 0, 0]
    >>> compute_class_weight(class_weight="balanced", classes=np.unique(y), y=y)
    array([1.5 , 0.75])
    """

    if set(y) - set(classes):
        raise ValueError("classes should include all valid labels that can be in y")
    if class_weight is None or len(class_weight) == 0:
        # uniform class weights
        weight = np.ones(classes.shape[0], dtype=np.float64, order="C")
    elif class_weight == "balanced":
        # Find the weight of each class as present in y.
        le = LabelEncoder()
        y_ind = le.fit_transform(y)
        if not all(np.isin(classes, le.classes_)):
            raise ValueError("classes should have valid labels that are in y")

        recip_freq = len(y) / (len(le.classes_) * np.bincount(y_ind).astype(np.float64))
        weight = recip_freq[le.transform(classes)]
    else:
        # user-defined dictionary
        weight = np.ones(classes.shape[0], dtype=np.float64, order="C")
        unweighted_classes = []
        for i, c in enumerate(classes):
            if c in class_weight:
                weight[i] = class_weight[c]
            else:
                unweighted_classes.append(c)

        n_weighted_classes = len(classes) - len(unweighted_classes)
        if unweighted_classes and n_weighted_classes != len(class_weight):
            unweighted_classes_user_friendly_str = np.array(unweighted_classes).tolist()
            raise ValueError(
                f"The classes, {unweighted_classes_user_friendly_str}, are not in"
                " class_weight"
            )

    return weight

cw_rf = compute_class_weight('balanced', classes=np.unique(y_dev_ml), y=y_dev_ml)
class_weight_rf = dict(enumerate(cw_rf))

In [7]:
# 1Ô∏è‚É£ Otimiza√ß√£o Random Forest com Optuna
def objective_random_forest(trial):
    """
    Fun√ß√£o objetivo para otimizar Random Forest usando PR-AUC como m√©trica
    """
    # Sugerir hiperpar√¢metros
    n_estimators = trial.suggest_int('n_estimators', 100, 500, step=50)
    max_depth = trial.suggest_int('max_depth', 10, 50, step=5)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 20)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10)
    max_features = trial.suggest_categorical('max_features', ['sqrt', 'log2', 0.5, 0.7])
    
    # Calcular class weight
    cw = compute_class_weight('balanced', classes=np.unique(y_dev_ml), y=y_dev_ml)
    class_weight = dict(enumerate(cw))
    
    # Modelo
    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        max_features=max_features,
        class_weight=class_weight,
        random_state=RANDOM_SEED,
        n_jobs=-1,
        verbose=0
    )
    
    # Valida√ß√£o cruzada estratificada
    skf = StratifiedKFold(n_splits=OPTUNA_CLASSIC_CV_SPLITS, shuffle=True, random_state=RANDOM_SEED)
    pr_auc_scores = []
    
    for train_idx, val_idx in skf.split(X_dev_ml, y_dev_ml):
        X_train, X_val = X_dev_ml[train_idx], X_dev_ml[val_idx]
        y_train, y_val = y_dev_ml[train_idx], y_dev_ml[val_idx]
        
        model.fit(X_train, y_train)
        y_pred_prob = model.predict_proba(X_val)[:, 1]
        
        pr_auc = average_precision_score(y_val, y_pred_prob)
        pr_auc_scores.append(pr_auc)
    
    return np.mean(pr_auc_scores)

print(f"\n{'='*70}")
print(f"üîÑ OTIMIZANDO RANDOM FOREST COM OPTUNA")
print(f"{'='*70}")

study_rf = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=RANDOM_SEED, multivariate=True),
    study_name='random_forest_optimization'
)

study_rf.optimize(objective_random_forest, n_trials=OPTUNA_CLASSIC_N_TRIALS, show_progress_bar=True)

print(f"\n‚úì Otimiza√ß√£o conclu√≠da!")
print(f"  Melhor PR-AUC (CV): {study_rf.best_value:.4f}")
print(f"\nMelhores hiperpar√¢metros Random Forest:")
for key, value in study_rf.best_params.items():
    print(f"  {key}: {value}")

[I 2025-11-19 01:05:53,620] A new study created in memory with name: random_forest_optimization



üîÑ OTIMIZANDO RANDOM FOREST COM OPTUNA


Best trial: 0. Best value: 0.984343:   2%|‚ñè         | 1/50 [00:02<02:09,  2.63s/it]

[I 2025-11-19 01:05:56,254] Trial 0 finished with value: 0.9843426344896932 and parameters: {'n_estimators': 250, 'max_depth': 50, 'min_samples_split': 15, 'min_samples_leaf': 6, 'max_features': 0.7}. Best is trial 0 with value: 0.9843426344896932.


Best trial: 0. Best value: 0.984343:   4%|‚ñç         | 2/50 [00:06<02:44,  3.43s/it]

[I 2025-11-19 01:06:00,234] Trial 1 finished with value: 0.9707227917010524 and parameters: {'n_estimators': 350, 'max_depth': 40, 'min_samples_split': 2, 'min_samples_leaf': 10, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.9843426344896932.


Best trial: 0. Best value: 0.984343:   6%|‚ñå         | 3/50 [00:08<02:17,  2.93s/it]

[I 2025-11-19 01:06:02,579] Trial 2 finished with value: 0.9807855302592146 and parameters: {'n_estimators': 200, 'max_depth': 30, 'min_samples_split': 10, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.9843426344896932.


Best trial: 0. Best value: 0.984343:   8%|‚ñä         | 4/50 [00:12<02:27,  3.21s/it]

[I 2025-11-19 01:06:06,208] Trial 3 finished with value: 0.9843426344896932 and parameters: {'n_estimators': 300, 'max_depth': 45, 'min_samples_split': 5, 'min_samples_leaf': 6, 'max_features': 0.5}. Best is trial 0 with value: 0.9843426344896932.


Best trial: 0. Best value: 0.984343:  10%|‚ñà         | 5/50 [00:13<01:52,  2.50s/it]

[I 2025-11-19 01:06:07,443] Trial 4 finished with value: 0.9792357968828556 and parameters: {'n_estimators': 100, 'max_depth': 50, 'min_samples_split': 20, 'min_samples_leaf': 9, 'max_features': 0.5}. Best is trial 0 with value: 0.9843426344896932.


Best trial: 0. Best value: 0.984343:  12%|‚ñà‚ñè        | 6/50 [00:15<01:40,  2.29s/it]

[I 2025-11-19 01:06:09,346] Trial 5 finished with value: 0.9644226499449792 and parameters: {'n_estimators': 150, 'max_depth': 30, 'min_samples_split': 2, 'min_samples_leaf': 10, 'max_features': 'log2'}. Best is trial 0 with value: 0.9843426344896932.


Best trial: 0. Best value: 0.984343:  14%|‚ñà‚ñç        | 7/50 [00:18<01:52,  2.61s/it]

[I 2025-11-19 01:06:12,602] Trial 6 finished with value: 0.9724043995668709 and parameters: {'n_estimators': 300, 'max_depth': 15, 'min_samples_split': 20, 'min_samples_leaf': 8, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.9843426344896932.


Best trial: 7. Best value: 0.986:  16%|‚ñà‚ñå        | 8/50 [00:20<01:29,  2.13s/it]   

[I 2025-11-19 01:06:13,715] Trial 7 finished with value: 0.9860001436472021 and parameters: {'n_estimators': 100, 'max_depth': 15, 'min_samples_split': 2, 'min_samples_leaf': 4, 'max_features': 0.5}. Best is trial 7 with value: 0.9860001436472021.


Best trial: 7. Best value: 0.986:  18%|‚ñà‚ñä        | 9/50 [00:22<01:24,  2.07s/it]

[I 2025-11-19 01:06:15,647] Trial 8 finished with value: 0.9717134238310707 and parameters: {'n_estimators': 200, 'max_depth': 30, 'min_samples_split': 4, 'min_samples_leaf': 9, 'max_features': 'log2'}. Best is trial 7 with value: 0.9860001436472021.


Best trial: 7. Best value: 0.986:  20%|‚ñà‚ñà        | 10/50 [00:23<01:11,  1.78s/it]

[I 2025-11-19 01:06:16,776] Trial 9 finished with value: 0.9720420807262912 and parameters: {'n_estimators': 100, 'max_depth': 45, 'min_samples_split': 15, 'min_samples_leaf': 8, 'max_features': 'sqrt'}. Best is trial 7 with value: 0.9860001436472021.


Best trial: 10. Best value: 0.987485:  22%|‚ñà‚ñà‚ñè       | 11/50 [00:24<01:06,  1.71s/it]

[I 2025-11-19 01:06:18,315] Trial 10 finished with value: 0.9874847374847375 and parameters: {'n_estimators': 150, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 4, 'max_features': 0.5}. Best is trial 10 with value: 0.9874847374847375.


Best trial: 10. Best value: 0.987485:  24%|‚ñà‚ñà‚ñç       | 12/50 [00:26<01:02,  1.64s/it]

[I 2025-11-19 01:06:19,795] Trial 11 finished with value: 0.9874847374847372 and parameters: {'n_estimators': 150, 'max_depth': 10, 'min_samples_split': 9, 'min_samples_leaf': 4, 'max_features': 0.5}. Best is trial 10 with value: 0.9874847374847375.


Best trial: 12. Best value: 0.995337:  26%|‚ñà‚ñà‚ñå       | 13/50 [00:27<00:55,  1.50s/it]

[I 2025-11-19 01:06:20,984] Trial 12 finished with value: 0.9953367145674837 and parameters: {'n_estimators': 100, 'max_depth': 15, 'min_samples_split': 12, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 12 with value: 0.9953367145674837.


Best trial: 12. Best value: 0.995337:  28%|‚ñà‚ñà‚ñä       | 14/50 [00:28<00:50,  1.41s/it]

[I 2025-11-19 01:06:22,197] Trial 13 finished with value: 0.9758770260240848 and parameters: {'n_estimators': 100, 'max_depth': 20, 'min_samples_split': 11, 'min_samples_leaf': 1, 'max_features': 'log2'}. Best is trial 12 with value: 0.9953367145674837.


Best trial: 12. Best value: 0.995337:  30%|‚ñà‚ñà‚ñà       | 15/50 [00:29<00:46,  1.34s/it]

[I 2025-11-19 01:06:23,356] Trial 14 finished with value: 0.989642857142857 and parameters: {'n_estimators': 100, 'max_depth': 20, 'min_samples_split': 16, 'min_samples_leaf': 2, 'max_features': 0.5}. Best is trial 12 with value: 0.9953367145674837.


Best trial: 15. Best value: 0.997802:  32%|‚ñà‚ñà‚ñà‚ñè      | 16/50 [00:31<00:54,  1.59s/it]

[I 2025-11-19 01:06:25,539] Trial 15 finished with value: 0.9978021978021976 and parameters: {'n_estimators': 200, 'max_depth': 25, 'min_samples_split': 18, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 15 with value: 0.9978021978021976.


Best trial: 15. Best value: 0.997802:  34%|‚ñà‚ñà‚ñà‚ñç      | 17/50 [00:34<01:04,  1.94s/it]

[I 2025-11-19 01:06:28,305] Trial 16 finished with value: 0.989642857142857 and parameters: {'n_estimators': 300, 'max_depth': 35, 'min_samples_split': 18, 'min_samples_leaf': 3, 'max_features': 0.5}. Best is trial 15 with value: 0.9978021978021976.


Best trial: 15. Best value: 0.997802:  36%|‚ñà‚ñà‚ñà‚ñå      | 18/50 [00:36<01:05,  2.06s/it]

[I 2025-11-19 01:06:30,619] Trial 17 finished with value: 0.9956663848971541 and parameters: {'n_estimators': 250, 'max_depth': 15, 'min_samples_split': 14, 'min_samples_leaf': 1, 'max_features': 0.7}. Best is trial 15 with value: 0.9978021978021976.


Best trial: 15. Best value: 0.997802:  38%|‚ñà‚ñà‚ñà‚ñä      | 19/50 [00:40<01:14,  2.41s/it]

[I 2025-11-19 01:06:33,867] Trial 18 finished with value: 0.9968498168498167 and parameters: {'n_estimators': 350, 'max_depth': 15, 'min_samples_split': 16, 'min_samples_leaf': 1, 'max_features': 0.7}. Best is trial 15 with value: 0.9978021978021976.


Best trial: 15. Best value: 0.997802:  40%|‚ñà‚ñà‚ñà‚ñà      | 20/50 [00:44<01:26,  2.88s/it]

[I 2025-11-19 01:06:37,829] Trial 19 finished with value: 0.9886904761904761 and parameters: {'n_estimators': 450, 'max_depth': 25, 'min_samples_split': 11, 'min_samples_leaf': 4, 'max_features': 0.7}. Best is trial 15 with value: 0.9978021978021976.


Best trial: 15. Best value: 0.997802:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 21/50 [00:47<01:25,  2.95s/it]

[I 2025-11-19 01:06:40,957] Trial 20 finished with value: 0.9935510002817693 and parameters: {'n_estimators': 350, 'max_depth': 15, 'min_samples_split': 18, 'min_samples_leaf': 2, 'max_features': 0.5}. Best is trial 15 with value: 0.9978021978021976.


Best trial: 15. Best value: 0.997802:  44%|‚ñà‚ñà‚ñà‚ñà‚ñç     | 22/50 [00:50<01:20,  2.88s/it]

[I 2025-11-19 01:06:43,657] Trial 21 finished with value: 0.9934319526627219 and parameters: {'n_estimators': 300, 'max_depth': 15, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 0.7}. Best is trial 15 with value: 0.9978021978021976.


Best trial: 22. Best value: 0.997949:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 23/50 [00:54<01:29,  3.32s/it]

[I 2025-11-19 01:06:48,023] Trial 22 finished with value: 0.9979487179487178 and parameters: {'n_estimators': 400, 'max_depth': 25, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.7}. Best is trial 22 with value: 0.9979487179487178.


Best trial: 22. Best value: 0.997949:  48%|‚ñà‚ñà‚ñà‚ñà‚ñä     | 24/50 [00:58<01:34,  3.65s/it]

[I 2025-11-19 01:06:52,444] Trial 23 finished with value: 0.9979487179487178 and parameters: {'n_estimators': 400, 'max_depth': 30, 'min_samples_split': 19, 'min_samples_leaf': 1, 'max_features': 0.7}. Best is trial 22 with value: 0.9979487179487178.


Best trial: 24. Best value: 0.997949:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 25/50 [01:02<01:33,  3.75s/it]

[I 2025-11-19 01:06:56,409] Trial 24 finished with value: 0.997948717948718 and parameters: {'n_estimators': 450, 'max_depth': 30, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.7}. Best is trial 24 with value: 0.997948717948718.


Best trial: 25. Best value: 0.998901:  52%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè    | 26/50 [01:07<01:34,  3.95s/it]

[I 2025-11-19 01:07:00,825] Trial 25 finished with value: 0.9989010989010989 and parameters: {'n_estimators': 500, 'max_depth': 35, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 27/50 [01:11<01:33,  4.06s/it]

[I 2025-11-19 01:07:05,152] Trial 26 finished with value: 0.9935510002817696 and parameters: {'n_estimators': 500, 'max_depth': 40, 'min_samples_split': 17, 'min_samples_leaf': 2, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  56%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 28/50 [01:15<01:31,  4.14s/it]

[I 2025-11-19 01:07:09,488] Trial 27 finished with value: 0.9725369075369075 and parameters: {'n_estimators': 450, 'max_depth': 30, 'min_samples_split': 20, 'min_samples_leaf': 4, 'max_features': 'log2'}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 29/50 [01:19<01:26,  4.14s/it]

[I 2025-11-19 01:07:13,607] Trial 28 finished with value: 0.9852813852813853 and parameters: {'n_estimators': 400, 'max_depth': 45, 'min_samples_split': 19, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 30/50 [01:24<01:25,  4.26s/it]

[I 2025-11-19 01:07:18,172] Trial 29 finished with value: 0.9989010989010987 and parameters: {'n_estimators': 500, 'max_depth': 25, 'min_samples_split': 16, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   | 31/50 [01:28<01:19,  4.18s/it]

[I 2025-11-19 01:07:22,143] Trial 30 finished with value: 0.9935510002817693 and parameters: {'n_estimators': 450, 'max_depth': 25, 'min_samples_split': 17, 'min_samples_leaf': 2, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  64%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç   | 32/50 [01:33<01:17,  4.29s/it]

[I 2025-11-19 01:07:26,685] Trial 31 finished with value: 0.9989010989010989 and parameters: {'n_estimators': 500, 'max_depth': 30, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 33/50 [01:38<01:16,  4.49s/it]

[I 2025-11-19 01:07:31,657] Trial 32 finished with value: 0.9844755244755244 and parameters: {'n_estimators': 500, 'max_depth': 20, 'min_samples_split': 17, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  68%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä   | 34/50 [01:42<01:12,  4.52s/it]

[I 2025-11-19 01:07:36,241] Trial 33 finished with value: 0.9935510002817696 and parameters: {'n_estimators': 500, 'max_depth': 30, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 35/50 [01:46<01:07,  4.47s/it]

[I 2025-11-19 01:07:40,587] Trial 34 finished with value: 0.9978021978021978 and parameters: {'n_estimators': 450, 'max_depth': 30, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 36/50 [01:51<01:04,  4.60s/it]

[I 2025-11-19 01:07:45,486] Trial 35 finished with value: 0.983627533853778 and parameters: {'n_estimators': 500, 'max_depth': 35, 'min_samples_split': 14, 'min_samples_leaf': 1, 'max_features': 'log2'}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  74%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 37/50 [01:56<00:59,  4.60s/it]

[I 2025-11-19 01:07:50,107] Trial 36 finished with value: 0.997948717948718 and parameters: {'n_estimators': 500, 'max_depth': 35, 'min_samples_split': 19, 'min_samples_leaf': 1, 'max_features': 0.7}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  76%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 38/50 [02:01<00:55,  4.58s/it]

[I 2025-11-19 01:07:54,647] Trial 37 finished with value: 0.9989010989010987 and parameters: {'n_estimators': 500, 'max_depth': 20, 'min_samples_split': 15, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  78%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 39/50 [02:05<00:50,  4.55s/it]

[I 2025-11-19 01:07:59,125] Trial 38 finished with value: 0.989642857142857 and parameters: {'n_estimators': 500, 'max_depth': 10, 'min_samples_split': 14, 'min_samples_leaf': 4, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 40/50 [02:10<00:45,  4.54s/it]

[I 2025-11-19 01:08:03,647] Trial 39 finished with value: 0.9989010989010987 and parameters: {'n_estimators': 500, 'max_depth': 10, 'min_samples_split': 15, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  82%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè | 41/50 [02:14<00:41,  4.56s/it]

[I 2025-11-19 01:08:08,260] Trial 40 finished with value: 0.9989010989010989 and parameters: {'n_estimators': 500, 'max_depth': 25, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 42/50 [02:19<00:36,  4.57s/it]

[I 2025-11-19 01:08:12,842] Trial 41 finished with value: 0.9935510002817693 and parameters: {'n_estimators': 500, 'max_depth': 20, 'min_samples_split': 19, 'min_samples_leaf': 2, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  86%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 43/50 [02:23<00:32,  4.58s/it]

[I 2025-11-19 01:08:17,433] Trial 42 finished with value: 0.9896428571428568 and parameters: {'n_estimators': 500, 'max_depth': 35, 'min_samples_split': 20, 'min_samples_leaf': 3, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 44/50 [02:28<00:27,  4.57s/it]

[I 2025-11-19 01:08:21,987] Trial 43 finished with value: 0.9816614757791227 and parameters: {'n_estimators': 500, 'max_depth': 20, 'min_samples_split': 16, 'min_samples_leaf': 1, 'max_features': 'log2'}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 25. Best value: 0.998901:  90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 45/50 [02:32<00:22,  4.59s/it]

[I 2025-11-19 01:08:26,611] Trial 44 finished with value: 0.9989010989010989 and parameters: {'n_estimators': 500, 'max_depth': 35, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 25 with value: 0.9989010989010989.


Best trial: 45. Best value: 1:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 46/50 [02:37<00:18,  4.66s/it]       

[I 2025-11-19 01:08:31,444] Trial 45 finished with value: 0.9999999999999998 and parameters: {'n_estimators': 400, 'max_depth': 40, 'min_samples_split': 18, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 45 with value: 0.9999999999999998.


Best trial: 45. Best value: 1:  94%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç| 47/50 [02:41<00:13,  4.48s/it]

[I 2025-11-19 01:08:35,491] Trial 46 finished with value: 0.9989010989010989 and parameters: {'n_estimators': 400, 'max_depth': 45, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 45 with value: 0.9999999999999998.


Best trial: 45. Best value: 1:  96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 48/50 [02:43<00:07,  3.74s/it]

[I 2025-11-19 01:08:37,525] Trial 47 finished with value: 0.9956663848971541 and parameters: {'n_estimators': 200, 'max_depth': 45, 'min_samples_split': 17, 'min_samples_leaf': 1, 'max_features': 0.7}. Best is trial 45 with value: 0.9999999999999998.


Best trial: 45. Best value: 1:  98%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä| 49/50 [02:46<00:03,  3.37s/it]

[I 2025-11-19 01:08:40,032] Trial 48 finished with value: 0.9978021978021976 and parameters: {'n_estimators': 250, 'max_depth': 45, 'min_samples_split': 12, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 45 with value: 0.9999999999999998.


Best trial: 45. Best value: 1: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 50/50 [02:51<00:00,  3.42s/it]

[I 2025-11-19 01:08:44,862] Trial 49 finished with value: 0.9989010989010989 and parameters: {'n_estimators': 500, 'max_depth': 30, 'min_samples_split': 20, 'min_samples_leaf': 1, 'max_features': 0.5}. Best is trial 45 with value: 0.9999999999999998.

‚úì Otimiza√ß√£o conclu√≠da!
  Melhor PR-AUC (CV): 1.0000

Melhores hiperpar√¢metros Random Forest:
  n_estimators: 400
  max_depth: 40
  min_samples_split: 18
  min_samples_leaf: 1
  max_features: 0.5





In [8]:
# Combinar todo o dataset (Dev + Test)
X_all_ml = np.vstack([X_dev_ml, X_test_ml])
y_all_ml = np.hstack([y_dev_ml, y_test_ml])

# Treinar o modelo Random Forest final com os melhores hiperpar√¢metros e avaliar com o stratified K-Fold
best_rf_params = study_rf.best_params
best_rf_model = RandomForestClassifier(
    n_estimators=best_rf_params['n_estimators'],
    max_depth=best_rf_params['max_depth'],
    min_samples_split=best_rf_params['min_samples_split'],
    min_samples_leaf=best_rf_params['min_samples_leaf'],
    max_features=best_rf_params['max_features'],
    class_weight=class_weight_rf,
    random_state=RANDOM_SEED,
    n_jobs=-1,
    verbose=0
)

skf = StratifiedKFold(n_splits=OPTUNA_CLASSIC_CV_SPLITS, shuffle=True, random_state=RANDOM_SEED)
pr_auc_scores_final_RF, roc_auc_scores_final_RF, f1_scores_final_RF, precision_scores_final_RF, \
recall_scores_final_RF, accuracy_scores_final_RF, cm_final_RF = [], [], [], [], [], [], []
for train_idx, val_idx in skf.split(X_all_ml, y_all_ml):
    X_train, X_val = X_all_ml[train_idx], X_all_ml[val_idx]
    y_train, y_val = y_all_ml[train_idx], y_all_ml[val_idx]
    
    best_rf_model.fit(X_train, y_train)
    y_pred_prob = best_rf_model.predict_proba(X_val)[:, 1]
    
    pr_auc = average_precision_score(y_val, y_pred_prob)
    roc_auc = roc_auc_score(y_val, y_pred_prob)
    f1 = f1_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    precision = precision_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    recall = recall_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    accuracy = accuracy_score(y_val, (y_pred_prob >= 0.5).astype(int))
    cm = confusion_matrix(y_val, (y_pred_prob >= 0.5).astype(int))

    pr_auc_scores_final_RF.append(pr_auc)
    roc_auc_scores_final_RF.append(roc_auc)
    f1_scores_final_RF.append(f1)
    precision_scores_final_RF.append(precision)
    recall_scores_final_RF.append(recall)
    accuracy_scores_final_RF.append(accuracy)
    cm_final_RF.append(cm)

mean_pr_auc_final_RF = np.mean(pr_auc_scores_final_RF)
mean_roc_auc_final_RF = np.mean(roc_auc_scores_final_RF)
mean_f1_final_RF = np.mean(f1_scores_final_RF)
mean_precision_final_RF = np.mean(precision_scores_final_RF)
mean_recall_final_RF = np.mean(recall_scores_final_RF)
mean_accuracy_final_RF = np.mean(accuracy_scores_final_RF)
total_cm_RF = np.sum(cm_final_RF, axis=0)

print(f"\n‚úì Avalia√ß√£o final Random Forest (CV {OPTUNA_CLASSIC_CV_SPLITS}-fold): PR-AUC = {mean_pr_auc_final_RF:.4f}, ROC-AUC = {mean_roc_auc_final_RF:.4f}, F1 = {mean_f1_final_RF:.4f}, Precision = {mean_precision_final_RF:.4f}, Recall = {mean_recall_final_RF:.4f}, Accuracy = {mean_accuracy_final_RF:.4f}")
print(f"Confusion Matrix acumulada:\n{total_cm_RF}")


‚úì Avalia√ß√£o final Random Forest (CV 5-fold): PR-AUC = 0.9993, ROC-AUC = 0.9999, F1 = 0.9681, Precision = 1.0000, Recall = 0.9397, Accuracy = 0.9950
Confusion Matrix acumulada:
[[918   0]
 [  5  77]]


### 5.2 _SVM_

In [9]:
# 2Ô∏è‚É£ Otimiza√ß√£o SVM com Optuna
def objective_svm(trial):
    """
    Fun√ß√£o objetivo para otimizar SVM usando PR-AUC como m√©trica
    """
    # Sugerir hiperpar√¢metros
    C = trial.suggest_float('C', 0.01, 100, log=True)
    kernel = trial.suggest_categorical('kernel', ['rbf', 'poly', 'sigmoid'])
    
    if kernel == 'poly':
        degree = trial.suggest_int('degree', 2, 5)
    else:
        degree = 3  # valor padr√£o
    
    gamma = trial.suggest_categorical('gamma', ['scale', 'auto'])
    
    # Calcular class weight
    cw = compute_class_weight('balanced', classes=np.unique(y_dev_ml), y=y_dev_ml)
    class_weight = dict(enumerate(cw))
    
    # Modelo
    model = SVC(
        C=C,
        kernel=kernel,
        degree=degree,
        gamma=gamma,
        class_weight=class_weight,
        probability=True,
        random_state=RANDOM_SEED,
        max_iter=2000
    )
    
    # Valida√ß√£o cruzada estratificada
    skf = StratifiedKFold(n_splits=OPTUNA_CLASSIC_CV_SPLITS, shuffle=True, random_state=RANDOM_SEED)
    pr_auc_scores = []
    
    for train_idx, val_idx in skf.split(X_dev_ml, y_dev_ml):
        X_train, X_val = X_dev_ml[train_idx], X_dev_ml[val_idx]
        y_train, y_val = y_dev_ml[train_idx], y_dev_ml[val_idx]
        
        model.fit(X_train, y_train)
        y_pred_prob = model.predict_proba(X_val)[:, 1]
        
        pr_auc = average_precision_score(y_val, y_pred_prob)
        pr_auc_scores.append(pr_auc)
    
    return np.mean(pr_auc_scores)

print(f"\n{'='*70}")
print(f"üîÑ OTIMIZANDO SVM COM OPTUNA")
print(f"{'='*70}")

study_svm = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=RANDOM_SEED, multivariate=True),
    study_name='svm_optimization'
)

study_svm.optimize(objective_svm, n_trials=OPTUNA_CLASSIC_N_TRIALS, show_progress_bar=True)

print(f"\n‚úì Otimiza√ß√£o conclu√≠da!")
print(f"  Melhor PR-AUC (CV): {study_svm.best_value:.4f}")
print(f"\nMelhores hiperpar√¢metros SVM:")
for key, value in study_svm.best_params.items():
    print(f"  {key}: {value}")

[I 2025-11-19 01:08:48,944] A new study created in memory with name: svm_optimization



üîÑ OTIMIZANDO SVM COM OPTUNA


Best trial: 0. Best value: 0.69992:   2%|‚ñè         | 1/50 [00:00<00:11,  4.32it/s]

[I 2025-11-19 01:08:49,176] Trial 0 finished with value: 0.6999203568474335 and parameters: {'C': 0.31489116479568624, 'kernel': 'rbf', 'gamma': 'scale'}. Best is trial 0 with value: 0.6999203568474335.


Best trial: 2. Best value: 0.73783:   6%|‚ñå         | 3/50 [00:00<00:08,  5.48it/s]

[I 2025-11-19 01:08:49,435] Trial 1 finished with value: 0.05366761922915301 and parameters: {'C': 0.017073967431528128, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 0 with value: 0.6999203568474335.
[I 2025-11-19 01:08:49,540] Trial 2 finished with value: 0.7378298571144055 and parameters: {'C': 21.368329072358772, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 2 with value: 0.7378298571144055.


Best trial: 2. Best value: 0.73783:  10%|‚ñà         | 5/50 [00:00<00:07,  6.04it/s]

[I 2025-11-19 01:08:49,680] Trial 3 finished with value: 0.6571136928361231 and parameters: {'C': 0.5342937261279778, 'kernel': 'poly', 'degree': 3, 'gamma': 'auto'}. Best is trial 2 with value: 0.7378298571144055.
[I 2025-11-19 01:08:49,845] Trial 4 finished with value: 0.4065922460101108 and parameters: {'C': 13.826232179369875, 'kernel': 'sigmoid', 'gamma': 'auto'}. Best is trial 2 with value: 0.7378298571144055.


Best trial: 2. Best value: 0.73783:  12%|‚ñà‚ñè        | 6/50 [00:01<00:10,  4.40it/s]

[I 2025-11-19 01:08:50,192] Trial 5 finished with value: 0.17147597186002542 and parameters: {'C': 0.04809461967501574, 'kernel': 'sigmoid', 'gamma': 'scale'}. Best is trial 2 with value: 0.7378298571144055.


Best trial: 2. Best value: 0.73783:  16%|‚ñà‚ñå        | 8/50 [00:01<00:08,  5.14it/s]

[I 2025-11-19 01:08:50,452] Trial 6 finished with value: 0.05355049418430511 and parameters: {'C': 0.024586032763280065, 'kernel': 'rbf', 'gamma': 'scale'}. Best is trial 2 with value: 0.7378298571144055.
[I 2025-11-19 01:08:50,555] Trial 7 finished with value: 0.5704894103937516 and parameters: {'C': 43.37920697490943, 'kernel': 'poly', 'degree': 4, 'gamma': 'scale'}. Best is trial 2 with value: 0.7378298571144055.


Best trial: 2. Best value: 0.73783:  18%|‚ñà‚ñä        | 9/50 [00:01<00:06,  6.03it/s]

[I 2025-11-19 01:08:50,657] Trial 8 finished with value: 0.5704894103937516 and parameters: {'C': 75.56810141274431, 'kernel': 'poly', 'degree': 4, 'gamma': 'scale'}. Best is trial 2 with value: 0.7378298571144055.


Best trial: 10. Best value: 0.740759:  22%|‚ñà‚ñà‚ñè       | 11/50 [00:02<00:07,  5.25it/s]

[I 2025-11-19 01:08:51,008] Trial 9 finished with value: 0.3600391715918882 and parameters: {'C': 0.06080390190296603, 'kernel': 'sigmoid', 'gamma': 'auto'}. Best is trial 2 with value: 0.7378298571144055.
[I 2025-11-19 01:08:51,124] Trial 10 finished with value: 0.7407590126817438 and parameters: {'C': 16.99454838747885, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 10 with value: 0.7407590126817438.


Best trial: 12. Best value: 0.75483:  26%|‚ñà‚ñà‚ñå       | 13/50 [00:02<00:06,  5.76it/s] 

[I 2025-11-19 01:08:51,238] Trial 11 finished with value: 0.7473766314435785 and parameters: {'C': 67.94364327569589, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 11 with value: 0.7473766314435785.
[I 2025-11-19 01:08:51,427] Trial 12 finished with value: 0.754829894961593 and parameters: {'C': 0.7377223791015205, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 12 with value: 0.754829894961593.


Best trial: 14. Best value: 0.765697:  30%|‚ñà‚ñà‚ñà       | 15/50 [00:02<00:06,  5.67it/s]

[I 2025-11-19 01:08:51,595] Trial 13 finished with value: 0.7518904817049991 and parameters: {'C': 1.1858195623063486, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 12 with value: 0.754829894961593.
[I 2025-11-19 01:08:51,779] Trial 14 finished with value: 0.7656968826127857 and parameters: {'C': 0.8732669631317976, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.


Best trial: 14. Best value: 0.765697:  34%|‚ñà‚ñà‚ñà‚ñç      | 17/50 [00:03<00:05,  5.80it/s]

[I 2025-11-19 01:08:52,019] Trial 15 finished with value: 0.6701216648004431 and parameters: {'C': 0.3284615846446105, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.
[I 2025-11-19 01:08:52,139] Trial 16 finished with value: 0.7050252624116122 and parameters: {'C': 4.457701631692534, 'kernel': 'rbf', 'gamma': 'scale'}. Best is trial 14 with value: 0.7656968826127857.


Best trial: 14. Best value: 0.765697:  38%|‚ñà‚ñà‚ñà‚ñä      | 19/50 [00:03<00:05,  5.42it/s]

[I 2025-11-19 01:08:52,334] Trial 17 finished with value: 0.2882230786523798 and parameters: {'C': 1.7830056431907009, 'kernel': 'sigmoid', 'gamma': 'scale'}. Best is trial 14 with value: 0.7656968826127857.
[I 2025-11-19 01:08:52,531] Trial 18 finished with value: 0.19169733823622875 and parameters: {'C': 0.019358455698582574, 'kernel': 'poly', 'degree': 2, 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.


Best trial: 14. Best value: 0.765697:  40%|‚ñà‚ñà‚ñà‚ñà      | 20/50 [00:03<00:04,  6.09it/s]

[I 2025-11-19 01:08:52,647] Trial 19 finished with value: 0.5474414931465998 and parameters: {'C': 0.2914934441601457, 'kernel': 'poly', 'degree': 5, 'gamma': 'scale'}. Best is trial 14 with value: 0.7656968826127857.


Best trial: 14. Best value: 0.765697:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 21/50 [00:03<00:04,  6.09it/s]

[I 2025-11-19 01:08:52,812] Trial 20 finished with value: 0.7469922930796061 and parameters: {'C': 1.2849783346650554, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.


Best trial: 14. Best value: 0.765697:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 23/50 [00:04<00:05,  5.37it/s]

[I 2025-11-19 01:08:53,102] Trial 21 finished with value: 0.5788028506277372 and parameters: {'C': 0.7890976148310083, 'kernel': 'sigmoid', 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.
[I 2025-11-19 01:08:53,252] Trial 22 finished with value: 0.7528134455680512 and parameters: {'C': 1.5732309784936742, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.


Best trial: 14. Best value: 0.765697:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 25/50 [00:04<00:04,  5.55it/s]

[I 2025-11-19 01:08:53,483] Trial 23 finished with value: 0.6027258843132057 and parameters: {'C': 0.21604100162060555, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.
[I 2025-11-19 01:08:53,618] Trial 24 finished with value: 0.7474846894332858 and parameters: {'C': 2.482596793707827, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 14 with value: 0.7656968826127857.


Best trial: 25. Best value: 0.767193:  52%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè    | 26/50 [00:04<00:04,  5.50it/s]

[I 2025-11-19 01:08:53,803] Trial 25 finished with value: 0.7671926091085121 and parameters: {'C': 0.8806857709923557, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  56%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 28/50 [00:05<00:03,  5.77it/s]

[I 2025-11-19 01:08:54,056] Trial 26 finished with value: 0.4307462042411405 and parameters: {'C': 0.1080763221185819, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:54,160] Trial 27 finished with value: 0.6832281567188921 and parameters: {'C': 20.50045492379971, 'kernel': 'poly', 'degree': 2, 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 29/50 [00:05<00:03,  6.40it/s]

[I 2025-11-19 01:08:54,276] Trial 28 finished with value: 0.5255245013965698 and parameters: {'C': 3.1513047134449645, 'kernel': 'poly', 'degree': 5, 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   | 31/50 [00:05<00:03,  5.58it/s]

[I 2025-11-19 01:08:54,503] Trial 29 finished with value: 0.6461923171993582 and parameters: {'C': 0.2561604412553385, 'kernel': 'rbf', 'gamma': 'scale'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:54,687] Trial 30 finished with value: 0.7552786925760502 and parameters: {'C': 0.6896669260054548, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 33/50 [00:06<00:03,  5.37it/s]

[I 2025-11-19 01:08:54,882] Trial 31 finished with value: 0.7509030174364799 and parameters: {'C': 0.6233569256995345, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:55,073] Trial 32 finished with value: 0.7491740299256179 and parameters: {'C': 0.6024697500028565, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 35/50 [00:06<00:02,  5.56it/s]

[I 2025-11-19 01:08:55,234] Trial 33 finished with value: 0.7198550680493663 and parameters: {'C': 1.0026345850666039, 'kernel': 'rbf', 'gamma': 'scale'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:55,416] Trial 34 finished with value: 0.7558101567161697 and parameters: {'C': 0.7491606138790815, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 36/50 [00:06<00:02,  5.43it/s]

[I 2025-11-19 01:08:55,610] Trial 35 finished with value: 0.5742438791566612 and parameters: {'C': 0.16404389696438096, 'kernel': 'poly', 'degree': 3, 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  76%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 38/50 [00:07<00:02,  5.18it/s]

[I 2025-11-19 01:08:55,865] Trial 36 finished with value: 0.5027002595552273 and parameters: {'C': 1.4919964709704883, 'kernel': 'sigmoid', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:56,031] Trial 37 finished with value: 0.25375525719000064 and parameters: {'C': 38.33762010859697, 'kernel': 'sigmoid', 'gamma': 'scale'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 40/50 [00:07<00:01,  5.79it/s]

[I 2025-11-19 01:08:56,158] Trial 38 finished with value: 0.7470431171330469 and parameters: {'C': 3.140214041140249, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:56,329] Trial 39 finished with value: 0.47480765634676186 and parameters: {'C': 0.010867150705486833, 'kernel': 'poly', 'degree': 5, 'gamma': 'scale'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 42/50 [00:07<00:01,  5.32it/s]

[I 2025-11-19 01:08:56,569] Trial 40 finished with value: 0.5916841711168885 and parameters: {'C': 0.20555189744955607, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:56,745] Trial 41 finished with value: 0.7659914988958217 and parameters: {'C': 0.8970146689493657, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 44/50 [00:08<00:01,  6.00it/s]

[I 2025-11-19 01:08:56,917] Trial 42 finished with value: 0.7565211662697034 and parameters: {'C': 1.0742311435006069, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:57,046] Trial 43 finished with value: 0.7483123112099441 and parameters: {'C': 3.0427403787900436, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 46/50 [00:08<00:00,  6.01it/s]

[I 2025-11-19 01:08:57,185] Trial 44 finished with value: 0.6953667648971893 and parameters: {'C': 0.7841010981927086, 'kernel': 'poly', 'degree': 2, 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:57,370] Trial 45 finished with value: 0.7599124957238746 and parameters: {'C': 0.7724087164117297, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 48/50 [00:08<00:00,  5.79it/s]

[I 2025-11-19 01:08:57,523] Trial 46 finished with value: 0.7452208170274434 and parameters: {'C': 2.0904154295218462, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.
[I 2025-11-19 01:08:57,720] Trial 47 finished with value: 0.7490782111102868 and parameters: {'C': 0.6035116226619337, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193:  98%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä| 49/50 [00:08<00:00,  5.84it/s]

[I 2025-11-19 01:08:57,887] Trial 48 finished with value: 0.7539402542176717 and parameters: {'C': 1.1340037347896497, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.


Best trial: 25. Best value: 0.767193: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 50/50 [00:09<00:00,  5.39it/s]

[I 2025-11-19 01:08:58,225] Trial 49 finished with value: 0.6028439927115569 and parameters: {'C': 0.4323397492584797, 'kernel': 'sigmoid', 'gamma': 'auto'}. Best is trial 25 with value: 0.7671926091085121.

‚úì Otimiza√ß√£o conclu√≠da!
  Melhor PR-AUC (CV): 0.7672

Melhores hiperpar√¢metros SVM:
  C: 0.8806857709923557
  kernel: rbf
  gamma: auto





In [10]:
cw_svm = compute_class_weight('balanced', classes=np.unique(y_dev_ml), y=y_dev_ml)
class_weight_svm = dict(enumerate(cw_svm))

# Treinar o modelo SVM final com os melhores hiperpar√¢metros e avaliar com o stratified K-Fold
best_svm_params = study_svm.best_params
best_svm_model = SVC(
    C=best_svm_params['C'],
    kernel=best_svm_params['kernel'],
    degree=best_svm_params.get('degree', 3),
    gamma=best_svm_params['gamma'],
    class_weight=class_weight_svm,
    probability=True,
    random_state=RANDOM_SEED,
    max_iter=2000
)

skf = StratifiedKFold(n_splits=OPTUNA_CLASSIC_CV_SPLITS, shuffle=True, random_state=RANDOM_SEED)
pr_auc_scores_final_SVM, roc_auc_scores_final_SVM, \
f1_scores_final_SVM, precision_scores_final_SVM, recall_scores_final_SVM, \
accuracy_scores_final_SVM, cm_final_SVM = [], [], [], [], [], [], []
for train_idx, val_idx in skf.split(X_all_ml, y_all_ml):
    X_train, X_val = X_all_ml[train_idx], X_all_ml[val_idx]
    y_train, y_val = y_all_ml[train_idx], y_all_ml[val_idx]
    
    best_svm_model.fit(X_train, y_train)
    y_pred_prob = best_svm_model.predict_proba(X_val)[:, 1]

    pr_auc = average_precision_score(y_val, y_pred_prob)
    roc_auc = roc_auc_score(y_val, y_pred_prob)
    f1 = f1_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    precision = precision_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    recall = recall_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    accuracy = accuracy_score(y_val, (y_pred_prob >= 0.5).astype(int))
    cm = confusion_matrix(y_val, (y_pred_prob >= 0.5).astype(int))

    pr_auc_scores_final_SVM.append(pr_auc)
    roc_auc_scores_final_SVM.append(roc_auc)
    f1_scores_final_SVM.append(f1)
    precision_scores_final_SVM.append(precision)
    recall_scores_final_SVM.append(recall)
    accuracy_scores_final_SVM.append(accuracy)
    cm_final_SVM.append(cm)

mean_pr_auc_final_SVM = np.mean(pr_auc_scores_final_SVM)
mean_roc_auc_final_SVM = np.mean(roc_auc_scores_final_SVM)
mean_f1_final_SVM = np.mean(f1_scores_final_SVM)
mean_precision_final_SVM = np.mean(precision_scores_final_SVM)
mean_recall_final_SVM = np.mean(recall_scores_final_SVM)
mean_accuracy_final_SVM = np.mean(accuracy_scores_final_SVM)
total_cm_SVM = sum(cm_final_SVM)

print(f"\n‚úì Avalia√ß√£o final SVM (CV {OPTUNA_CLASSIC_CV_SPLITS}-fold): PR-AUC = {mean_pr_auc_final_SVM:.4f}, ROC-AUC = {mean_roc_auc_final_SVM:.4f}, F1 = {mean_f1_final_SVM:.4f}, Precision = {mean_precision_final_SVM:.4f}, Recall = {mean_recall_final_SVM:.4f}, Accuracy = {mean_accuracy_final_SVM:.4f}")
print(f"Confusion Matrix acumulada:\n{total_cm_SVM}")


‚úì Avalia√ß√£o final SVM (CV 5-fold): PR-AUC = 0.7422, ROC-AUC = 0.9580, F1 = 0.6127, Precision = 0.7236, Recall = 0.5515, Accuracy = 0.9440
Confusion Matrix acumulada:
[[899  19]
 [ 37  45]]


### 5.3 _XGBoost_

In [11]:
# 3Ô∏è‚É£ Otimiza√ß√£o XGBoost com Optuna
def objective_xgboost(trial):
    """
    Fun√ß√£o objetivo para otimizar XGBoost usando PR-AUC como m√©trica
    """
    # Calcular scale_pos_weight para desbalanceamento
    scale_pos_weight = (y_dev_ml == 0).sum() / (y_dev_ml == 1).sum()
    
    # Sugerir hiperpar√¢metros
    n_estimators = trial.suggest_int('n_estimators', 100, 500, step=50)
    max_depth = trial.suggest_int('max_depth', 3, 10)
    learning_rate = trial.suggest_float('learning_rate', 0.01, 0.3, log=True)
    subsample = trial.suggest_float('subsample', 0.6, 1.0)
    colsample_bytree = trial.suggest_float('colsample_bytree', 0.6, 1.0)
    min_child_weight = trial.suggest_int('min_child_weight', 1, 10)
    gamma = trial.suggest_float('gamma', 0.0, 5.0)
    reg_alpha = trial.suggest_float('reg_alpha', 0.0, 10.0)
    reg_lambda = trial.suggest_float('reg_lambda', 0.0, 10.0)
    
    # Modelo
    model = XGBClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        learning_rate=learning_rate,
        subsample=subsample,
        colsample_bytree=colsample_bytree,
        min_child_weight=min_child_weight,
        gamma=gamma,
        reg_alpha=reg_alpha,
        reg_lambda=reg_lambda,
        scale_pos_weight=scale_pos_weight,
        random_state=RANDOM_SEED,
        n_jobs=-1,
        verbosity=0,
        eval_metric='logloss'
    )
    
    # Valida√ß√£o cruzada estratificada
    skf = StratifiedKFold(n_splits=OPTUNA_CLASSIC_CV_SPLITS, shuffle=True, random_state=RANDOM_SEED)
    pr_auc_scores = []
    
    for train_idx, val_idx in skf.split(X_dev_ml, y_dev_ml):
        X_train, X_val = X_dev_ml[train_idx], X_dev_ml[val_idx]
        y_train, y_val = y_dev_ml[train_idx], y_dev_ml[val_idx]
        
        model.fit(X_train, y_train)
        y_pred_prob = model.predict_proba(X_val)[:, 1]
        
        pr_auc = average_precision_score(y_val, y_pred_prob)
        pr_auc_scores.append(pr_auc)
    
    return np.mean(pr_auc_scores)

print(f"\n{'='*70}")
print(f"üîÑ OTIMIZANDO XGBOOST COM OPTUNA")
print(f"{'='*70}")

study_xgb = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=RANDOM_SEED, multivariate=True),
    study_name='xgboost_optimization'
)

study_xgb.optimize(objective_xgboost, n_trials=OPTUNA_CLASSIC_N_TRIALS, show_progress_bar=True)

print(f"\n‚úì Otimiza√ß√£o conclu√≠da!")
print(f"  Melhor PR-AUC (CV): {study_xgb.best_value:.4f}")
print(f"\nMelhores hiperpar√¢metros XGBoost:")
for key, value in study_xgb.best_params.items():
    print(f"  {key}: {value}")

[I 2025-11-19 01:08:58,574] A new study created in memory with name: xgboost_optimization



üîÑ OTIMIZANDO XGBOOST COM OPTUNA


Best trial: 0. Best value: 0.963585:   2%|‚ñè         | 1/50 [01:22<1:07:09, 82.24s/it]

[I 2025-11-19 01:10:20,812] Trial 0 finished with value: 0.9635851105416322 and parameters: {'n_estimators': 250, 'max_depth': 10, 'learning_rate': 0.1205712628744377, 'subsample': 0.8394633936788146, 'colsample_bytree': 0.6624074561769746, 'min_child_weight': 2, 'gamma': 0.2904180608409973, 'reg_alpha': 8.661761457749352, 'reg_lambda': 6.011150117432088}. Best is trial 0 with value: 0.9635851105416322.


Best trial: 1. Best value: 0.969647:   4%|‚ñç         | 2/50 [01:52<41:28, 51.85s/it]  

[I 2025-11-19 01:10:51,392] Trial 1 finished with value: 0.9696470486035704 and parameters: {'n_estimators': 400, 'max_depth': 3, 'learning_rate': 0.2708160864249968, 'subsample': 0.9329770563201687, 'colsample_bytree': 0.6849356442713105, 'min_child_weight': 2, 'gamma': 0.9170225492671691, 'reg_alpha': 3.0424224295953772, 'reg_lambda': 5.247564316322379}. Best is trial 1 with value: 0.9696470486035704.


Best trial: 1. Best value: 0.969647:   6%|‚ñå         | 3/50 [01:53<22:09, 28.30s/it]

[I 2025-11-19 01:10:51,661] Trial 2 finished with value: 0.9628673848574424 and parameters: {'n_estimators': 250, 'max_depth': 5, 'learning_rate': 0.08012737503998542, 'subsample': 0.6557975442608167, 'colsample_bytree': 0.7168578594140873, 'min_child_weight': 4, 'gamma': 2.28034992108518, 'reg_alpha': 7.851759613930136, 'reg_lambda': 1.9967378215835974}. Best is trial 1 with value: 0.9696470486035704.


Best trial: 1. Best value: 0.969647:   8%|‚ñä         | 4/50 [02:08<17:42, 23.10s/it]

[I 2025-11-19 01:11:06,788] Trial 3 finished with value: 0.9660820158102765 and parameters: {'n_estimators': 300, 'max_depth': 7, 'learning_rate': 0.011711509955524094, 'subsample': 0.8430179407605753, 'colsample_bytree': 0.6682096494749166, 'min_child_weight': 1, 'gamma': 4.7444276862666666, 'reg_alpha': 9.656320330745594, 'reg_lambda': 8.08397348116461}. Best is trial 1 with value: 0.9696470486035704.


Best trial: 1. Best value: 0.969647:  10%|‚ñà         | 5/50 [02:08<11:12, 14.94s/it]

[I 2025-11-19 01:11:07,276] Trial 4 finished with value: 0.9638273244680565 and parameters: {'n_estimators': 200, 'max_depth': 3, 'learning_rate': 0.1024932221692416, 'subsample': 0.7760609974958406, 'colsample_bytree': 0.6488152939379115, 'min_child_weight': 5, 'gamma': 0.17194260557609198, 'reg_alpha': 9.093204020787821, 'reg_lambda': 2.587799816000169}. Best is trial 1 with value: 0.9696470486035704.


Best trial: 1. Best value: 0.969647:  12%|‚ñà‚ñè        | 6/50 [02:09<07:18,  9.97s/it]

[I 2025-11-19 01:11:07,605] Trial 5 finished with value: 0.9626167310949919 and parameters: {'n_estimators': 350, 'max_depth': 5, 'learning_rate': 0.05864129169696527, 'subsample': 0.8186841117373118, 'colsample_bytree': 0.6739417822102108, 'min_child_weight': 10, 'gamma': 3.8756641168055728, 'reg_alpha': 9.394989415641891, 'reg_lambda': 8.948273504276488}. Best is trial 1 with value: 0.9696470486035704.


Best trial: 1. Best value: 0.969647:  14%|‚ñà‚ñç        | 7/50 [02:09<05:01,  7.01s/it]

[I 2025-11-19 01:11:08,524] Trial 6 finished with value: 0.9658926218708824 and parameters: {'n_estimators': 350, 'max_depth': 10, 'learning_rate': 0.01351182947645082, 'subsample': 0.6783931449676581, 'colsample_bytree': 0.6180909155642152, 'min_child_weight': 4, 'gamma': 1.9433864484474102, 'reg_alpha': 2.713490317738959, 'reg_lambda': 8.287375091519294}. Best is trial 1 with value: 0.9696470486035704.


Best trial: 1. Best value: 0.969647:  16%|‚ñà‚ñå        | 8/50 [02:10<03:24,  4.88s/it]

[I 2025-11-19 01:11:08,819] Trial 7 finished with value: 0.9647420322644107 and parameters: {'n_estimators': 250, 'max_depth': 5, 'learning_rate': 0.06333268775321842, 'subsample': 0.6563696899899051, 'colsample_bytree': 0.9208787923016158, 'min_child_weight': 1, 'gamma': 4.9344346830025865, 'reg_alpha': 7.722447692966574, 'reg_lambda': 1.987156815341724}. Best is trial 1 with value: 0.9696470486035704.


Best trial: 8. Best value: 0.983078:  18%|‚ñà‚ñä        | 9/50 [02:10<02:21,  3.45s/it]

[I 2025-11-19 01:11:09,147] Trial 8 finished with value: 0.9830775203057811 and parameters: {'n_estimators': 100, 'max_depth': 9, 'learning_rate': 0.11069143219393454, 'subsample': 0.8916028672163949, 'colsample_bytree': 0.9085081386743783, 'min_child_weight': 1, 'gamma': 1.7923286427213632, 'reg_alpha': 1.1586905952512971, 'reg_lambda': 8.631034258755935}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  20%|‚ñà‚ñà        | 10/50 [02:11<01:42,  2.56s/it]

[I 2025-11-19 01:11:09,696] Trial 9 finished with value: 0.9660820158102765 and parameters: {'n_estimators': 350, 'max_depth': 5, 'learning_rate': 0.012413189635294229, 'subsample': 0.7243929286862649, 'colsample_bytree': 0.7300733288106989, 'min_child_weight': 8, 'gamma': 3.1877873567760657, 'reg_alpha': 8.872127425763265, 'reg_lambda': 4.722149251619493}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  22%|‚ñà‚ñà‚ñè       | 11/50 [02:11<01:13,  1.89s/it]

[I 2025-11-19 01:11:10,088] Trial 10 finished with value: 0.9681483434107122 and parameters: {'n_estimators': 150, 'max_depth': 10, 'learning_rate': 0.16062222457346795, 'subsample': 0.8970301918497886, 'colsample_bytree': 0.8152133074567247, 'min_child_weight': 3, 'gamma': 3.098667986279486, 'reg_alpha': 0.5104459902767716, 'reg_lambda': 8.388172179801954}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  24%|‚ñà‚ñà‚ñç       | 12/50 [02:11<00:53,  1.41s/it]

[I 2025-11-19 01:11:10,390] Trial 11 finished with value: 0.9650585270485846 and parameters: {'n_estimators': 350, 'max_depth': 5, 'learning_rate': 0.20045619762565445, 'subsample': 0.8537862441522944, 'colsample_bytree': 0.7122432515455728, 'min_child_weight': 1, 'gamma': 1.6083383012806267, 'reg_alpha': 4.406372246642526, 'reg_lambda': 4.44933109722488}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  26%|‚ñà‚ñà‚ñå       | 13/50 [02:12<00:39,  1.07s/it]

[I 2025-11-19 01:11:10,674] Trial 12 finished with value: 0.9719337606837607 and parameters: {'n_estimators': 100, 'max_depth': 7, 'learning_rate': 0.07911113859751223, 'subsample': 0.9314443289154776, 'colsample_bytree': 0.9961003032759808, 'min_child_weight': 5, 'gamma': 1.0857240367682373, 'reg_alpha': 2.7089097101680286, 'reg_lambda': 8.028857395069258}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  28%|‚ñà‚ñà‚ñä       | 14/50 [02:12<00:30,  1.17it/s]

[I 2025-11-19 01:11:11,040] Trial 13 finished with value: 0.9716224155248547 and parameters: {'n_estimators': 100, 'max_depth': 8, 'learning_rate': 0.07779032076465192, 'subsample': 0.9455864257527243, 'colsample_bytree': 0.9840454394395587, 'min_child_weight': 5, 'gamma': 1.6197360701327814, 'reg_alpha': 1.8761804830249518, 'reg_lambda': 6.0488815071975806}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  30%|‚ñà‚ñà‚ñà       | 15/50 [02:12<00:23,  1.46it/s]

[I 2025-11-19 01:11:11,327] Trial 14 finished with value: 0.9753382510280503 and parameters: {'n_estimators': 100, 'max_depth': 8, 'learning_rate': 0.05521524442070543, 'subsample': 0.735813611277198, 'colsample_bytree': 0.9854352857133014, 'min_child_weight': 1, 'gamma': 3.5950539428176116, 'reg_alpha': 3.1010178026503477, 'reg_lambda': 9.194802119639782}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  32%|‚ñà‚ñà‚ñà‚ñè      | 16/50 [02:13<00:19,  1.78it/s]

[I 2025-11-19 01:11:11,603] Trial 15 finished with value: 0.9665258485624422 and parameters: {'n_estimators': 200, 'max_depth': 8, 'learning_rate': 0.0446227782720138, 'subsample': 0.6565883720691214, 'colsample_bytree': 0.9678260282087661, 'min_child_weight': 1, 'gamma': 3.087958455450406, 'reg_alpha': 5.165742767442894, 'reg_lambda': 6.367123001156491}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  36%|‚ñà‚ñà‚ñà‚ñå      | 18/50 [02:14<00:20,  1.58it/s]

[I 2025-11-19 01:11:13,165] Trial 16 finished with value: 0.9708531433755219 and parameters: {'n_estimators': 200, 'max_depth': 8, 'learning_rate': 0.020248755872466547, 'subsample': 0.8855199285339285, 'colsample_bytree': 0.9529033396905625, 'min_child_weight': 1, 'gamma': 4.441828993227856, 'reg_alpha': 4.846468165976164, 'reg_lambda': 8.500401721385225}. Best is trial 8 with value: 0.9830775203057811.
[I 2025-11-19 01:11:13,268] Trial 17 finished with value: 0.973660321160321 and parameters: {'n_estimators': 100, 'max_depth': 7, 'learning_rate': 0.23075396331577488, 'subsample': 0.71309423714191, 'colsample_bytree': 0.9489305037809493, 'min_child_weight': 2, 'gamma': 2.8202463471659796, 'reg_alpha': 2.2167684060166235, 'reg_lambda': 8.937341179341551}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  38%|‚ñà‚ñà‚ñà‚ñä      | 19/50 [02:15<00:17,  1.81it/s]

[I 2025-11-19 01:11:13,630] Trial 18 finished with value: 0.9710590063531239 and parameters: {'n_estimators': 100, 'max_depth': 8, 'learning_rate': 0.02944307721728956, 'subsample': 0.9204273039814319, 'colsample_bytree': 0.7791580202511157, 'min_child_weight': 2, 'gamma': 0.7335033851780299, 'reg_alpha': 3.8546574818691415, 'reg_lambda': 9.423995778845592}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  40%|‚ñà‚ñà‚ñà‚ñà      | 20/50 [02:16<00:22,  1.34it/s]

[I 2025-11-19 01:11:14,829] Trial 19 finished with value: 0.9768604873387481 and parameters: {'n_estimators': 100, 'max_depth': 9, 'learning_rate': 0.2950112753893747, 'subsample': 0.8961460723850782, 'colsample_bytree': 0.8745526216745758, 'min_child_weight': 2, 'gamma': 0.6446545179827752, 'reg_alpha': 1.8288692461528, 'reg_lambda': 7.628716989035871}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 21/50 [02:16<00:17,  1.62it/s]

[I 2025-11-19 01:11:15,142] Trial 20 finished with value: 0.9672098438645753 and parameters: {'n_estimators': 100, 'max_depth': 9, 'learning_rate': 0.271959262490005, 'subsample': 0.8458098873414628, 'colsample_bytree': 0.8105472363490581, 'min_child_weight': 4, 'gamma': 0.13993376833446014, 'reg_alpha': 4.405883286077703, 'reg_lambda': 7.366878887126411}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  44%|‚ñà‚ñà‚ñà‚ñà‚ñç     | 22/50 [02:16<00:14,  1.99it/s]

[I 2025-11-19 01:11:15,383] Trial 21 finished with value: 0.9768522503137888 and parameters: {'n_estimators': 300, 'max_depth': 8, 'learning_rate': 0.18308958622842947, 'subsample': 0.8820632322610379, 'colsample_bytree': 0.8452612977850306, 'min_child_weight': 1, 'gamma': 0.5346876046695554, 'reg_alpha': 0.532583454728303, 'reg_lambda': 6.030488440119339}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 23/50 [02:17<00:12,  2.11it/s]

[I 2025-11-19 01:11:15,787] Trial 22 finished with value: 0.9801083740674041 and parameters: {'n_estimators': 300, 'max_depth': 8, 'learning_rate': 0.18765794202260766, 'subsample': 0.785129627498889, 'colsample_bytree': 0.9099820031259398, 'min_child_weight': 2, 'gamma': 0.21982421774211525, 'reg_alpha': 1.3192896464213653, 'reg_lambda': 6.106635900097363}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  48%|‚ñà‚ñà‚ñà‚ñà‚ñä     | 24/50 [02:17<00:10,  2.55it/s]

[I 2025-11-19 01:11:15,988] Trial 23 finished with value: 0.9774930714446842 and parameters: {'n_estimators': 100, 'max_depth': 6, 'learning_rate': 0.08750253756752528, 'subsample': 0.8061997275994548, 'colsample_bytree': 0.8459284447088317, 'min_child_weight': 2, 'gamma': 1.0040918330819557, 'reg_alpha': 0.27348184475600656, 'reg_lambda': 9.04534025912788}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 25/50 [02:17<00:09,  2.76it/s]

[I 2025-11-19 01:11:16,284] Trial 24 finished with value: 0.9749210933993542 and parameters: {'n_estimators': 200, 'max_depth': 7, 'learning_rate': 0.08893483720141586, 'subsample': 0.7707958296594007, 'colsample_bytree': 0.8078852132275316, 'min_child_weight': 2, 'gamma': 1.554032136140409, 'reg_alpha': 1.0043191113084449, 'reg_lambda': 9.357656358194314}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  52%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè    | 26/50 [02:18<00:09,  2.61it/s]

[I 2025-11-19 01:11:16,712] Trial 25 finished with value: 0.9746958188202051 and parameters: {'n_estimators': 300, 'max_depth': 7, 'learning_rate': 0.10897388556428503, 'subsample': 0.7813227496959372, 'colsample_bytree': 0.9657868620627084, 'min_child_weight': 1, 'gamma': 0.5958449143853057, 'reg_alpha': 4.2307444721869665, 'reg_lambda': 6.834886975427331}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 27/50 [02:18<00:09,  2.33it/s]

[I 2025-11-19 01:11:17,249] Trial 26 finished with value: 0.9779048157724628 and parameters: {'n_estimators': 300, 'max_depth': 5, 'learning_rate': 0.16144494558754577, 'subsample': 0.7269262816193481, 'colsample_bytree': 0.9205754282615106, 'min_child_weight': 2, 'gamma': 0.4029896104769902, 'reg_alpha': 1.1004705446688734, 'reg_lambda': 1.8615432787644162}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  56%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 28/50 [02:20<00:16,  1.37it/s]

[I 2025-11-19 01:11:18,674] Trial 27 finished with value: 0.9671227895526633 and parameters: {'n_estimators': 300, 'max_depth': 4, 'learning_rate': 0.1516262372417542, 'subsample': 0.7492114490377975, 'colsample_bytree': 0.9044338192079292, 'min_child_weight': 3, 'gamma': 0.028828097235857608, 'reg_alpha': 2.497796462862588, 'reg_lambda': 1.8554167401199637}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 29/50 [02:20<00:13,  1.56it/s]

[I 2025-11-19 01:11:19,110] Trial 28 finished with value: 0.9658926218708828 and parameters: {'n_estimators': 200, 'max_depth': 5, 'learning_rate': 0.11587138913715744, 'subsample': 0.6599900503570315, 'colsample_bytree': 0.7890321849185349, 'min_child_weight': 3, 'gamma': 1.8075530204573051, 'reg_alpha': 1.8327590224808183, 'reg_lambda': 2.6182167740967186}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 30/50 [02:21<00:12,  1.63it/s]

[I 2025-11-19 01:11:19,660] Trial 29 finished with value: 0.9814000490824313 and parameters: {'n_estimators': 250, 'max_depth': 7, 'learning_rate': 0.0695458814251136, 'subsample': 0.8343547453464374, 'colsample_bytree': 0.9289599117523027, 'min_child_weight': 2, 'gamma': 1.7268805036649044, 'reg_alpha': 2.335489653410554, 'reg_lambda': 1.2549606404573281}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 8. Best value: 0.983078:  62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   | 31/50 [02:21<00:11,  1.66it/s]

[I 2025-11-19 01:11:20,236] Trial 30 finished with value: 0.9753163503163502 and parameters: {'n_estimators': 350, 'max_depth': 7, 'learning_rate': 0.09664801371742755, 'subsample': 0.8198363593069941, 'colsample_bytree': 0.910569692925973, 'min_child_weight': 2, 'gamma': 1.3952449142694197, 'reg_alpha': 3.639408999658805, 'reg_lambda': 1.2854335401701669}. Best is trial 8 with value: 0.9830775203057811.


Best trial: 31. Best value: 0.985465:  64%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç   | 32/50 [02:22<00:09,  1.86it/s]

[I 2025-11-19 01:11:20,621] Trial 31 finished with value: 0.9854645354645355 and parameters: {'n_estimators': 200, 'max_depth': 9, 'learning_rate': 0.08460717673292172, 'subsample': 0.9148411603186509, 'colsample_bytree': 0.8874705513948635, 'min_child_weight': 2, 'gamma': 0.4556961022635173, 'reg_alpha': 1.3642495812843543, 'reg_lambda': 0.03182598407445281}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 33/50 [02:22<00:07,  2.23it/s]

[I 2025-11-19 01:11:20,867] Trial 32 finished with value: 0.9730111663480244 and parameters: {'n_estimators': 100, 'max_depth': 9, 'learning_rate': 0.053101504916542745, 'subsample': 0.8434180541119527, 'colsample_bytree': 0.9819199558762307, 'min_child_weight': 3, 'gamma': 0.9327750561635286, 'reg_alpha': 1.4874889266675553, 'reg_lambda': 0.6857116331843413}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  68%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä   | 34/50 [02:23<00:13,  1.21it/s]

[I 2025-11-19 01:11:22,568] Trial 33 finished with value: 0.9808201942984551 and parameters: {'n_estimators': 200, 'max_depth': 9, 'learning_rate': 0.028855892637458785, 'subsample': 0.9955494150612852, 'colsample_bytree': 0.7976901930175428, 'min_child_weight': 1, 'gamma': 0.06970508840003081, 'reg_alpha': 1.15083347778104, 'reg_lambda': 0.29874961220793045}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 35/50 [02:24<00:11,  1.29it/s]

[I 2025-11-19 01:11:23,227] Trial 34 finished with value: 0.971727745353065 and parameters: {'n_estimators': 200, 'max_depth': 9, 'learning_rate': 0.01454220343108931, 'subsample': 0.9573055188345206, 'colsample_bytree': 0.7982540200836477, 'min_child_weight': 3, 'gamma': 1.6665714781127823, 'reg_alpha': 0.868688122902324, 'reg_lambda': 1.0602079872492818}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 36/50 [02:25<00:09,  1.46it/s]

[I 2025-11-19 01:11:23,703] Trial 35 finished with value: 0.9775352514868644 and parameters: {'n_estimators': 150, 'max_depth': 10, 'learning_rate': 0.12657281814311566, 'subsample': 0.9166812745093947, 'colsample_bytree': 0.8027538774909536, 'min_child_weight': 1, 'gamma': 0.7958412510913321, 'reg_alpha': 0.49439267852419666, 'reg_lambda': 0.6001059810895614}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  74%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 37/50 [02:26<00:11,  1.10it/s]

[I 2025-11-19 01:11:25,124] Trial 36 finished with value: 0.981687062937063 and parameters: {'n_estimators': 100, 'max_depth': 10, 'learning_rate': 0.07054691483239175, 'subsample': 0.8904149153923863, 'colsample_bytree': 0.9639238278158525, 'min_child_weight': 1, 'gamma': 1.5093081608514545, 'reg_alpha': 2.6320005215604967, 'reg_lambda': 9.59610534760743}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  78%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 39/50 [02:26<00:05,  1.84it/s]

[I 2025-11-19 01:11:25,350] Trial 37 finished with value: 0.9830775203057811 and parameters: {'n_estimators': 150, 'max_depth': 9, 'learning_rate': 0.05310065866115399, 'subsample': 0.942558036375204, 'colsample_bytree': 0.9735075446625835, 'min_child_weight': 1, 'gamma': 2.62835080057512, 'reg_alpha': 2.0316058841916145, 'reg_lambda': 7.632226814656162}. Best is trial 31 with value: 0.9854645354645355.
[I 2025-11-19 01:11:25,526] Trial 38 finished with value: 0.981282967032967 and parameters: {'n_estimators': 100, 'max_depth': 10, 'learning_rate': 0.08041585931834004, 'subsample': 0.9326304426331652, 'colsample_bytree': 0.9980560996267769, 'min_child_weight': 1, 'gamma': 1.4306352940623068, 'reg_alpha': 1.7896510487398372, 'reg_lambda': 6.896426568240985}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 40/50 [02:27<00:06,  1.66it/s]

[I 2025-11-19 01:11:26,261] Trial 39 finished with value: 0.9721497721497722 and parameters: {'n_estimators': 250, 'max_depth': 9, 'learning_rate': 0.02839342358765508, 'subsample': 0.9047058969459978, 'colsample_bytree': 0.9502510427666174, 'min_child_weight': 4, 'gamma': 3.2204777081193536, 'reg_alpha': 0.7571358275630526, 'reg_lambda': 8.246208109317516}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  82%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè | 41/50 [02:28<00:05,  1.79it/s]

[I 2025-11-19 01:11:26,714] Trial 40 finished with value: 0.974472782119841 and parameters: {'n_estimators': 400, 'max_depth': 10, 'learning_rate': 0.051016493510423026, 'subsample': 0.9303379654171416, 'colsample_bytree': 0.9014601732497607, 'min_child_weight': 5, 'gamma': 0.41948061895220756, 'reg_alpha': 1.628448471744908, 'reg_lambda': 0.85911655880044}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 42/50 [02:28<00:04,  1.92it/s]

[I 2025-11-19 01:11:27,153] Trial 41 finished with value: 0.9741443278943279 and parameters: {'n_estimators': 100, 'max_depth': 10, 'learning_rate': 0.07679158982468737, 'subsample': 0.909037183851878, 'colsample_bytree': 0.9220580362743752, 'min_child_weight': 2, 'gamma': 0.9058785062842112, 'reg_alpha': 6.041689199060313, 'reg_lambda': 8.452533028498557}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  86%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 43/50 [02:28<00:03,  2.14it/s]

[I 2025-11-19 01:11:27,494] Trial 42 finished with value: 0.9733801247771836 and parameters: {'n_estimators': 250, 'max_depth': 4, 'learning_rate': 0.0415275180243528, 'subsample': 0.8927282149596488, 'colsample_bytree': 0.8937128801719718, 'min_child_weight': 4, 'gamma': 2.050822966033963, 'reg_alpha': 2.224833950875645, 'reg_lambda': 0.1873774347972461}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 44/50 [02:29<00:03,  1.89it/s]

[I 2025-11-19 01:11:28,173] Trial 43 finished with value: 0.971059006353124 and parameters: {'n_estimators': 250, 'max_depth': 7, 'learning_rate': 0.03395717092255177, 'subsample': 0.8403937603719731, 'colsample_bytree': 0.9336680277718761, 'min_child_weight': 1, 'gamma': 3.5998190927229716, 'reg_alpha': 4.37223339366934, 'reg_lambda': 1.4750019846855598}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 45/50 [02:29<00:02,  2.13it/s]

[I 2025-11-19 01:11:28,495] Trial 44 finished with value: 0.9828881263663872 and parameters: {'n_estimators': 150, 'max_depth': 6, 'learning_rate': 0.02882341551430925, 'subsample': 0.9746384851965857, 'colsample_bytree': 0.9269823715774226, 'min_child_weight': 1, 'gamma': 2.640381987426918, 'reg_alpha': 2.1633831361752436, 'reg_lambda': 5.756608843859562}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 46/50 [02:30<00:01,  2.30it/s]

[I 2025-11-19 01:11:28,856] Trial 45 finished with value: 0.9741443278943279 and parameters: {'n_estimators': 200, 'max_depth': 5, 'learning_rate': 0.04970248022099943, 'subsample': 0.9113037924464353, 'colsample_bytree': 0.9368293931316036, 'min_child_weight': 2, 'gamma': 2.497830856158082, 'reg_alpha': 4.3077830870783504, 'reg_lambda': 5.37740349175307}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  94%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç| 47/50 [02:31<00:02,  1.36it/s]

[I 2025-11-19 01:11:30,296] Trial 46 finished with value: 0.9841764214046822 and parameters: {'n_estimators': 200, 'max_depth': 6, 'learning_rate': 0.03800167965115237, 'subsample': 0.8973690744652999, 'colsample_bytree': 0.9217689342166555, 'min_child_weight': 1, 'gamma': 1.9676042916566454, 'reg_alpha': 0.30488363483275815, 'reg_lambda': 6.758797422702445}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 48/50 [02:32<00:01,  1.64it/s]

[I 2025-11-19 01:11:30,604] Trial 47 finished with value: 0.9639181108746326 and parameters: {'n_estimators': 150, 'max_depth': 6, 'learning_rate': 0.013488971777329106, 'subsample': 0.936575875018079, 'colsample_bytree': 0.7964967008336085, 'min_child_weight': 3, 'gamma': 3.598307240496079, 'reg_alpha': 2.2674600348108878, 'reg_lambda': 6.238123587198323}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465:  98%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä| 49/50 [02:32<00:00,  1.87it/s]

[I 2025-11-19 01:11:30,963] Trial 48 finished with value: 0.9819939958231814 and parameters: {'n_estimators': 150, 'max_depth': 8, 'learning_rate': 0.04780032705694896, 'subsample': 0.9854926857104741, 'colsample_bytree': 0.9796522495209655, 'min_child_weight': 1, 'gamma': 3.1033546697435215, 'reg_alpha': 3.5033292462591077, 'reg_lambda': 5.005132318044005}. Best is trial 31 with value: 0.9854645354645355.


Best trial: 31. Best value: 0.985465: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 50/50 [02:33<00:00,  3.06s/it]

[I 2025-11-19 01:11:31,670] Trial 49 finished with value: 0.9671671465789113 and parameters: {'n_estimators': 150, 'max_depth': 7, 'learning_rate': 0.010061485273576346, 'subsample': 0.8809487866559865, 'colsample_bytree': 0.9879972258007352, 'min_child_weight': 1, 'gamma': 2.566718027378215, 'reg_alpha': 2.086324068021754, 'reg_lambda': 6.64068967878981}. Best is trial 31 with value: 0.9854645354645355.

‚úì Otimiza√ß√£o conclu√≠da!
  Melhor PR-AUC (CV): 0.9855

Melhores hiperpar√¢metros XGBoost:
  n_estimators: 200
  max_depth: 9
  learning_rate: 0.08460717673292172
  subsample: 0.9148411603186509
  colsample_bytree: 0.8874705513948635
  min_child_weight: 2
  gamma: 0.4556961022635173
  reg_alpha: 1.3642495812843543
  reg_lambda: 0.03182598407445281





In [12]:
scale_pos_weight = (y_dev_ml == 0).sum() / (y_dev_ml == 1).sum()

# Treinar o modelo XGBoost final com os melhores hiperpar√¢metros e avaliar com o stratified K-Fold
best_xgb_params = study_xgb.best_params
best_xgb_model = XGBClassifier(
    n_estimators=best_xgb_params['n_estimators'],
    max_depth=best_xgb_params['max_depth'],
    learning_rate=best_xgb_params['learning_rate'],
    subsample=best_xgb_params['subsample'],
    colsample_bytree=best_xgb_params['colsample_bytree'],
    min_child_weight=best_xgb_params['min_child_weight'],
    gamma=best_xgb_params['gamma'],
    reg_alpha=best_xgb_params['reg_alpha'],
    reg_lambda=best_xgb_params['reg_lambda'],
    scale_pos_weight=scale_pos_weight,
    random_state=RANDOM_SEED,
    n_jobs=-1,
    verbosity=0,
    eval_metric='logloss'
)

skf = StratifiedKFold(n_splits=OPTUNA_CLASSIC_CV_SPLITS, shuffle=True, random_state=RANDOM_SEED)
pr_auc_scores_final_XGB, roc_auc_scores_final_XGB, f1_scores_final_XGB, precision_scores_final_XGB, \
recall_scores_final_XGB, accuracy_scores_final_XGB, cm_final_XGB = [], [], [], [], [], [], []
for train_idx, val_idx in skf.split(X_all_ml, y_all_ml):
    X_train, X_val = X_all_ml[train_idx], X_all_ml[val_idx]
    y_train, y_val = y_all_ml[train_idx], y_all_ml[val_idx]
    
    best_xgb_model.fit(X_train, y_train)
    y_pred_prob = best_xgb_model.predict_proba(X_val)[:, 1]

    pr_auc = average_precision_score(y_val, y_pred_prob)
    roc_auc = roc_auc_score(y_val, y_pred_prob)
    f1 = f1_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    precision = precision_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    recall = recall_score(y_val, (y_pred_prob >= 0.5).astype(int), zero_division=0)
    accuracy = accuracy_score(y_val, (y_pred_prob >= 0.5).astype(int))
    cm = confusion_matrix(y_val, (y_pred_prob >= 0.5).astype(int))

    pr_auc_scores_final_XGB.append(pr_auc)
    roc_auc_scores_final_XGB.append(roc_auc)
    f1_scores_final_XGB.append(f1)
    precision_scores_final_XGB.append(precision)
    recall_scores_final_XGB.append(recall)
    accuracy_scores_final_XGB.append(accuracy)
    cm_final_XGB.append(cm)

mean_pr_auc_final_XGB = np.mean(pr_auc_scores_final_XGB)
mean_roc_auc_final_XGB = np.mean(roc_auc_scores_final_XGB)
mean_f1_final_XGB = np.mean(f1_scores_final_XGB)
mean_precision_final_XGB = np.mean(precision_scores_final_XGB)
mean_recall_final_XGB = np.mean(recall_scores_final_XGB)
mean_accuracy_final_XGB = np.mean(accuracy_scores_final_XGB)
total_cm_XGB = np.sum(cm_final_XGB, axis=0)

print(f"\n‚úì Avalia√ß√£o final XGBoost (CV {OPTUNA_CLASSIC_CV_SPLITS}-fold): PR-AUC = {mean_pr_auc_final_XGB:.4f}, ROC-AUC = {mean_roc_auc_final_XGB:.4f}, F1 = {mean_f1_final_XGB:.4f}, Precision = {mean_precision_final_XGB:.4f}, Recall = {mean_recall_final_XGB:.4f}, Accuracy = {mean_accuracy_final_XGB:.4f}")
print(f"Confusion Matrix acumulada:\n{total_cm_XGB}")


‚úì Avalia√ß√£o final XGBoost (CV 5-fold): PR-AUC = 0.9911, ROC-AUC = 0.9986, F1 = 0.9806, Precision = 1.0000, Recall = 0.9632, Accuracy = 0.9970
Confusion Matrix acumulada:
[[918   0]
 [  3  79]]


### 5.4 Compara√ß√£o entre os m√©todos cl√°ssicos


In [13]:
print(f"\n{'='*70}")
print(f"RESUMO COMPARATIVO DOS MODELOS CL√ÅSSICOS")
print(f"{'='*70}")

models_summary = pd.DataFrame({
    'Model': ['Random Forest', 'SVM', 'XGBoost'],
    'PR-AUC (CV)': [mean_pr_auc_final_RF, mean_pr_auc_final_SVM, mean_pr_auc_final_XGB],
    'ROC-AUC (CV)': [mean_roc_auc_final_RF, mean_roc_auc_final_SVM, mean_roc_auc_final_XGB],
    'F1-Score (CV)': [mean_f1_final_RF, mean_f1_final_SVM, mean_f1_final_XGB],
    'Precision (CV)': [mean_precision_final_RF, mean_precision_final_SVM, mean_precision_final_XGB],
    'Recall (CV)': [mean_recall_final_RF, mean_recall_final_SVM, mean_recall_final_XGB],
    'Accuracy (CV)': [mean_accuracy_final_RF, mean_accuracy_final_SVM, mean_accuracy_final_XGB]
})

print(models_summary.to_string(index=False))

# Imprimir as matrizes de confus√£o finais
print(f"\nMatrizes de Confus√£o Finais:")
print(f"\nRandom Forest:\n{total_cm_RF}")
print(f"\nSVM:\n{total_cm_SVM}")
print(f"\nXGBoost:\n{total_cm_XGB}")



RESUMO COMPARATIVO DOS MODELOS CL√ÅSSICOS
        Model  PR-AUC (CV)  ROC-AUC (CV)  F1-Score (CV)  Precision (CV)  Recall (CV)  Accuracy (CV)
Random Forest     0.999346      0.999936       0.968106        1.000000     0.939706          0.995
          SVM     0.742226      0.957998       0.612738        0.723571     0.551471          0.944
      XGBoost     0.991076      0.998592       0.980606        1.000000     0.963235          0.997

Matrizes de Confus√£o Finais:

Random Forest:
[[918   0]
 [  5  77]]

SVM:
[[899  19]
 [ 37  45]]

XGBoost:
[[918   0]
 [  3  79]]
