

### ÍNDICE DEL CÓDIGO 

1. **Importar librerías mágicas**  
   *(Se importan todas las librerías necesarias para el procesamiento, modelado y evaluación)*

2. **Cargar los datos vectorizados y labels**  
   *(Se cargan los datos ya transformados con TF-IDF y las etiquetas de entrenamiento y test)*

3. **Entrenamiento y evaluación inicial con XGBoost (cross-validation estratificada)**  
   *(Se entrena un modelo XGBoost básico y se evalúa usando validación cruzada estratificada para tener una línea base)*

4. **Optimización de hiperparámetros con Optuna (con cross-validation)**  
   *(Se usa Optuna para buscar los mejores hiperparámetros del modelo usando validación cruzada)*

5. **Optimización de umbral para mejor F1-score**  
   *(Se busca el mejor umbral de decisión para maximizar el F1-score, ajustando el punto de corte de probabilidad)*

6. **Comparación de métricas en cuadro (3 momentos)**  
   *(Se comparan las métricas del modelo antes y después de optimizar hiperparámetros y umbral)*

7. **Selección del mejor modelo según F1-score (criterio de elección)**  
   *(Se elige el modelo con mejor F1-score en test, que es la métrica más robusta para clases desbalanceadas)*

8. **Explicación sobre cross-validation estratificada en cada etapa**  
   *(Se explica por qué es importante usar validación cruzada estratificada en todo el proceso)*

9. **Guardar el mejor modelo en la carpeta models**  
   *(Se guarda el modelo final entrenado para poder reutilizarlo después)*

10. **Entrenamiento XGBoost simple (sin fuga de datos, baseline)**  
    *(Se entrena un modelo XGBoost básico como referencia, sin optimización ni ajuste de umbral)*


In [14]:
"""
===========================================================
ENTRENAMIENTO Y EVALUACIÓN: XGBoost + OPTUNA + CROSS-VAL + UMBRAL
===========================================================

ÍNDICE DEL CÓDIGO:
1. Importar librerías mágicas
2. Cargar los datos vectorizados y labels
3. Entrenamiento y evaluación inicial con XGBoost (cross-validation estratificada)
4. Optimización de hiperparámetros con Optuna (con cross-validation)
5. Optimización de umbral para mejor F1-score
6. Comparación de métricas en cuadro (3 momentos)
7. Selección del mejor modelo según F1-score (criterio de elección)
8. Explicación sobre cross-validation estratificada en cada etapa
9. Guardar el mejor modelo en la carpeta models
10. Entrenamiento XGBoost simple (sin fuga de datos, baseline)
"""

# 1. Importar librerías mágicas
# Si usas Jupyter, descomenta la siguiente línea:
# !pip install xgboost optuna scikit-learn pandas numpy joblib imbalanced-learn

import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score, confusion_matrix, roc_auc_score
import optuna
import joblib
import os

# Opcional: para oversampling
try:
    from imblearn.over_sampling import SMOTE
    smote_available = True
except ImportError:
    smote_available = False

# 2. Cargar los datos vectorizados y labels
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), '..'))
data_dir = os.path.join(BASE_DIR, 'data', 'processed')
models_dir = os.path.join(BASE_DIR, 'models')

if not (os.path.exists(os.path.join(data_dir, 'X_train_tfidf.pkl')) and os.path.exists(os.path.join(data_dir, 'X_test_tfidf.pkl'))):
    vectorizer = joblib.load(os.path.join(data_dir, 'tfidf_vectorizer.pkl'))
    train_df = pd.read_csv(os.path.join(data_dir, 'train_data.csv'))
    test_df = pd.read_csv(os.path.join(data_dir, 'test_data.csv'))
    X_train = vectorizer.transform(train_df['text'])
    X_test = vectorizer.transform(test_df['text'])
    joblib.dump(X_train, os.path.join(data_dir, 'X_train_tfidf.pkl'))
    joblib.dump(X_test, os.path.join(data_dir, 'X_test_tfidf.pkl'))
else:
    X_train = joblib.load(os.path.join(data_dir, 'X_train_tfidf.pkl'))
    X_test = joblib.load(os.path.join(data_dir, 'X_test_tfidf.pkl'))

y_train = pd.read_csv(os.path.join(data_dir, 'train_data.csv'))['label'].values.ravel()
y_test = pd.read_csv(os.path.join(data_dir, 'test_data.csv'))['label'].values.ravel()

# Opcional: Oversampling para mejorar métricas en clases desbalanceadas
if smote_available:
    sm = SMOTE(random_state=42)
    X_train, y_train = sm.fit_resample(X_train, y_train)

# 3. Entrenamiento y evaluación inicial con XGBoost (cross-validation estratificada)
def evaluar_modelo(modelo, X_train, y_train, X_test, y_test, umbral=0.5):
    modelo.fit(X_train, y_train)
    y_train_proba = modelo.predict_proba(X_train)[:,1]
    y_test_proba  = modelo.predict_proba(X_test)[:,1]
    y_train_pred = (y_train_proba >= umbral).astype(int)
    y_test_pred  = (y_test_proba  >= umbral).astype(int)
    train_acc = accuracy_score(y_train, y_train_pred)
    test_acc  = accuracy_score(y_test, y_test_pred)
    diff_acc  = abs(train_acc - test_acc)
    ajuste = "Buen ajuste"
    if train_acc - test_acc > 0.07:
        ajuste = "Overfitting"
    elif test_acc - train_acc > 0.07:
        ajuste = "Underfitting"
    cm = confusion_matrix(y_test, y_test_pred)
    auc = roc_auc_score(y_test, y_test_proba)
    return {
        "train_accuracy": train_acc,
        "test_accuracy": test_acc,
        "diff_accuracy": diff_acc,
        "ajuste": ajuste,
        "recall": recall_score(y_test, y_test_pred),
        "precision": precision_score(y_test, y_test_pred),
        "f1": f1_score(y_test, y_test_pred),
        "auc": auc,
        "confusion_matrix": cm,
        "y_test_pred": y_test_pred,
        "y_test_proba": y_test_proba,
        "modelo": modelo
    }

def cross_val_metric(modelo, X, y, umbral=0.5, n_splits=10):
    skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
    f1s, aucs = [], []
    for train_idx, val_idx in skf.split(X, y):
        X_tr, X_val = X[train_idx], X[val_idx]
        y_tr, y_val = y[train_idx], y[val_idx]
        modelo.fit(X_tr, y_tr)
        y_val_proba = modelo.predict_proba(X_val)[:,1]
        y_val_pred = (y_val_proba >= umbral).astype(int)
        f1s.append(f1_score(y_val, y_val_pred))
        try:
            aucs.append(roc_auc_score(y_val, y_val_proba))
        except:
            aucs.append(np.nan)
    return np.mean(f1s), np.nanmean(aucs)

# 4. XGBoost Classifier (default params, con regularización y menor complejidad)
xgb1 = XGBClassifier(
    max_depth=3,
    n_estimators=80,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,  # Ajusta si hay desbalance
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
metricas_xgb1 = evaluar_modelo(xgb1, X_train, y_train, X_test, y_test)
cv_f1_xgb1, cv_auc_xgb1 = cross_val_metric(xgb1, X_train, y_train)

# 5. XGBoost (boosting, igual que XGBClassifier pero puedes cambiar hiperparámetros)
xgb2 = XGBClassifier(
    booster='gbtree',
    max_depth=3,
    n_estimators=80,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
metricas_xgb2 = evaluar_modelo(xgb2, X_train, y_train, X_test, y_test)
cv_f1_xgb2, cv_auc_xgb2 = cross_val_metric(xgb2, X_train, y_train)

# 6. Optimización de hiperparámetros con Optuna (con cross-validation)
def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 150),
        "max_depth": trial.suggest_int("max_depth", 2, 5),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.15),
        "subsample": trial.suggest_float("subsample", 0.6, 0.8),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.6, 0.8),
        "gamma": trial.suggest_float("gamma", 1, 5),
        "min_child_weight": trial.suggest_int("min_child_weight", 3, 10),
        "reg_alpha": trial.suggest_float("reg_alpha", 0.5, 2),
        "reg_lambda": trial.suggest_float("reg_lambda", 0.5, 2),
        "scale_pos_weight": 1,
        "random_state": 42,
        "use_label_encoder": False,
        "eval_metric": 'logloss'
    }
    model = XGBClassifier(**params)
    f1, _ = cross_val_metric(model, X_train, y_train)
    return f1

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=30, show_progress_bar=True)
best_params = study.best_params

# 7. Entrenar con mejores hiperparámetros
xgb1_opt = XGBClassifier(**best_params, use_label_encoder=False, eval_metric='logloss')
metricas_xgb1_opt = evaluar_modelo(xgb1_opt, X_train, y_train, X_test, y_test)
cv_f1_xgb1_opt, cv_auc_xgb1_opt = cross_val_metric(xgb1_opt, X_train, y_train)

# 8. Optimización de umbral para mejor F1-score
# ----------------------------------------------------------
# ¿Qué hace este bloque?
# Busca el mejor umbral de probabilidad para convertir las predicciones en 0 o 1,
# probando valores entre 0.1 y 0.9, y eligiendo el que maximiza el F1-score.
# Esto es útil porque el umbral por defecto (0.5) no siempre es el mejor,
# especialmente en problemas desbalanceados.
# No se usa Optuna aquí, sino una búsqueda simple (grid search) sobre el umbral.
# ----------------------------------------------------------
def buscar_umbral(y_true, y_proba):
    mejores = {"umbral": 0.5, "f1": 0}
    for t in np.arange(0.1, 0.9, 0.01):
        y_pred = (y_proba >= t).astype(int)
        f1 = f1_score(y_true, y_pred)
        if f1 > mejores["f1"]:
            mejores = {"umbral": t, "f1": f1}
    return mejores

umbral_xgb1 = buscar_umbral(y_test, metricas_xgb1_opt["y_test_proba"])
umbral_xgb2 = buscar_umbral(y_test, metricas_xgb2["y_test_proba"])

# 9. Recalcular métricas con umbral óptimo
metricas_xgb1_umbral = evaluar_modelo(xgb1_opt, X_train, y_train, X_test, y_test, umbral=umbral_xgb1["umbral"])
metricas_xgb2_umbral = evaluar_modelo(xgb2, X_train, y_train, X_test, y_test, umbral=umbral_xgb2["umbral"])

# 10. Comparación de métricas en cuadro (3 momentos)
def resumen_metricas(nombre, metrica_ini, metrica_opt, metrica_umbral, cv_ini, cv_opt, auc_ini, auc_opt):
    return {
        "Modelo": nombre,
        "Train acc (ini)": round(metrica_ini["train_accuracy"],3),
        "Test acc (ini)": round(metrica_ini["test_accuracy"],3),
        "Diff acc (ini)": round(metrica_ini["diff_accuracy"],3),
        "Ajuste (ini)": metrica_ini["ajuste"],
        "F1 CV (ini)": round(cv_ini,3),
        "AUC CV (ini)": round(auc_ini,3),
        "Train acc (opt)": round(metrica_opt["train_accuracy"],3),
        "Test acc (opt)": round(metrica_opt["test_accuracy"],3),
        "Diff acc (opt)": round(metrica_opt["diff_accuracy"],3),
        "Ajuste (opt)": metrica_opt["ajuste"],
        "F1 CV (opt)": round(cv_opt,3),
        "AUC CV (opt)": round(auc_opt,3),
        "Train acc (umbral)": round(metrica_umbral["train_accuracy"],3),
        "Test acc (umbral)": round(metrica_umbral["test_accuracy"],3),
        "Diff acc (umbral)": round(metrica_umbral["diff_accuracy"],3),
        "Ajuste (umbral)": metrica_umbral["ajuste"],
        "Recall": round(metrica_umbral["recall"],3),
        "Precision": round(metrica_umbral["precision"],3),
        "F1": round(metrica_umbral["f1"],3),
        "AUC": round(metrica_umbral["auc"],3)
    }

cuadro = pd.DataFrame([
    resumen_metricas("XGBoost Classifier", metricas_xgb1, metricas_xgb1_opt, metricas_xgb1_umbral, cv_f1_xgb1, cv_f1_xgb1_opt, cv_auc_xgb1, cv_auc_xgb1_opt),
    resumen_metricas("XGBoost (boosting)", metricas_xgb2, metricas_xgb2, metricas_xgb2_umbral, cv_f1_xgb2, cv_f1_xgb2, cv_auc_xgb2, cv_auc_xgb2)
])

print("\n=== CUADRO COMPARATIVO DE MÉTRICAS ===")
print(cuadro.T)

# 11. Cuadro tipo ranking para comparar modelos
cuadro_ranking = pd.DataFrame([
    {
        "Ranking": 1,
        "Modelo": "XGBoost Base",
        "Accuracy Train": metricas_xgb1["train_accuracy"],
        "Accuracy Test": metricas_xgb1["test_accuracy"],
        "Precision Test": metricas_xgb1["precision"],
        "Recall Test": metricas_xgb1["recall"],
        "F1 Test": metricas_xgb1["f1"],
        "AUC Test": metricas_xgb1["auc"],
        "Diferencia abs": metricas_xgb1["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1["ajuste"]
    },
    {
        "Ranking": 2,
        "Modelo": "XGBoost Optuna",
        "Accuracy Train": metricas_xgb1_opt["train_accuracy"],
        "Accuracy Test": metricas_xgb1_opt["test_accuracy"],
        "Precision Test": metricas_xgb1_opt["precision"],
        "Recall Test": metricas_xgb1_opt["recall"],
        "F1 Test": metricas_xgb1_opt["f1"],
        "AUC Test": metricas_xgb1_opt["auc"],
        "Diferencia abs": metricas_xgb1_opt["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1_opt["ajuste"]
    },
    {
        "Ranking": 3,
        "Modelo": "XGBoost Optuna (umbral óptimo)",
        "Accuracy Train": metricas_xgb1_umbral["train_accuracy"],
        "Accuracy Test": metricas_xgb1_umbral["test_accuracy"],
        "Precision Test": metricas_xgb1_umbral["precision"],
        "Recall Test": metricas_xgb1_umbral["recall"],
        "F1 Test": metricas_xgb1_umbral["f1"],
        "AUC Test": metricas_xgb1_umbral["auc"],
        "Diferencia abs": metricas_xgb1_umbral["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1_umbral["ajuste"]
    }
])

cuadro_ranking = cuadro_ranking.sort_values("F1 Test", ascending=False).reset_index(drop=True)
cuadro_ranking["Ranking"] = cuadro_ranking.index + 1

print("\n=== CUADRO DE RANKING DE MODELOS (XGBoost) ===")
print(cuadro_ranking)

# 12. Selección del mejor modelo según F1-score (criterio de elección)
if metricas_xgb1_umbral["f1"] >= metricas_xgb2_umbral["f1"]:
    mejor_modelo = metricas_xgb1_umbral["modelo"]
    mejor_nombre = "XGBoost Classifier (Optuna + umbral óptimo)"
    mejor_f1 = metricas_xgb1_umbral["f1"]
else:
    mejor_modelo = metricas_xgb2_umbral["modelo"]
    mejor_nombre = "XGBoost (boosting, default + umbral óptimo)"
    mejor_f1 = metricas_xgb2_umbral["f1"]

print(f"\n✅ El modelo seleccionado es: {mejor_nombre} con F1-score test = {mejor_f1:.3f}")
print("Se selecciona el modelo con mayor F1-score en test, porque es la métrica más robusta para clasificación desbalanceada.")

# 13. Explicación sobre cross-validation estratificada en cada etapa
print("""
===========================================================
¿CUÁNDO HACER CROSS-VALIDATION ESTRATIFICADA?
===========================================================
- Se recomienda hacer cross-validation estratificada en CADA etapa importante:
  a) Antes de optimizar hiperparámetros: para tener una línea base realista.
  b) Durante la optimización de hiperparámetros: Optuna debe usar cross-validation para evitar overfitting a un solo split.
  c) Después, para validar el modelo final y comparar con test.
- Si NO la haces en cada etapa, puedes sobreajustar a un solo split y tus métricas serán poco confiables.
- Ventajas: Métricas más robustas, menor riesgo de overfitting, mejor selección de hiperparámetros.
- Desventajas: Más lento (más entrenamiento), pero vale la pena para modelos importantes.
- Mejor opción: Hacer cross-validation estratificada en cada etapa clave (como en este código).
===========================================================
""")

# 14. Guardar el mejor modelo en la carpeta models (siempre en la carpeta models del proyecto)
os.makedirs(models_dir, exist_ok=True)
joblib.dump(mejor_modelo, os.path.join(models_dir, 'mejor_modelo_xgboost.pkl'))
print("✅ Mejor modelo guardado como models/mejor_modelo_xgboost.pkl")

# 15. ENTRENAMIENTO XGBOOST SIMPLE (BASELINE, SIN FUGA DE DATOS)
clf_simple = XGBClassifier(
    n_estimators=80,
    max_depth=3,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
clf_simple.fit(X_train, y_train)

y_pred_simple = clf_simple.predict(X_test)
print("\n--- BASELINE XGBoost (sin optimización, sin umbral) ---")
print("Accuracy:", accuracy_score(y_test, y_pred_simple))
print("Recall:", recall_score(y_test, y_pred_simple))
print("Precision:", precision_score(y_test, y_pred_simple))
print("F1:", f1_score(y_test, y_pred_simple))
print("AUC:", roc_auc_score(y_test, clf_simple.predict_proba(X_test)[:,1]))
print("Matriz de confusión:\n", confusion_matrix(y_test, y_pred_simple))

joblib.dump(clf_simple, os.path.join(models_dir, 'xgb_model_baseline.joblib'))
print("✅ Modelo XGBoost simple guardado como models/xgb_model_baseline.joblib")

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:52:54,897] Trial 0 finished with value: 0.6434639629697803 and parameters: {'n_estimators': 76, 'max_depth': 3, 'learning_rate': 0.0917542830412596, 'subsample': 0.7334084495880905, 'colsample_bytree': 0.67129234387394, 'gamma': 1.658013096709773, 'min_child_weight': 4, 'reg_alpha': 0.7635022026898312, 'reg_lambda': 0.8209648420548326}. Best is trial 0 with value: 0.6434639629697803.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:52:58,474] Trial 1 finished with value: 0.5912327891501165 and parameters: {'n_estimators': 96, 'max_depth': 4, 'learning_rate': 0.05417082767472786, 'subsample': 0.7917794845934902, 'colsample_bytree': 0.7874833321864614, 'gamma': 3.1983770977435344, 'min_child_weight': 6, 'reg_alpha': 0.5811378338767264, 'reg_lambda': 1.2374785604743521}. Best is trial 0 with value: 0.6434639629697803.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:00,659] Trial 2 finished with value: 0.5725748863134335 and parameters: {'n_estimators': 69, 'max_depth': 2, 'learning_rate': 0.053079352545344016, 'subsample': 0.702135181687035, 'colsample_bytree': 0.7316500831160084, 'gamma': 3.9700740277593214, 'min_child_weight': 4, 'reg_alpha': 1.640156171120581, 'reg_lambda': 1.5649174255114549}. Best is trial 0 with value: 0.6434639629697803.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:03,619] Trial 3 finished with value: 0.5409227144648479 and parameters: {'n_estimators': 123, 'max_depth': 4, 'learning_rate': 0.060571011275793944, 'subsample': 0.7255065011554069, 'colsample_bytree': 0.7199230321416121, 'gamma': 2.2157781502082807, 'min_child_weight': 8, 'reg_alpha': 1.7079045626538047, 'reg_lambda': 1.679184143142611}. Best is trial 0 with value: 0.6434639629697803.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:06,216] Trial 4 finished with value: 0.49586771725800044 and parameters: {'n_estimators': 78, 'max_depth': 3, 'learning_rate': 0.011168161549448514, 'subsample': 0.6764934235398007, 'colsample_bytree': 0.7627652946730301, 'gamma': 2.3868824325350144, 'min_child_weight': 7, 'reg_alpha': 0.6511670428390867, 'reg_lambda': 1.8421115480029557}. Best is trial 0 with value: 0.6434639629697803.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:08,154] Trial 5 finished with value: 0.48333743590411327 and parameters: {'n_estimators': 81, 'max_depth': 2, 'learning_rate': 0.01680734513368834, 'subsample': 0.614868059077648, 'colsample_bytree': 0.6832578901601729, 'gamma': 1.8727507993555301, 'min_child_weight': 9, 'reg_alpha': 0.7855874199391057, 'reg_lambda': 1.9730407542873372}. Best is trial 0 with value: 0.6434639629697803.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:11,876] Trial 6 finished with value: 0.6517052194380961 and parameters: {'n_estimators': 110, 'max_depth': 5, 'learning_rate': 0.14373353991433366, 'subsample': 0.7787750471153393, 'colsample_bytree': 0.6477257383894613, 'gamma': 1.3022868227528774, 'min_child_weight': 4, 'reg_alpha': 1.27543723230331, 'reg_lambda': 1.6397652888257699}. Best is trial 6 with value: 0.6517052194380961.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:13,575] Trial 7 finished with value: 0.5620427622892645 and parameters: {'n_estimators': 73, 'max_depth': 3, 'learning_rate': 0.11226361001943436, 'subsample': 0.6414384597836438, 'colsample_bytree': 0.7005209974602088, 'gamma': 4.924107095148829, 'min_child_weight': 6, 'reg_alpha': 1.1752721200809653, 'reg_lambda': 1.5857403343572793}. Best is trial 6 with value: 0.6517052194380961.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:15,536] Trial 8 finished with value: 0.5136765063424977 and parameters: {'n_estimators': 97, 'max_depth': 2, 'learning_rate': 0.01728819974027784, 'subsample': 0.679240763674386, 'colsample_bytree': 0.6076114648802502, 'gamma': 1.9924009431990282, 'min_child_weight': 7, 'reg_alpha': 0.6601818827761482, 'reg_lambda': 1.5219730626870103}. Best is trial 6 with value: 0.6517052194380961.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:18,476] Trial 9 finished with value: 0.6341600416007004 and parameters: {'n_estimators': 146, 'max_depth': 2, 'learning_rate': 0.12379396828485775, 'subsample': 0.6343024866538491, 'colsample_bytree': 0.7987588096602685, 'gamma': 2.316147007537074, 'min_child_weight': 4, 'reg_alpha': 0.7907386154198097, 'reg_lambda': 1.6066828531225559}. Best is trial 6 with value: 0.6517052194380961.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:21,368] Trial 10 finished with value: 0.6470160993448665 and parameters: {'n_estimators': 52, 'max_depth': 5, 'learning_rate': 0.1491485602009081, 'subsample': 0.7992172233703094, 'colsample_bytree': 0.6271829575533258, 'gamma': 1.038098856302733, 'min_child_weight': 3, 'reg_alpha': 1.2899301635660751, 'reg_lambda': 1.154494900891197}. Best is trial 6 with value: 0.6517052194380961.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:24,027] Trial 11 finished with value: 0.6452162218178478 and parameters: {'n_estimators': 56, 'max_depth': 5, 'learning_rate': 0.13899840597521776, 'subsample': 0.7997016048615203, 'colsample_bytree': 0.6218554189234845, 'gamma': 1.0562271837108381, 'min_child_weight': 3, 'reg_alpha': 1.3137938049298143, 'reg_lambda': 1.139468172604874}. Best is trial 6 with value: 0.6517052194380961.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:28,104] Trial 12 finished with value: 0.6602349681909988 and parameters: {'n_estimators': 119, 'max_depth': 5, 'learning_rate': 0.1477672450801576, 'subsample': 0.7580838073016158, 'colsample_bytree': 0.6468395625208762, 'gamma': 1.056919186242245, 'min_child_weight': 3, 'reg_alpha': 1.2576279652372049, 'reg_lambda': 0.959726814726842}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:31,054] Trial 13 finished with value: 0.6049628599057341 and parameters: {'n_estimators': 118, 'max_depth': 5, 'learning_rate': 0.1093730627953723, 'subsample': 0.7591700460215418, 'colsample_bytree': 0.653032468535637, 'gamma': 3.2404952777620957, 'min_child_weight': 5, 'reg_alpha': 1.0861333698172846, 'reg_lambda': 0.5051001956288995}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:34,645] Trial 14 finished with value: 0.6552302196674143 and parameters: {'n_estimators': 117, 'max_depth': 4, 'learning_rate': 0.1273663540168946, 'subsample': 0.7626406065432986, 'colsample_bytree': 0.6479362502855878, 'gamma': 1.426409936097357, 'min_child_weight': 3, 'reg_alpha': 1.4972538092593082, 'reg_lambda': 0.9359885496514161}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:37,751] Trial 15 finished with value: 0.6437277230569673 and parameters: {'n_estimators': 129, 'max_depth': 4, 'learning_rate': 0.12500779112398702, 'subsample': 0.7536781797794732, 'colsample_bytree': 0.647668508968543, 'gamma': 2.8540877346174236, 'min_child_weight': 3, 'reg_alpha': 1.9888857072017907, 'reg_lambda': 0.8908518052654136}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:41,070] Trial 16 finished with value: 0.5010155480212792 and parameters: {'n_estimators': 141, 'max_depth': 4, 'learning_rate': 0.08824898679650296, 'subsample': 0.757800836249332, 'colsample_bytree': 0.6787199876961731, 'gamma': 1.5723345261058006, 'min_child_weight': 10, 'reg_alpha': 1.5114219007774137, 'reg_lambda': 0.831757732996474}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:43,687] Trial 17 finished with value: 0.6141745139704122 and parameters: {'n_estimators': 107, 'max_depth': 5, 'learning_rate': 0.12956451824168447, 'subsample': 0.7242774473401183, 'colsample_bytree': 0.607581999664805, 'gamma': 3.9614466198940894, 'min_child_weight': 5, 'reg_alpha': 0.9886609311399532, 'reg_lambda': 0.5533001117799181}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:48,245] Trial 18 finished with value: 0.6062806686780856 and parameters: {'n_estimators': 133, 'max_depth': 4, 'learning_rate': 0.10527098017053182, 'subsample': 0.7685096764638469, 'colsample_bytree': 0.6331908743469061, 'gamma': 1.4707817592782333, 'min_child_weight': 5, 'reg_alpha': 1.4701176365718298, 'reg_lambda': 0.9873235335183699}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:51,554] Trial 19 finished with value: 0.627492723251944 and parameters: {'n_estimators': 90, 'max_depth': 5, 'learning_rate': 0.06965927845682585, 'subsample': 0.7370605372870671, 'colsample_bytree': 0.6660069670959474, 'gamma': 2.7508723994547566, 'min_child_weight': 3, 'reg_alpha': 1.832119745574853, 'reg_lambda': 1.3951522862949668}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:55,120] Trial 20 finished with value: 0.5753104331315175 and parameters: {'n_estimators': 112, 'max_depth': 4, 'learning_rate': 0.036814349690316245, 'subsample': 0.7114390067866085, 'colsample_bytree': 0.7054386157270651, 'gamma': 3.607912425314351, 'min_child_weight': 5, 'reg_alpha': 1.522889360049105, 'reg_lambda': 0.71283273358494}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:53:58,705] Trial 21 finished with value: 0.6367619839919908 and parameters: {'n_estimators': 109, 'max_depth': 5, 'learning_rate': 0.14957719486929014, 'subsample': 0.7756881331136812, 'colsample_bytree': 0.6403355780828683, 'gamma': 1.322759061072536, 'min_child_weight': 4, 'reg_alpha': 1.372666621228999, 'reg_lambda': 1.0202719699206582}. Best is trial 12 with value: 0.6602349681909988.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:05,069] Trial 22 finished with value: 0.662030863711406 and parameters: {'n_estimators': 120, 'max_depth': 5, 'learning_rate': 0.13719278031447865, 'subsample': 0.780248851299007, 'colsample_bytree': 0.6595046054457805, 'gamma': 1.2387083455204178, 'min_child_weight': 3, 'reg_alpha': 0.9571820282456955, 'reg_lambda': 1.3168428379492587}. Best is trial 22 with value: 0.662030863711406.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:09,683] Trial 23 finished with value: 0.6667377637078359 and parameters: {'n_estimators': 135, 'max_depth': 5, 'learning_rate': 0.1338896959169234, 'subsample': 0.7431070065086582, 'colsample_bytree': 0.6611254037146114, 'gamma': 1.819649309996539, 'min_child_weight': 3, 'reg_alpha': 0.9738222954219415, 'reg_lambda': 1.3631855633575487}. Best is trial 23 with value: 0.6667377637078359.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:13,875] Trial 24 finished with value: 0.6574077153964333 and parameters: {'n_estimators': 137, 'max_depth': 5, 'learning_rate': 0.13605165510468856, 'subsample': 0.7513798511500556, 'colsample_bytree': 0.6867368327317754, 'gamma': 1.9240745706359585, 'min_child_weight': 3, 'reg_alpha': 0.9478683245154076, 'reg_lambda': 1.3589701674953334}. Best is trial 23 with value: 0.6667377637078359.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:18,771] Trial 25 finished with value: 0.6461290779698965 and parameters: {'n_estimators': 127, 'max_depth': 5, 'learning_rate': 0.11541365703152875, 'subsample': 0.7407914099111833, 'colsample_bytree': 0.6625516129413062, 'gamma': 1.142607405537748, 'min_child_weight': 4, 'reg_alpha': 0.9400127670741747, 'reg_lambda': 1.3924041310147837}. Best is trial 23 with value: 0.6667377637078359.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:23,955] Trial 26 finished with value: 0.6465365885938333 and parameters: {'n_estimators': 147, 'max_depth': 5, 'learning_rate': 0.09720131796586674, 'subsample': 0.7870375176132692, 'colsample_bytree': 0.692325518597788, 'gamma': 1.7668630279497934, 'min_child_weight': 3, 'reg_alpha': 1.1287822424841365, 'reg_lambda': 1.0967228739454775}. Best is trial 23 with value: 0.6667377637078359.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:26,854] Trial 27 finished with value: 0.6170238722961728 and parameters: {'n_estimators': 122, 'max_depth': 5, 'learning_rate': 0.1353219048366058, 'subsample': 0.7143282897391394, 'colsample_bytree': 0.7149531583215627, 'gamma': 2.5775057454759054, 'min_child_weight': 6, 'reg_alpha': 1.0233708087898998, 'reg_lambda': 1.2726754825659674}. Best is trial 23 with value: 0.6667377637078359.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:30,352] Trial 28 finished with value: 0.6206992696339798 and parameters: {'n_estimators': 138, 'max_depth': 4, 'learning_rate': 0.11688862461190522, 'subsample': 0.7754148300459335, 'colsample_bytree': 0.7385362962161144, 'gamma': 2.0823055236366512, 'min_child_weight': 5, 'reg_alpha': 0.8848849889481184, 'reg_lambda': 1.475189231672467}. Best is trial 23 with value: 0.6667377637078359.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-03 11:54:33,496] Trial 29 finished with value: 0.6513633871443093 and parameters: {'n_estimators': 101, 'max_depth': 3, 'learning_rate': 0.0992861137314556, 'subsample': 0.7389280053356589, 'colsample_bytree': 0.6698461140035323, 'gamma': 1.7001047168778314, 'min_child_weight': 4, 'reg_alpha': 1.1675234218047033, 'reg_lambda': 1.2657090549460706}. Best is trial 23 with value: 0.6667377637078359.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.



=== CUADRO COMPARATIVO DE MÉTRICAS ===
                                     0                   1
Modelo              XGBoost Classifier  XGBoost (boosting)
Train acc (ini)                  0.744               0.744
Test acc (ini)                    0.67                0.67
Diff acc (ini)                   0.074               0.074
Ajuste (ini)               Overfitting         Overfitting
F1 CV (ini)                      0.592               0.592
AUC CV (ini)                     0.719               0.719
Train acc (opt)                   0.82               0.744
Test acc (opt)                    0.71                0.67
Diff acc (opt)                    0.11               0.074
Ajuste (opt)               Overfitting         Overfitting
F1 CV (opt)                      0.657               0.592
AUC CV (opt)                     0.765               0.719
Train acc (umbral)                0.75               0.641
Test acc (umbral)                 0.72               0.625
Diff acc (umbral

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)



--- BASELINE XGBoost (sin optimización, sin umbral) ---
Accuracy: 0.67
Recall: 0.391304347826087
Precision: 0.782608695652174
F1: 0.5217391304347826
AUC: 0.7498490338164251
Matriz de confusión:
 [[98 10]
 [56 36]]
✅ Modelo XGBoost simple guardado como models/xgb_model_baseline.joblib




### 1. ¿SE APLICÓ BALANCEO?
**Sí.**  
Se aplica balanceo de clases **solo si tienes instalada la librería `imblearn`** (SMOTE).  
Esto ocurre en este bloque:

```python
try:
    from imblearn.over_sampling import SMOTE
    smote_available = True
except ImportError:
    smote_available = False

# ...
if smote_available:
    sm = SMOTE(random_state=42)
    X_train, y_train = sm.fit_resample(X_train, y_train)
```
**¿Qué hace?**  
- Aplica SMOTE (oversampling) **solo al set de entrenamiento** (`X_train`, `y_train`), generando nuevas muestras sintéticas de la clase minoritaria.
- El set de test **NO se balancea** (correcto).

---

### 2. ¿CÓMO AFECTA ESO AL VECTORIZADO?
- **El vectorizado (TF-IDF) se realiza primero** sobre los textos originales.
- Luego, **SMOTE se aplica sobre la matriz TF-IDF** (`X_train`), generando nuevas filas (vectores) sintéticos.
- El vectorizador **no se ve afectado**: solo transforma texto a matriz numérica. SMOTE actúa sobre esa matriz.

---

### 3. ¿SE CARGAN LOS CSV Y LOS PKL? ¿QUÉ CSV SE CARGAN, QUÉ PKL SE CARGAN?
**Sí, se cargan ambos tipos:**

- **CSV:**  
  - `train_data.csv` y `test_data.csv` (contienen columnas `text` y `label`).
  - Se usan para obtener los textos y las etiquetas.

- **PKL:**  
  - `tfidf_vectorizer.pkl` (el vectorizador entrenado).
  - `X_train_tfidf.pkl` y `X_test_tfidf.pkl` (matrices TF-IDF ya generadas).
  - Si no existen los `.pkl` de los datos vectorizados, se crean a partir de los CSV y el vectorizador.

---

### 4. ¿CÓMO ES EL PASO A PASO DEL ENTRENAMIENTO Y USO DE LOS VECTORIZADOS?

1. **Carga de datos:**
   - Lee los CSV (`train_data.csv`, `test_data.csv`) para obtener textos y etiquetas.
2. **Carga o creación de vectorizados:**
   - Si existen los `.pkl` de los datos vectorizados (`X_train_tfidf.pkl`, `X_test_tfidf.pkl`), los carga.
   - Si no existen, transforma los textos con el vectorizador (`tfidf_vectorizer.pkl`) y guarda los `.pkl`.
3. **Balanceo (SMOTE):**
   - Si está disponible, aplica SMOTE **solo a `X_train` y `y_train`**.
4. **Entrenamiento y evaluación:**
   - Usa los datos vectorizados (`X_train`, `y_train`, `X_test`, `y_test`) para entrenar y evaluar los modelos XGBoost.
   - El test **nunca se balancea ni se vectoriza de nuevo**.
5. **Optimización y selección de modelo:**
   - Optuna para hiperparámetros, búsqueda de umbral óptimo, comparación de métricas.
6. **Guardado de modelos:**
   - El mejor modelo y el baseline se guardan como `.pkl` o `.joblib` en la carpeta models.

---

**Resumen visual:**

```
CSV (text, label) ──> TF-IDF vectorizer (.pkl) ──> X_train, X_test (.pkl)
                                         │
                                         └─> SMOTE (solo X_train) ──> X_train_balanced
```

---

¿Te gustaría un diagrama o código de ejemplo para visualizar el flujo?

Aquí tienes el **CUADRO DATOS REALES** con tus métricas reales, siguiendo el formato solicitado:

---

## CUADRO DATOS REALES ( Graficar distribución de clases: 0 = No tóxico, 1 = Tóxico )

### MÉTRICAS ANTES DE OPTIMIZACIÓN

| Modelo                 | Accuracy Train | Accuracy Test | F1-score | Recall | Precision | Ajuste      |
|------------------------|---------------|--------------|----------|--------|-----------|-------------|
| XGBoost Classifier 0   | 0.74          | 0.67         | 0.59     | 0.90   | 0.64      | Overfitting |
| XGBoost Classifier 1   | 0.74          | 0.67         | 0.69     | 0.90   | 0.56      | Overfitting |
| XGBoost (boosting) 0   | 0.74          | 0.67         | 0.59     | 0.90   | 0.64      | Overfitting |
| XGBoost (boosting) 1   | 0.74          | 0.67         | 0.69     | 0.90   | 0.56      | Overfitting |

---

### MÉTRICAS DESPUÉS DE OPTIMIZACIÓN DE HIPERPARÁMETROS

| Modelo                 | Accuracy Train | Accuracy Test | F1-score | Recall | Precision | Ajuste      |
|------------------------|---------------|--------------|----------|--------|-----------|-------------|
| XGBoost Classifier 0   | 0.82          | 0.71         | 0.66     | 0.90   | 0.77      | Overfitting |
| XGBoost Classifier 1   | 0.82          | 0.71         | 0.62     | 0.52   | 0.77      | Overfitting |
| XGBoost (boosting) 0   | 0.74          | 0.67         | 0.59     | 0.90   | 0.64      | Overfitting |
| XGBoost (boosting) 1   | 0.74          | 0.67         | 0.69     | 0.90   | 0.56      | Overfitting |

---

### MÉTRICAS LUEGO DE OPTIMIZACIÓN DE UMBRAL

| Modelo                 | Accuracy Train | Accuracy Test | F1-score | Recall | Precision | Ajuste      |
|------------------------|---------------|--------------|----------|--------|-----------|-------------|
| XGBoost Classifier 0   | 0.75          | 0.72         | 0.75     | 0.90   | 0.64      | Buen ajuste |
| XGBoost Classifier 1   | 0.75          | 0.72         | 0.69     | 0.90   | 0.56      | Buen ajuste |
| XGBoost (boosting) 0   | 0.64          | 0.63         | 0.69     | 0.90   | 0.56      | Buen ajuste |
| XGBoost (boosting) 1   | 0.64          | 0.63         | 0.69     | 0.90   | 0.56      | Buen ajuste |

---

### CUADRO DE RANKING DE MODELOS (XGBoost)

| Ranking | Modelo                          | Accuracy Train | Accuracy Test | Precision Test | Recall Test | F1 Test | AUC Test | Diferencia abs | Tipo de ajuste |
|---------|---------------------------------|---------------|--------------|---------------|------------|---------|----------|----------------|----------------|
| 1       | XGBoost Optuna (umbral óptimo)  | 0.750         | 0.72         | 0.638         | 0.902      | 0.748   | 0.810    | 0.030          | Buen ajuste    |
| 2       | XGBoost Optuna                  | 0.820         | 0.71         | 0.774         | 0.522      | 0.623   | 0.810    | 0.110          | Overfitting    |
| 3       | XGBoost Base                    | 0.744         | 0.67         | 0.783         | 0.391      | 0.522   | 0.750    | 0.074          | Overfitting    |

---

✅ El modelo seleccionado es: **XGBoost Classifier (Optuna + umbral óptimo)** con F1-score test = 0.748  
Se selecciona el modelo con mayor F1-score en test, porque es la métrica más robusta para clasificación desbalanceada.

---

### ¿CUÁNDO HACER CROSS-VALIDATION ESTRATIFICADA?

- Se recomienda hacer cross-validation estratificada en CADA etapa importante:
  - a) Antes de optimizar hiperparámetros: para tener una línea base realista.
  - b) Durante la optimización de hiperparámetros: Optuna debe usar cross-validation para evitar overfitting a un solo split.
  - c) Después, para validar el modelo final y comparar con test.
- Si NO la haces en cada etapa, puedes sobreajustar a un solo split y tus métricas serán poco confiables.
- Ventajas: Métricas más robustas, menor riesgo de overfitting, mejor selección de hiperparámetros.
- Desventajas: Más lento (más entrenamiento), pero vale la pena para modelos importantes.
- Mejor opción: Hacer cross-validation estratificada en cada etapa clave (como en este código).

---

✅ Mejor modelo guardado como mejor_modelo_xgboost.pkl

---

**Baseline XGBoost (sin optimización, sin umbral):**  
Accuracy: 0.67  
Recall: 0.391  
Precision: 0.783  
F1: 0.522  
AUC: 0.750  
Matriz de confusión:  
[[98 10]  
 [56 36]]  

✅ Modelo XGBoost simple guardado como xgb_model_baseline.joblib

---

In [3]:
# ===========================================================
# ENTRENAMIENTO Y EVALUACIÓN: XGBoost + OPTUNA + CROSS-VAL + UMBRAL
# ===========================================================

# 1. Importar librerías
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score, confusion_matrix, roc_auc_score
from xgboost import XGBClassifier
import optuna
import joblib
import os
import re

# Opcional: para oversampling
try:
    from imblearn.over_sampling import SMOTE
    smote_available = True
except ImportError:
    smote_available = False

# 2. Cargar el CSV y preparar datos
csv_path = r'c:\Users\admin\Desktop\Proyecto 10\nlp_grupo_5_proyecto_10\data\clean\dataset_pretraining_final.csv'
df = pd.read_csv(csv_path)

# Usar columna limpia si existe, si no, usar la original
if 'Text_clean' in df.columns:
    text_col = 'Text_clean'
elif 'Texto_Original' in df.columns:
    text_col = 'Texto_Original'
else:
    raise ValueError("No se encontró columna de texto limpia ni original.")

# Etiqueta binaria: usa 'IsToxic' o la que corresponda
label_col = 'IsToxic'
if label_col not in df.columns:
    raise ValueError("No se encontró columna de etiqueta 'IsToxic'.")

# 3. Preprocesamiento mínimo
def simple_preprocess(text):
    text = str(text).lower()
    text = re.sub(r'[^a-záéíóúüñ0-9 ]', ' ', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

df[text_col] = df[text_col].astype(str).map(simple_preprocess)

# 4. Split train/test estratificado
X = df[text_col].values
y = df[label_col].astype(int).values
X_train_text, X_test_text, y_train_orig, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 5. Vectorización TF-IDF (solo con train, luego transforma test)
vectorizer = TfidfVectorizer(max_features=10000, ngram_range=(1,2))
X_train = vectorizer.fit_transform(X_train_text)
X_test = vectorizer.transform(X_test_text)

# 6. Guardar los textos y etiquetas ANTES de SMOTE (para reproducibilidad)
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), '..'))
data_dir = os.path.join(BASE_DIR, 'data', 'processed')
models_dir = os.path.join(BASE_DIR, 'models')
os.makedirs(data_dir, exist_ok=True)
os.makedirs(models_dir, exist_ok=True)

pd.DataFrame({'text': X_train_text, 'label': y_train_orig}).to_csv(os.path.join(data_dir, 'train_data.csv'), index=False)
pd.DataFrame({'text': X_test_text, 'label': y_test}).to_csv(os.path.join(data_dir, 'test_data.csv'), index=False)

# 7. SMOTE SOLO EN TRAIN
y_train = y_train_orig.copy()
if smote_available:
    sm = SMOTE(random_state=42)
    X_train, y_train = sm.fit_resample(X_train, y_train)

# 8. Guarda vectorizador y datos vectorizados para reproducibilidad
joblib.dump(vectorizer, os.path.join(data_dir, 'tfidf_vectorizer.pkl'))
joblib.dump(X_train, os.path.join(data_dir, 'X_train_tfidf.pkl'))
joblib.dump(X_test, os.path.join(data_dir, 'X_test_tfidf.pkl'))

# 9. Funciones de evaluación y cross-validation
def evaluar_modelo(modelo, X_train, y_train, X_test, y_test, umbral=0.5):
    modelo.fit(X_train, y_train)
    y_train_proba = modelo.predict_proba(X_train)[:,1]
    y_test_proba  = modelo.predict_proba(X_test)[:,1]
    y_train_pred = (y_train_proba >= umbral).astype(int)
    y_test_pred  = (y_test_proba  >= umbral).astype(int)
    train_acc = accuracy_score(y_train, y_train_pred)
    test_acc  = accuracy_score(y_test, y_test_pred)
    diff_acc  = abs(train_acc - test_acc)
    ajuste = "Buen ajuste"
    if train_acc - test_acc > 0.07:
        ajuste = "Overfitting"
    elif test_acc - train_acc > 0.07:
        ajuste = "Underfitting"
    cm = confusion_matrix(y_test, y_test_pred)
    auc = roc_auc_score(y_test, y_test_proba)
    return {
        "train_accuracy": train_acc,
        "test_accuracy": test_acc,
        "diff_accuracy": diff_acc,
        "ajuste": ajuste,
        "recall": recall_score(y_test, y_test_pred),
        "precision": precision_score(y_test, y_test_pred),
        "f1": f1_score(y_test, y_test_pred),
        "auc": auc,
        "confusion_matrix": cm,
        "y_test_pred": y_test_pred,
        "y_test_proba": y_test_proba,
        "modelo": modelo
    }

def cross_val_metric(modelo, X, y, umbral=0.5, n_splits=5):
    skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
    f1s, aucs = [], []
    for train_idx, val_idx in skf.split(X, y):
        X_tr, X_val = X[train_idx], X[val_idx]
        y_tr, y_val = y[train_idx], y[val_idx]
        modelo.fit(X_tr, y_tr)
        y_val_proba = modelo.predict_proba(X_val)[:,1]
        y_val_pred = (y_val_proba >= umbral).astype(int)
        f1s.append(f1_score(y_val, y_val_pred))
        try:
            aucs.append(roc_auc_score(y_val, y_val_proba))
        except:
            aucs.append(np.nan)
    return np.mean(f1s), np.nanmean(aucs)

# 10. XGBoost baseline
xgb1 = XGBClassifier(
    max_depth=3,
    n_estimators=80,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
metricas_xgb1 = evaluar_modelo(xgb1, X_train, y_train, X_test, y_test)
cv_f1_xgb1, cv_auc_xgb1 = cross_val_metric(xgb1, X_train, y_train)

# 11. XGBoost (boosting)
xgb2 = XGBClassifier(
    booster='gbtree',
    max_depth=3,
    n_estimators=80,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
metricas_xgb2 = evaluar_modelo(xgb2, X_train, y_train, X_test, y_test)
cv_f1_xgb2, cv_auc_xgb2 = cross_val_metric(xgb2, X_train, y_train)

# 12. Optuna para hiperparámetros
def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 150),
        "max_depth": trial.suggest_int("max_depth", 2, 5),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.15),
        "subsample": trial.suggest_float("subsample", 0.6, 0.8),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.6, 0.8),
        "gamma": trial.suggest_float("gamma", 1, 5),
        "min_child_weight": trial.suggest_int("min_child_weight", 3, 10),
        "reg_alpha": trial.suggest_float("reg_alpha", 0.5, 2),
        "reg_lambda": trial.suggest_float("reg_lambda", 0.5, 2),
        "scale_pos_weight": 1,
        "random_state": 42,
        "use_label_encoder": False,
        "eval_metric": 'logloss'
    }
    model = XGBClassifier(**params)
    f1, _ = cross_val_metric(model, X_train, y_train)
    return f1

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=30, show_progress_bar=True)
best_params = study.best_params

# 13. Entrenar con mejores hiperparámetros
xgb1_opt = XGBClassifier(**best_params, use_label_encoder=False, eval_metric='logloss')
metricas_xgb1_opt = evaluar_modelo(xgb1_opt, X_train, y_train, X_test, y_test)
cv_f1_xgb1_opt, cv_auc_xgb1_opt = cross_val_metric(xgb1_opt, X_train, y_train)

# 14. Optimización de umbral
def buscar_umbral(y_true, y_proba):
    mejores = {"umbral": 0.5, "f1": 0}
    for t in np.arange(0.1, 0.9, 0.01):
        y_pred = (y_proba >= t).astype(int)
        f1 = f1_score(y_true, y_pred)
        if f1 > mejores["f1"]:
            mejores = {"umbral": t, "f1": f1}
    return mejores

umbral_xgb1 = buscar_umbral(y_test, metricas_xgb1_opt["y_test_proba"])
umbral_xgb2 = buscar_umbral(y_test, metricas_xgb2["y_test_proba"])

# 15. Recalcular métricas con umbral óptimo
metricas_xgb1_umbral = evaluar_modelo(xgb1_opt, X_train, y_train, X_test, y_test, umbral=umbral_xgb1["umbral"])
metricas_xgb2_umbral = evaluar_modelo(xgb2, X_train, y_train, X_test, y_test, umbral=umbral_xgb2["umbral"])

# 16. Comparación de métricas y ranking
def resumen_metricas(nombre, metrica_ini, metrica_opt, metrica_umbral, cv_ini, cv_opt, auc_ini, auc_opt):
    return {
        "Modelo": nombre,
        "Train acc (ini)": round(metrica_ini["train_accuracy"],3),
        "Test acc (ini)": round(metrica_ini["test_accuracy"],3),
        "Diff acc (ini)": round(metrica_ini["diff_accuracy"],3),
        "Ajuste (ini)": metrica_ini["ajuste"],
        "F1 CV (ini)": round(cv_ini,3),
        "AUC CV (ini)": round(auc_ini,3),
        "Train acc (opt)": round(metrica_opt["train_accuracy"],3),
        "Test acc (opt)": round(metrica_opt["test_accuracy"],3),
        "Diff acc (opt)": round(metrica_opt["diff_accuracy"],3),
        "Ajuste (opt)": metrica_opt["ajuste"],
        "F1 CV (opt)": round(cv_opt,3),
        "AUC CV (opt)": round(auc_opt,3),
        "Train acc (umbral)": round(metrica_umbral["train_accuracy"],3),
        "Test acc (umbral)": round(metrica_umbral["test_accuracy"],3),
        "Diff acc (umbral)": round(metrica_umbral["diff_accuracy"],3),
        "Ajuste (umbral)": metrica_umbral["ajuste"],
        "Recall": round(metrica_umbral["recall"],3),
        "Precision": round(metrica_umbral["precision"],3),
        "F1": round(metrica_umbral["f1"],3),
        "AUC": round(metrica_umbral["auc"],3)
    }

cuadro = pd.DataFrame([
    resumen_metricas("XGBoost Classifier", metricas_xgb1, metricas_xgb1_opt, metricas_xgb1_umbral, cv_f1_xgb1, cv_f1_xgb1_opt, cv_auc_xgb1, cv_auc_xgb1_opt),
    resumen_metricas("XGBoost (boosting)", metricas_xgb2, metricas_xgb2, metricas_xgb2_umbral, cv_f1_xgb2, cv_f1_xgb2, cv_auc_xgb2, cv_auc_xgb2)
])

print("\n=== CUADRO COMPARATIVO DE MÉTRICAS ===")
print(cuadro.T)

# Ranking
cuadro_ranking = pd.DataFrame([
    {
        "Ranking": 1,
        "Modelo": "XGBoost Base",
        "Accuracy Train": metricas_xgb1["train_accuracy"],
        "Accuracy Test": metricas_xgb1["test_accuracy"],
        "Precision Test": metricas_xgb1["precision"],
        "Recall Test": metricas_xgb1["recall"],
        "F1 Test": metricas_xgb1["f1"],
        "AUC Test": metricas_xgb1["auc"],
        "Diferencia abs": metricas_xgb1["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1["ajuste"]
    },
    {
        "Ranking": 2,
        "Modelo": "XGBoost Optuna",
        "Accuracy Train": metricas_xgb1_opt["train_accuracy"],
        "Accuracy Test": metricas_xgb1_opt["test_accuracy"],
        "Precision Test": metricas_xgb1_opt["precision"],
        "Recall Test": metricas_xgb1_opt["recall"],
        "F1 Test": metricas_xgb1_opt["f1"],
        "AUC Test": metricas_xgb1_opt["auc"],
        "Diferencia abs": metricas_xgb1_opt["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1_opt["ajuste"]
    },
    {
        "Ranking": 3,
        "Modelo": "XGBoost Optuna (umbral óptimo)",
        "Accuracy Train": metricas_xgb1_umbral["train_accuracy"],
        "Accuracy Test": metricas_xgb1_umbral["test_accuracy"],
        "Precision Test": metricas_xgb1_umbral["precision"],
        "Recall Test": metricas_xgb1_umbral["recall"],
        "F1 Test": metricas_xgb1_umbral["f1"],
        "AUC Test": metricas_xgb1_umbral["auc"],
        "Diferencia abs": metricas_xgb1_umbral["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1_umbral["ajuste"]
    }
])

cuadro_ranking = cuadro_ranking.sort_values("F1 Test", ascending=False).reset_index(drop=True)
cuadro_ranking["Ranking"] = cuadro_ranking.index + 1

print("\n=== CUADRO DE RANKING DE MODELOS (XGBoost) ===")
print(cuadro_ranking)

# Guardar el mejor modelo con el nombre solicitado
if metricas_xgb1_umbral["f1"] >= metricas_xgb2_umbral["f1"]:
    mejor_modelo = metricas_xgb1_umbral["modelo"]
    mejor_nombre = "XGBoost Classifier (Optuna + umbral óptimo)"
    mejor_f1 = metricas_xgb1_umbral["f1"]
else:
    mejor_modelo = metricas_xgb2_umbral["modelo"]
    mejor_nombre = "XGBoost (boosting, default + umbral óptimo)"
    mejor_f1 = metricas_xgb2_umbral["f1"]

joblib.dump(mejor_modelo, r'c:\Users\admin\Desktop\Proyecto 10\nlp_grupo_5_proyecto_10\models\BOOST_AMPLIADO.pkl')
print(f"\n✅ El modelo seleccionado es: {mejor_nombre} con F1-score test = {mejor_f1:.3f}")
print("✅ Mejor modelo guardado como models/BOOST_AMPLIADO.pkl")

# Entrenamiento XGBoost simple (baseline, sin optimización, sin umbral)
clf_simple = XGBClassifier(
    n_estimators=80,
    max_depth=3,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
clf_simple.fit(X_train, y_train)

y_pred_simple = clf_simple.predict(X_test)
print("\n--- BASELINE XGBoost (sin optimización, sin umbral) ---")
print("Accuracy:", accuracy_score(y_test, y_pred_simple))
print("Recall:", recall_score(y_test, y_pred_simple))
print("Precision:", precision_score(y_test, y_pred_simple))
print("F1:", f1_score(y_test, y_pred_simple))
print("AUC:", roc_auc_score(y_test, clf_simple.predict_proba(X_test)[:,1]))
print("Matriz de confusión:\n", confusion_matrix(y_test, y_pred_simple))

joblib.dump(clf_simple, os.path.join(models_dir, 'xgb_model_baseline.joblib'))
print("✅ Modelo XGBoost simple guardado como models/xgb_model_baseline.joblib")

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-08 10:13:30,407] Trial 0 finished with value: 0.5784099882394542 and parameters: {'n_estimators': 135, 'max_depth': 3, 'learning_rate': 0.14280615152311163, 'subsample': 0.7558756899083034, 'colsample_bytree': 0.656435423349897, 'gamma': 3.554137130036424, 'min_child_weight': 8, 'reg_alpha': 0.8866052567602676, 'reg_lambda': 0.9091655715068478}. Best is trial 0 with value: 0.5784099882394542.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:32,851] Trial 1 finished with value: 0.5733943283078946 and parameters: {'n_estimators': 93, 'max_depth': 5, 'learning_rate': 0.14380794074705341, 'subsample': 0.6573102576203161, 'colsample_bytree': 0.7524675154442622, 'gamma': 1.5588794517041267, 'min_child_weight': 10, 'reg_alpha': 1.5225077316734994, 'reg_lambda': 0.7098583181084889}. Best is trial 0 with value: 0.5784099882394542.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:37,095] Trial 2 finished with value: 0.6015066081292881 and parameters: {'n_estimators': 149, 'max_depth': 5, 'learning_rate': 0.10586363611715181, 'subsample': 0.6951783568004575, 'colsample_bytree': 0.7641880465639159, 'gamma': 2.2362335619312743, 'min_child_weight': 6, 'reg_alpha': 0.7645327449448496, 'reg_lambda': 1.6636039861617633}. Best is trial 2 with value: 0.6015066081292881.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:38,891] Trial 3 finished with value: 0.5796233428196912 and parameters: {'n_estimators': 67, 'max_depth': 4, 'learning_rate': 0.07877916945878885, 'subsample': 0.7892178842299399, 'colsample_bytree': 0.6845065935851876, 'gamma': 3.582759015162198, 'min_child_weight': 4, 'reg_alpha': 0.7980549152001827, 'reg_lambda': 0.743178321205075}. Best is trial 2 with value: 0.6015066081292881.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:41,138] Trial 4 finished with value: 0.5355223291885054 and parameters: {'n_estimators': 122, 'max_depth': 2, 'learning_rate': 0.05191083988997988, 'subsample': 0.791814161709074, 'colsample_bytree': 0.7927907372476506, 'gamma': 1.7745157089582078, 'min_child_weight': 3, 'reg_alpha': 1.7146257464979875, 'reg_lambda': 0.9356108069321067}. Best is trial 2 with value: 0.6015066081292881.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:42,752] Trial 5 finished with value: 0.5308168487102157 and parameters: {'n_estimators': 63, 'max_depth': 4, 'learning_rate': 0.10183684892575841, 'subsample': 0.6410233912818104, 'colsample_bytree': 0.7425559797914327, 'gamma': 2.09319346493498, 'min_child_weight': 10, 'reg_alpha': 1.688382685159711, 'reg_lambda': 1.5201709691458345}. Best is trial 2 with value: 0.6015066081292881.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:45,759] Trial 6 finished with value: 0.605321096236332 and parameters: {'n_estimators': 101, 'max_depth': 5, 'learning_rate': 0.09617474028462433, 'subsample': 0.7096040478678565, 'colsample_bytree': 0.782012869465589, 'gamma': 2.0458865355478824, 'min_child_weight': 5, 'reg_alpha': 1.9021426418224388, 'reg_lambda': 0.5665709911483012}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:46,943] Trial 7 finished with value: 0.4959240317518204 and parameters: {'n_estimators': 59, 'max_depth': 3, 'learning_rate': 0.07035787695880322, 'subsample': 0.6351465217817057, 'colsample_bytree': 0.699467798103864, 'gamma': 4.9403524790168145, 'min_child_weight': 8, 'reg_alpha': 1.8745601860715808, 'reg_lambda': 1.4965035309286863}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:48,150] Trial 8 finished with value: 0.5454600450958409 and parameters: {'n_estimators': 65, 'max_depth': 2, 'learning_rate': 0.10663254635308123, 'subsample': 0.7918648690756529, 'colsample_bytree': 0.7609704851882491, 'gamma': 3.104320671419206, 'min_child_weight': 6, 'reg_alpha': 0.5326842627485721, 'reg_lambda': 1.1026523970801791}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:49,750] Trial 9 finished with value: 0.5393848262505129 and parameters: {'n_estimators': 92, 'max_depth': 4, 'learning_rate': 0.12399205108982983, 'subsample': 0.6362446308384736, 'colsample_bytree': 0.7020200201003499, 'gamma': 4.857269860144111, 'min_child_weight': 10, 'reg_alpha': 0.624847349547808, 'reg_lambda': 1.5742590334068158}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:52,803] Trial 10 finished with value: 0.5164299329507288 and parameters: {'n_estimators': 109, 'max_depth': 5, 'learning_rate': 0.020131176885068443, 'subsample': 0.7196622191355918, 'colsample_bytree': 0.6086858856880225, 'gamma': 1.0276200389018584, 'min_child_weight': 4, 'reg_alpha': 1.2778983751415465, 'reg_lambda': 1.9309870934369249}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:13:56,164] Trial 11 finished with value: 0.602894200545302 and parameters: {'n_estimators': 147, 'max_depth': 5, 'learning_rate': 0.10309600533726881, 'subsample': 0.6975955021084362, 'colsample_bytree': 0.7992765343066419, 'gamma': 2.470292860749278, 'min_child_weight': 6, 'reg_alpha': 1.110023922733892, 'reg_lambda': 1.9618869429957253}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:00,833] Trial 12 finished with value: 0.6038208474254934 and parameters: {'n_estimators': 148, 'max_depth': 5, 'learning_rate': 0.049804103982368814, 'subsample': 0.6955666071073516, 'colsample_bytree': 0.7903357164748042, 'gamma': 2.6343928228799354, 'min_child_weight': 5, 'reg_alpha': 1.2109209286843399, 'reg_lambda': 0.5111983634437299}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:04,310] Trial 13 finished with value: 0.566251027182589 and parameters: {'n_estimators': 111, 'max_depth': 5, 'learning_rate': 0.0403716127615782, 'subsample': 0.7254894097819242, 'colsample_bytree': 0.725945073599419, 'gamma': 2.8771911123367575, 'min_child_weight': 5, 'reg_alpha': 1.3318494149971518, 'reg_lambda': 0.5220022610468612}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:06,530] Trial 14 finished with value: 0.5669763540794763 and parameters: {'n_estimators': 83, 'max_depth': 4, 'learning_rate': 0.057528077642176115, 'subsample': 0.6733449832211177, 'colsample_bytree': 0.7824528796276082, 'gamma': 2.855974931308434, 'min_child_weight': 4, 'reg_alpha': 1.9740340460854557, 'reg_lambda': 0.5097419271372543}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:10,106] Trial 15 finished with value: 0.5352939022967502 and parameters: {'n_estimators': 128, 'max_depth': 5, 'learning_rate': 0.023586155735915982, 'subsample': 0.6018005857588781, 'colsample_bytree': 0.7282636530570306, 'gamma': 4.096727416294954, 'min_child_weight': 7, 'reg_alpha': 1.030128591557787, 'reg_lambda': 1.2363308840772769}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:12,049] Trial 16 finished with value: 0.5719227595886627 and parameters: {'n_estimators': 79, 'max_depth': 3, 'learning_rate': 0.0908917404428258, 'subsample': 0.7419975181147143, 'colsample_bytree': 0.7786456800350652, 'gamma': 1.3758492989262976, 'min_child_weight': 5, 'reg_alpha': 1.4140068055165098, 'reg_lambda': 0.7173836229628496}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:13,605] Trial 17 finished with value: 0.49927726647849235 and parameters: {'n_estimators': 50, 'max_depth': 4, 'learning_rate': 0.03879912487519088, 'subsample': 0.6718675150025122, 'colsample_bytree': 0.6500717354277454, 'gamma': 2.6260928757509916, 'min_child_weight': 3, 'reg_alpha': 1.1458496132258387, 'reg_lambda': 0.9308057905541545}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:16,998] Trial 18 finished with value: 0.5886372034072919 and parameters: {'n_estimators': 116, 'max_depth': 5, 'learning_rate': 0.0666448465754189, 'subsample': 0.7149886160610629, 'colsample_bytree': 0.773703979081391, 'gamma': 3.481039713229726, 'min_child_weight': 5, 'reg_alpha': 1.5378460584960423, 'reg_lambda': 0.6448227159827373}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:19,982] Trial 19 finished with value: 0.6004193606413198 and parameters: {'n_estimators': 137, 'max_depth': 4, 'learning_rate': 0.12263515996479157, 'subsample': 0.763826878183829, 'colsample_bytree': 0.7362082839541461, 'gamma': 2.1203102994063627, 'min_child_weight': 7, 'reg_alpha': 1.7036488064654938, 'reg_lambda': 1.086954740540218}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:22,864] Trial 20 finished with value: 0.5857913591897433 and parameters: {'n_estimators': 104, 'max_depth': 5, 'learning_rate': 0.08319730452434203, 'subsample': 0.6838000290739122, 'colsample_bytree': 0.7108852810833082, 'gamma': 1.816727117782682, 'min_child_weight': 8, 'reg_alpha': 1.4940965245666815, 'reg_lambda': 0.8357136289579025}. Best is trial 6 with value: 0.605321096236332.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:26,912] Trial 21 finished with value: 0.6055317748469811 and parameters: {'n_estimators': 149, 'max_depth': 5, 'learning_rate': 0.12348681909730223, 'subsample': 0.7090128412559711, 'colsample_bytree': 0.7987059799282344, 'gamma': 2.4779157717936124, 'min_child_weight': 6, 'reg_alpha': 1.1124794287657507, 'reg_lambda': 1.9766991666840577}. Best is trial 21 with value: 0.6055317748469811.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:31,076] Trial 22 finished with value: 0.6206031711349674 and parameters: {'n_estimators': 139, 'max_depth': 5, 'learning_rate': 0.12582994203014505, 'subsample': 0.7104621742765841, 'colsample_bytree': 0.7988510323712024, 'gamma': 2.4776134997165142, 'min_child_weight': 5, 'reg_alpha': 0.990359151808219, 'reg_lambda': 1.778293137933705}. Best is trial 22 with value: 0.6206031711349674.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:34,702] Trial 23 finished with value: 0.6042814399055856 and parameters: {'n_estimators': 139, 'max_depth': 5, 'learning_rate': 0.1272035256126806, 'subsample': 0.7358608662008563, 'colsample_bytree': 0.7999694006363303, 'gamma': 2.3259576869861576, 'min_child_weight': 7, 'reg_alpha': 0.9802361376616148, 'reg_lambda': 1.7855871265427}. Best is trial 22 with value: 0.6206031711349674.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:39,613] Trial 24 finished with value: 0.6043962410566495 and parameters: {'n_estimators': 126, 'max_depth': 4, 'learning_rate': 0.13421090145312486, 'subsample': 0.7102120716185454, 'colsample_bytree': 0.7701211037980026, 'gamma': 1.8842109145247983, 'min_child_weight': 6, 'reg_alpha': 0.9256776326366647, 'reg_lambda': 1.7830607339929818}. Best is trial 22 with value: 0.6206031711349674.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:43,562] Trial 25 finished with value: 0.635145401264387 and parameters: {'n_estimators': 140, 'max_depth': 5, 'learning_rate': 0.11352766122427828, 'subsample': 0.7570507028302584, 'colsample_bytree': 0.7497608009991688, 'gamma': 3.2530580956266073, 'min_child_weight': 4, 'reg_alpha': 1.0799323183562715, 'reg_lambda': 1.397958011948947}. Best is trial 25 with value: 0.635145401264387.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:46,976] Trial 26 finished with value: 0.6323339361103223 and parameters: {'n_estimators': 139, 'max_depth': 4, 'learning_rate': 0.11616530141981257, 'subsample': 0.7672515582313211, 'colsample_bytree': 0.7533542806207094, 'gamma': 3.211698134012285, 'min_child_weight': 4, 'reg_alpha': 1.0538905047529146, 'reg_lambda': 1.3928961520911716}. Best is trial 25 with value: 0.635145401264387.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:49,828] Trial 27 finished with value: 0.641930610286241 and parameters: {'n_estimators': 131, 'max_depth': 3, 'learning_rate': 0.11553734901345918, 'subsample': 0.7724916533866296, 'colsample_bytree': 0.7490373550061288, 'gamma': 4.136673561719677, 'min_child_weight': 3, 'reg_alpha': 0.7626782230566667, 'reg_lambda': 1.3442245771780945}. Best is trial 27 with value: 0.641930610286241.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:52,767] Trial 28 finished with value: 0.6436913070459864 and parameters: {'n_estimators': 131, 'max_depth': 3, 'learning_rate': 0.11353769443296824, 'subsample': 0.7766033271796217, 'colsample_bytree': 0.7473780176948768, 'gamma': 4.111309713438669, 'min_child_weight': 3, 'reg_alpha': 0.659983211501729, 'reg_lambda': 1.3627112363086873}. Best is trial 28 with value: 0.6436913070459864.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Best trial: 28. Best value: 0.643691: 100%|██████████| 30/30 [01:27<00:00,  2.90s/it]
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:14:55,336] Trial 29 finished with value: 0.6433459503162988 and parameters: {'n_estimators': 131, 'max_depth': 3, 'learning_rate': 0.13842624479823123, 'subsample': 0.7523225687025005, 'colsample_bytree': 0.6746671350373283, 'gamma': 4.281115202173956, 'min_child_weight': 3, 'reg_alpha': 0.7159873836973176, 'reg_lambda': 1.3402839867827232}. Best is trial 28 with value: 0.6436913070459864.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)



=== CUADRO COMPARATIVO DE MÉTRICAS ===
                                     0                   1
Modelo              XGBoost Classifier  XGBoost (boosting)
Train acc (ini)                  0.693               0.693
Test acc (ini)                   0.642               0.642
Diff acc (ini)                   0.051               0.051
Ajuste (ini)               Buen ajuste         Buen ajuste
F1 CV (ini)                      0.558               0.558
AUC CV (ini)                     0.711               0.711
Train acc (opt)                  0.762               0.693
Test acc (opt)                   0.672               0.642
Diff acc (opt)                    0.09               0.051
Ajuste (opt)               Overfitting         Buen ajuste
F1 CV (opt)                      0.628               0.558
AUC CV (opt)                     0.752               0.711
Train acc (umbral)               0.669               0.588
Test acc (umbral)                0.602               0.549
Diff acc (umbral

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)



--- BASELINE XGBoost (sin optimización, sin umbral) ---
Accuracy: 0.6419753086419753
Recall: 0.3727810650887574
Precision: 0.72
F1: 0.49122807017543857
AUC: 0.6867726509178407
Matriz de confusión:
 [[342  49]
 [212 126]]
✅ Modelo XGBoost simple guardado como models/xgb_model_baseline.joblib


In [6]:
# ===========================================================
# INGENIERÍA DE VARIABLES ADICIONALES + PIPELINE XGBOOST
# ===========================================================

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score, confusion_matrix, roc_auc_score
from xgboost import XGBClassifier
import optuna
import joblib
import os
import re
from scipy.sparse import hstack

# Opcional: para oversampling
try:
    from imblearn.over_sampling import SMOTE
    smote_available = True
except ImportError:
    smote_available = False

# 1. Cargar el CSV y preparar datos
csv_path = r'c:\Users\admin\Desktop\Proyecto 10\nlp_grupo_5_proyecto_10\data\clean\dataset_pretraining_final.csv'
df = pd.read_csv(csv_path)

# Usar columna limpia si existe, si no, usar la original
if 'Text_clean' in df.columns:
    text_col = 'Text_clean'
elif 'Texto_Original' in df.columns:
    text_col = 'Texto_Original'
else:
    raise ValueError("No se encontró columna de texto limpia ni original.")

label_col = 'IsToxic'
if label_col not in df.columns:
    raise ValueError("No se encontró columna de etiqueta 'IsToxic'.")

# 2. Ingeniería de variables adicionales
def count_upper(text):
    return sum(1 for c in text if c.isupper())

def count_exclam(text):
    return text.count('!')

def count_question(text):
    return text.count('?')

def count_offensive(text, offensive_words):
    return sum(word in text for word in offensive_words)

def text_len(text):
    return len(text)

def word_count(text):
    return len(text.split())

# Diccionario simple de insultos (puedes ampliarlo)
offensive_words = ['bitch', 'fuck', 'shit', 'cunt', 'idiot', 'stupid', 'asshole', 'dumb', 'punk']

df['text_len'] = df[text_col].astype(str).map(text_len)
df['word_count'] = df[text_col].astype(str).map(word_count)
df['upper_count'] = df[text_col].astype(str).map(count_upper)
df['exclam_count'] = df[text_col].astype(str).map(count_exclam)
df['question_count'] = df[text_col].astype(str).map(count_question)
df['offensive_count'] = df[text_col].astype(str).map(lambda x: count_offensive(x, offensive_words))

# Features binarias auxiliares (si target es solo IsToxic)
bin_cols = [c for c in ['IsAbusive','IsThreat','IsProvocative','IsObscene','IsHatespeech','IsRacist','IsNationalist','IsSexist','IsReligiousHate'] if c in df.columns and c != label_col]
for c in bin_cols:
    df[c] = df[c].astype(int)

# 3. Preprocesamiento mínimo
def simple_preprocess(text):
    text = str(text).lower()
    text = re.sub(r'[^a-záéíóúüñ0-9 ]', ' ', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

df[text_col] = df[text_col].astype(str).map(simple_preprocess)

# 4. Split train/test estratificado
X = df[text_col].values
y = df[label_col].astype(int).values
X_train_text, X_test_text, y_train_orig, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# --- ARREGLO CRÍTICO: ALINEAR FEATURES Y TEXTOS ---
# Para evitar errores de dimensiones, usamos los índices originales del split
train_idx, test_idx = train_test_split(
    np.arange(len(df)), test_size=0.2, random_state=42, stratify=y
)
df_train = df.iloc[train_idx].reset_index(drop=True)
df_test = df.iloc[test_idx].reset_index(drop=True)

# 5. Vectorización TF-IDF
vectorizer = TfidfVectorizer(max_features=10000, ngram_range=(1,2))
X_train_tfidf = vectorizer.fit_transform(df_train[text_col])
X_test_tfidf = vectorizer.transform(df_test[text_col])

# 6. Features adicionales para train/test
feature_cols = ['text_len','word_count','upper_count','exclam_count','question_count','offensive_count'] + bin_cols
X_train_feat = df_train[feature_cols].values
X_test_feat = df_test[feature_cols].values

# 7. Concatenar TF-IDF + features adicionales
X_train_full = hstack([X_train_tfidf, X_train_feat])
X_test_full = hstack([X_test_tfidf, X_test_feat])
y_train = df_train[label_col].astype(int).values
y_test = df_test[label_col].astype(int).values

# 8. SMOTE SOLO EN TRAIN
if smote_available:
    sm = SMOTE(random_state=42)
    X_train_full, y_train = sm.fit_resample(X_train_full, y_train)

# 9. Guardar vectorizador y features para reproducibilidad
BASE_DIR = os.path.abspath(os.path.join(os.getcwd(), '..'))
data_dir = os.path.join(BASE_DIR, 'data', 'processed')
models_dir = os.path.join(BASE_DIR, 'models')
os.makedirs(data_dir, exist_ok=True)
os.makedirs(models_dir, exist_ok=True)
joblib.dump(vectorizer, os.path.join(data_dir, 'tfidf_vectorizer.pkl'))
joblib.dump(X_train_full, os.path.join(data_dir, 'X_train_features.pkl'))
joblib.dump(X_test_full, os.path.join(data_dir, 'X_test_features.pkl'))

# 10. Funciones de evaluación y cross-validation
def evaluar_modelo(modelo, X_train, y_train, X_test, y_test, umbral=0.5):
    modelo.fit(X_train, y_train)
    y_train_proba = modelo.predict_proba(X_train)[:,1]
    y_test_proba  = modelo.predict_proba(X_test)[:,1]
    y_train_pred = (y_train_proba >= umbral).astype(int)
    y_test_pred  = (y_test_proba  >= umbral).astype(int)
    train_acc = accuracy_score(y_train, y_train_pred)
    test_acc  = accuracy_score(y_test, y_test_pred)
    diff_acc  = abs(train_acc - test_acc)
    ajuste = "Buen ajuste"
    if train_acc - test_acc > 0.07:
        ajuste = "Overfitting"
    elif test_acc - train_acc > 0.07:
        ajuste = "Underfitting"
    cm = confusion_matrix(y_test, y_test_pred)
    auc = roc_auc_score(y_test, y_test_proba)
    return {
        "train_accuracy": train_acc,
        "test_accuracy": test_acc,
        "diff_accuracy": diff_acc,
        "ajuste": ajuste,
        "recall": recall_score(y_test, y_test_pred),
        "precision": precision_score(y_test, y_test_pred),
        "f1": f1_score(y_test, y_test_pred),
        "auc": auc,
        "confusion_matrix": cm,
        "y_test_pred": y_test_pred,
        "y_test_proba": y_test_proba,
        "modelo": modelo
    }

def cross_val_metric(modelo, X, y, umbral=0.5, n_splits=5):
    skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
    f1s, aucs = [], []
    for train_idx, val_idx in skf.split(X, y):
        X_tr, X_val = X[train_idx], X[val_idx]
        y_tr, y_val = y[train_idx], y[val_idx]
        modelo.fit(X_tr, y_tr)
        y_val_proba = modelo.predict_proba(X_val)[:,1]
        y_val_pred = (y_val_proba >= umbral).astype(int)
        f1s.append(f1_score(y_val, y_val_pred))
        try:
            aucs.append(roc_auc_score(y_val, y_val_proba))
        except:
            aucs.append(np.nan)
    return np.mean(f1s), np.nanmean(aucs)

# 11. XGBoost baseline
xgb1 = XGBClassifier(
    max_depth=3,
    n_estimators=80,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
metricas_xgb1 = evaluar_modelo(xgb1, X_train_full, y_train, X_test_full, y_test)
cv_f1_xgb1, cv_auc_xgb1 = cross_val_metric(xgb1, X_train_full, y_train)

# 12. XGBoost (boosting)
xgb2 = XGBClassifier(
    booster='gbtree',
    max_depth=3,
    n_estimators=80,
    learning_rate=0.07,
    subsample=0.7,
    colsample_bytree=0.7,
    min_child_weight=5,
    gamma=2,
    reg_alpha=1,
    reg_lambda=1,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
metricas_xgb2 = evaluar_modelo(xgb2, X_train_full, y_train, X_test_full, y_test)
cv_f1_xgb2, cv_auc_xgb2 = cross_val_metric(xgb2, X_train_full, y_train)

# 13. Optuna para hiperparámetros
def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 150),
        "max_depth": trial.suggest_int("max_depth", 2, 5),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.15),
        "subsample": trial.suggest_float("subsample", 0.6, 0.8),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.6, 0.8),
        "gamma": trial.suggest_float("gamma", 1, 5),
        "min_child_weight": trial.suggest_int("min_child_weight", 3, 10),
        "reg_alpha": trial.suggest_float("reg_alpha", 0.5, 2),
        "reg_lambda": trial.suggest_float("reg_lambda", 0.5, 2),
        "scale_pos_weight": 1,
        "random_state": 42,
        "use_label_encoder": False,
        "eval_metric": 'logloss'
    }
    model = XGBClassifier(**params)
    f1, _ = cross_val_metric(model, X_train_full, y_train)
    return f1

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=30, show_progress_bar=True)
best_params = study.best_params

# 14. Entrenar con mejores hiperparámetros
xgb1_opt = XGBClassifier(**best_params, use_label_encoder=False, eval_metric='logloss')
metricas_xgb1_opt = evaluar_modelo(xgb1_opt, X_train_full, y_train, X_test_full, y_test)
cv_f1_xgb1_opt, cv_auc_xgb1_opt = cross_val_metric(xgb1_opt, X_train_full, y_train)

# 15. Optimización de umbral
def buscar_umbral(y_true, y_proba):
    mejores = {"umbral": 0.5, "f1": 0}
    for t in np.arange(0.1, 0.9, 0.01):
        y_pred = (y_proba >= t).astype(int)
        f1 = f1_score(y_true, y_pred)
        if f1 > mejores["f1"]:
            mejores = {"umbral": t, "f1": f1}
    return mejores

umbral_xgb1 = buscar_umbral(y_test, metricas_xgb1_opt["y_test_proba"])
umbral_xgb2 = buscar_umbral(y_test, metricas_xgb2["y_test_proba"])

# 16. Recalcular métricas con umbral óptimo
metricas_xgb1_umbral = evaluar_modelo(xgb1_opt, X_train_full, y_train, X_test_full, y_test, umbral=umbral_xgb1["umbral"])
metricas_xgb2_umbral = evaluar_modelo(xgb2, X_train_full, y_train, X_test_full, y_test, umbral=umbral_xgb2["umbral"])

# 17. Comparación de métricas y ranking
def resumen_metricas(nombre, metrica_ini, metrica_opt, metrica_umbral, cv_ini, cv_opt, auc_ini, auc_opt):
    return {
        "Modelo": nombre,
        "Train acc (ini)": round(metrica_ini["train_accuracy"],3),
        "Test acc (ini)": round(metrica_ini["test_accuracy"],3),
        "Diff acc (ini)": round(metrica_ini["diff_accuracy"],3),
        "Ajuste (ini)": metrica_ini["ajuste"],
        "F1 CV (ini)": round(cv_ini,3),
        "AUC CV (ini)": round(auc_ini,3),
        "Train acc (opt)": round(metrica_opt["train_accuracy"],3),
        "Test acc (opt)": round(metrica_opt["test_accuracy"],3),
        "Diff acc (opt)": round(metrica_opt["diff_accuracy"],3),
        "Ajuste (opt)": metrica_opt["ajuste"],
        "F1 CV (opt)": round(cv_opt,3),
        "AUC CV (opt)": round(auc_opt,3),
        "Train acc (umbral)": round(metrica_umbral["train_accuracy"],3),
        "Test acc (umbral)": round(metrica_umbral["test_accuracy"],3),
        "Diff acc (umbral)": round(metrica_umbral["diff_accuracy"],3),
        "Ajuste (umbral)": metrica_umbral["ajuste"],
        "Recall": round(metrica_umbral["recall"],3),
        "Precision": round(metrica_umbral["precision"],3),
        "F1": round(metrica_umbral["f1"],3),
        "AUC": round(metrica_umbral["auc"],3)
    }

cuadro = pd.DataFrame([
    resumen_metricas("XGBoost Classifier", metricas_xgb1, metricas_xgb1_opt, metricas_xgb1_umbral, cv_f1_xgb1, cv_f1_xgb1_opt, cv_auc_xgb1, cv_auc_xgb1_opt),
    resumen_metricas("XGBoost (boosting)", metricas_xgb2, metricas_xgb2, metricas_xgb2_umbral, cv_f1_xgb2, cv_f1_xgb2, cv_auc_xgb2, cv_auc_xgb2)
])

print("\n=== CUADRO COMPARATIVO DE MÉTRICAS ===")
print(cuadro.T)

# Ranking
cuadro_ranking = pd.DataFrame([
    {
        "Ranking": 1,
        "Modelo": "XGBoost Base",
        "Accuracy Train": metricas_xgb1["train_accuracy"],
        "Accuracy Test": metricas_xgb1["test_accuracy"],
        "Precision Test": metricas_xgb1["precision"],
        "Recall Test": metricas_xgb1["recall"],
        "F1 Test": metricas_xgb1["f1"],
        "AUC Test": metricas_xgb1["auc"],
        "Diferencia abs": metricas_xgb1["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1["ajuste"]
    },
    {
        "Ranking": 2,
        "Modelo": "XGBoost Optuna",
        "Accuracy Train": metricas_xgb1_opt["train_accuracy"],
        "Accuracy Test": metricas_xgb1_opt["test_accuracy"],
        "Precision Test": metricas_xgb1_opt["precision"],
        "Recall Test": metricas_xgb1_opt["recall"],
        "F1 Test": metricas_xgb1_opt["f1"],
        "AUC Test": metricas_xgb1_opt["auc"],
        "Diferencia abs": metricas_xgb1_opt["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1_opt["ajuste"]
    },
    {
        "Ranking": 3,
        "Modelo": "XGBoost Optuna (umbral óptimo)",
        "Accuracy Train": metricas_xgb1_umbral["train_accuracy"],
        "Accuracy Test": metricas_xgb1_umbral["test_accuracy"],
        "Precision Test": metricas_xgb1_umbral["precision"],
        "Recall Test": metricas_xgb1_umbral["recall"],
        "F1 Test": metricas_xgb1_umbral["f1"],
        "AUC Test": metricas_xgb1_umbral["auc"],
        "Diferencia abs": metricas_xgb1_umbral["diff_accuracy"],
        "Tipo de ajuste": metricas_xgb1_umbral["ajuste"]
    }
])

cuadro_ranking = cuadro_ranking.sort_values("F1 Test", ascending=False).reset_index(drop=True)
cuadro_ranking["Ranking"] = cuadro_ranking.index + 1

print("\n=== CUADRO DE RANKING DE MODELOS (XGBoost) ===")
print(cuadro_ranking)

# Guardar el mejor modelo con el nombre solicitado
if metricas_xgb1_umbral["f1"] >= metricas_xgb2_umbral["f1"]:
    mejor_modelo = metricas_xgb1_umbral["modelo"]
    mejor_nombre = "XGBoost Classifier (Optuna + umbral óptimo)"
    mejor_f1 = metricas_xgb1_umbral["f1"]
else:
    mejor_modelo = metricas_xgb2_umbral["modelo"]
    mejor_nombre = "XGBoost (boosting, default + umbral óptimo)"
    mejor_f1 = metricas_xgb2_umbral["f1"]

joblib.dump(mejor_modelo, os.path.join(models_dir, 'FEATURE.pkl'))
print(f"\n✅ El modelo seleccionado es: {mejor_nombre} con F1-score test = {mejor_f1:.3f}")
print("✅ Mejor modelo guardado como models/FEATURE.pkl")

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


[I 2025-07-08 10:36:40,499] Trial 0 finished with value: 1.0 and parameters: {'n_estimators': 62, 'max_depth': 4, 'learning_rate': 0.035986086440028965, 'subsample': 0.7292379692046231, 'colsample_bytree': 0.6543687719304825, 'gamma': 3.645140511677983, 'min_child_weight': 8, 'reg_alpha': 1.342535204726738, 'reg_lambda': 0.9640438696883991}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:42,918] Trial 1 finished with value: 1.0 and parameters: {'n_estimators': 121, 'max_depth': 5, 'learning_rate': 0.06440187123208065, 'subsample': 0.7233184920777312, 'colsample_bytree': 0.7328095477368434, 'gamma': 4.9263179277769815, 'min_child_weight': 6, 'reg_alpha': 1.4007179037401283, 'reg_lambda': 1.60287803022272}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:44,249] Trial 2 finished with value: 1.0 and parameters: {'n_estimators': 110, 'max_depth': 2, 'learning_rate': 0.14662863034234602, 'subsample': 0.6648203989246559, 'colsample_bytree': 0.7210186933326864, 'gamma': 1.8062824622996736, 'min_child_weight': 10, 'reg_alpha': 0.6453321000796648, 'reg_lambda': 0.6883441917362902}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:45,323] Trial 3 finished with value: 1.0 and parameters: {'n_estimators': 60, 'max_depth': 5, 'learning_rate': 0.1254366330001918, 'subsample': 0.724755422148909, 'colsample_bytree': 0.7700479399903779, 'gamma': 3.729912789189879, 'min_child_weight': 10, 'reg_alpha': 1.893579603557828, 'reg_lambda': 1.9114529597751502}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:46,332] Trial 4 finished with value: 1.0 and parameters: {'n_estimators': 64, 'max_depth': 2, 'learning_rate': 0.12663563750738246, 'subsample': 0.622881568830811, 'colsample_bytree': 0.6493533378171761, 'gamma': 2.7120679067255944, 'min_child_weight': 4, 'reg_alpha': 1.781272405693609, 'reg_lambda': 0.8162891741011727}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:47,434] Trial 5 finished with value: 1.0 and parameters: {'n_estimators': 59, 'max_depth': 5, 'learning_rate': 0.12451861446658301, 'subsample': 0.7813722806312572, 'colsample_bytree': 0.6868265067644683, 'gamma': 3.1444328168203564, 'min_child_weight': 9, 'reg_alpha': 1.0149115011519272, 'reg_lambda': 1.2182177410480117}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:48,357] Trial 6 finished with value: 1.0 and parameters: {'n_estimators': 55, 'max_depth': 2, 'learning_rate': 0.011361746442948049, 'subsample': 0.7637897823673296, 'colsample_bytree': 0.6462708203142443, 'gamma': 1.566166840321395, 'min_child_weight': 8, 'reg_alpha': 0.7563206253680737, 'reg_lambda': 1.3891720407718886}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:50,309] Trial 7 finished with value: 1.0 and parameters: {'n_estimators': 123, 'max_depth': 5, 'learning_rate': 0.10301634815768047, 'subsample': 0.6721586616128107, 'colsample_bytree': 0.7307920403602423, 'gamma': 1.2015622105955912, 'min_child_weight': 3, 'reg_alpha': 1.9983586234977047, 'reg_lambda': 1.5734960548653483}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:51,954] Trial 8 finished with value: 1.0 and parameters: {'n_estimators': 101, 'max_depth': 2, 'learning_rate': 0.0773155075550732, 'subsample': 0.6167762920432678, 'colsample_bytree': 0.7759382183616106, 'gamma': 4.075794113342195, 'min_child_weight': 4, 'reg_alpha': 0.8514427490874996, 'reg_lambda': 1.5746469734637165}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:52,941] Trial 9 finished with value: 1.0 and parameters: {'n_estimators': 57, 'max_depth': 2, 'learning_rate': 0.025795007114732454, 'subsample': 0.7788065287406618, 'colsample_bytree': 0.6080588709094624, 'gamma': 4.314157696243072, 'min_child_weight': 6, 'reg_alpha': 1.4794234837732185, 'reg_lambda': 0.8235821111929453}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:55,546] Trial 10 finished with value: 1.0 and parameters: {'n_estimators': 150, 'max_depth': 4, 'learning_rate': 0.04421338931634601, 'subsample': 0.7281895501409871, 'colsample_bytree': 0.6727076131706048, 'gamma': 2.4120955189028006, 'min_child_weight': 8, 'reg_alpha': 1.164762801544295, 'reg_lambda': 1.0943766708998284}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:57,382] Trial 11 finished with value: 1.0 and parameters: {'n_estimators': 84, 'max_depth': 4, 'learning_rate': 0.059318066799815036, 'subsample': 0.7269231543377229, 'colsample_bytree': 0.7362854821857612, 'gamma': 4.797069746350229, 'min_child_weight': 6, 'reg_alpha': 1.4426615120434083, 'reg_lambda': 1.832409247698273}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:36:59,694] Trial 12 finished with value: 1.0 and parameters: {'n_estimators': 130, 'max_depth': 4, 'learning_rate': 0.05589566484817672, 'subsample': 0.6875427907847501, 'colsample_bytree': 0.623829357284831, 'gamma': 4.858567623435428, 'min_child_weight': 7, 'reg_alpha': 1.511106509955147, 'reg_lambda': 1.029079445746894}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:01,221] Trial 13 finished with value: 1.0 and parameters: {'n_estimators': 81, 'max_depth': 3, 'learning_rate': 0.0862868067608878, 'subsample': 0.7492142271407566, 'colsample_bytree': 0.7045778081884198, 'gamma': 3.4471026970825704, 'min_child_weight': 5, 'reg_alpha': 1.2342499139207832, 'reg_lambda': 1.455262390905793}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:03,237] Trial 14 finished with value: 1.0 and parameters: {'n_estimators': 87, 'max_depth': 3, 'learning_rate': 0.0347805165841629, 'subsample': 0.7038017883644988, 'colsample_bytree': 0.7985763488595122, 'gamma': 4.3177580964266955, 'min_child_weight': 7, 'reg_alpha': 1.6381949913362066, 'reg_lambda': 1.7083979715566961}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:05,320] Trial 15 finished with value: 1.0 and parameters: {'n_estimators': 131, 'max_depth': 5, 'learning_rate': 0.06864113180502629, 'subsample': 0.7966965184154414, 'colsample_bytree': 0.666485640082281, 'gamma': 3.771713233691007, 'min_child_weight': 8, 'reg_alpha': 1.3684328479056542, 'reg_lambda': 1.0118035958326101}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:07,124] Trial 16 finished with value: 1.0 and parameters: {'n_estimators': 113, 'max_depth': 4, 'learning_rate': 0.09417477217386921, 'subsample': 0.702180149521487, 'colsample_bytree': 0.7130769660148, 'gamma': 2.304197108608826, 'min_child_weight': 5, 'reg_alpha': 1.0486320489701442, 'reg_lambda': 0.5209089034253789}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:10,259] Trial 17 finished with value: 1.0 and parameters: {'n_estimators': 147, 'max_depth': 3, 'learning_rate': 0.010703527825745837, 'subsample': 0.6444338653201515, 'colsample_bytree': 0.7502664764742464, 'gamma': 4.9970687517640595, 'min_child_weight': 9, 'reg_alpha': 1.6403116930110766, 'reg_lambda': 1.2238776861586718}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:12,016] Trial 18 finished with value: 1.0 and parameters: {'n_estimators': 74, 'max_depth': 5, 'learning_rate': 0.04669628435509428, 'subsample': 0.743213297319089, 'colsample_bytree': 0.6944625596370467, 'gamma': 3.2308461509076487, 'min_child_weight': 7, 'reg_alpha': 1.3082207440767142, 'reg_lambda': 1.3800387459232824}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:14,061] Trial 19 finished with value: 1.0 and parameters: {'n_estimators': 97, 'max_depth': 4, 'learning_rate': 0.03213143417236556, 'subsample': 0.7116179861351593, 'colsample_bytree': 0.6346571081408359, 'gamma': 4.468587663378229, 'min_child_weight': 5, 'reg_alpha': 0.9924964321874685, 'reg_lambda': 1.714198928912491}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:15,515] Trial 20 finished with value: 1.0 and parameters: {'n_estimators': 71, 'max_depth': 5, 'learning_rate': 0.06607327620933827, 'subsample': 0.6840992596326263, 'colsample_bytree': 0.6713424570277063, 'gamma': 3.7277574482393927, 'min_child_weight': 9, 'reg_alpha': 0.5113574707872003, 'reg_lambda': 1.9763695035339919}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:17,216] Trial 21 finished with value: 1.0 and parameters: {'n_estimators': 111, 'max_depth': 3, 'learning_rate': 0.14483049041499202, 'subsample': 0.650176358679166, 'colsample_bytree': 0.7204914592707089, 'gamma': 1.8286790781298627, 'min_child_weight': 10, 'reg_alpha': 0.5205403073789081, 'reg_lambda': 0.5433079695263537}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:18,954] Trial 22 finished with value: 1.0 and parameters: {'n_estimators': 113, 'max_depth': 4, 'learning_rate': 0.10350709554741236, 'subsample': 0.6618629058759963, 'colsample_bytree': 0.7473709346575325, 'gamma': 1.8910673135742009, 'min_child_weight': 10, 'reg_alpha': 0.6796427600430985, 'reg_lambda': 0.7657090771074688}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:20,439] Trial 23 finished with value: 1.0 and parameters: {'n_estimators': 98, 'max_depth': 3, 'learning_rate': 0.14209883417521574, 'subsample': 0.6015332406491689, 'colsample_bytree': 0.7617323411896975, 'gamma': 2.8166187436446166, 'min_child_weight': 8, 'reg_alpha': 1.2178960064395699, 'reg_lambda': 0.6687664101291796}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:22,446] Trial 24 finished with value: 1.0 and parameters: {'n_estimators': 123, 'max_depth': 4, 'learning_rate': 0.07809239782258139, 'subsample': 0.7419245746037074, 'colsample_bytree': 0.6881937515121379, 'gamma': 1.1013644349313725, 'min_child_weight': 6, 'reg_alpha': 1.6103398725084666, 'reg_lambda': 0.9525394209684871}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:24,680] Trial 25 finished with value: 1.0 and parameters: {'n_estimators': 137, 'max_depth': 3, 'learning_rate': 0.045479200406866524, 'subsample': 0.6782755228088094, 'colsample_bytree': 0.7173851263957791, 'gamma': 2.445134378133792, 'min_child_weight': 9, 'reg_alpha': 1.1007984282176801, 'reg_lambda': 0.8889515587731902}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:27,350] Trial 26 finished with value: 1.0 and parameters: {'n_estimators': 105, 'max_depth': 5, 'learning_rate': 0.02391772121541562, 'subsample': 0.7155272043757414, 'colsample_bytree': 0.7369572226995582, 'gamma': 1.9546442017656562, 'min_child_weight': 7, 'reg_alpha': 0.8957350015154838, 'reg_lambda': 0.6815604756317359}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:29,052] Trial 27 finished with value: 1.0 and parameters: {'n_estimators': 120, 'max_depth': 2, 'learning_rate': 0.10808475056746975, 'subsample': 0.6935392270217174, 'colsample_bytree': 0.7047030271928302, 'gamma': 3.444178600947861, 'min_child_weight': 10, 'reg_alpha': 0.6485433080823129, 'reg_lambda': 1.1074693741225872}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:30,913] Trial 28 finished with value: 1.0 and parameters: {'n_estimators': 88, 'max_depth': 4, 'learning_rate': 0.05273877069586887, 'subsample': 0.7576290704853166, 'colsample_bytree': 0.7885377601344724, 'gamma': 4.540538029923178, 'min_child_weight': 9, 'reg_alpha': 1.3942967330029603, 'reg_lambda': 0.6614324040166357}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Best trial: 0. Best value: 1: 100%|██████████| 30/30 [00:53<00:00,  1.78s/it]
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[I 2025-07-08 10:37:32,442] Trial 29 finished with value: 1.0 and parameters: {'n_estimators': 93, 'max_depth': 5, 'learning_rate': 0.11909225918128924, 'subsample': 0.7324466204200292, 'colsample_bytree': 0.7637595160153197, 'gamma': 3.9312430965578855, 'min_child_weight': 8, 'reg_alpha': 1.7789784322553834, 'reg_lambda': 1.1629656796159045}. Best is trial 0 with value: 1.0.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)



=== CUADRO COMPARATIVO DE MÉTRICAS ===
                                     0                   1
Modelo              XGBoost Classifier  XGBoost (boosting)
Train acc (ini)                    1.0                 1.0
Test acc (ini)                     1.0                 1.0
Diff acc (ini)                     0.0                 0.0
Ajuste (ini)               Buen ajuste         Buen ajuste
F1 CV (ini)                        1.0                 1.0
AUC CV (ini)                       1.0                 1.0
Train acc (opt)                    1.0                 1.0
Test acc (opt)                     1.0                 1.0
Diff acc (opt)                     0.0                 0.0
Ajuste (opt)               Buen ajuste         Buen ajuste
F1 CV (opt)                        1.0                 1.0
AUC CV (opt)                       1.0                 1.0
Train acc (umbral)               0.999                 1.0
Test acc (umbral)                  1.0                 1.0
Diff acc (umbral

In [7]:
import pandas as pd

# Cargar el dataset
csv_path = r'c:\Users\admin\Desktop\Proyecto 10\nlp_grupo_5_proyecto_10\data\clean\dataset_pretraining_final.csv'
df = pd.read_csv(csv_path)

# Mostrar las columnas
print("Columnas del dataset:")
print(list(df.columns))

# Mostrar las primeras 10 filas con todas sus columnas
print("\nPrimeras 10 filas del dataset:")
print(df.head(10))

Columnas del dataset:
['Text_clean', 'Idioma_Tecnica', 'Texto_Original', 'IsToxic', 'IsAbusive', 'IsThreat', 'IsProvocative', 'IsObscene', 'IsHatespeech', 'IsRacist', 'IsNationalist', 'IsSexist', 'IsReligiousHate']

Primeras 10 filas del dataset:
                                          Text_clean  Idioma_Tecnica  \
0  people step case wasn people situation lump me...     en_original   
1  La gente pasa el caso no era la situación de l...  es_translation   
2  Les personnes étapes de l'étape n'étaient pas ...  fr_translation   
3  law enforcement train shoot apprehend train sh...     en_original   
4  Sesga de trenes de la ley Trawend Train Shoot ...  es_translation   
5  Le tournage du train d'application de la loi a...  fr_translation   
6  Strafverfolgung Zug Dreh Shooting Zug Shoot Tö...  de_translation   
7  reckon black life matter banner hold white cun...     en_original   
8  Reckon Black Life Matter Banner sostenga el co...  es_translation   
9  Reckon Black Life Matter Banne