# Modelo de Clasificación Jerárquica con Aumento de Datos v4.0

Este notebook implementa un pipeline avanzado de clasificación jerárquica con características combinadas:

1.  **Carga y Preprocesamiento**: Usa `hate_speech_twitter` y realiza limpieza de texto (tokenización, stemming, etc.).
2.  **Generación de Características Dual**: Crea embeddings de BERT y vectores TF-IDF.
3.  **Aumento de Datos Sintéticos**: Utiliza **CTGAN** de la librería `sdv` para generar datos sintéticos y **balancear las sub-categorías** de discurso de odio en el conjunto de entrenamiento, mejorando la robustez del modelo.
4.  **Entrenamiento de Clasificador Principal (Nivel 1)**: Entrena y optimiza un **ensemble extendido de seis modelos**. Incluye modelos que usan solo embeddings, solo TF-IDF, y una **combinación de ambos** para una detección más robusta de `odio` vs. `no-odio`.
5.  **Entrenamiento de Clasificador de Sub-categorías (Nivel 2)**: Entrena un ensemble de tres modelos XGBoost, MLP y Regresión Logística para clasificar el **tipo de odio** (ej. sexismo, racismo), utilizando también **características combinadas de embeddings y TF-IDF**.
6.  **Evaluación Jerárquica**: Evalúa el rendimiento del pipeline completo en dos niveles, reportando la precisión tanto en la detección de odio como en la clasificación de su tipo.

## 1. Instalación y Configuración

In [1]:
#!pip install transformers torch datasets scikit-learn xgboost pandas seaborn matplotlib tqdm optuna nltk scipy sdv nltk

In [2]:
import pandas as pd
import numpy as np
import os
import pickle
import seaborn as sns
import matplotlib.pyplot as plt
import time
import torch
import xgboost as xgb
from tqdm.auto import tqdm
import re

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModel
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, log_loss, f1_score
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse import hstack, csr_matrix, vstack

import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
try:
    nltk.data.find('tokenizers/punkt')
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('punkt')
    nltk.download('stopwords')

BERT_MODEL_NAME = 'bert-base-uncased'
MAX_SAMPLES = 10000 # Aumentar para un mejor entrenamiento de sub-categorías
MAX_TOKEN_LENGTH = 128

# --- Configuración de Dispositivo (GPU o CPU) ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Usando dispositivo: {device}")

# --- Definición de Rutas Locales ---
job_id = f"hierarchical-job-{int(time.time())}"
BASE_DIR = "datos_locales"
PROCESSED_DIR = os.path.join(BASE_DIR, "processed", job_id)
MODEL_OUTPUT_DIR = os.path.join(BASE_DIR, "model_output", job_id)
os.makedirs(PROCESSED_DIR, exist_ok=True)
os.makedirs(MODEL_OUTPUT_DIR, exist_ok=True)
PROCESSED_DATA_PATH = os.path.join(PROCESSED_DIR, "processed_data_with_embeddings.csv")
print(f"\nID de trabajo: {job_id}")

  from .autonotebook import tqdm as notebook_tqdm


Usando dispositivo: cuda

ID de trabajo: hierarchical-job-1750890141


## 2. Carga, Análisis y Preprocesamiento de Datos

In [3]:
print("Cargando dataset 'thefrankhsu/hate_speech_twitter'...")
dataset = load_dataset("thefrankhsu/hate_speech_twitter")
df = pd.DataFrame(dataset['train'])

# Renombrar columnas y manejar nulos en 'categories'
df = df.rename(columns={'tweet': 'text_raw', 'label': 'main_label', 'categories': 'sub_label_str'})
df['sub_label_str'] = df['sub_label_str'].fillna('not-hate')

if MAX_SAMPLES is not None:
    # Asegurarse de que el tamaño de la muestra no sea mayor que la población
    sample_size = min(MAX_SAMPLES, len(df))
    print(f"Tomando una muestra aleatoria de {sample_size} registros (de un total de {len(df)}).")
    df = df.sample(n=sample_size, random_state=42, replace=False).reset_index(drop=True)

print("Distribución de etiquetas principales:")
print(df['main_label'].value_counts())

print("\nDistribución de sub-etiquetas (solo para 'odio'):")
print(df[df['main_label'] == 1]['sub_label_str'].value_counts())

# Codificar sub-etiquetas
from sklearn.preprocessing import LabelEncoder
sub_label_encoder = LabelEncoder()
df['sub_label_encoded'] = sub_label_encoder.fit_transform(df['sub_label_str'])
sub_label_mapping = dict(zip(sub_label_encoder.classes_, sub_label_encoder.transform(sub_label_encoder.classes_)))
print("\nMapeo de Sub-etiquetas:", sub_label_mapping)

# Preprocesamiento de texto
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()
def clean_text(text, apply_stemming=False):
    if pd.isna(text): return ""
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    text = re.sub(r'\@\w+|#','', text)
    tokens = word_tokenize(text)
    words = [word.lower() for word in tokens if word.isalpha() and word.lower() not in stop_words]
    if apply_stemming: words = [stemmer.stem(word) for word in words]
    return " ".join(words)

tqdm.pandas(desc="Limpiando Texto para Embeddings")
df['text_cleaned'] = df['text_raw'].progress_apply(lambda x: clean_text(x, apply_stemming=False))
tqdm.pandas(desc="Aplicando Stemming para TF-IDF")
df['text_stemmed'] = df['text_cleaned'].progress_apply(lambda x: " ".join([stemmer.stem(word) for word in x.split()]))

Cargando dataset 'thefrankhsu/hate_speech_twitter'...
Tomando una muestra aleatoria de 5679 registros (de un total de 5679).
Distribución de etiquetas principales:
main_label
0    4163
1    1516
Name: count, dtype: int64

Distribución de sub-etiquetas (solo para 'odio'):
sub_label_str
Race                   523
Sexual Orientation     429
Gender                 279
Physical Appearance     73
Religion                52
Behavior                40
Class                   40
Ethnicity               40
Disability              40
Name: count, dtype: int64

Mapeo de Sub-etiquetas: {'Behavior': 0, 'Class': 1, 'Disability': 2, 'Ethnicity': 3, 'Gender': 4, 'Physical Appearance': 5, 'Race': 6, 'Religion': 7, 'Sexual Orientation': 8, 'not-hate': 9}


Limpiando Texto para Embeddings: 100%|██████████| 5679/5679 [00:00<00:00, 17038.80it/s]
Aplicando Stemming para TF-IDF: 100%|██████████| 5679/5679 [00:00<00:00, 11101.37it/s]


## 3. Generación de Embeddings y División de Datos

In [4]:
print(f"Cargando modelo y tokenizador BERT: {BERT_MODEL_NAME}")
tokenizer_bert = AutoTokenizer.from_pretrained(BERT_MODEL_NAME)
model_bert = AutoModel.from_pretrained(BERT_MODEL_NAME).to(device)
model_bert.eval()

def get_bert_embeddings(batch_text):
    inputs = tokenizer_bert(batch_text, padding=True, truncation=True, max_length=MAX_TOKEN_LENGTH, return_tensors='pt')
    inputs = {k: v.to(device) for k, v in inputs.items()}
    with torch.no_grad():
        outputs = model_bert(**inputs)
    return outputs.last_hidden_state[:, 0, :].cpu().numpy()

print("Generando embeddings...")
batch_size = 32
all_embeddings = np.vstack([get_bert_embeddings(df.iloc[i:i+batch_size]['text_cleaned'].tolist()) for i in tqdm(range(0, len(df), batch_size))])

embedding_cols = [f'dim_{i}' for i in range(all_embeddings.shape[1])]
df_embeddings = pd.DataFrame(all_embeddings, columns=embedding_cols, index=df.index)
df_processed = pd.concat([df, df_embeddings], axis=1)

print("\n--- Dividiendo Datos ---")
y_main = df_processed['main_label'].values
df_trainval, df_test = train_test_split(df_processed, test_size=0.2, random_state=42, stratify=y_main)
y_trainval_main = df_trainval['main_label'].values
df_train, df_val = train_test_split(df_trainval, test_size=0.25, random_state=42, stratify=y_trainval_main)

print(f"Tamaño Train: {len(df_train)}, Val: {len(df_val)}, Test: {len(df_test)}")

Cargando modelo y tokenizador BERT: bert-base-uncased
Generando embeddings...


100%|██████████| 178/178 [00:05<00:00, 34.57it/s]



--- Dividiendo Datos ---
Tamaño Train: 3407, Val: 1136, Test: 1136


## 4. Aumento de Datos Sintéticos para Sub-categorías (CTGAN)
Nos enfocamos en el desbalance de las sub-categorías de 'odio'. Usaremos CTGAN para generar nuevos datos de entrenamiento para las clases minoritarias, basándonos en sus embeddings. **Importante**: CTGAN solo genera embeddings sintéticos; no puede generar texto. La parte de TF-IDF para estos datos se manejará más adelante.

In [5]:
from sdv.single_table import CTGANSynthesizer
from sdv.metadata import SingleTableMetadata
from sklearn.preprocessing import LabelEncoder

print("--- Preparando datos para aumento ---")
# 1. Aislar los datos de entrenamiento que son 'odio'
df_train_hate = df_train[df_train['main_label'] == 1].copy()
features_to_augment = ['sub_label_str'] + embedding_cols
df_to_augment = df_train_hate[features_to_augment]

print("Distribución de sub-categorías ANTES del aumento:")
hate_counts = df_to_augment['sub_label_str'].value_counts()
print(hate_counts)

# Crear un nuevo encoder dedicado SOLO para las sub-categorías de odio.
sub_hate_only_encoder = LabelEncoder()
df_synthetic = pd.DataFrame() # Inicializar como dataframe vacío

if len(hate_counts) > 1 and not df_to_augment.empty:
    # Ajustar el nuevo encoder solo con las etiquetas de odio
    sub_hate_only_encoder.fit(df_to_augment['sub_label_str'])
    print("\nNuevo encoder para Nivel 2 creado. Clases:", sub_hate_only_encoder.classes_)
    
    # 2. Configurar metadatos
    metadata = SingleTableMetadata()
    metadata.detect_from_dataframe(data=df_to_augment)
    
    print("\nActualizando metadatos para tratar embeddings como numéricos continuos...")
    for col in embedding_cols:
        metadata.update_column(column_name=col, sdtype='numerical')
    metadata.update_column(column_name='sub_label_str', sdtype='categorical')

    # 3. Configurar y entrenar el sintetizador CTGAN
    use_gpu = torch.cuda.is_available()
    print(f"Usando GPU: {use_gpu}")
    synthesizer = CTGANSynthesizer(
        metadata, 
        epochs=150, 
        embedding_dim=64, 
        verbose=False,
        cuda=use_gpu  
    )
    
    print(f"\nEntrenando CTGAN para generar datos sintéticos... (Usando GPU: {use_gpu})")
    synthesizer.fit(df_to_augment)

    # 4. Determinar cuántas muestras generar
    max_class_size = hate_counts.max()
    num_to_generate = max_class_size * len(hate_counts) - hate_counts.sum()

    # 5. Generar y combinar datos
    if num_to_generate > 0:
        print(f"\nGenerando {num_to_generate} muestras sintéticas...")
        df_synthetic = synthesizer.sample(num_rows=num_to_generate)
        df_train_hate_balanced = pd.concat([df_to_augment, df_synthetic], ignore_index=True)
    else:
        df_train_hate_balanced = df_to_augment
else:
    print("\nSolo una sub-categoría presente o no hay datos de odio, no se requiere aumento.")
    df_train_hate_balanced = df_to_augment
    if not df_to_augment.empty:
        # Ajustar el encoder si solo hay una clase
        sub_hate_only_encoder.fit(df_to_augment['sub_label_str'])

print("\nDistribución de sub-categorías DESPUÉS del aumento:")
all_sub_labels = df_to_augment['sub_label_str'].unique()
print(df_train_hate_balanced['sub_label_str'].value_counts().reindex(all_sub_labels, fill_value=0))

# Preparar datos de entrenamiento para el clasificador de sub-categorías
if not df_train_hate_balanced.empty:
    X_train_sub_emb = df_train_hate_balanced[embedding_cols].values
    # Usar el NUEVO encoder para transformar las etiquetas
    y_train_sub = sub_hate_only_encoder.transform(df_train_hate_balanced['sub_label_str'])
else:
    # Crear arrays vacíos si no hay datos para evitar errores posteriores
    X_train_sub_emb = np.array([]).reshape(0, len(embedding_cols))
    y_train_sub = np.array([])

--- Preparando datos para aumento ---
Distribución de sub-categorías ANTES del aumento:
sub_label_str
Race                   321
Sexual Orientation     271
Gender                 158
Physical Appearance     43
Religion                27
Class                   25
Ethnicity               23
Disability              22
Behavior                20
Name: count, dtype: int64

Nuevo encoder para Nivel 2 creado. Clases: ['Behavior' 'Class' 'Disability' 'Ethnicity' 'Gender'
 'Physical Appearance' 'Race' 'Religion' 'Sexual Orientation']

Actualizando metadatos para tratar embeddings como numéricos continuos...
Usando GPU: True

Entrenando CTGAN para generar datos sintéticos... (Usando GPU: True)




PerformanceAlert: Using the CTGANSynthesizer on this data is not recommended. To model this data, CTGAN will generate a large number of columns.

Original Column Name   Est # of Columns (CTGAN)
sub_label_str          9
dim_0                  11
dim_1                  11
dim_2                  11
dim_3                  11
dim_4                  11
dim_5                  11
dim_6                  11
dim_7                  11
dim_8                  11
dim_9                  11
dim_10                 11
dim_11                 11
dim_12                 11
dim_13                 11
dim_14                 11
dim_15                 11
dim_16                 11
dim_17                 11
dim_18                 11
dim_19                 11
dim_20                 11
dim_21                 11
dim_22                 11
dim_23                 11
dim_24                 11
dim_25                 11
dim_26                 11
dim_27                 11
dim_28                 11
dim_29                 11
d

## 5. Entrenamiento del Clasificador Principal (Nivel 1) con Optuna y Ensemble Extendido

Aquí es donde integramos el pipeline de entrenamiento robusto. Entrenaremos y optimizaremos un **ensemble de seis modelos** (XGBoost y MLP usando solo embeddings, los mismos dos usando embeddings + TF-IDF, y dos Regresiones Logísticas usando cada tipo de característica por separado). Este proceso no utiliza los datos aumentados, solo el conjunto de entrenamiento original.

In [6]:
import optuna
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import warnings

warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

print("--- Preparando datos para el entrenamiento del Clasificador Principal ---")

# Usar las variables correctas del split jerárquico
y_train = df_train['main_label'].values
y_val = df_val['main_label'].values
num_classes = len(np.unique(y_train)) # Será 2 en este caso

# 1. Escalar características de embeddings
scaler_L1_emb = StandardScaler()
X_train_emb = df_train[embedding_cols].values
X_train_emb_scaled = scaler_L1_emb.fit_transform(X_train_emb)
X_val_emb = df_val[embedding_cols].values
X_val_emb_scaled = scaler_L1_emb.transform(X_val_emb)

# 2. Vectorizar características de texto con TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=10000, ngram_range=(1, 2))
X_train_text = df_train['text_stemmed'].values
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train_text)
X_val_text = df_val['text_stemmed'].values
X_val_tfidf = tfidf_vectorizer.transform(X_val_text)
print(f"TF-IDF: {X_train_tfidf.shape[1]} características generadas.")

# 3. Crear características combinadas (Embeddings + TF-IDF)
# Para XGBoost y modelos de Scikit-learn, usamos la matriz sparse combinada
X_train_combined = hstack([X_train_emb, X_train_tfidf]).tocsr()
X_val_combined = hstack([X_val_emb, X_val_tfidf]).tocsr()

# Para MLP, necesitamos una matriz densa. Combinamos embeddings escalados y TF-IDF denso.
X_train_combined_dense = np.hstack([X_train_emb_scaled, X_train_tfidf.toarray()])
X_val_combined_dense = np.hstack([X_val_emb_scaled, X_val_tfidf.toarray()])

# 4. Convertir datos a tensores para PyTorch
X_val_torch_emb = torch.tensor(X_val_emb_scaled, dtype=torch.float32).to(device)
X_val_torch_combined = torch.tensor(X_val_combined_dense, dtype=torch.float32).to(device)
y_val_torch = torch.tensor(y_val, dtype=torch.long).to(device)

print("\n✓ Datos escalados, vectorizados y tensores de PyTorch listos para el Nivel 1.")

--- Preparando datos para el entrenamiento del Clasificador Principal ---
TF-IDF: 10000 características generadas.

✓ Datos escalados, vectorizados y tensores de PyTorch listos para el Nivel 1.


In [7]:
# --- Clase genérica para MLP ---
class MLP(nn.Module):
    def __init__(self, input_size, hidden_layers, output_size, activation_fn, dropout_rate):
        super(MLP, self).__init__()
        layers = []
        current_size = input_size
        for hidden_size in hidden_layers:
            layers.append(nn.Linear(current_size, hidden_size))
            layers.append(activation_fn())
            layers.append(nn.Dropout(dropout_rate))
            current_size = hidden_size
        layers.append(nn.Linear(current_size, output_size))
        self.model = nn.Sequential(*layers)

    def forward(self, x):
        return self.model(x)

# --- Función genérica para entrenar y evaluar MLP en Optuna ---
def train_eval_mlp_objective(trial, X_train_data, y_train_data, X_val_tensor, y_val_tensor, input_dim):
    n_layers = trial.suggest_int('n_layers', 1, 3)
    hidden_layers = [trial.suggest_int(f'n_units_l{i}', 32, 256) for i in range(n_layers)]
    dropout_rate = trial.suggest_float('dropout_rate', 0.1, 0.5)
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'RMSprop'])
    lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
    activation_fn = getattr(nn, trial.suggest_categorical('activation', ['ReLU', 'Tanh']))
    
    model = MLP(input_dim, hidden_layers, num_classes, activation_fn, dropout_rate).to(device)
    optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    train_dataset = TensorDataset(torch.tensor(X_train_data, dtype=torch.float32), torch.tensor(y_train_data, dtype=torch.long))
    train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

    for epoch in range(25):
        model.train()
        for data, target in train_loader:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            loss = criterion(model(data), target)
            loss.backward()
            optimizer.step()
    
    model.eval()
    with torch.no_grad():
        val_loss = criterion(model(X_val_tensor), y_val_tensor).item()
    
    trial.report(val_loss, epoch)
    if trial.should_prune(): raise optuna.exceptions.TrialPruned()
    return val_loss

# --- 1. Objetivo para MLP (Usa solo Embeddings) ---
def objective_mlp_embeddings(trial):
    return train_eval_mlp_objective(trial, X_train_emb_scaled, y_train, X_val_torch_emb, y_val_torch, X_train_emb_scaled.shape[1])

# --- 2. Objetivo para MLP (Usa Embeddings + TF-IDF) ---
def objective_mlp_combined(trial):
    return train_eval_mlp_objective(trial, X_train_combined_dense, y_train, X_val_torch_combined, y_val_torch, X_train_combined_dense.shape[1])

# --- 3. Objetivo para XGBoost (Usa solo Embeddings) ---
def objective_xgboost_embeddings(trial):
    params = {
        'objective': 'binary:logistic', 'eval_metric': 'logloss',
        'device': 'cuda' if device.type == 'cuda' else 'cpu',
        'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'max_depth': trial.suggest_int('max_depth', 3, 9),
    }
    model = xgb.XGBClassifier(**params, early_stopping_rounds=10)
    model.fit(X_train_emb, y_train, eval_set=[(X_val_emb, y_val)], verbose=False)
    return log_loss(y_val, model.predict_proba(X_val_emb))

# --- 4. Objetivo para XGBoost (Usa Embeddings + TF-IDF) ---
def objective_xgboost_combined(trial):
    params = {
        'objective': 'binary:logistic', 'eval_metric': 'logloss',
        'device': 'cuda' if device.type == 'cuda' else 'cpu',
        'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'max_depth': trial.suggest_int('max_depth', 3, 9),
    }
    model = xgb.XGBClassifier(**params, early_stopping_rounds=10)
    model.fit(X_train_combined, y_train, eval_set=[(X_val_combined, y_val)], verbose=False)
    return log_loss(y_val, model.predict_proba(X_val_combined))

# --- 5. Objetivo para Regresión Logística (Usa solo Embeddings) ---
def objective_logistic_embeddings(trial):
    params = {'C': trial.suggest_float('C', 1e-4, 1e2, log=True), 'solver': 'liblinear', 'max_iter': 1000}
    model = LogisticRegression(**params, random_state=42)
    model.fit(X_train_emb_scaled, y_train)
    return log_loss(y_val, model.predict_proba(X_val_emb_scaled))

# --- 6. Objetivo para Regresión Logística (Usa solo TF-IDF) ---
def objective_logistic_tfidf(trial):
    params = {'C': trial.suggest_float('C', 1e-2, 1e2, log=True), 'solver': 'liblinear', 'max_iter': 1000}
    model = LogisticRegression(**params, random_state=42)
    model.fit(X_train_tfidf, y_train)
    return log_loss(y_val, model.predict_proba(X_val_tfidf))

print(f"Funciones objetivo de Optuna para el Nivel 1 definidas.")

Funciones objetivo de Optuna para el Nivel 1 definidas.


In [8]:
models_config = {
    'XGBoost_Embeddings': {'objective_func': objective_xgboost_embeddings, 'n_trials': 25},
    'MLP_PyTorch_Embeddings': {'objective_func': objective_mlp_embeddings, 'n_trials': 30},
    'LogisticRegression_Embeddings': {'objective_func': objective_logistic_embeddings, 'n_trials': 20},
    'LogisticRegression_TFIDF': {'objective_func': objective_logistic_tfidf, 'n_trials': 20},
    'XGBoost_Combined': {'objective_func': objective_xgboost_combined, 'n_trials': 25},
    'MLP_PyTorch_Combined': {'objective_func': objective_mlp_combined, 'n_trials': 30}
}

model_results = {}

for model_name, config in models_config.items():
    print(f"\n--- Optimizando {model_name} (Nivel 1) ---")
    study = optuna.create_study(direction='minimize', sampler=optuna.samplers.TPESampler(seed=42))
    study.optimize(config['objective_func'], n_trials=config['n_trials'], show_progress_bar=True)
    
    model_results[model_name] = {
        'best_params': study.best_params,
        'best_score': study.best_value
    }
    print(f"✓ {model_name} completado. Mejor LogLoss: {study.best_value:.4f}")

[I 2025-06-25 16:34:11,285] A new study created in memory with name: no-name-b521c075-0adc-455c-b6a4-36877f2c1f5e



--- Optimizando XGBoost_Embeddings (Nivel 1) ---


Best trial: 0. Best value: 0.388348:   4%|▍         | 1/25 [00:01<00:47,  1.98s/it]

[I 2025-06-25 16:34:13,265] Trial 0 finished with value: 0.38834796796875687 and parameters: {'n_estimators': 437, 'learning_rate': 0.2536999076681772, 'max_depth': 8}. Best is trial 0 with value: 0.38834796796875687.


Best trial: 1. Best value: 0.330828:   8%|▊         | 2/25 [00:06<01:15,  3.26s/it]

[I 2025-06-25 16:34:17,431] Trial 1 finished with value: 0.3308276993342611 and parameters: {'n_estimators': 639, 'learning_rate': 0.01700037298921102, 'max_depth': 4}. Best is trial 1 with value: 0.3308276993342611.


Best trial: 1. Best value: 0.330828:  12%|█▏        | 3/25 [00:07<00:53,  2.44s/it]

[I 2025-06-25 16:34:18,897] Trial 2 finished with value: 0.35457881385286627 and parameters: {'n_estimators': 152, 'learning_rate': 0.19030368381735815, 'max_depth': 7}. Best is trial 1 with value: 0.3308276993342611.


Best trial: 1. Best value: 0.330828:  16%|█▌        | 4/25 [00:35<04:22, 12.50s/it]

[I 2025-06-25 16:34:46,806] Trial 3 finished with value: 0.3635742502413637 and parameters: {'n_estimators': 737, 'learning_rate': 0.010725209743171996, 'max_depth': 9}. Best is trial 1 with value: 0.3308276993342611.


Best trial: 4. Best value: 0.328352:  20%|██        | 5/25 [00:39<03:09,  9.46s/it]

[I 2025-06-25 16:34:50,884] Trial 4 finished with value: 0.3283516402816501 and parameters: {'n_estimators': 850, 'learning_rate': 0.020589728197687916, 'max_depth': 4}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  24%|██▍       | 6/25 [00:44<02:31,  7.96s/it]

[I 2025-06-25 16:34:55,942] Trial 5 finished with value: 0.3424010185363584 and parameters: {'n_estimators': 265, 'learning_rate': 0.028145092716060652, 'max_depth': 6}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  28%|██▊       | 7/25 [00:51<02:19,  7.74s/it]

[I 2025-06-25 16:35:03,216] Trial 6 finished with value: 0.3547687649305162 and parameters: {'n_estimators': 489, 'learning_rate': 0.02692655251486473, 'max_depth': 7}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  32%|███▏      | 8/25 [00:54<01:44,  6.13s/it]

[I 2025-06-25 16:35:05,898] Trial 7 finished with value: 0.3421056697542824 and parameters: {'n_estimators': 225, 'learning_rate': 0.027010527749605478, 'max_depth': 5}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  36%|███▌      | 9/25 [00:55<01:12,  4.50s/it]

[I 2025-06-25 16:35:06,831] Trial 8 finished with value: 0.3347771775892681 and parameters: {'n_estimators': 510, 'learning_rate': 0.14447746112718687, 'max_depth': 4}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  40%|████      | 10/25 [00:56<00:52,  3.47s/it]

[I 2025-06-25 16:35:07,991] Trial 9 finished with value: 0.3349609370112823 and parameters: {'n_estimators': 563, 'learning_rate': 0.07500118950416987, 'max_depth': 3}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  44%|████▍     | 11/25 [00:57<00:37,  2.70s/it]

[I 2025-06-25 16:35:08,947] Trial 10 finished with value: 0.3396220915620309 and parameters: {'n_estimators': 981, 'learning_rate': 0.06690992453172917, 'max_depth': 3}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  48%|████▊     | 12/25 [01:05<00:57,  4.39s/it]

[I 2025-06-25 16:35:17,207] Trial 11 finished with value: 0.33582109176999275 and parameters: {'n_estimators': 847, 'learning_rate': 0.010464817979692459, 'max_depth': 5}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  52%|█████▏    | 13/25 [01:10<00:53,  4.47s/it]

[I 2025-06-25 16:35:21,866] Trial 12 finished with value: 0.3310297457642165 and parameters: {'n_estimators': 710, 'learning_rate': 0.018042210081692857, 'max_depth': 4}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 13. Best value: 0.327103:  56%|█████▌    | 14/25 [01:12<00:41,  3.80s/it]

[I 2025-06-25 16:35:24,099] Trial 13 finished with value: 0.327103316371309 and parameters: {'n_estimators': 997, 'learning_rate': 0.04271369442805087, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  60%|██████    | 15/25 [01:15<00:34,  3.47s/it]

[I 2025-06-25 16:35:26,801] Trial 14 finished with value: 0.3292267280214686 and parameters: {'n_estimators': 978, 'learning_rate': 0.04410047566419462, 'max_depth': 5}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  64%|██████▍   | 16/25 [01:16<00:25,  2.85s/it]

[I 2025-06-25 16:35:28,220] Trial 15 finished with value: 0.3360596257760045 and parameters: {'n_estimators': 859, 'learning_rate': 0.044767771726731534, 'max_depth': 3}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  68%|██████▊   | 17/25 [01:18<00:20,  2.52s/it]

[I 2025-06-25 16:35:29,957] Trial 16 finished with value: 0.341524444466647 and parameters: {'n_estimators': 851, 'learning_rate': 0.11124442632859347, 'max_depth': 6}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  72%|███████▏  | 18/25 [01:21<00:17,  2.47s/it]

[I 2025-06-25 16:35:32,335] Trial 17 finished with value: 0.32880382190584445 and parameters: {'n_estimators': 999, 'learning_rate': 0.037451197538989976, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  76%|███████▌  | 19/25 [01:28<00:24,  4.06s/it]

[I 2025-06-25 16:35:40,103] Trial 18 finished with value: 0.3435131842583088 and parameters: {'n_estimators': 795, 'learning_rate': 0.017250820550283107, 'max_depth': 6}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  80%|████████  | 20/25 [01:30<00:16,  3.23s/it]

[I 2025-06-25 16:35:41,399] Trial 19 finished with value: 0.332489871213085 and parameters: {'n_estimators': 927, 'learning_rate': 0.09154387249271213, 'max_depth': 5}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  84%|████████▍ | 21/25 [01:31<00:10,  2.60s/it]

[I 2025-06-25 16:35:42,522] Trial 20 finished with value: 0.34187274331030476 and parameters: {'n_estimators': 377, 'learning_rate': 0.05547535231708596, 'max_depth': 3}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  88%|████████▊ | 22/25 [01:33<00:07,  2.45s/it]

[I 2025-06-25 16:35:44,623] Trial 21 finished with value: 0.3295852792436287 and parameters: {'n_estimators': 997, 'learning_rate': 0.036712507316242, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  92%|█████████▏| 23/25 [01:35<00:04,  2.43s/it]

[I 2025-06-25 16:35:46,989] Trial 22 finished with value: 0.3294519604453533 and parameters: {'n_estimators': 898, 'learning_rate': 0.03366151211963755, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  96%|█████████▌| 24/25 [01:39<00:02,  2.81s/it]

[I 2025-06-25 16:35:50,686] Trial 23 finished with value: 0.32791301563696784 and parameters: {'n_estimators': 784, 'learning_rate': 0.02114521057583463, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103: 100%|██████████| 25/25 [01:46<00:00,  4.24s/it]
[I 2025-06-25 16:35:57,349] A new study created in memory with name: no-name-faf13f8f-8c19-4f89-871a-8a304fbcbd51


[I 2025-06-25 16:35:57,344] Trial 24 finished with value: 0.3344977433278849 and parameters: {'n_estimators': 749, 'learning_rate': 0.01493805762572708, 'max_depth': 5}. Best is trial 13 with value: 0.327103316371309.
✓ XGBoost_Embeddings completado. Mejor LogLoss: 0.3271

--- Optimizando MLP_PyTorch_Embeddings (Nivel 1) ---


Best trial: 0. Best value: 0.374582:   3%|▎         | 1/30 [00:01<00:50,  1.74s/it]

[I 2025-06-25 16:35:59,086] Trial 0 finished with value: 0.37458160519599915 and parameters: {'n_layers': 2, 'n_units_l0': 245, 'n_units_l1': 196, 'dropout_rate': 0.3394633936788146, 'optimizer': 'Adam', 'lr': 1.493656855461762e-05, 'activation': 'ReLU'}. Best is trial 0 with value: 0.37458160519599915.


Best trial: 1. Best value: 0.348226:   7%|▋         | 2/30 [00:03<00:50,  1.82s/it]

[I 2025-06-25 16:36:00,959] Trial 1 finished with value: 0.3482261896133423 and parameters: {'n_layers': 3, 'n_units_l0': 36, 'n_units_l1': 250, 'n_units_l2': 219, 'dropout_rate': 0.18493564427131048, 'optimizer': 'RMSprop', 'lr': 8.17949947521167e-05, 'activation': 'ReLU'}. Best is trial 1 with value: 0.3482261896133423.


Best trial: 1. Best value: 0.348226:  10%|█         | 3/30 [00:05<00:44,  1.66s/it]

[I 2025-06-25 16:36:02,432] Trial 2 finished with value: 0.3925720155239105 and parameters: {'n_layers': 1, 'n_units_l0': 169, 'dropout_rate': 0.15579754426081674, 'optimizer': 'RMSprop', 'lr': 0.00023345864076016249, 'activation': 'ReLU'}. Best is trial 1 with value: 0.3482261896133423.


Best trial: 1. Best value: 0.348226:  13%|█▎        | 4/30 [00:06<00:43,  1.68s/it]

[I 2025-06-25 16:36:04,154] Trial 3 finished with value: 0.7316364645957947 and parameters: {'n_layers': 2, 'n_units_l0': 165, 'n_units_l1': 42, 'dropout_rate': 0.34301794076057535, 'optimizer': 'Adam', 'lr': 0.007025166339242158, 'activation': 'ReLU'}. Best is trial 1 with value: 0.3482261896133423.


Best trial: 4. Best value: 0.314528:  17%|█▋        | 5/30 [00:08<00:41,  1.64s/it]

[I 2025-06-25 16:36:05,731] Trial 4 finished with value: 0.3145275115966797 and parameters: {'n_layers': 1, 'n_units_l0': 53, 'dropout_rate': 0.3736932106048628, 'optimizer': 'Adam', 'lr': 0.0003058656666978527, 'activation': 'Tanh'}. Best is trial 4 with value: 0.3145275115966797.


Best trial: 4. Best value: 0.314528:  20%|██        | 6/30 [00:09<00:38,  1.62s/it]

[I 2025-06-25 16:36:07,293] Trial 5 finished with value: 0.31544774770736694 and parameters: {'n_layers': 1, 'n_units_l0': 181, 'dropout_rate': 0.2246844304357644, 'optimizer': 'RMSprop', 'lr': 3.585612610345396e-05, 'activation': 'ReLU'}. Best is trial 4 with value: 0.3145275115966797.


Best trial: 4. Best value: 0.314528:  23%|██▎       | 7/30 [00:11<00:39,  1.71s/it]

[I 2025-06-25 16:36:09,196] Trial 6 pruned. 


Best trial: 4. Best value: 0.314528:  27%|██▋       | 8/30 [00:13<00:39,  1.78s/it]

[I 2025-06-25 16:36:11,125] Trial 7 pruned. 


Best trial: 4. Best value: 0.314528:  30%|███       | 9/30 [00:15<00:35,  1.70s/it]

[I 2025-06-25 16:36:12,641] Trial 8 finished with value: 0.3505118489265442 and parameters: {'n_layers': 1, 'n_units_l0': 215, 'dropout_rate': 0.38274293753904687, 'optimizer': 'RMSprop', 'lr': 1.667761543019792e-05, 'activation': 'ReLU'}. Best is trial 4 with value: 0.3145275115966797.


Best trial: 4. Best value: 0.314528:  33%|███▎      | 10/30 [00:17<00:34,  1.74s/it]

[I 2025-06-25 16:36:14,471] Trial 9 pruned. 


Best trial: 4. Best value: 0.314528:  37%|███▋      | 11/30 [00:18<00:31,  1.67s/it]

[I 2025-06-25 16:36:16,000] Trial 10 finished with value: 0.3487488925457001 and parameters: {'n_layers': 1, 'n_units_l0': 33, 'dropout_rate': 0.48781484431514643, 'optimizer': 'Adam', 'lr': 0.0013980053602502692, 'activation': 'Tanh'}. Best is trial 4 with value: 0.3145275115966797.


Best trial: 4. Best value: 0.314528:  40%|████      | 12/30 [00:20<00:29,  1.64s/it]

[I 2025-06-25 16:36:17,561] Trial 11 finished with value: 0.3382868468761444 and parameters: {'n_layers': 1, 'n_units_l0': 97, 'dropout_rate': 0.2786611502863705, 'optimizer': 'RMSprop', 'lr': 6.641310150471283e-05, 'activation': 'Tanh'}. Best is trial 4 with value: 0.3145275115966797.


Best trial: 4. Best value: 0.314528:  43%|████▎     | 13/30 [00:22<00:28,  1.69s/it]

[I 2025-06-25 16:36:19,361] Trial 12 pruned. 


Best trial: 4. Best value: 0.314528:  47%|████▋     | 14/30 [00:23<00:26,  1.67s/it]

[I 2025-06-25 16:36:20,991] Trial 13 pruned. 


Best trial: 14. Best value: 0.314105:  50%|█████     | 15/30 [00:25<00:24,  1.66s/it]

[I 2025-06-25 16:36:22,625] Trial 14 finished with value: 0.31410539150238037 and parameters: {'n_layers': 1, 'n_units_l0': 65, 'dropout_rate': 0.4505764863334542, 'optimizer': 'Adam', 'lr': 0.00023422328621630747, 'activation': 'Tanh'}. Best is trial 14 with value: 0.31410539150238037.


Best trial: 14. Best value: 0.314105:  53%|█████▎    | 16/30 [00:27<00:23,  1.69s/it]

[I 2025-06-25 16:36:24,382] Trial 15 finished with value: 0.328482449054718 and parameters: {'n_layers': 2, 'n_units_l0': 68, 'n_units_l1': 46, 'dropout_rate': 0.48684929702150676, 'optimizer': 'Adam', 'lr': 0.0002597866463064149, 'activation': 'Tanh'}. Best is trial 14 with value: 0.31410539150238037.


Best trial: 14. Best value: 0.314105:  57%|█████▋    | 17/30 [00:28<00:21,  1.66s/it]

[I 2025-06-25 16:36:25,973] Trial 16 finished with value: 0.3175424039363861 and parameters: {'n_layers': 1, 'n_units_l0': 62, 'dropout_rate': 0.4284712651840725, 'optimizer': 'Adam', 'lr': 0.0001760434411056533, 'activation': 'Tanh'}. Best is trial 14 with value: 0.31410539150238037.


Best trial: 14. Best value: 0.314105:  60%|██████    | 18/30 [00:30<00:20,  1.68s/it]

[I 2025-06-25 16:36:27,695] Trial 17 pruned. 


Best trial: 14. Best value: 0.314105:  63%|██████▎   | 19/30 [00:31<00:18,  1.64s/it]

[I 2025-06-25 16:36:29,255] Trial 18 pruned. 


Best trial: 14. Best value: 0.314105:  67%|██████▋   | 20/30 [00:33<00:16,  1.66s/it]

[I 2025-06-25 16:36:30,972] Trial 19 pruned. 


Best trial: 14. Best value: 0.314105:  70%|███████   | 21/30 [00:35<00:14,  1.64s/it]

[I 2025-06-25 16:36:32,573] Trial 20 finished with value: 0.315868079662323 and parameters: {'n_layers': 1, 'n_units_l0': 46, 'dropout_rate': 0.45302187107736813, 'optimizer': 'Adam', 'lr': 0.0004171430962194586, 'activation': 'Tanh'}. Best is trial 14 with value: 0.31410539150238037.


Best trial: 14. Best value: 0.314105:  73%|███████▎  | 22/30 [00:36<00:12,  1.60s/it]

[I 2025-06-25 16:36:34,071] Trial 21 pruned. 


Best trial: 14. Best value: 0.314105:  77%|███████▋  | 23/30 [00:38<00:11,  1.58s/it]

[I 2025-06-25 16:36:35,608] Trial 22 pruned. 


Best trial: 14. Best value: 0.314105:  80%|████████  | 24/30 [00:39<00:09,  1.57s/it]

[I 2025-06-25 16:36:37,157] Trial 23 pruned. 


Best trial: 24. Best value: 0.313469:  83%|████████▎ | 25/30 [00:41<00:07,  1.58s/it]

[I 2025-06-25 16:36:38,745] Trial 24 finished with value: 0.313469260931015 and parameters: {'n_layers': 1, 'n_units_l0': 89, 'dropout_rate': 0.31374978928855246, 'optimizer': 'Adam', 'lr': 0.00014130518731795622, 'activation': 'Tanh'}. Best is trial 24 with value: 0.313469260931015.


Best trial: 24. Best value: 0.313469:  87%|████████▋ | 26/30 [00:42<00:06,  1.57s/it]

[I 2025-06-25 16:36:40,307] Trial 25 finished with value: 0.31921541690826416 and parameters: {'n_layers': 1, 'n_units_l0': 87, 'dropout_rate': 0.30981728216219306, 'optimizer': 'Adam', 'lr': 0.00011565513838081996, 'activation': 'Tanh'}. Best is trial 24 with value: 0.313469260931015.


Best trial: 24. Best value: 0.313469:  90%|█████████ | 27/30 [00:44<00:04,  1.61s/it]

[I 2025-06-25 16:36:41,996] Trial 26 pruned. 


Best trial: 24. Best value: 0.313469:  93%|█████████▎| 28/30 [00:46<00:03,  1.57s/it]

[I 2025-06-25 16:36:43,473] Trial 27 pruned. 


Best trial: 24. Best value: 0.313469:  97%|█████████▋| 29/30 [00:47<00:01,  1.57s/it]

[I 2025-06-25 16:36:45,059] Trial 28 finished with value: 0.31444671750068665 and parameters: {'n_layers': 1, 'n_units_l0': 80, 'dropout_rate': 0.33665023612082956, 'optimizer': 'Adam', 'lr': 0.0001808953807899914, 'activation': 'Tanh'}. Best is trial 24 with value: 0.313469260931015.


Best trial: 24. Best value: 0.313469: 100%|██████████| 30/30 [00:49<00:00,  1.65s/it]
[I 2025-06-25 16:36:46,786] A new study created in memory with name: no-name-fb6deccb-5957-44cd-b6bd-f6701a36258d


[I 2025-06-25 16:36:46,783] Trial 29 pruned. 
✓ MLP_PyTorch_Embeddings completado. Mejor LogLoss: 0.3135

--- Optimizando LogisticRegression_Embeddings (Nivel 1) ---


Best trial: 0. Best value: 0.319224:   5%|▌         | 1/20 [00:00<00:09,  1.93it/s]

[I 2025-06-25 16:36:47,306] Trial 0 finished with value: 0.31922374641950113 and parameters: {'C': 0.017670169402947963}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  10%|█         | 2/20 [00:03<00:39,  2.22s/it]

[I 2025-06-25 16:36:50,715] Trial 1 finished with value: 1.5955142482896418 and parameters: {'C': 50.61576888752309}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  15%|█▌        | 3/20 [00:05<00:33,  2.00s/it]

[I 2025-06-25 16:36:52,449] Trial 2 finished with value: 0.8098998458689198 and parameters: {'C': 2.465832945854912}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  20%|██        | 4/20 [00:06<00:27,  1.70s/it]

[I 2025-06-25 16:36:53,681] Trial 3 finished with value: 0.50440823117129 and parameters: {'C': 0.39079671568228835}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  25%|██▌       | 5/20 [00:07<00:17,  1.16s/it]

[I 2025-06-25 16:36:53,900] Trial 4 finished with value: 0.4062108845720676 and parameters: {'C': 0.0008632008168602544}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  35%|███▌      | 7/20 [00:07<00:08,  1.60it/s]

[I 2025-06-25 16:36:54,131] Trial 5 finished with value: 0.4062311119461451 and parameters: {'C': 0.0008629132190071859}. Best is trial 0 with value: 0.31922374641950113.
[I 2025-06-25 16:36:54,300] Trial 6 finished with value: 0.49179866551459905 and parameters: {'C': 0.00022310108018679258}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  40%|████      | 8/20 [00:09<00:13,  1.14s/it]

[I 2025-06-25 16:36:56,548] Trial 7 finished with value: 1.2866922738134148 and parameters: {'C': 15.741890047456648}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  45%|████▌     | 9/20 [00:10<00:12,  1.13s/it]

[I 2025-06-25 16:36:57,655] Trial 8 finished with value: 0.5085943495265486 and parameters: {'C': 0.4042872735027334}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  50%|█████     | 10/20 [00:12<00:12,  1.27s/it]

[I 2025-06-25 16:36:59,226] Trial 9 finished with value: 0.7405196393655641 and parameters: {'C': 1.7718847354806828}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 10. Best value: 0.319215:  55%|█████▌    | 11/20 [00:12<00:09,  1.02s/it]

[I 2025-06-25 16:36:59,694] Trial 10 finished with value: 0.31921511298162764 and parameters: {'C': 0.017654677164766052}. Best is trial 10 with value: 0.31921511298162764.


Best trial: 10. Best value: 0.319215:  60%|██████    | 12/20 [00:13<00:06,  1.17it/s]

[I 2025-06-25 16:37:00,178] Trial 11 finished with value: 0.31963801799006214 and parameters: {'C': 0.018388382022356858}. Best is trial 10 with value: 0.31921511298162764.


Best trial: 12. Best value: 0.31745:  65%|██████▌   | 13/20 [00:13<00:05,  1.35it/s] 

[I 2025-06-25 16:37:00,655] Trial 12 finished with value: 0.3174502549962926 and parameters: {'C': 0.012798666702408415}. Best is trial 12 with value: 0.3174502549962926.


Best trial: 13. Best value: 0.317428:  70%|███████   | 14/20 [00:14<00:03,  1.51it/s]

[I 2025-06-25 16:37:01,125] Trial 13 finished with value: 0.31742845333408026 and parameters: {'C': 0.012053282859112415}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  75%|███████▌  | 15/20 [00:14<00:02,  1.77it/s]

[I 2025-06-25 16:37:01,468] Trial 14 finished with value: 0.345137252213233 and parameters: {'C': 0.0028718250465162133}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  80%|████████  | 16/20 [00:15<00:02,  1.60it/s]

[I 2025-06-25 16:37:02,231] Trial 15 finished with value: 0.3535279019185097 and parameters: {'C': 0.061296196372412286}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  85%|████████▌ | 17/20 [00:15<00:01,  1.88it/s]

[I 2025-06-25 16:37:02,551] Trial 16 finished with value: 0.338999567018194 and parameters: {'C': 0.003409264942037956}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  90%|█████████ | 18/20 [00:16<00:01,  1.62it/s]

[I 2025-06-25 16:37:03,366] Trial 17 finished with value: 0.3731334741878992 and parameters: {'C': 0.08901581332229032}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428: 100%|██████████| 20/20 [00:17<00:00,  1.17it/s]
[I 2025-06-25 16:37:03,847] A new study created in memory with name: no-name-43286d0c-46e6-4a68-964d-0fbe30bbe8e8


[I 2025-06-25 16:37:03,714] Trial 18 finished with value: 0.33286536853181986 and parameters: {'C': 0.00415403421342495}. Best is trial 13 with value: 0.31742845333408026.
[I 2025-06-25 16:37:03,843] Trial 19 finished with value: 0.5296032502671237 and parameters: {'C': 0.00011996661220636725}. Best is trial 13 with value: 0.31742845333408026.
✓ LogisticRegression_Embeddings completado. Mejor LogLoss: 0.3174

--- Optimizando LogisticRegression_TFIDF (Nivel 1) ---


Best trial: 0. Best value: 0.355276:   0%|          | 0/20 [00:00<?, ?it/s]

[I 2025-06-25 16:37:03,866] Trial 0 finished with value: 0.35527615580839433 and parameters: {'C': 0.31489116479568624}. Best is trial 0 with value: 0.35527615580839433.


Best trial: 7. Best value: 0.183261:  35%|███▌      | 7/20 [00:00<00:00, 39.02it/s]

[I 2025-06-25 16:37:03,901] Trial 1 finished with value: 0.1924042412264549 and parameters: {'C': 63.512210106407046}. Best is trial 1 with value: 0.1924042412264549.
[I 2025-06-25 16:37:03,930] Trial 2 finished with value: 0.18620053322128527 and parameters: {'C': 8.471801418819979}. Best is trial 2 with value: 0.18620053322128527.
[I 2025-06-25 16:37:03,949] Trial 3 finished with value: 0.21830131127974442 and parameters: {'C': 2.481040974867813}. Best is trial 2 with value: 0.18620053322128527.
[I 2025-06-25 16:37:03,964] Trial 4 finished with value: 0.5133252839930367 and parameters: {'C': 0.04207988669606638}. Best is trial 2 with value: 0.18620053322128527.
[I 2025-06-25 16:37:03,980] Trial 5 finished with value: 0.5133373460531051 and parameters: {'C': 0.042070539502879395}. Best is trial 2 with value: 0.18620053322128527.
[I 2025-06-25 16:37:03,994] Trial 6 finished with value: 0.5502256967712097 and parameters: {'C': 0.017073967431528128}. Best is trial 2 with value: 0.1862005

Best trial: 7. Best value: 0.183261:  45%|████▌     | 9/20 [00:00<00:00, 45.26it/s]

[I 2025-06-25 16:37:04,050] Trial 8 finished with value: 0.21739142624844643 and parameters: {'C': 2.5378155082656657}. Best is trial 7 with value: 0.18326064109811352.
[I 2025-06-25 16:37:04,066] Trial 9 finished with value: 0.18956593831523136 and parameters: {'C': 6.79657809075816}. Best is trial 7 with value: 0.18326064109811352.


Best trial: 13. Best value: 0.181403:  70%|███████   | 14/20 [00:00<00:00, 36.75it/s]

[I 2025-06-25 16:37:04,102] Trial 10 finished with value: 0.19640185578666278 and parameters: {'C': 82.29631658321766}. Best is trial 7 with value: 0.18326064109811352.
[I 2025-06-25 16:37:04,133] Trial 11 finished with value: 0.18159010722748264 and parameters: {'C': 20.996451336399733}. Best is trial 11 with value: 0.18159010722748264.
[I 2025-06-25 16:37:04,166] Trial 12 finished with value: 0.18459080131066244 and parameters: {'C': 34.043053954671954}. Best is trial 11 with value: 0.18159010722748264.
[I 2025-06-25 16:37:04,214] Trial 13 finished with value: 0.18140298496485857 and parameters: {'C': 17.14446414712941}. Best is trial 13 with value: 0.18140298496485857.
[I 2025-06-25 16:37:04,230] Trial 14 finished with value: 0.33732918864876865 and parameters: {'C': 0.3880040916855655}. Best is trial 13 with value: 0.18140298496485857.


Best trial: 13. Best value: 0.181403:  75%|███████▌  | 15/20 [00:00<00:00, 36.75it/s]

[I 2025-06-25 16:37:04,249] Trial 15 finished with value: 0.1836636745703452 and parameters: {'C': 10.66261287572915}. Best is trial 13 with value: 0.18140298496485857.


Best trial: 17. Best value: 0.181396: 100%|██████████| 20/20 [00:00<00:00, 37.46it/s]
[I 2025-06-25 16:37:04,381] A new study created in memory with name: no-name-d5979086-7a6c-42d3-8727-8792f14a1ab5


[I 2025-06-25 16:37:04,285] Trial 16 finished with value: 0.27098679035802226 and parameters: {'C': 0.9253254956089687}. Best is trial 13 with value: 0.18140298496485857.
[I 2025-06-25 16:37:04,316] Trial 17 finished with value: 0.18139642209754364 and parameters: {'C': 18.63934843098346}. Best is trial 17 with value: 0.18139642209754364.
[I 2025-06-25 16:37:04,333] Trial 18 finished with value: 0.20862394501160347 and parameters: {'C': 3.217618717445632}. Best is trial 17 with value: 0.18139642209754364.
[I 2025-06-25 16:37:04,378] Trial 19 finished with value: 0.19892671588852887 and parameters: {'C': 96.04396902719638}. Best is trial 17 with value: 0.18139642209754364.
✓ LogisticRegression_TFIDF completado. Mejor LogLoss: 0.1814

--- Optimizando XGBoost_Combined (Nivel 1) ---


Best trial: 0. Best value: 0.238061:   4%|▍         | 1/25 [00:02<00:58,  2.45s/it]

[I 2025-06-25 16:37:06,829] Trial 0 finished with value: 0.23806062977392978 and parameters: {'n_estimators': 437, 'learning_rate': 0.2536999076681772, 'max_depth': 8}. Best is trial 0 with value: 0.23806062977392978.


Best trial: 1. Best value: 0.215217:   8%|▊         | 2/25 [00:08<01:45,  4.59s/it]

[I 2025-06-25 16:37:12,920] Trial 1 finished with value: 0.2152167853848329 and parameters: {'n_estimators': 639, 'learning_rate': 0.01700037298921102, 'max_depth': 4}. Best is trial 1 with value: 0.2152167853848329.


Best trial: 1. Best value: 0.215217:  12%|█▏        | 3/25 [00:11<01:20,  3.67s/it]

[I 2025-06-25 16:37:15,490] Trial 2 finished with value: 0.22039458418520458 and parameters: {'n_estimators': 152, 'learning_rate': 0.19030368381735815, 'max_depth': 7}. Best is trial 1 with value: 0.2152167853848329.


Best trial: 1. Best value: 0.215217:  16%|█▌        | 4/25 [00:47<05:50, 16.67s/it]

[I 2025-06-25 16:37:52,096] Trial 3 finished with value: 0.2219276950840297 and parameters: {'n_estimators': 737, 'learning_rate': 0.010725209743171996, 'max_depth': 9}. Best is trial 1 with value: 0.2152167853848329.


Best trial: 4. Best value: 0.200172:  20%|██        | 5/25 [00:57<04:40, 14.01s/it]

[I 2025-06-25 16:38:01,379] Trial 4 finished with value: 0.20017217889850955 and parameters: {'n_estimators': 850, 'learning_rate': 0.020589728197687916, 'max_depth': 4}. Best is trial 4 with value: 0.20017217889850955.


Best trial: 4. Best value: 0.200172:  24%|██▍       | 6/25 [01:03<03:38, 11.48s/it]

[I 2025-06-25 16:38:07,948] Trial 5 finished with value: 0.2176488521371926 and parameters: {'n_estimators': 265, 'learning_rate': 0.028145092716060652, 'max_depth': 6}. Best is trial 4 with value: 0.20017217889850955.


Best trial: 4. Best value: 0.200172:  28%|██▊       | 7/25 [01:14<03:25, 11.42s/it]

[I 2025-06-25 16:38:19,263] Trial 6 finished with value: 0.2109122392871085 and parameters: {'n_estimators': 489, 'learning_rate': 0.02692655251486473, 'max_depth': 7}. Best is trial 4 with value: 0.20017217889850955.


Best trial: 4. Best value: 0.200172:  32%|███▏      | 8/25 [01:18<02:31,  8.92s/it]

[I 2025-06-25 16:38:22,824] Trial 7 finished with value: 0.23264551260046573 and parameters: {'n_estimators': 225, 'learning_rate': 0.027010527749605478, 'max_depth': 5}. Best is trial 4 with value: 0.20017217889850955.


Best trial: 8. Best value: 0.198779:  36%|███▌      | 9/25 [01:20<01:46,  6.64s/it]

[I 2025-06-25 16:38:24,457] Trial 8 finished with value: 0.19877930515768735 and parameters: {'n_estimators': 510, 'learning_rate': 0.14447746112718687, 'max_depth': 4}. Best is trial 8 with value: 0.19877930515768735.


Best trial: 9. Best value: 0.192414:  40%|████      | 10/25 [01:22<01:20,  5.39s/it]

[I 2025-06-25 16:38:27,058] Trial 9 finished with value: 0.1924141040159233 and parameters: {'n_estimators': 563, 'learning_rate': 0.07500118950416987, 'max_depth': 3}. Best is trial 9 with value: 0.1924141040159233.


Best trial: 9. Best value: 0.192414:  44%|████▍     | 11/25 [01:24<01:01,  4.36s/it]

[I 2025-06-25 16:38:29,062] Trial 10 finished with value: 0.19575935587529228 and parameters: {'n_estimators': 951, 'learning_rate': 0.09121222976475842, 'max_depth': 3}. Best is trial 9 with value: 0.1924141040159233.


Best trial: 11. Best value: 0.188783:  48%|████▊     | 12/25 [01:27<00:49,  3.81s/it]

[I 2025-06-25 16:38:31,618] Trial 11 finished with value: 0.1887826456714964 and parameters: {'n_estimators': 939, 'learning_rate': 0.08362963273390475, 'max_depth': 3}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  52%|█████▏    | 13/25 [01:29<00:41,  3.44s/it]

[I 2025-06-25 16:38:34,210] Trial 12 finished with value: 0.1996727889664943 and parameters: {'n_estimators': 981, 'learning_rate': 0.062256311832958315, 'max_depth': 3}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  56%|█████▌    | 14/25 [01:32<00:34,  3.16s/it]

[I 2025-06-25 16:38:36,724] Trial 13 finished with value: 0.20248851128563558 and parameters: {'n_estimators': 744, 'learning_rate': 0.05615239504587521, 'max_depth': 3}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  60%|██████    | 15/25 [01:34<00:29,  2.91s/it]

[I 2025-06-25 16:38:39,067] Trial 14 finished with value: 0.20818913037045678 and parameters: {'n_estimators': 357, 'learning_rate': 0.09633960914309062, 'max_depth': 5}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  64%|██████▍   | 16/25 [01:37<00:26,  2.91s/it]

[I 2025-06-25 16:38:41,955] Trial 15 finished with value: 0.20274673577591695 and parameters: {'n_estimators': 650, 'learning_rate': 0.09691282898930703, 'max_depth': 5}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  68%|██████▊   | 17/25 [01:40<00:23,  2.92s/it]

[I 2025-06-25 16:38:44,903] Trial 16 finished with value: 0.20170497683898606 and parameters: {'n_estimators': 841, 'learning_rate': 0.044130015771797246, 'max_depth': 3}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  72%|███████▏  | 18/25 [01:42<00:18,  2.60s/it]

[I 2025-06-25 16:38:46,770] Trial 17 finished with value: 0.20262266862271097 and parameters: {'n_estimators': 631, 'learning_rate': 0.13798805957047916, 'max_depth': 4}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  76%|███████▌  | 19/25 [01:48<00:22,  3.70s/it]

[I 2025-06-25 16:38:53,020] Trial 18 finished with value: 0.2110039713943525 and parameters: {'n_estimators': 870, 'learning_rate': 0.042134314181247204, 'max_depth': 6}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  80%|████████  | 20/25 [01:51<00:17,  3.42s/it]

[I 2025-06-25 16:38:55,780] Trial 19 finished with value: 0.2076008756702082 and parameters: {'n_estimators': 365, 'learning_rate': 0.07422036921265081, 'max_depth': 5}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  84%|████████▍ | 21/25 [01:52<00:11,  2.80s/it]

[I 2025-06-25 16:38:57,154] Trial 20 finished with value: 0.19890435176532065 and parameters: {'n_estimators': 758, 'learning_rate': 0.14873707144352227, 'max_depth': 3}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  88%|████████▊ | 22/25 [01:54<00:07,  2.59s/it]

[I 2025-06-25 16:38:59,233] Trial 21 finished with value: 0.1965137758164625 and parameters: {'n_estimators': 992, 'learning_rate': 0.08915665880056885, 'max_depth': 3}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  92%|█████████▏| 23/25 [01:57<00:04,  2.47s/it]

[I 2025-06-25 16:39:01,436] Trial 22 finished with value: 0.2025482545298662 and parameters: {'n_estimators': 941, 'learning_rate': 0.11536652129866494, 'max_depth': 4}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783:  96%|█████████▌| 24/25 [02:00<00:02,  2.73s/it]

[I 2025-06-25 16:39:04,773] Trial 23 finished with value: 0.198388138823426 and parameters: {'n_estimators': 892, 'learning_rate': 0.044014594850777786, 'max_depth': 3}. Best is trial 11 with value: 0.1887826456714964.


Best trial: 11. Best value: 0.188783: 100%|██████████| 25/25 [02:01<00:00,  4.88s/it]
[I 2025-06-25 16:39:06,288] A new study created in memory with name: no-name-545ae309-071e-4852-a58b-f31c81c51e89


[I 2025-06-25 16:39:06,283] Trial 24 finished with value: 0.2033889675348963 and parameters: {'n_estimators': 792, 'learning_rate': 0.21370406661903554, 'max_depth': 4}. Best is trial 11 with value: 0.1887826456714964.
✓ XGBoost_Combined completado. Mejor LogLoss: 0.1888

--- Optimizando MLP_PyTorch_Combined (Nivel 1) ---


Best trial: 0. Best value: 0.351336:   3%|▎         | 1/30 [00:01<00:54,  1.88s/it]

[I 2025-06-25 16:39:08,157] Trial 0 finished with value: 0.35133615136146545 and parameters: {'n_layers': 2, 'n_units_l0': 245, 'n_units_l1': 196, 'dropout_rate': 0.3394633936788146, 'optimizer': 'Adam', 'lr': 1.493656855461762e-05, 'activation': 'ReLU'}. Best is trial 0 with value: 0.35133615136146545.


Best trial: 0. Best value: 0.351336:   7%|▋         | 2/30 [00:03<00:54,  1.93s/it]

[I 2025-06-25 16:39:10,140] Trial 1 finished with value: 0.38641178607940674 and parameters: {'n_layers': 3, 'n_units_l0': 36, 'n_units_l1': 250, 'n_units_l2': 219, 'dropout_rate': 0.18493564427131048, 'optimizer': 'RMSprop', 'lr': 8.17949947521167e-05, 'activation': 'ReLU'}. Best is trial 0 with value: 0.35133615136146545.


Best trial: 0. Best value: 0.351336:  10%|█         | 3/30 [00:05<00:48,  1.81s/it]

[I 2025-06-25 16:39:11,810] Trial 2 finished with value: 0.35773414373397827 and parameters: {'n_layers': 1, 'n_units_l0': 169, 'dropout_rate': 0.15579754426081674, 'optimizer': 'RMSprop', 'lr': 0.00023345864076016249, 'activation': 'ReLU'}. Best is trial 0 with value: 0.35133615136146545.


Best trial: 0. Best value: 0.351336:  13%|█▎        | 4/30 [00:07<00:47,  1.82s/it]

[I 2025-06-25 16:39:13,648] Trial 3 finished with value: 0.9014737606048584 and parameters: {'n_layers': 2, 'n_units_l0': 165, 'n_units_l1': 42, 'dropout_rate': 0.34301794076057535, 'optimizer': 'Adam', 'lr': 0.007025166339242158, 'activation': 'ReLU'}. Best is trial 0 with value: 0.35133615136146545.


Best trial: 4. Best value: 0.308533:  17%|█▋        | 5/30 [00:09<00:44,  1.77s/it]

[I 2025-06-25 16:39:15,336] Trial 4 finished with value: 0.3085332214832306 and parameters: {'n_layers': 1, 'n_units_l0': 53, 'dropout_rate': 0.3736932106048628, 'optimizer': 'Adam', 'lr': 0.0003058656666978527, 'activation': 'Tanh'}. Best is trial 4 with value: 0.3085332214832306.


Best trial: 5. Best value: 0.299605:  20%|██        | 6/30 [00:10<00:41,  1.74s/it]

[I 2025-06-25 16:39:16,993] Trial 5 finished with value: 0.2996053695678711 and parameters: {'n_layers': 1, 'n_units_l0': 181, 'dropout_rate': 0.2246844304357644, 'optimizer': 'RMSprop', 'lr': 3.585612610345396e-05, 'activation': 'ReLU'}. Best is trial 5 with value: 0.2996053695678711.


Best trial: 5. Best value: 0.299605:  23%|██▎       | 7/30 [00:12<00:42,  1.84s/it]

[I 2025-06-25 16:39:19,042] Trial 6 pruned. 


Best trial: 5. Best value: 0.299605:  27%|██▋       | 8/30 [00:14<00:42,  1.91s/it]

[I 2025-06-25 16:39:21,116] Trial 7 pruned. 


Best trial: 5. Best value: 0.299605:  30%|███       | 9/30 [00:16<00:38,  1.84s/it]

[I 2025-06-25 16:39:22,790] Trial 8 finished with value: 0.34474679827690125 and parameters: {'n_layers': 1, 'n_units_l0': 215, 'dropout_rate': 0.38274293753904687, 'optimizer': 'RMSprop', 'lr': 1.667761543019792e-05, 'activation': 'ReLU'}. Best is trial 5 with value: 0.2996053695678711.


Best trial: 5. Best value: 0.299605:  33%|███▎      | 10/30 [00:18<00:38,  1.93s/it]

[I 2025-06-25 16:39:24,922] Trial 9 pruned. 


Best trial: 5. Best value: 0.299605:  37%|███▋      | 11/30 [00:20<00:35,  1.88s/it]

[I 2025-06-25 16:39:26,703] Trial 10 pruned. 


Best trial: 5. Best value: 0.299605:  40%|████      | 12/30 [00:22<00:33,  1.86s/it]

[I 2025-06-25 16:39:28,518] Trial 11 pruned. 


Best trial: 5. Best value: 0.299605:  43%|████▎     | 13/30 [00:24<00:32,  1.90s/it]

[I 2025-06-25 16:39:30,496] Trial 12 pruned. 


Best trial: 5. Best value: 0.299605:  47%|████▋     | 14/30 [00:26<00:30,  1.89s/it]

[I 2025-06-25 16:39:32,366] Trial 13 pruned. 


Best trial: 5. Best value: 0.299605:  50%|█████     | 15/30 [00:27<00:28,  1.87s/it]

[I 2025-06-25 16:39:34,193] Trial 14 finished with value: 0.3318484425544739 and parameters: {'n_layers': 1, 'n_units_l0': 201, 'dropout_rate': 0.4505764863334542, 'optimizer': 'Adam', 'lr': 0.00023422328621630747, 'activation': 'Tanh'}. Best is trial 5 with value: 0.2996053695678711.


Best trial: 5. Best value: 0.299605:  53%|█████▎    | 16/30 [00:29<00:26,  1.90s/it]

[I 2025-06-25 16:39:36,166] Trial 15 finished with value: 0.3092837929725647 and parameters: {'n_layers': 2, 'n_units_l0': 136, 'n_units_l1': 46, 'dropout_rate': 0.29883415744434105, 'optimizer': 'RMSprop', 'lr': 3.416246922046104e-05, 'activation': 'Tanh'}. Best is trial 5 with value: 0.2996053695678711.


Best trial: 5. Best value: 0.299605:  57%|█████▋    | 17/30 [00:31<00:24,  1.86s/it]

[I 2025-06-25 16:39:37,921] Trial 16 finished with value: 0.3017366826534271 and parameters: {'n_layers': 1, 'n_units_l0': 76, 'dropout_rate': 0.24695155488800646, 'optimizer': 'RMSprop', 'lr': 0.00016006854792293955, 'activation': 'Tanh'}. Best is trial 5 with value: 0.2996053695678711.


Best trial: 5. Best value: 0.299605:  60%|██████    | 18/30 [00:33<00:22,  1.88s/it]

[I 2025-06-25 16:39:39,840] Trial 17 finished with value: 0.30398839712142944 and parameters: {'n_layers': 2, 'n_units_l0': 144, 'n_units_l1': 208, 'dropout_rate': 0.10263141366114917, 'optimizer': 'RMSprop', 'lr': 3.834304470038068e-05, 'activation': 'Tanh'}. Best is trial 5 with value: 0.2996053695678711.


Best trial: 18. Best value: 0.291567:  63%|██████▎   | 19/30 [00:35<00:20,  1.85s/it]

[I 2025-06-25 16:39:41,628] Trial 18 finished with value: 0.2915670871734619 and parameters: {'n_layers': 1, 'n_units_l0': 80, 'dropout_rate': 0.24339194901610642, 'optimizer': 'RMSprop', 'lr': 0.00015175432368088785, 'activation': 'Tanh'}. Best is trial 18 with value: 0.2915670871734619.


Best trial: 18. Best value: 0.291567:  67%|██████▋   | 20/30 [00:37<00:18,  1.89s/it]

[I 2025-06-25 16:39:43,613] Trial 19 finished with value: 0.29930850863456726 and parameters: {'n_layers': 2, 'n_units_l0': 196, 'n_units_l1': 127, 'dropout_rate': 0.20874256619525425, 'optimizer': 'RMSprop', 'lr': 2.883008884726061e-05, 'activation': 'ReLU'}. Best is trial 18 with value: 0.2915670871734619.


Best trial: 18. Best value: 0.291567:  70%|███████   | 21/30 [00:39<00:17,  1.92s/it]

[I 2025-06-25 16:39:45,594] Trial 20 pruned. 


Best trial: 18. Best value: 0.291567:  73%|███████▎  | 22/30 [00:41<00:15,  1.88s/it]

[I 2025-06-25 16:39:47,382] Trial 21 pruned. 


Best trial: 22. Best value: 0.291555:  77%|███████▋  | 23/30 [00:43<00:13,  1.90s/it]

[I 2025-06-25 16:39:49,344] Trial 22 finished with value: 0.2915545701980591 and parameters: {'n_layers': 2, 'n_units_l0': 229, 'n_units_l1': 140, 'dropout_rate': 0.26390501487745355, 'optimizer': 'RMSprop', 'lr': 3.7302960386200595e-05, 'activation': 'ReLU'}. Best is trial 22 with value: 0.2915545701980591.


Best trial: 22. Best value: 0.291555:  80%|████████  | 24/30 [00:45<00:11,  1.92s/it]

[I 2025-06-25 16:39:51,308] Trial 23 pruned. 


Best trial: 24. Best value: 0.2852:  83%|████████▎ | 25/30 [00:46<00:09,  1.93s/it]  

[I 2025-06-25 16:39:53,247] Trial 24 finished with value: 0.2851995527744293 and parameters: {'n_layers': 2, 'n_units_l0': 234, 'n_units_l1': 82, 'dropout_rate': 0.26166750169446507, 'optimizer': 'RMSprop', 'lr': 4.722872272988321e-05, 'activation': 'ReLU'}. Best is trial 24 with value: 0.2851995527744293.


Best trial: 24. Best value: 0.2852:  87%|████████▋ | 26/30 [00:49<00:08,  2.00s/it]

[I 2025-06-25 16:39:55,422] Trial 25 finished with value: 0.2942093312740326 and parameters: {'n_layers': 3, 'n_units_l0': 227, 'n_units_l1': 68, 'n_units_l2': 38, 'dropout_rate': 0.31588279963806015, 'optimizer': 'RMSprop', 'lr': 5.4483382921098096e-05, 'activation': 'ReLU'}. Best is trial 24 with value: 0.2851995527744293.


Best trial: 24. Best value: 0.2852:  90%|█████████ | 27/30 [00:51<00:05,  1.99s/it]

[I 2025-06-25 16:39:57,389] Trial 26 pruned. 


Best trial: 24. Best value: 0.2852:  93%|█████████▎| 28/30 [00:53<00:03,  2.00s/it]

[I 2025-06-25 16:39:59,386] Trial 27 finished with value: 0.29413074254989624 and parameters: {'n_layers': 2, 'n_units_l0': 86, 'n_units_l1': 160, 'dropout_rate': 0.30097273911499556, 'optimizer': 'RMSprop', 'lr': 5.062232169527993e-05, 'activation': 'ReLU'}. Best is trial 24 with value: 0.2851995527744293.


Best trial: 24. Best value: 0.2852:  97%|█████████▋| 29/30 [00:55<00:02,  2.07s/it]

[I 2025-06-25 16:40:01,621] Trial 28 pruned. 


Best trial: 24. Best value: 0.2852: 100%|██████████| 30/30 [00:57<00:00,  1.91s/it]

[I 2025-06-25 16:40:03,640] Trial 29 pruned. 
✓ MLP_PyTorch_Combined completado. Mejor LogLoss: 0.2852





In [9]:
main_classifier_models = {} 
print("--- Entrenando modelos finales con los mejores hiperparámetros ---\n")

# 1. XGBoost (Embeddings)
params = model_results['XGBoost_Embeddings']['best_params']
final_xgb_emb = xgb.XGBClassifier(objective='binary:logistic', eval_metric='logloss',
                                  device='cuda' if device.type == 'cuda' else 'cpu', **params)
final_xgb_emb.fit(X_train_emb, y_train)
main_classifier_models['XGBoost_Embeddings'] = final_xgb_emb
print("✓ Modelo XGBoost (Embeddings) final entrenado.")

# 2. XGBoost (Combined)
params = model_results['XGBoost_Combined']['best_params']
final_xgb_comb = xgb.XGBClassifier(objective='binary:logistic', eval_metric='logloss',
                                   device='cuda' if device.type == 'cuda' else 'cpu', **params)
final_xgb_comb.fit(X_train_combined, y_train)
main_classifier_models['XGBoost_Combined'] = final_xgb_comb
print("✓ Modelo XGBoost (Combined) final entrenado.")

# 3. Regresión Logística (Embeddings)
params = model_results['LogisticRegression_Embeddings']['best_params']
final_log_emb = LogisticRegression(solver='liblinear', random_state=42, max_iter=1000, **params)
final_log_emb.fit(X_train_emb_scaled, y_train)
main_classifier_models['LogisticRegression_Embeddings'] = final_log_emb
print("✓ Modelo Regresión Logística (Embeddings) final entrenado.")

# 4. Regresión Logística (TF-IDF)
params = model_results['LogisticRegression_TFIDF']['best_params']
final_log_tfidf = LogisticRegression(solver='liblinear', random_state=42, max_iter=1000, **params)
final_log_tfidf.fit(X_train_tfidf, y_train)
main_classifier_models['LogisticRegression_TFIDF'] = final_log_tfidf
print("✓ Modelo Regresión Logística (TF-IDF) final entrenado.")

# 5. MLP (Embeddings)
params = model_results['MLP_PyTorch_Embeddings']['best_params']
hidden_layers = [params[f'n_units_l{i}'] for i in range(params['n_layers'])]
final_mlp_emb = MLP(X_train_emb_scaled.shape[1], hidden_layers, num_classes, 
                    getattr(nn, params['activation']), params['dropout_rate']).to(device)
optimizer = getattr(optim, params['optimizer'])(final_mlp_emb.parameters(), lr=params['lr'])
train_loader = DataLoader(TensorDataset(torch.tensor(X_train_emb_scaled, dtype=torch.float32), torch.tensor(y_train, dtype=torch.long)), batch_size=128, shuffle=True)
for epoch in tqdm(range(30), desc="Epochs MLP (Embeddings) final"):
    final_mlp_emb.train()
    for data, target in train_loader: 
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        loss = nn.CrossEntropyLoss()(final_mlp_emb(data), target)
        loss.backward()
        optimizer.step()
main_classifier_models['MLP_PyTorch_Embeddings'] = final_mlp_emb.eval()
print("✓ Modelo MLP (Embeddings) final entrenado.")

# 6. MLP (Combined)
params = model_results['MLP_PyTorch_Combined']['best_params']
hidden_layers = [params[f'n_units_l{i}'] for i in range(params['n_layers'])]
final_mlp_comb = MLP(X_train_combined_dense.shape[1], hidden_layers, num_classes, 
                     getattr(nn, params['activation']), params['dropout_rate']).to(device)
optimizer = getattr(optim, params['optimizer'])(final_mlp_comb.parameters(), lr=params['lr'])
train_loader = DataLoader(TensorDataset(torch.tensor(X_train_combined_dense, dtype=torch.float32), torch.tensor(y_train, dtype=torch.long)), batch_size=128, shuffle=True)
for epoch in tqdm(range(30), desc="Epochs MLP (Combined) final"):
    final_mlp_comb.train()
    for data, target in train_loader: 
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        loss = nn.CrossEntropyLoss()(final_mlp_comb(data), target)
        loss.backward()
        optimizer.step()
main_classifier_models['MLP_PyTorch_Combined'] = final_mlp_comb.eval()
print("✓ Modelo MLP (Combined) final entrenado.")

print("\n✓ Todos los modelos finales han sido entrenados.")

--- Entrenando modelos finales con los mejores hiperparámetros ---

✓ Modelo XGBoost (Embeddings) final entrenado.
✓ Modelo XGBoost (Combined) final entrenado.
✓ Modelo Regresión Logística (Embeddings) final entrenado.
✓ Modelo Regresión Logística (TF-IDF) final entrenado.


Epochs MLP (Embeddings) final: 100%|██████████| 30/30 [00:01<00:00, 16.17it/s]


✓ Modelo MLP (Embeddings) final entrenado.


Epochs MLP (Combined) final: 100%|██████████| 30/30 [00:02<00:00, 11.42it/s]

✓ Modelo MLP (Combined) final entrenado.

✓ Todos los modelos finales han sido entrenados.





In [10]:
print("--- Calculando pesos para el Ensemble del Clasificador Principal (Nivel 1) ---")
val_probas = {}

# Obtener predicciones de cada modelo en el set de validación
val_probas['XGBoost_Embeddings'] = main_classifier_models['XGBoost_Embeddings'].predict_proba(X_val_emb)
val_probas['XGBoost_Combined'] = main_classifier_models['XGBoost_Combined'].predict_proba(X_val_combined)
val_probas['LogisticRegression_Embeddings'] = main_classifier_models['LogisticRegression_Embeddings'].predict_proba(X_val_emb_scaled)
val_probas['LogisticRegression_TFIDF'] = main_classifier_models['LogisticRegression_TFIDF'].predict_proba(X_val_tfidf)
with torch.no_grad():
    # MLP Embeddings
    mlp_outputs_emb = main_classifier_models['MLP_PyTorch_Embeddings'](X_val_torch_emb)
    val_probas['MLP_PyTorch_Embeddings'] = torch.softmax(mlp_outputs_emb, dim=1).cpu().numpy()
    # MLP Combined
    mlp_outputs_comb = main_classifier_models['MLP_PyTorch_Combined'](X_val_torch_combined)
    val_probas['MLP_PyTorch_Combined'] = torch.softmax(mlp_outputs_comb, dim=1).cpu().numpy()

# Calcular métricas y pesos del ensemble (mayor peso a menor log_loss)
losses = {name: log_loss(y_val, proba) for name, proba in val_probas.items()}
scores = {name: 1.0 / (loss + 1e-9) for name, loss in losses.items()}
total_score = sum(scores.values())
ensemble_weights = {name: score / total_score for name, score in scores.items()}

print("\n--- Pesos del Ensemble de Nivel 1 Calculados ---")
for name, w in sorted(ensemble_weights.items(), key=lambda item: item[1], reverse=True):
    print(f"{name:<30} | Peso: {w:.3f} | LogLoss (Val): {losses[name]:.4f}")

# Evaluar el rendimiento del ensemble en el set de validación
ensemble_proba_val = np.zeros_like(val_probas['XGBoost_Embeddings'])
for name, proba in val_probas.items():
    ensemble_proba_val += proba * ensemble_weights[name]

ensemble_log_loss_val = log_loss(y_val, ensemble_proba_val)
print(f"\nLogLoss del Ensemble L1 en Validación: {ensemble_log_loss_val:.4f}")

--- Calculando pesos para el Ensemble del Clasificador Principal (Nivel 1) ---

--- Pesos del Ensemble de Nivel 1 Calculados ---
LogisticRegression_TFIDF       | Peso: 0.239 | LogLoss (Val): 0.1814
XGBoost_Combined               | Peso: 0.215 | LogLoss (Val): 0.2015
MLP_PyTorch_Combined           | Peso: 0.151 | LogLoss (Val): 0.2871
MLP_PyTorch_Embeddings         | Peso: 0.140 | LogLoss (Val): 0.3088
LogisticRegression_Embeddings  | Peso: 0.137 | LogLoss (Val): 0.3174
XGBoost_Embeddings             | Peso: 0.118 | LogLoss (Val): 0.3662

LogLoss del Ensemble L1 en Validación: 0.2067


## 6. Entrenamiento del Clasificador de Sub-categorías (Nivel 2) con Optuna y Ensemble

Ahora, aplicamos la misma metodología robusta al clasificador de Nivel 2. Este se entrenará **únicamente con los datos de 'odio' balanceados sintéticamente**. Crearemos un ensemble de tres modelos (XGBoost, MLP, Regresión Logística) usando **características combinadas de embeddings y TF-IDF**.

In [11]:
print("--- Preparando datos y definiendo objetivos para el Clasificador de Sub-categorías (Nivel 2) ---")

if X_train_sub_emb.shape[0] > 0:
    # 1. Preparar características TF-IDF para datos de Nivel 2
    # Datos reales de 'odio'
    real_hate_texts_train = df_train_hate['text_stemmed']
    X_train_sub_tfidf_real = tfidf_vectorizer.transform(real_hate_texts_train)
    # Datos sintéticos (vector de ceros, ya que no tienen texto)
    num_synthetic = len(df_synthetic)
    X_train_sub_tfidf_synthetic = csr_matrix((num_synthetic, X_train_sub_tfidf_real.shape[1]), dtype=np.float64)
    # Combinar TF-IDF de datos reales y sintéticos
    X_train_sub_tfidf = vstack([X_train_sub_tfidf_real, X_train_sub_tfidf_synthetic])
    
    # 2. Combinar Embeddings y TF-IDF para Nivel 2
    X_train_sub_combined = hstack([X_train_sub_emb, X_train_sub_tfidf]).tocsr()
    num_sub_classes = len(np.unique(y_train_sub))
    print(f"Datos combinados para Nivel 2 listos. Shape: {X_train_sub_combined.shape}, {num_sub_classes} sub-clases detectadas.")

    # 3. Dividir los datos COMBINADOS para HPO
    X_sub_train_comb, X_sub_val_comb, y_sub_train, y_sub_val = train_test_split(
        X_train_sub_combined, y_train_sub, test_size=0.25, random_state=42, stratify=y_train_sub
    )

    # 4. Escalar la parte de embeddings de los datos combinados para MLP y LogReg
    scaler_L2_emb = StandardScaler()
    # Extraer y escalar la parte de embeddings
    X_sub_train_emb_part = X_sub_train_comb[:, :X_train_sub_emb.shape[1]].toarray()
    X_sub_train_emb_part_scaled = scaler_L2_emb.fit_transform(X_sub_train_emb_part)
    X_sub_val_emb_part = X_sub_val_comb[:, :X_train_sub_emb.shape[1]].toarray()
    X_sub_val_emb_part_scaled = scaler_L2_emb.transform(X_sub_val_emb_part)
    # Re-combinar con la parte de TF-IDF (que no se escala)
    X_sub_train_scaled_comb_dense = np.hstack([X_sub_train_emb_part_scaled, X_sub_train_comb[:, X_train_sub_emb.shape[1]:].toarray()])
    X_sub_val_scaled_comb_dense = np.hstack([X_sub_val_emb_part_scaled, X_sub_val_comb[:, X_train_sub_emb.shape[1]:].toarray()])

    # 5. Convertir a tensores de PyTorch
    X_sub_val_torch = torch.tensor(X_sub_val_scaled_comb_dense, dtype=torch.float32).to(device)
    y_sub_val_torch = torch.tensor(y_sub_val, dtype=torch.long).to(device)

    # --- Funciones Objetivo para Optuna (Nivel 2, con datos combinados) ---
    def objective_xgboost_L2(trial):
        params = {'objective': 'multi:softprob', 'num_class': num_sub_classes, 'eval_metric': 'mlogloss', 'device': 'cuda', 
                  'n_estimators': trial.suggest_int('n_estimators', 100, 800), 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2, log=True), 'max_depth': trial.suggest_int('max_depth', 3, 8)}
        model = xgb.XGBClassifier(**params, early_stopping_rounds=10)
        model.fit(X_sub_train_comb, y_sub_train, eval_set=[(X_sub_val_comb, y_sub_val)], verbose=False)
        return log_loss(y_sub_val, model.predict_proba(X_sub_val_comb))

    def objective_logistic_L2(trial):
        params = {'C': trial.suggest_float('C', 1e-3, 1e2, log=True), 'solver': 'liblinear', 'max_iter': 1000, 'multi_class': 'ovr'}
        model = LogisticRegression(**params, random_state=42)
        # Se entrena con los datos densos (escalados en parte)
        model.fit(X_sub_train_scaled_comb_dense, y_sub_train)
        return log_loss(y_sub_val, model.predict_proba(X_sub_val_scaled_comb_dense))
        
    def objective_mlp_L2(trial):
        n_layers = trial.suggest_int('n_layers', 1, 2)
        hidden_layers = [trial.suggest_int(f'n_units_l{i}', 32, 128) for i in range(n_layers)]
        lr = trial.suggest_float('lr', 1e-4, 1e-2, log=True)
        
        model = MLP(X_sub_train_scaled_comb_dense.shape[1], hidden_layers, num_sub_classes, nn.ReLU, 0.3).to(device)
        optimizer = optim.Adam(model.parameters(), lr=lr)
        criterion = nn.CrossEntropyLoss()
        train_dataset = TensorDataset(torch.tensor(X_sub_train_scaled_comb_dense, dtype=torch.float32), torch.tensor(y_sub_train, dtype=torch.long))
        train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

        for epoch in range(20):
            for data, target in train_loader:
                data, target = data.to(device), target.to(device)
                optimizer.zero_grad()
                loss = criterion(model(data), target)
                loss.backward()
                optimizer.step()

        model.eval()
        with torch.no_grad():
            val_loss = criterion(model(X_sub_val_torch), y_sub_val_torch).item()
        return val_loss
    print("Funciones objetivo para Nivel 2 definidas.")
else:
    print("No hay datos para preparar el Nivel 2.")

--- Preparando datos y definiendo objetivos para el Clasificador de Sub-categorías (Nivel 2) ---
Datos combinados para Nivel 2 listos. Shape: (2889, 10768), 9 sub-clases detectadas.
Funciones objetivo para Nivel 2 definidas.


In [12]:
if X_train_sub_emb.shape[0] > 0:
    # --- 1. Búsqueda de Hiperparámetros (HPO) para Nivel 2 ---
    models_config_L2 = {
        'XGBoost_L2': {'objective_func': objective_xgboost_L2, 'n_trials': 20},
        'MLP_PyTorch_L2': {'objective_func': objective_mlp_L2, 'n_trials': 25},
        'LogisticRegression_L2': {'objective_func': objective_logistic_L2, 'n_trials': 15}
    }
    model_results_L2 = {}
    for model_name, config in models_config_L2.items():
        print(f"\n--- Optimizando {model_name} (Nivel 2) ---")
        study = optuna.create_study(direction='minimize', sampler=optuna.samplers.TPESampler(seed=42))
        study.optimize(config['objective_func'], n_trials=config['n_trials'], show_progress_bar=True)
        model_results_L2[model_name] = {'best_params': study.best_params}

    # --- 2. Entrenamiento de los modelos finales del ensemble de Nivel 2 ---
    print("\n--- Entrenando modelos finales del Ensemble (Nivel 2) ---")
    sub_classifier_models = {}
    
    # Preparar datos completos de entrenamiento L2 (escalados y densos para MLP/LogReg)
    X_train_sub_emb_part_full = X_train_sub_combined[:, :X_train_sub_emb.shape[1]].toarray()
    X_train_sub_emb_part_full_scaled = scaler_L2_emb.transform(X_train_sub_emb_part_full)
    X_train_sub_full_scaled_dense = np.hstack([X_train_sub_emb_part_full_scaled, X_train_sub_combined[:, X_train_sub_emb.shape[1]:].toarray()])
    
    # XGBoost L2
    params = model_results_L2['XGBoost_L2']['best_params']
    final_xgb_L2 = xgb.XGBClassifier(objective='multi:softprob', num_class=num_sub_classes, eval_metric='mlogloss',
                                     device='cuda' if device.type == 'cuda' else 'cpu', **params)
    final_xgb_L2.fit(X_train_sub_combined, y_train_sub) # Entrenar con sparse
    sub_classifier_models['XGBoost_L2'] = final_xgb_L2
    
    # Logistic Regression L2
    params = model_results_L2['LogisticRegression_L2']['best_params']
    final_log_L2 = LogisticRegression(solver='liblinear', random_state=42, max_iter=1000, **params)
    final_log_L2.fit(X_train_sub_full_scaled_dense, y_train_sub) # Entrenar con denso escalado
    sub_classifier_models['LogisticRegression_L2'] = final_log_L2
    
    # MLP L2
    params = model_results_L2['MLP_PyTorch_L2']['best_params']
    hidden_layers = [params[f'n_units_l{i}'] for i in range(params['n_layers'])]
    final_mlp_L2 = MLP(X_train_sub_full_scaled_dense.shape[1], hidden_layers, num_sub_classes, nn.ReLU, 0.3).to(device)
    optimizer = optim.Adam(final_mlp_L2.parameters(), lr=params['lr'])
    train_dataset_L2 = TensorDataset(torch.tensor(X_train_sub_full_scaled_dense, dtype=torch.float32), torch.tensor(y_train_sub, dtype=torch.long))
    train_loader_L2 = DataLoader(train_dataset_L2, batch_size=64, shuffle=True)
    for epoch in tqdm(range(30), desc="Epochs MLP L2 final"):
        for data, target in train_loader_L2:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            loss = nn.CrossEntropyLoss()(final_mlp_L2(data), target)
            loss.backward()
            optimizer.step()
    sub_classifier_models['MLP_PyTorch_L2'] = final_mlp_L2.eval()
    print("✓ Todos los modelos del ensemble de Nivel 2 han sido entrenados.")

    # --- 3. Cálculo de pesos para el ensemble de Nivel 2 ---
    print("\n--- Calculando pesos para el Ensemble de Nivel 2 ---")
    val_probas_L2 = {}
    val_probas_L2['XGBoost_L2'] = sub_classifier_models['XGBoost_L2'].predict_proba(X_sub_val_comb)
    val_probas_L2['LogisticRegression_L2'] = sub_classifier_models['LogisticRegression_L2'].predict_proba(X_sub_val_scaled_comb_dense)
    with torch.no_grad():
        mlp_outputs = sub_classifier_models['MLP_PyTorch_L2'](X_sub_val_torch)
        val_probas_L2['MLP_PyTorch_L2'] = torch.softmax(mlp_outputs, dim=1).cpu().numpy()

    losses_L2 = {name: log_loss(y_sub_val, proba, labels=np.unique(y_train_sub)) for name, proba in val_probas_L2.items()}
    scores_L2 = {name: 1.0 / (loss + 1e-9) for name, loss in losses_L2.items()}
    total_score_L2 = sum(scores_L2.values())
    ensemble_weights_L2 = {name: score / total_score_L2 for name, score in scores_L2.items()}

    print("\n--- Pesos del Ensemble de Nivel 2 Calculados ---")
    for name, w in sorted(ensemble_weights_L2.items(), key=lambda item: item[1], reverse=True):
        print(f"{name:<25} | Peso: {w:.3f} | LogLoss (Val): {losses_L2[name]:.4f}")
else:
    print("No hay datos para entrenar el clasificador de Nivel 2.")
    sub_classifier_models = None
    ensemble_weights_L2 = None

[I 2025-06-25 16:40:20,193] A new study created in memory with name: no-name-1f3db140-28de-4d51-9a39-c56c76bf70dc



--- Optimizando XGBoost_L2 (Nivel 2) ---


Best trial: 0. Best value: 1.7529:   5%|▌         | 1/20 [00:11<03:35, 11.32s/it]

[I 2025-06-25 16:40:31,509] Trial 0 finished with value: 1.7529011310597284 and parameters: {'n_estimators': 362, 'learning_rate': 0.17254716573280354, 'max_depth': 7}. Best is trial 0 with value: 1.7529011310597284.


Best trial: 1. Best value: 1.72887:  10%|█         | 2/20 [00:36<05:50, 19.46s/it]

[I 2025-06-25 16:40:56,662] Trial 1 finished with value: 1.7288703563114218 and parameters: {'n_estimators': 519, 'learning_rate': 0.015958237752949748, 'max_depth': 3}. Best is trial 1 with value: 1.7288703563114218.


Best trial: 1. Best value: 1.72887:  15%|█▌        | 3/20 [00:48<04:29, 15.86s/it]

[I 2025-06-25 16:41:08,249] Trial 2 finished with value: 1.741666548840014 and parameters: {'n_estimators': 140, 'learning_rate': 0.13394334706750485, 'max_depth': 6}. Best is trial 1 with value: 1.7288703563114218.


Best trial: 3. Best value: 1.72381:  20%|██        | 4/20 [04:48<27:55, 104.71s/it]

[I 2025-06-25 16:45:09,174] Trial 3 finished with value: 1.7238118693301696 and parameters: {'n_estimators': 596, 'learning_rate': 0.010636066512540286, 'max_depth': 8}. Best is trial 3 with value: 1.7238118693301696.


Best trial: 3. Best value: 1.72381:  25%|██▌       | 5/20 [05:25<19:59, 79.94s/it] 

[I 2025-06-25 16:45:45,198] Trial 4 finished with value: 1.7311920040959285 and parameters: {'n_estimators': 683, 'learning_rate': 0.018891200276189388, 'max_depth': 4}. Best is trial 3 with value: 1.7238118693301696.


Best trial: 3. Best value: 1.72381:  30%|███       | 6/20 [06:13<16:09, 69.26s/it]

[I 2025-06-25 16:46:33,729] Trial 5 finished with value: 1.7429196955231827 and parameters: {'n_estimators': 228, 'learning_rate': 0.024878734419814436, 'max_depth': 6}. Best is trial 3 with value: 1.7238118693301696.


Best trial: 3. Best value: 1.72381:  35%|███▌      | 7/20 [07:09<14:03, 64.90s/it]

[I 2025-06-25 16:47:29,655] Trial 6 finished with value: 1.7356701624974293 and parameters: {'n_estimators': 402, 'learning_rate': 0.023927528765580644, 'max_depth': 6}. Best is trial 3 with value: 1.7238118693301696.


Best trial: 3. Best value: 1.72381:  40%|████      | 8/20 [07:39<10:46, 53.89s/it]

[I 2025-06-25 16:47:59,965] Trial 7 finished with value: 1.7407388459102966 and parameters: {'n_estimators': 197, 'learning_rate': 0.023993242906812727, 'max_depth': 5}. Best is trial 3 with value: 1.7238118693301696.


Best trial: 3. Best value: 1.72381:  45%|████▌     | 9/20 [07:46<07:09, 39.08s/it]

[I 2025-06-25 16:48:06,472] Trial 8 finished with value: 1.73772514869139 and parameters: {'n_estimators': 419, 'learning_rate': 0.10508421338691762, 'max_depth': 4}. Best is trial 3 with value: 1.7238118693301696.


Best trial: 9. Best value: 1.70681:  50%|█████     | 10/20 [07:58<05:06, 30.66s/it]

[I 2025-06-25 16:48:18,280] Trial 9 finished with value: 1.7068101117162489 and parameters: {'n_estimators': 460, 'learning_rate': 0.05898602410432694, 'max_depth': 3}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  55%|█████▌    | 11/20 [08:07<03:37, 24.21s/it]

[I 2025-06-25 16:48:27,873] Trial 10 finished with value: 1.7081009835920338 and parameters: {'n_estimators': 762, 'learning_rate': 0.07008140236396194, 'max_depth': 3}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  60%|██████    | 12/20 [08:16<02:36, 19.53s/it]

[I 2025-06-25 16:48:36,708] Trial 11 finished with value: 1.717354075626965 and parameters: {'n_estimators': 753, 'learning_rate': 0.06492371490532975, 'max_depth': 3}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  65%|██████▌   | 13/20 [08:28<02:01, 17.35s/it]

[I 2025-06-25 16:48:49,033] Trial 12 finished with value: 1.729864773185282 and parameters: {'n_estimators': 785, 'learning_rate': 0.05334507682418605, 'max_depth': 4}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  70%|███████   | 14/20 [08:37<01:28, 14.75s/it]

[I 2025-06-25 16:48:57,779] Trial 13 finished with value: 1.718281725697804 and parameters: {'n_estimators': 599, 'learning_rate': 0.07671546368325825, 'max_depth': 3}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  75%|███████▌  | 15/20 [09:00<01:25, 17.13s/it]

[I 2025-06-25 16:49:20,434] Trial 14 finished with value: 1.739563930730314 and parameters: {'n_estimators': 300, 'learning_rate': 0.039466832647880305, 'max_depth': 5}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  80%|████████  | 16/20 [09:16<01:08, 17.00s/it]

[I 2025-06-25 16:49:37,129] Trial 15 finished with value: 1.714104551458531 and parameters: {'n_estimators': 528, 'learning_rate': 0.03594892609049163, 'max_depth': 3}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  85%|████████▌ | 17/20 [09:24<00:42, 14.22s/it]

[I 2025-06-25 16:49:44,871] Trial 16 finished with value: 1.7402938719470036 and parameters: {'n_estimators': 676, 'learning_rate': 0.08780373974611752, 'max_depth': 4}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  90%|█████████ | 18/20 [09:45<00:32, 16.19s/it]

[I 2025-06-25 16:50:05,652] Trial 17 finished with value: 1.7334118948448136 and parameters: {'n_estimators': 510, 'learning_rate': 0.04987876350633008, 'max_depth': 5}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681:  95%|█████████▌| 19/20 [09:49<00:12, 12.59s/it]

[I 2025-06-25 16:50:09,849] Trial 18 finished with value: 1.7262249141268433 and parameters: {'n_estimators': 338, 'learning_rate': 0.1174270882879059, 'max_depth': 3}. Best is trial 9 with value: 1.7068101117162489.


Best trial: 9. Best value: 1.70681: 100%|██████████| 20/20 [09:54<00:00, 29.73s/it]
[I 2025-06-25 16:50:14,863] A new study created in memory with name: no-name-874334bf-5a1b-4c04-ae80-23ed3083513a


[I 2025-06-25 16:50:14,857] Trial 19 finished with value: 1.7441533813542445 and parameters: {'n_estimators': 684, 'learning_rate': 0.1814111113522079, 'max_depth': 4}. Best is trial 9 with value: 1.7068101117162489.

--- Optimizando MLP_PyTorch_L2 (Nivel 2) ---


Best trial: 0. Best value: 3.69783:   4%|▍         | 1/25 [00:01<00:30,  1.26s/it]

[I 2025-06-25 16:50:16,118] Trial 0 finished with value: 3.697831869125366 and parameters: {'n_layers': 1, 'n_units_l0': 124, 'lr': 0.0029106359131330704}. Best is trial 0 with value: 3.697831869125366.


Best trial: 1. Best value: 1.71773:   8%|▊         | 2/25 [00:02<00:30,  1.33s/it]

[I 2025-06-25 16:50:17,506] Trial 1 finished with value: 1.717734456062317 and parameters: {'n_layers': 2, 'n_units_l0': 47, 'n_units_l1': 47, 'lr': 0.00013066739238053285}. Best is trial 1 with value: 1.717734456062317.


Best trial: 2. Best value: 1.6643:  12%|█▏        | 3/25 [00:04<00:29,  1.35s/it] 

[I 2025-06-25 16:50:18,877] Trial 2 finished with value: 1.6643002033233643 and parameters: {'n_layers': 2, 'n_units_l0': 90, 'n_units_l1': 100, 'lr': 0.00010994335574766199}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  16%|█▌        | 4/25 [00:05<00:28,  1.36s/it]

[I 2025-06-25 16:50:20,242] Trial 3 finished with value: 1.7571276426315308 and parameters: {'n_layers': 2, 'n_units_l0': 112, 'n_units_l1': 52, 'lr': 0.0002310201887845295}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  20%|██        | 5/25 [00:06<00:25,  1.30s/it]

[I 2025-06-25 16:50:21,435] Trial 4 finished with value: 2.418408155441284 and parameters: {'n_layers': 1, 'n_units_l0': 61, 'lr': 0.0011207606211860567}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  24%|██▍       | 6/25 [00:07<00:24,  1.27s/it]

[I 2025-06-25 16:50:22,649] Trial 5 finished with value: 2.695966958999634 and parameters: {'n_layers': 1, 'n_units_l0': 60, 'lr': 0.0016738085788752138}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  28%|██▊       | 7/25 [00:09<00:22,  1.26s/it]

[I 2025-06-25 16:50:23,899] Trial 6 finished with value: 1.891309142112732 and parameters: {'n_layers': 1, 'n_units_l0': 60, 'lr': 0.0005404103854647331}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  32%|███▏      | 8/25 [00:10<00:21,  1.24s/it]

[I 2025-06-25 16:50:25,105] Trial 7 finished with value: 1.688671588897705 and parameters: {'n_layers': 1, 'n_units_l0': 108, 'lr': 0.00025081156860452336}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  36%|███▌      | 9/25 [00:11<00:20,  1.28s/it]

[I 2025-06-25 16:50:26,444] Trial 8 finished with value: 3.2442564964294434 and parameters: {'n_layers': 2, 'n_units_l0': 89, 'n_units_l1': 36, 'lr': 0.0016409286730647919}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  40%|████      | 10/25 [00:12<00:18,  1.25s/it]

[I 2025-06-25 16:50:27,652] Trial 9 finished with value: 3.9542670249938965 and parameters: {'n_layers': 1, 'n_units_l0': 38, 'lr': 0.007902619549708232}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 2. Best value: 1.6643:  44%|████▍     | 11/25 [00:14<00:17,  1.28s/it]

[I 2025-06-25 16:50:29,000] Trial 10 finished with value: 1.6659131050109863 and parameters: {'n_layers': 2, 'n_units_l0': 86, 'n_units_l1': 120, 'lr': 0.00010862348973937149}. Best is trial 2 with value: 1.6643002033233643.


Best trial: 11. Best value: 1.66148:  48%|████▊     | 12/25 [00:15<00:17,  1.32s/it]

[I 2025-06-25 16:50:30,398] Trial 11 finished with value: 1.661484718322754 and parameters: {'n_layers': 2, 'n_units_l0': 87, 'n_units_l1': 124, 'lr': 0.0001051379852350776}. Best is trial 11 with value: 1.661484718322754.


Best trial: 11. Best value: 1.66148:  52%|█████▏    | 13/25 [00:16<00:15,  1.32s/it]

[I 2025-06-25 16:50:31,732] Trial 12 finished with value: 2.406712055206299 and parameters: {'n_layers': 2, 'n_units_l0': 94, 'n_units_l1': 117, 'lr': 0.0004492445130341701}. Best is trial 11 with value: 1.661484718322754.


Best trial: 11. Best value: 1.66148:  56%|█████▌    | 14/25 [00:18<00:14,  1.34s/it]

[I 2025-06-25 16:50:33,108] Trial 13 finished with value: 1.7459715604782104 and parameters: {'n_layers': 2, 'n_units_l0': 74, 'n_units_l1': 92, 'lr': 0.00023587584846791236}. Best is trial 11 with value: 1.661484718322754.


Best trial: 14. Best value: 1.66096:  60%|██████    | 15/25 [00:19<00:13,  1.35s/it]

[I 2025-06-25 16:50:34,469] Trial 14 finished with value: 1.6609570980072021 and parameters: {'n_layers': 2, 'n_units_l0': 101, 'n_units_l1': 98, 'lr': 0.00010165624431986141}. Best is trial 14 with value: 1.6609570980072021.


Best trial: 14. Best value: 1.66096:  64%|██████▍   | 16/25 [00:20<00:12,  1.36s/it]

[I 2025-06-25 16:50:35,856] Trial 15 finished with value: 2.5887627601623535 and parameters: {'n_layers': 2, 'n_units_l0': 108, 'n_units_l1': 128, 'lr': 0.0005029515102335909}. Best is trial 14 with value: 1.6609570980072021.


Best trial: 14. Best value: 1.66096:  68%|██████▊   | 17/25 [00:22<00:10,  1.36s/it]

[I 2025-06-25 16:50:37,225] Trial 16 finished with value: 1.7284175157546997 and parameters: {'n_layers': 2, 'n_units_l0': 73, 'n_units_l1': 70, 'lr': 0.00019903453904446326}. Best is trial 14 with value: 1.6609570980072021.


Best trial: 14. Best value: 1.66096:  72%|███████▏  | 18/25 [00:23<00:09,  1.36s/it]

[I 2025-06-25 16:50:38,590] Trial 17 finished with value: 2.163639545440674 and parameters: {'n_layers': 2, 'n_units_l0': 98, 'n_units_l1': 102, 'lr': 0.00036306296540862033}. Best is trial 14 with value: 1.6609570980072021.


Best trial: 14. Best value: 1.66096:  76%|███████▌  | 19/25 [00:25<00:08,  1.36s/it]

[I 2025-06-25 16:50:39,941] Trial 18 finished with value: 2.914799690246582 and parameters: {'n_layers': 2, 'n_units_l0': 127, 'n_units_l1': 80, 'lr': 0.0008371623317801159}. Best is trial 14 with value: 1.6609570980072021.


Best trial: 14. Best value: 1.66096:  80%|████████  | 20/25 [00:26<00:06,  1.36s/it]

[I 2025-06-25 16:50:41,306] Trial 19 finished with value: 2.3352584838867188 and parameters: {'n_layers': 2, 'n_units_l0': 77, 'n_units_l1': 109, 'lr': 0.009818247569037463}. Best is trial 14 with value: 1.6609570980072021.


Best trial: 20. Best value: 1.65977:  84%|████████▍ | 21/25 [00:27<00:05,  1.36s/it]

[I 2025-06-25 16:50:42,649] Trial 20 finished with value: 1.6597704887390137 and parameters: {'n_layers': 2, 'n_units_l0': 101, 'n_units_l1': 86, 'lr': 0.0001509391812068579}. Best is trial 20 with value: 1.6597704887390137.


Best trial: 20. Best value: 1.65977:  88%|████████▊ | 22/25 [00:29<00:04,  1.36s/it]

[I 2025-06-25 16:50:44,016] Trial 21 finished with value: 1.6763966083526611 and parameters: {'n_layers': 2, 'n_units_l0': 102, 'n_units_l1': 87, 'lr': 0.00015494615433005285}. Best is trial 20 with value: 1.6597704887390137.


Best trial: 20. Best value: 1.65977:  92%|█████████▏| 23/25 [00:30<00:02,  1.36s/it]

[I 2025-06-25 16:50:45,373] Trial 22 finished with value: 1.692805528640747 and parameters: {'n_layers': 2, 'n_units_l0': 119, 'n_units_l1': 74, 'lr': 0.000159324044920702}. Best is trial 20 with value: 1.6597704887390137.


Best trial: 20. Best value: 1.65977:  96%|█████████▌| 24/25 [00:31<00:01,  1.36s/it]

[I 2025-06-25 16:50:46,716] Trial 23 finished with value: 1.6768012046813965 and parameters: {'n_layers': 2, 'n_units_l0': 82, 'n_units_l1': 63, 'lr': 0.00010178729223738477}. Best is trial 20 with value: 1.6597704887390137.


Best trial: 20. Best value: 1.65977: 100%|██████████| 25/25 [00:33<00:00,  1.33s/it]
[I 2025-06-25 16:50:48,090] A new study created in memory with name: no-name-1d4eabce-f9a1-428b-adf2-1ee6c1b92054


[I 2025-06-25 16:50:48,090] Trial 24 finished with value: 1.9035534858703613 and parameters: {'n_layers': 2, 'n_units_l0': 101, 'n_units_l1': 93, 'lr': 0.0002662322888315539}. Best is trial 20 with value: 1.6597704887390137.

--- Optimizando LogisticRegression_L2 (Nivel 2) ---


Best trial: 0. Best value: 2.14975:   7%|▋         | 1/15 [00:03<00:53,  3.81s/it]

[I 2025-06-25 16:50:51,897] Trial 0 finished with value: 2.1497471810042383 and parameters: {'C': 0.0745934328572655}. Best is trial 0 with value: 2.1497471810042383.


Best trial: 0. Best value: 2.14975:  13%|█▎        | 2/15 [00:18<02:12, 10.18s/it]

[I 2025-06-25 16:51:06,539] Trial 1 finished with value: 9.850396174268472 and parameters: {'C': 56.69849511478853}. Best is trial 0 with value: 2.1497471810042383.


Best trial: 0. Best value: 2.14975:  20%|██        | 3/15 [00:29<02:06, 10.53s/it]

[I 2025-06-25 16:51:17,487] Trial 2 finished with value: 6.485183359388385 and parameters: {'C': 4.5705630998014515}. Best is trial 0 with value: 2.1497471810042383.


Best trial: 0. Best value: 2.14975:  27%|██▋       | 4/15 [00:37<01:45,  9.61s/it]

[I 2025-06-25 16:51:25,680] Trial 3 finished with value: 4.350783186381087 and parameters: {'C': 0.9846738873614566}. Best is trial 0 with value: 2.1497471810042383.


Best trial: 4. Best value: 1.7872:  33%|███▎      | 5/15 [00:39<01:07,  6.78s/it] 

[I 2025-06-25 16:51:27,452] Trial 4 finished with value: 1.7871995271604306 and parameters: {'C': 0.006026889128682512}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 4. Best value: 1.7872:  40%|████      | 6/15 [00:41<00:45,  5.09s/it]

[I 2025-06-25 16:51:29,273] Trial 5 finished with value: 1.7872169699572147 and parameters: {'C': 0.0060252157362038605}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 4. Best value: 1.7872:  47%|████▋     | 7/15 [00:42<00:30,  3.85s/it]

[I 2025-06-25 16:51:30,547] Trial 6 finished with value: 1.8929499773720395 and parameters: {'C': 0.0019517224641449498}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 4. Best value: 1.7872:  53%|█████▎    | 8/15 [00:54<00:45,  6.51s/it]

[I 2025-06-25 16:51:42,770] Trial 7 finished with value: 8.61949598600459 and parameters: {'C': 21.42302175774105}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 4. Best value: 1.7872:  60%|██████    | 9/15 [01:03<00:42,  7.15s/it]

[I 2025-06-25 16:51:51,323] Trial 8 finished with value: 4.386545218548194 and parameters: {'C': 1.0129197956845732}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 4. Best value: 1.7872:  67%|██████▋   | 10/15 [01:13<00:40,  8.04s/it]

[I 2025-06-25 16:52:01,346] Trial 9 finished with value: 6.0953037369186776 and parameters: {'C': 3.4702669886504163}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 4. Best value: 1.7872:  73%|███████▎  | 11/15 [01:16<00:26,  6.51s/it]

[I 2025-06-25 16:52:04,379] Trial 10 finished with value: 1.9038456708984546 and parameters: {'C': 0.03603517820107174}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 4. Best value: 1.7872:  80%|████████  | 12/15 [01:17<00:14,  4.86s/it]

[I 2025-06-25 16:52:05,461] Trial 11 finished with value: 1.9606937485076676 and parameters: {'C': 0.0010359916440554257}. Best is trial 4 with value: 1.7871995271604306.


Best trial: 12. Best value: 1.76849:  87%|████████▋ | 13/15 [01:19<00:08,  4.03s/it]

[I 2025-06-25 16:52:07,582] Trial 12 finished with value: 1.7684858172974263 and parameters: {'C': 0.010735378215036786}. Best is trial 12 with value: 1.7684858172974263.


Best trial: 12. Best value: 1.76849:  93%|█████████▎| 14/15 [01:21<00:03,  3.53s/it]

[I 2025-06-25 16:52:09,948] Trial 13 finished with value: 1.787232151726306 and parameters: {'C': 0.01715076259923028}. Best is trial 12 with value: 1.7684858172974263.


Best trial: 12. Best value: 1.76849: 100%|██████████| 15/15 [01:26<00:00,  5.74s/it]


[I 2025-06-25 16:52:14,128] Trial 14 finished with value: 2.3825237153468874 and parameters: {'C': 0.11779637328014542}. Best is trial 12 with value: 1.7684858172974263.

--- Entrenando modelos finales del Ensemble (Nivel 2) ---


Epochs MLP L2 final: 100%|██████████| 30/30 [00:03<00:00,  9.32it/s]


✓ Todos los modelos del ensemble de Nivel 2 han sido entrenados.

--- Calculando pesos para el Ensemble de Nivel 2 ---

--- Pesos del Ensemble de Nivel 2 Calculados ---
XGBoost_L2                | Peso: 0.546 | LogLoss (Val): 0.2253
MLP_PyTorch_L2            | Peso: 0.347 | LogLoss (Val): 0.3543
LogisticRegression_L2     | Peso: 0.107 | LogLoss (Val): 1.1489


## 7. Evaluación Final del Pipeline Jerárquico Robusto

Evaluamos el pipeline completo. Primero, usamos el **ensemble ponderado de Nivel 1** para la predicción de "Odio vs. No-Odio". Luego, para las predicciones de "odio", usamos el **ensemble ponderado de Nivel 2** para predecir la sub-categoría, ambos usando las configuraciones de características correspondientes.

In [13]:
print("--- Evaluación del pipeline jerárquico en el conjunto de prueba ---")

# 1. Preparar todas las características de prueba
X_test_emb_eval = df_test[embedding_cols].values
X_test_emb_scaled_eval = scaler_L1_emb.transform(X_test_emb_eval)
X_test_text_eval = df_test['text_stemmed'].values
X_test_tfidf_eval = tfidf_vectorizer.transform(X_test_text_eval)
X_test_combined_eval = hstack([X_test_emb_eval, X_test_tfidf_eval]).tocsr()
X_test_combined_dense_eval = np.hstack([X_test_emb_scaled_eval, X_test_tfidf_eval.toarray()])
X_test_torch_emb_eval = torch.tensor(X_test_emb_scaled_eval, dtype=torch.float32).to(device)
X_test_torch_combined_eval = torch.tensor(X_test_combined_dense_eval, dtype=torch.float32).to(device)
y_main_true = df_test['main_label'].values

# 2. Obtener predicciones del ENSEMBLE de Nivel 1
test_probas_L1 = {}
test_probas_L1['XGBoost_Embeddings'] = main_classifier_models['XGBoost_Embeddings'].predict_proba(X_test_emb_eval)
test_probas_L1['XGBoost_Combined'] = main_classifier_models['XGBoost_Combined'].predict_proba(X_test_combined_eval)
test_probas_L1['LogisticRegression_Embeddings'] = main_classifier_models['LogisticRegression_Embeddings'].predict_proba(X_test_emb_scaled_eval)
test_probas_L1['LogisticRegression_TFIDF'] = main_classifier_models['LogisticRegression_TFIDF'].predict_proba(X_test_tfidf_eval)
with torch.no_grad():
    test_probas_L1['MLP_PyTorch_Embeddings'] = torch.softmax(main_classifier_models['MLP_PyTorch_Embeddings'](X_test_torch_emb_eval), dim=1).cpu().numpy()
    test_probas_L1['MLP_PyTorch_Combined'] = torch.softmax(main_classifier_models['MLP_PyTorch_Combined'](X_test_torch_combined_eval), dim=1).cpu().numpy()

final_ensemble_proba_L1 = np.zeros_like(test_probas_L1['XGBoost_Embeddings'])
for name, proba in test_probas_L1.items():
    final_ensemble_proba_L1 += proba * ensemble_weights[name]
y_main_pred = np.argmax(final_ensemble_proba_L1, axis=1)

# 3. Evaluar Nivel 1
print("\n--- [Nivel 1] Rendimiento del Ensemble Principal (Prueba) ---")
print(classification_report(y_main_true, y_main_pred, target_names=['not-hate', 'hate']))

# 4. Obtener y evaluar predicciones del ENSEMBLE de Nivel 2
if sub_classifier_models is not None:
    df_test_true_hate = df_test[df_test['main_label'] == 1].copy()
    if not df_test_true_hate.empty:
        y_sub_true = sub_hate_only_encoder.transform(df_test_true_hate['sub_label_str'])
        
        # Preparar datos combinados para L2 en el conjunto de prueba
        X_test_hate_emb = df_test_true_hate[embedding_cols].values
        X_test_hate_tfidf = tfidf_vectorizer.transform(df_test_true_hate['text_stemmed'])
        X_test_hate_combined = hstack([X_test_hate_emb, X_test_hate_tfidf]).tocsr()
        
        # Preparar versión densa y escalada para MLP/LogReg
        X_test_hate_emb_scaled = scaler_L2_emb.transform(X_test_hate_emb)
        X_test_hate_combined_dense = np.hstack([X_test_hate_emb_scaled, X_test_hate_tfidf.toarray()])
        X_test_hate_torch = torch.tensor(X_test_hate_combined_dense, dtype=torch.float32).to(device)
        
        # Obtener y combinar probabilidades L2
        true_hate_probas_L2 = {}
        true_hate_probas_L2['XGBoost_L2'] = sub_classifier_models['XGBoost_L2'].predict_proba(X_test_hate_combined)
        true_hate_probas_L2['LogisticRegression_L2'] = sub_classifier_models['LogisticRegression_L2'].predict_proba(X_test_hate_combined_dense)
        with torch.no_grad():
            mlp_outputs_L2 = sub_classifier_models['MLP_PyTorch_L2'](X_test_hate_torch)
            true_hate_probas_L2['MLP_PyTorch_L2'] = torch.softmax(mlp_outputs_L2, dim=1).cpu().numpy()
        
        final_true_hate_proba_L2 = np.zeros_like(true_hate_probas_L2['XGBoost_L2'])
        for name, proba in true_hate_probas_L2.items():
            final_true_hate_proba_L2 += proba * ensemble_weights_L2[name]
        y_sub_pred_for_eval = np.argmax(final_true_hate_proba_L2, axis=1)
        
        print("\n--- [Nivel 2] Rendimiento del Ensemble de Sub-categorías (Prueba) ---")
        print(classification_report(y_sub_true, y_sub_pred_for_eval, target_names=sub_hate_only_encoder.classes_, zero_division=0))
else:
    print("\nEl clasificador de sub-categorías no fue entrenado.")

--- Evaluación del pipeline jerárquico en el conjunto de prueba ---

--- [Nivel 1] Rendimiento del Ensemble Principal (Prueba) ---
              precision    recall  f1-score   support

    not-hate       0.91      0.98      0.95       833
        hate       0.94      0.74      0.83       303

    accuracy                           0.92      1136
   macro avg       0.93      0.86      0.89      1136
weighted avg       0.92      0.92      0.91      1136


--- [Nivel 2] Rendimiento del Ensemble de Sub-categorías (Prueba) ---
                     precision    recall  f1-score   support

           Behavior       0.00      0.00      0.00         8
              Class       0.67      0.40      0.50        10
         Disability       1.00      0.08      0.15        12
          Ethnicity       0.00      0.00      0.00         9
             Gender       0.50      0.48      0.49        62
Physical Appearance       0.80      0.22      0.35        18
               Race       0.58      0.86   

## 8. Guardado de Artefactos

Guardamos todos los componentes del pipeline jerárquico: los modelos de ambos ensembles, sus respectivos pesos, transformadores y codificadores.

In [14]:
print(f"--- Guardando artefactos en {MODEL_OUTPUT_DIR} ---")

# 1. Guardar modelos y pesos del ensemble de Nivel 1
with open(os.path.join(MODEL_OUTPUT_DIR, "main_classifier_models_L1.pkl"), 'wb') as f: pickle.dump(main_classifier_models, f)
with open(os.path.join(MODEL_OUTPUT_DIR, "ensemble_weights_L1.pkl"), 'wb') as f: pickle.dump(ensemble_weights, f)
print("✓ Modelos y pesos de Nivel 1 guardados.")

# 2. Guardar modelos y pesos del ensemble de Nivel 2
if sub_classifier_models is not None:
    with open(os.path.join(MODEL_OUTPUT_DIR, "sub_classifier_models_L2.pkl"), 'wb') as f: pickle.dump(sub_classifier_models, f)
    with open(os.path.join(MODEL_OUTPUT_DIR, "ensemble_weights_L2.pkl"), 'wb') as f: pickle.dump(ensemble_weights_L2, f)
    print("✓ Modelos y pesos de Nivel 2 guardados.")

# 3. Guardar transformadores y codificadores
with open(os.path.join(MODEL_OUTPUT_DIR, "scaler_L1_emb.pkl"), 'wb') as f: pickle.dump(scaler_L1_emb, f)
if 'scaler_L2_emb' in locals():
    with open(os.path.join(MODEL_OUTPUT_DIR, "scaler_L2_emb.pkl"), 'wb') as f: pickle.dump(scaler_L2_emb, f)
with open(os.path.join(MODEL_OUTPUT_DIR, "tfidf_vectorizer.pkl"), 'wb') as f: pickle.dump(tfidf_vectorizer, f)
if 'sub_hate_only_encoder' in locals():
    with open(os.path.join(MODEL_OUTPUT_DIR, "sub_hate_only_encoder.pkl"), 'wb') as f: pickle.dump(sub_hate_only_encoder, f)
print("✓ Scalers, TF-IDF Vectorizer y codificador de sub-etiquetas guardados.")

# 4. Guardar resultados de Optuna
with open(os.path.join(MODEL_OUTPUT_DIR, "optuna_results.pkl"), 'wb') as f:
    pickle.dump({'L1': model_results, 'L2': model_results_L2 if 'model_results_L2' in locals() else {}}, f)
print("✓ Resultados de Optuna guardados.")

print("\n🎉 Pipeline jerárquico robusto completado y todos los artefactos han sido guardados.")

--- Guardando artefactos en datos_locales\model_output\hierarchical-job-1750890141 ---
✓ Modelos y pesos de Nivel 1 guardados.
✓ Modelos y pesos de Nivel 2 guardados.
✓ Scalers, TF-IDF Vectorizer y codificador de sub-etiquetas guardados.
✓ Resultados de Optuna guardados.

🎉 Pipeline jerárquico robusto completado y todos los artefactos han sido guardados.
