# Modelo de Clasificación Jerárquica con Aumento de Datos v3.0

Este notebook implementa un pipeline avanzado de clasificación jerárquica:

1.  **Carga y Preprocesamiento**: Usa `hate_speech_twitter` y realiza limpieza de texto (tokenización, stemming, etc.).
2.  **Generación de Características Dual**: Crea embeddings de BERT y vectores TF-IDF.
3.  **Aumento de Datos Sintéticos**: Utiliza **CTGAN** de la librería `sdv` para generar datos sintéticos y **balancear las sub-categorías** de discurso de odio en el conjunto de entrenamiento, mejorando la robustez del modelo.
4.  **Entrenamiento de Clasificador Principal (Nivel 1)**: Entrena y optimiza un ensemble de cuatro modelos para distinguir entre `odio` y `no-odio`.
5.  **Entrenamiento de Clasificador de Sub-categorías (Nivel 2)**: Entrena un modelo XGBoost para clasificar el **tipo de odio** (ej. sexismo, racismo) en los textos ya identificados como odio.
6.  **Evaluación Jerárquica**: Evalúa el rendimiento del pipeline completo en dos niveles, reportando la precisión tanto en la detección de odio como en la clasificación de su tipo.

## 1. Instalación y Configuración

In [None]:
#!pip install transformers torch datasets scikit-learn xgboost pandas seaborn matplotlib tqdm optuna nltk scipy sdv nltk

In [None]:
import pandas as pd
import numpy as np
import os
import pickle
import seaborn as sns
import matplotlib.pyplot as plt
import time
import torch
import xgboost as xgb
from tqdm.auto import tqdm
import re

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModel
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, log_loss, f1_score
from sklearn.feature_extraction.text import TfidfVectorizer

import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
try:
    nltk.data.find('tokenizers/punkt')
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('punkt')
    nltk.download('stopwords')

BERT_MODEL_NAME = 'bert-base-uncased'
MAX_SAMPLES = 10000 # Aumentar para un mejor entrenamiento de sub-categorías
MAX_TOKEN_LENGTH = 128

# --- Configuración de Dispositivo (GPU o CPU) ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Usando dispositivo: {device}")

# --- Definición de Rutas Locales ---
job_id = f"hierarchical-job-{int(time.time())}"
BASE_DIR = "datos_locales"
PROCESSED_DIR = os.path.join(BASE_DIR, "processed", job_id)
MODEL_OUTPUT_DIR = os.path.join(BASE_DIR, "model_output", job_id)
os.makedirs(PROCESSED_DIR, exist_ok=True)
os.makedirs(MODEL_OUTPUT_DIR, exist_ok=True)
PROCESSED_DATA_PATH = os.path.join(PROCESSED_DIR, "processed_data_with_embeddings.csv")
print(f"\nID de trabajo: {job_id}")

  from .autonotebook import tqdm as notebook_tqdm


Usando dispositivo: cuda

ID de trabajo: hierarchical-job-1750836320


## 2. Carga, Análisis y Preprocesamiento de Datos

In [None]:
print("Cargando dataset 'thefrankhsu/hate_speech_twitter'...")
dataset = load_dataset("thefrankhsu/hate_speech_twitter")
df = pd.DataFrame(dataset['train'])

# Renombrar columnas y manejar nulos en 'categories'
df = df.rename(columns={'tweet': 'text_raw', 'label': 'main_label', 'categories': 'sub_label_str'})
df['sub_label_str'] = df['sub_label_str'].fillna('not-hate')

if MAX_SAMPLES is not None:
    # Asegurarse de que el tamaño de la muestra no sea mayor que la población
    sample_size = min(MAX_SAMPLES, len(df))
    print(f"Tomando una muestra aleatoria de {sample_size} registros (de un total de {len(df)}).")
    df = df.sample(n=sample_size, random_state=42, replace=False).reset_index(drop=True)

print("Distribución de etiquetas principales:")
print(df['main_label'].value_counts())

print("\nDistribución de sub-etiquetas (solo para 'odio'):")
print(df[df['main_label'] == 1]['sub_label_str'].value_counts())

# Codificar sub-etiquetas
from sklearn.preprocessing import LabelEncoder
sub_label_encoder = LabelEncoder()
df['sub_label_encoded'] = sub_label_encoder.fit_transform(df['sub_label_str'])
sub_label_mapping = dict(zip(sub_label_encoder.classes_, sub_label_encoder.transform(sub_label_encoder.classes_)))
print("\nMapeo de Sub-etiquetas:", sub_label_mapping)

# Preprocesamiento de texto
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()
def clean_text(text, apply_stemming=False):
    if pd.isna(text): return ""
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    text = re.sub(r'\@\w+|#','', text)
    tokens = word_tokenize(text)
    words = [word.lower() for word in tokens if word.isalpha() and word.lower() not in stop_words]
    if apply_stemming: words = [stemmer.stem(word) for word in words]
    return " ".join(words)

tqdm.pandas(desc="Limpiando Texto para Embeddings")
df['text_cleaned'] = df['text_raw'].progress_apply(lambda x: clean_text(x, apply_stemming=False))
tqdm.pandas(desc="Aplicando Stemming para TF-IDF")
df['text_stemmed'] = df['text_cleaned'].progress_apply(lambda x: " ".join([stemmer.stem(word) for word in x.split()]))

Cargando dataset 'thefrankhsu/hate_speech_twitter'...
Tomando una muestra aleatoria de 5679 registros (de un total de 5679).
Distribución de etiquetas principales:
main_label
0    4163
1    1516
Name: count, dtype: int64

Distribución de sub-etiquetas (solo para 'odio'):
sub_label_str
Race                   523
Sexual Orientation     429
Gender                 279
Physical Appearance     73
Religion                52
Behavior                40
Class                   40
Ethnicity               40
Disability              40
Name: count, dtype: int64

Mapeo de Sub-etiquetas: {'Behavior': 0, 'Class': 1, 'Disability': 2, 'Ethnicity': 3, 'Gender': 4, 'Physical Appearance': 5, 'Race': 6, 'Religion': 7, 'Sexual Orientation': 8, 'not-hate': 9}


Limpiando Texto para Embeddings: 100%|██████████| 5679/5679 [00:00<00:00, 18824.90it/s]
Aplicando Stemming para TF-IDF: 100%|██████████| 5679/5679 [00:00<00:00, 10888.97it/s]


## 3. Generación de Embeddings y División de Datos

In [4]:
print(f"Cargando modelo y tokenizador BERT: {BERT_MODEL_NAME}")
tokenizer_bert = AutoTokenizer.from_pretrained(BERT_MODEL_NAME)
model_bert = AutoModel.from_pretrained(BERT_MODEL_NAME).to(device)
model_bert.eval()

def get_bert_embeddings(batch_text):
    inputs = tokenizer_bert(batch_text, padding=True, truncation=True, max_length=MAX_TOKEN_LENGTH, return_tensors='pt')
    inputs = {k: v.to(device) for k, v in inputs.items()}
    with torch.no_grad():
        outputs = model_bert(**inputs)
    return outputs.last_hidden_state[:, 0, :].cpu().numpy()

print("Generando embeddings...")
batch_size = 32
all_embeddings = np.vstack([get_bert_embeddings(df.iloc[i:i+batch_size]['text_cleaned'].tolist()) for i in tqdm(range(0, len(df), batch_size))])

embedding_cols = [f'dim_{i}' for i in range(all_embeddings.shape[1])]
df_embeddings = pd.DataFrame(all_embeddings, columns=embedding_cols, index=df.index)
df_processed = pd.concat([df, df_embeddings], axis=1)

print("\n--- Dividiendo Datos ---")
y_main = df_processed['main_label'].values
df_trainval, df_test = train_test_split(df_processed, test_size=0.2, random_state=42, stratify=y_main)
y_trainval_main = df_trainval['main_label'].values
df_train, df_val = train_test_split(df_trainval, test_size=0.25, random_state=42, stratify=y_trainval_main)

print(f"Tamaño Train: {len(df_train)}, Val: {len(df_val)}, Test: {len(df_test)}")

Cargando modelo y tokenizador BERT: bert-base-uncased
Generando embeddings...


100%|██████████| 178/178 [00:05<00:00, 35.17it/s]


--- Dividiendo Datos ---
Tamaño Train: 3407, Val: 1136, Test: 1136





## 4. Aumento de Datos Sintéticos para Sub-categorías (CTGAN)
Nos enfocamos en el desbalance de las sub-categorías de 'odio'. Usaremos CTGAN para generar nuevos datos de entrenamiento para las clases minoritarias, basándonos en sus embeddings.

In [None]:
from sdv.single_table import CTGANSynthesizer
from sdv.metadata import SingleTableMetadata
from sklearn.preprocessing import LabelEncoder

print("--- Preparando datos para aumento ---")
# 1. Aislar los datos de entrenamiento que son 'odio'
df_train_hate = df_train[df_train['main_label'] == 1].copy()
features_to_augment = ['sub_label_str'] + embedding_cols
df_to_augment = df_train_hate[features_to_augment]

print("Distribución de sub-categorías ANTES del aumento:")
hate_counts = df_to_augment['sub_label_str'].value_counts()
print(hate_counts)

# Crear un nuevo encoder dedicado SOLO para las sub-categorías de odio.
sub_hate_only_encoder = LabelEncoder()

if len(hate_counts) > 1 and not df_to_augment.empty:
    # Ajustar el nuevo encoder solo con las etiquetas de odio
    sub_hate_only_encoder.fit(df_to_augment['sub_label_str'])
    print("\nNuevo encoder para Nivel 2 creado. Clases:", sub_hate_only_encoder.classes_)
    
    # 2. Configurar metadatos
    metadata = SingleTableMetadata()
    metadata.detect_from_dataframe(data=df_to_augment)
    
    print("\nActualizando metadatos para tratar embeddings como numéricos continuos...")
    for col in embedding_cols:
        metadata.update_column(column_name=col, sdtype='numerical')
    metadata.update_column(column_name='sub_label_str', sdtype='categorical')

    # 3. Configurar y entrenar el sintetizador CTGAN
    use_gpu = torch.cuda.is_available()
    print(f"Usando GPU: {use_gpu}")
    synthesizer = CTGANSynthesizer(
        metadata, 
        epochs=150, 
        embedding_dim=64, 
        verbose=False,
        cuda=use_gpu  
    )
    
    print(f"\nEntrenando CTGAN para generar datos sintéticos... (Usando GPU: {use_gpu})")
    synthesizer.fit(df_to_augment)

    # 4. Determinar cuántas muestras generar
    max_class_size = hate_counts.max()
    num_to_generate = max_class_size * len(hate_counts) - hate_counts.sum()

    # 5. Generar y combinar datos
    if num_to_generate > 0:
        print(f"\nGenerando {num_to_generate} muestras sintéticas...")
        df_synthetic = synthesizer.sample(num_rows=num_to_generate)
        df_train_hate_balanced = pd.concat([df_to_augment, df_synthetic], ignore_index=True)
    else:
        df_train_hate_balanced = df_to_augment
else:
    print("\nSolo una sub-categoría presente o no hay datos de odio, no se requiere aumento.")
    df_train_hate_balanced = df_to_augment
    if not df_to_augment.empty:
        # Ajustar el encoder si solo hay una clase
        sub_hate_only_encoder.fit(df_to_augment['sub_label_str'])

print("\nDistribución de sub-categorías DESPUÉS del aumento:")
all_sub_labels = df_to_augment['sub_label_str'].unique()
print(df_train_hate_balanced['sub_label_str'].value_counts().reindex(all_sub_labels, fill_value=0))

# Preparar datos de entrenamiento para el clasificador de sub-categorías
if not df_train_hate_balanced.empty:
    X_train_sub = df_train_hate_balanced[embedding_cols].values
    # Usar el NUEVO encoder para transformar las etiquetas
    y_train_sub = sub_hate_only_encoder.transform(df_train_hate_balanced['sub_label_str'])
else:
    # Crear arrays vacíos si no hay datos para evitar errores posteriores
    X_train_sub = np.array([]).reshape(0, len(embedding_cols))
    y_train_sub = np.array([])

--- Preparando datos para aumento ---
Distribución de sub-categorías ANTES del aumento:
sub_label_str
Race                   321
Sexual Orientation     271
Gender                 158
Physical Appearance     43
Religion                27
Class                   25
Ethnicity               23
Disability              22
Behavior                20
Name: count, dtype: int64

Nuevo encoder para Nivel 2 creado. Clases: ['Behavior' 'Class' 'Disability' 'Ethnicity' 'Gender'
 'Physical Appearance' 'Race' 'Religion' 'Sexual Orientation']

Actualizando metadatos para tratar embeddings como numéricos continuos...
Usando GPU: True

Entrenando CTGAN para generar datos sintéticos... (Usando GPU: True)




PerformanceAlert: Using the CTGANSynthesizer on this data is not recommended. To model this data, CTGAN will generate a large number of columns.

Original Column Name   Est # of Columns (CTGAN)
sub_label_str          9
dim_0                  11
dim_1                  11
dim_2                  11
dim_3                  11
dim_4                  11
dim_5                  11
dim_6                  11
dim_7                  11
dim_8                  11
dim_9                  11
dim_10                 11
dim_11                 11
dim_12                 11
dim_13                 11
dim_14                 11
dim_15                 11
dim_16                 11
dim_17                 11
dim_18                 11
dim_19                 11
dim_20                 11
dim_21                 11
dim_22                 11
dim_23                 11
dim_24                 11
dim_25                 11
dim_26                 11
dim_27                 11
dim_28                 11
dim_29                 11
d

## 5. Entrenamiento del Clasificador Principal (Nivel 1) con Optuna y Ensemble

Aquí es donde integramos el pipeline de entrenamiento robusto. Entrenaremos y optimizaremos los cuatro modelos (XGBoost, MLP, y dos Regresiones Logísticas) para la tarea de clasificación binaria (Odio vs. No-Odio). Este proceso no utiliza los datos aumentados, solo el conjunto de entrenamiento original.

In [None]:
import optuna
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import warnings

warnings.filterwarnings('ignore', category=UserWarning)

print("--- Preparando datos para el entrenamiento del Clasificador Principal ---")

# Usar las variables correctas del split jerárquico
y_train = df_train['main_label'].values
y_val = df_val['main_label'].values
num_classes = len(np.unique(y_train)) # Será 2 en este caso

# 1. Escalar características de embeddings
scaler = StandardScaler()
X_train_emb = df_train[embedding_cols].values
X_train_emb_scaled = scaler.fit_transform(X_train_emb)
X_val_emb = df_val[embedding_cols].values
X_val_emb_scaled = scaler.transform(X_val_emb)

# 2. Vectorizar características de texto con TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=10000, ngram_range=(1, 2))
X_train_text = df_train['text_stemmed'].values
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train_text)
X_val_text = df_val['text_stemmed'].values
X_val_tfidf = tfidf_vectorizer.transform(X_val_text)
print(f"TF-IDF: {X_train_tfidf.shape[1]} características generadas.")

# 3. Convertir datos de validación a tensores para PyTorch
X_val_torch = torch.tensor(X_val_emb_scaled, dtype=torch.float32).to(device)
y_val_torch = torch.tensor(y_val, dtype=torch.long).to(device)

print("\n✓ Datos escalados, vectorizados y tensores de PyTorch listos para el Nivel 1.")

--- Preparando datos para el entrenamiento del Clasificador Principal ---
TF-IDF: 10000 características generadas.

✓ Datos escalados, vectorizados y tensores de PyTorch listos para el Nivel 1.


In [7]:
# --- 1. Objetivo para MLP con PyTorch (Usa Embeddings) ---
class MLP(nn.Module):
    def __init__(self, input_size, hidden_layers, output_size, activation_fn, dropout_rate):
        super(MLP, self).__init__()
        layers = []
        current_size = input_size
        for hidden_size in hidden_layers:
            layers.append(nn.Linear(current_size, hidden_size))
            layers.append(activation_fn())
            layers.append(nn.Dropout(dropout_rate))
            current_size = hidden_size
        layers.append(nn.Linear(current_size, output_size))
        self.model = nn.Sequential(*layers)

    def forward(self, x):
        return self.model(x)

def objective_mlp(trial):
    n_layers = trial.suggest_int('n_layers', 1, 3)
    hidden_layers = [trial.suggest_int(f'n_units_l{i}', 32, 256) for i in range(n_layers)]
    dropout_rate = trial.suggest_float('dropout_rate', 0.1, 0.5)
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'RMSprop'])
    lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
    activation_fn = getattr(nn, trial.suggest_categorical('activation', ['ReLU', 'Tanh']))
    
    model = MLP(X_train_emb_scaled.shape[1], hidden_layers, num_classes, activation_fn, dropout_rate).to(device)
    optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    train_dataset = TensorDataset(torch.tensor(X_train_emb_scaled, dtype=torch.float32), torch.tensor(y_train, dtype=torch.long))
    train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

    for epoch in range(25): # Menos epochs para HPO más rápido
        model.train()
        for data, target in train_loader:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            loss = criterion(model(data), target)
            loss.backward()
            optimizer.step()
        
    model.eval()
    with torch.no_grad():
        val_loss = criterion(model(X_val_torch), y_val_torch).item()
    
    trial.report(val_loss, epoch)
    if trial.should_prune(): raise optuna.exceptions.TrialPruned()
    return val_loss

# --- 2. Objetivo para XGBoost (Usa Embeddings) ---
def objective_xgboost(trial):
    params = {
        'objective': 'binary:logistic' if num_classes == 2 else 'multi:softprob',
        'eval_metric': 'logloss' if num_classes == 2 else 'mlogloss',
        'device': 'cuda' if device.type == 'cuda' else 'cpu',
        'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'max_depth': trial.suggest_int('max_depth', 3, 9),
    }
    model = xgb.XGBClassifier(**params, early_stopping_rounds=10)
    model.fit(X_train_emb, y_train, eval_set=[(X_val_emb, y_val)], verbose=False)
    return log_loss(y_val, model.predict_proba(X_val_emb))

# --- 3. Objetivo para Regresión Logística (Usa Embeddings) ---
def objective_logistic_embeddings(trial):
    params = {'C': trial.suggest_float('C', 1e-4, 1e2, log=True), 'solver': 'liblinear', 'max_iter': 1000}
    model = LogisticRegression(**params, random_state=42)
    model.fit(X_train_emb_scaled, y_train)
    return log_loss(y_val, model.predict_proba(X_val_emb_scaled))

# --- 4. Objetivo para Regresión Logística (Usa TF-IDF) ---
def objective_logistic_tfidf(trial):
    params = {'C': trial.suggest_float('C', 1e-2, 1e2, log=True), 'solver': 'liblinear', 'max_iter': 1000}
    model = LogisticRegression(**params, random_state=42)
    model.fit(X_train_tfidf, y_train)
    return log_loss(y_val, model.predict_proba(X_val_tfidf))

print(f"Funciones objetivo de Optuna para el Nivel 1 definidas.")

Funciones objetivo de Optuna para el Nivel 1 definidas.


In [8]:
models_config = {
    'XGBoost': {'objective_func': objective_xgboost, 'n_trials': 25},
    'MLP_PyTorch': {'objective_func': objective_mlp, 'n_trials': 30},
    'LogisticRegression_Embeddings': {'objective_func': objective_logistic_embeddings, 'n_trials': 20},
    'LogisticRegression_TFIDF': {'objective_func': objective_logistic_tfidf, 'n_trials': 20}
}

model_results = {}

for model_name, config in models_config.items():
    print(f"\n--- Optimizando {model_name} (Nivel 1) ---")
    study = optuna.create_study(direction='minimize', sampler=optuna.samplers.TPESampler(seed=42))
    study.optimize(config['objective_func'], n_trials=config['n_trials'], show_progress_bar=True)
    
    model_results[model_name] = {
        'best_params': study.best_params,
        'best_score': study.best_value
    }
    print(f"✓ {model_name} completado. Mejor LogLoss: {study.best_value:.4f}")

[I 2025-06-25 01:37:58,209] A new study created in memory with name: no-name-29223502-920d-428b-93bd-b7bf90b6ed07



--- Optimizando XGBoost (Nivel 1) ---


Best trial: 0. Best value: 0.388348:   4%|▍         | 1/25 [00:01<00:45,  1.91s/it]

[I 2025-06-25 01:38:00,114] Trial 0 finished with value: 0.38834796796875687 and parameters: {'n_estimators': 437, 'learning_rate': 0.2536999076681772, 'max_depth': 8}. Best is trial 0 with value: 0.38834796796875687.


Best trial: 1. Best value: 0.330828:   8%|▊         | 2/25 [00:06<01:14,  3.26s/it]

[I 2025-06-25 01:38:04,318] Trial 1 finished with value: 0.3308276993342611 and parameters: {'n_estimators': 639, 'learning_rate': 0.01700037298921102, 'max_depth': 4}. Best is trial 1 with value: 0.3308276993342611.


Best trial: 1. Best value: 0.330828:  12%|█▏        | 3/25 [00:07<00:53,  2.43s/it]

[I 2025-06-25 01:38:05,770] Trial 2 finished with value: 0.35457881385286627 and parameters: {'n_estimators': 152, 'learning_rate': 0.19030368381735815, 'max_depth': 7}. Best is trial 1 with value: 0.3308276993342611.


Best trial: 1. Best value: 0.330828:  16%|█▌        | 4/25 [00:35<04:24, 12.58s/it]

[I 2025-06-25 01:38:33,896] Trial 3 finished with value: 0.3635742502413637 and parameters: {'n_estimators': 737, 'learning_rate': 0.010725209743171996, 'max_depth': 9}. Best is trial 1 with value: 0.3308276993342611.


Best trial: 4. Best value: 0.328352:  20%|██        | 5/25 [00:39<03:10,  9.54s/it]

[I 2025-06-25 01:38:38,048] Trial 4 finished with value: 0.3283516402816501 and parameters: {'n_estimators': 850, 'learning_rate': 0.020589728197687916, 'max_depth': 4}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  24%|██▍       | 6/25 [00:44<02:32,  8.04s/it]

[I 2025-06-25 01:38:43,176] Trial 5 finished with value: 0.3424010185363584 and parameters: {'n_estimators': 265, 'learning_rate': 0.028145092716060652, 'max_depth': 6}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  28%|██▊       | 7/25 [00:52<02:20,  7.82s/it]

[I 2025-06-25 01:38:50,540] Trial 6 finished with value: 0.3547687649305162 and parameters: {'n_estimators': 489, 'learning_rate': 0.02692655251486473, 'max_depth': 7}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  32%|███▏      | 8/25 [00:55<01:45,  6.18s/it]

[I 2025-06-25 01:38:53,216] Trial 7 finished with value: 0.3421056697542824 and parameters: {'n_estimators': 225, 'learning_rate': 0.027010527749605478, 'max_depth': 5}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  36%|███▌      | 9/25 [00:55<01:12,  4.54s/it]

[I 2025-06-25 01:38:54,145] Trial 8 finished with value: 0.3347771775892681 and parameters: {'n_estimators': 510, 'learning_rate': 0.14447746112718687, 'max_depth': 4}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  40%|████      | 10/25 [00:57<00:52,  3.51s/it]

[I 2025-06-25 01:38:55,350] Trial 9 finished with value: 0.3349609370112823 and parameters: {'n_estimators': 563, 'learning_rate': 0.07500118950416987, 'max_depth': 3}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  44%|████▍     | 11/25 [00:58<00:38,  2.74s/it]

[I 2025-06-25 01:38:56,344] Trial 10 finished with value: 0.3396220915620309 and parameters: {'n_estimators': 981, 'learning_rate': 0.06690992453172917, 'max_depth': 3}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  48%|████▊     | 12/25 [01:06<00:58,  4.47s/it]

[I 2025-06-25 01:39:04,781] Trial 11 finished with value: 0.33582109176999275 and parameters: {'n_estimators': 847, 'learning_rate': 0.010464817979692459, 'max_depth': 5}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 4. Best value: 0.328352:  52%|█████▏    | 13/25 [01:11<00:54,  4.53s/it]

[I 2025-06-25 01:39:09,455] Trial 12 finished with value: 0.3310297457642165 and parameters: {'n_estimators': 710, 'learning_rate': 0.018042210081692857, 'max_depth': 4}. Best is trial 4 with value: 0.3283516402816501.


Best trial: 13. Best value: 0.327103:  56%|█████▌    | 14/25 [01:13<00:42,  3.86s/it]

[I 2025-06-25 01:39:11,743] Trial 13 finished with value: 0.327103316371309 and parameters: {'n_estimators': 997, 'learning_rate': 0.04271369442805087, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  60%|██████    | 15/25 [01:16<00:35,  3.50s/it]

[I 2025-06-25 01:39:14,426] Trial 14 finished with value: 0.3292267280214686 and parameters: {'n_estimators': 978, 'learning_rate': 0.04410047566419462, 'max_depth': 5}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  64%|██████▍   | 16/25 [01:17<00:25,  2.88s/it]

[I 2025-06-25 01:39:15,846] Trial 15 finished with value: 0.3360596257760045 and parameters: {'n_estimators': 859, 'learning_rate': 0.044767771726731534, 'max_depth': 3}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  68%|██████▊   | 17/25 [01:19<00:20,  2.53s/it]

[I 2025-06-25 01:39:17,567] Trial 16 finished with value: 0.341524444466647 and parameters: {'n_estimators': 851, 'learning_rate': 0.11124442632859347, 'max_depth': 6}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  72%|███████▏  | 18/25 [01:21<00:17,  2.50s/it]

[I 2025-06-25 01:39:19,987] Trial 17 finished with value: 0.32880382190584445 and parameters: {'n_estimators': 999, 'learning_rate': 0.037451197538989976, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  76%|███████▌  | 19/25 [01:29<00:24,  4.06s/it]

[I 2025-06-25 01:39:27,703] Trial 18 finished with value: 0.3435131842583088 and parameters: {'n_estimators': 795, 'learning_rate': 0.017250820550283107, 'max_depth': 6}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  80%|████████  | 20/25 [01:30<00:16,  3.24s/it]

[I 2025-06-25 01:39:29,015] Trial 19 finished with value: 0.332489871213085 and parameters: {'n_estimators': 927, 'learning_rate': 0.09154387249271213, 'max_depth': 5}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  84%|████████▍ | 21/25 [01:31<00:10,  2.62s/it]

[I 2025-06-25 01:39:30,187] Trial 20 finished with value: 0.34187274331030476 and parameters: {'n_estimators': 377, 'learning_rate': 0.05547535231708596, 'max_depth': 3}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  88%|████████▊ | 22/25 [01:34<00:07,  2.46s/it]

[I 2025-06-25 01:39:32,267] Trial 21 finished with value: 0.3295852792436287 and parameters: {'n_estimators': 997, 'learning_rate': 0.036712507316242, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  92%|█████████▏| 23/25 [01:36<00:04,  2.42s/it]

[I 2025-06-25 01:39:34,606] Trial 22 finished with value: 0.3294519604453533 and parameters: {'n_estimators': 898, 'learning_rate': 0.03366151211963755, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103:  96%|█████████▌| 24/25 [01:40<00:02,  2.79s/it]

[I 2025-06-25 01:39:38,252] Trial 23 finished with value: 0.32791301563696784 and parameters: {'n_estimators': 784, 'learning_rate': 0.02114521057583463, 'max_depth': 4}. Best is trial 13 with value: 0.327103316371309.


Best trial: 13. Best value: 0.327103: 100%|██████████| 25/25 [01:46<00:00,  4.26s/it]
[I 2025-06-25 01:39:44,786] A new study created in memory with name: no-name-fbcb44b0-bfd5-4192-84f4-879ddca085c7


[I 2025-06-25 01:39:44,777] Trial 24 finished with value: 0.3344977433278849 and parameters: {'n_estimators': 749, 'learning_rate': 0.01493805762572708, 'max_depth': 5}. Best is trial 13 with value: 0.327103316371309.
✓ XGBoost completado. Mejor LogLoss: 0.3271

--- Optimizando MLP_PyTorch (Nivel 1) ---


Best trial: 0. Best value: 0.37968:   3%|▎         | 1/30 [00:01<00:49,  1.69s/it]

[I 2025-06-25 01:39:46,482] Trial 0 finished with value: 0.3796803057193756 and parameters: {'n_layers': 2, 'n_units_l0': 245, 'n_units_l1': 196, 'dropout_rate': 0.3394633936788146, 'optimizer': 'Adam', 'lr': 1.493656855461762e-05, 'activation': 'ReLU'}. Best is trial 0 with value: 0.3796803057193756.


Best trial: 1. Best value: 0.342347:   7%|▋         | 2/30 [00:03<00:47,  1.68s/it]

[I 2025-06-25 01:39:48,151] Trial 1 finished with value: 0.34234651923179626 and parameters: {'n_layers': 3, 'n_units_l0': 36, 'n_units_l1': 250, 'n_units_l2': 219, 'dropout_rate': 0.18493564427131048, 'optimizer': 'RMSprop', 'lr': 8.17949947521167e-05, 'activation': 'ReLU'}. Best is trial 1 with value: 0.34234651923179626.


Best trial: 1. Best value: 0.342347:  10%|█         | 3/30 [00:04<00:42,  1.57s/it]

[I 2025-06-25 01:39:49,581] Trial 2 finished with value: 0.37580549716949463 and parameters: {'n_layers': 1, 'n_units_l0': 169, 'dropout_rate': 0.15579754426081674, 'optimizer': 'RMSprop', 'lr': 0.00023345864076016249, 'activation': 'ReLU'}. Best is trial 1 with value: 0.34234651923179626.


Best trial: 1. Best value: 0.342347:  13%|█▎        | 4/30 [00:06<00:44,  1.70s/it]

[I 2025-06-25 01:39:51,497] Trial 3 finished with value: 0.9165170192718506 and parameters: {'n_layers': 2, 'n_units_l0': 165, 'n_units_l1': 42, 'dropout_rate': 0.34301794076057535, 'optimizer': 'Adam', 'lr': 0.007025166339242158, 'activation': 'ReLU'}. Best is trial 1 with value: 0.34234651923179626.


Best trial: 4. Best value: 0.318321:  17%|█▋        | 5/30 [00:08<00:40,  1.63s/it]

[I 2025-06-25 01:39:52,997] Trial 4 finished with value: 0.3183211088180542 and parameters: {'n_layers': 1, 'n_units_l0': 53, 'dropout_rate': 0.3736932106048628, 'optimizer': 'Adam', 'lr': 0.0003058656666978527, 'activation': 'Tanh'}. Best is trial 4 with value: 0.3183211088180542.


Best trial: 5. Best value: 0.311277:  20%|██        | 6/30 [00:09<00:38,  1.59s/it]

[I 2025-06-25 01:39:54,503] Trial 5 finished with value: 0.3112765848636627 and parameters: {'n_layers': 1, 'n_units_l0': 181, 'dropout_rate': 0.2246844304357644, 'optimizer': 'RMSprop', 'lr': 3.585612610345396e-05, 'activation': 'ReLU'}. Best is trial 5 with value: 0.3112765848636627.


Best trial: 5. Best value: 0.311277:  23%|██▎       | 7/30 [00:11<00:38,  1.65s/it]

[I 2025-06-25 01:39:56,296] Trial 6 pruned. 


Best trial: 5. Best value: 0.311277:  27%|██▋       | 8/30 [00:13<00:38,  1.73s/it]

[I 2025-06-25 01:39:58,178] Trial 7 pruned. 


Best trial: 5. Best value: 0.311277:  30%|███       | 9/30 [00:14<00:33,  1.62s/it]

[I 2025-06-25 01:39:59,551] Trial 8 finished with value: 0.3501242399215698 and parameters: {'n_layers': 1, 'n_units_l0': 215, 'dropout_rate': 0.38274293753904687, 'optimizer': 'RMSprop', 'lr': 1.667761543019792e-05, 'activation': 'ReLU'}. Best is trial 5 with value: 0.3112765848636627.


Best trial: 5. Best value: 0.311277:  33%|███▎      | 10/30 [00:16<00:33,  1.67s/it]

[I 2025-06-25 01:40:01,351] Trial 9 pruned. 


Best trial: 5. Best value: 0.311277:  37%|███▋      | 11/30 [00:18<00:31,  1.66s/it]

[I 2025-06-25 01:40:03,000] Trial 10 pruned. 


Best trial: 5. Best value: 0.311277:  40%|████      | 12/30 [00:19<00:29,  1.66s/it]

[I 2025-06-25 01:40:04,633] Trial 11 pruned. 


Best trial: 5. Best value: 0.311277:  43%|████▎     | 13/30 [00:21<00:28,  1.66s/it]

[I 2025-06-25 01:40:06,299] Trial 12 pruned. 


Best trial: 5. Best value: 0.311277:  47%|████▋     | 14/30 [00:23<00:26,  1.63s/it]

[I 2025-06-25 01:40:07,847] Trial 13 pruned. 


Best trial: 5. Best value: 0.311277:  50%|█████     | 15/30 [00:24<00:23,  1.57s/it]

[I 2025-06-25 01:40:09,302] Trial 14 finished with value: 0.33943459391593933 and parameters: {'n_layers': 1, 'n_units_l0': 201, 'dropout_rate': 0.4505764863334542, 'optimizer': 'Adam', 'lr': 0.00023422328621630747, 'activation': 'Tanh'}. Best is trial 5 with value: 0.3112765848636627.


Best trial: 5. Best value: 0.311277:  53%|█████▎    | 16/30 [00:26<00:22,  1.63s/it]

[I 2025-06-25 01:40:11,064] Trial 15 pruned. 


Best trial: 16. Best value: 0.308976:  57%|█████▋    | 17/30 [00:28<00:21,  1.67s/it]

[I 2025-06-25 01:40:12,804] Trial 16 finished with value: 0.30897584557533264 and parameters: {'n_layers': 1, 'n_units_l0': 76, 'dropout_rate': 0.24695155488800646, 'optimizer': 'RMSprop', 'lr': 0.00016006854792293955, 'activation': 'Tanh'}. Best is trial 16 with value: 0.30897584557533264.


Best trial: 16. Best value: 0.308976:  60%|██████    | 18/30 [00:29<00:19,  1.65s/it]

[I 2025-06-25 01:40:14,440] Trial 17 finished with value: 0.31004074215888977 and parameters: {'n_layers': 2, 'n_units_l0': 144, 'n_units_l1': 208, 'dropout_rate': 0.10263141366114917, 'optimizer': 'RMSprop', 'lr': 3.834304470038068e-05, 'activation': 'Tanh'}. Best is trial 16 with value: 0.30897584557533264.


Best trial: 16. Best value: 0.308976:  63%|██████▎   | 19/30 [00:31<00:17,  1.63s/it]

[I 2025-06-25 01:40:16,029] Trial 18 pruned. 


Best trial: 16. Best value: 0.308976:  67%|██████▋   | 20/30 [00:32<00:16,  1.66s/it]

[I 2025-06-25 01:40:17,729] Trial 19 finished with value: 0.33975961804389954 and parameters: {'n_layers': 2, 'n_units_l0': 134, 'n_units_l1': 138, 'dropout_rate': 0.2666349936722775, 'optimizer': 'RMSprop', 'lr': 0.000137501310114947, 'activation': 'Tanh'}. Best is trial 16 with value: 0.30897584557533264.


Best trial: 16. Best value: 0.308976:  70%|███████   | 21/30 [00:34<00:15,  1.69s/it]

[I 2025-06-25 01:40:19,505] Trial 20 finished with value: 0.3356916904449463 and parameters: {'n_layers': 2, 'n_units_l0': 84, 'n_units_l1': 211, 'dropout_rate': 0.11222606671483279, 'optimizer': 'RMSprop', 'lr': 2.53867069347109e-05, 'activation': 'Tanh'}. Best is trial 16 with value: 0.30897584557533264.


Best trial: 16. Best value: 0.308976:  73%|███████▎  | 22/30 [00:36<00:13,  1.65s/it]

[I 2025-06-25 01:40:21,045] Trial 21 pruned. 


Best trial: 16. Best value: 0.308976:  77%|███████▋  | 23/30 [00:37<00:11,  1.59s/it]

[I 2025-06-25 01:40:22,520] Trial 22 finished with value: 0.3257765471935272 and parameters: {'n_layers': 1, 'n_units_l0': 148, 'dropout_rate': 0.1913464841688231, 'optimizer': 'RMSprop', 'lr': 5.667717009082661e-05, 'activation': 'Tanh'}. Best is trial 16 with value: 0.30897584557533264.


Best trial: 16. Best value: 0.308976:  80%|████████  | 24/30 [00:39<00:09,  1.62s/it]

[I 2025-06-25 01:40:24,194] Trial 23 finished with value: 0.3190734088420868 and parameters: {'n_layers': 1, 'n_units_l0': 196, 'dropout_rate': 0.2637641985408664, 'optimizer': 'RMSprop', 'lr': 0.0001365793264765297, 'activation': 'ReLU'}. Best is trial 16 with value: 0.30897584557533264.


Best trial: 16. Best value: 0.308976:  83%|████████▎ | 25/30 [00:41<00:08,  1.71s/it]

[I 2025-06-25 01:40:26,131] Trial 24 pruned. 


Best trial: 16. Best value: 0.308976:  87%|████████▋ | 26/30 [00:43<00:07,  1.77s/it]

[I 2025-06-25 01:40:28,025] Trial 25 pruned. 


Best trial: 16. Best value: 0.308976:  90%|█████████ | 27/30 [00:44<00:05,  1.73s/it]

[I 2025-06-25 01:40:29,659] Trial 26 pruned. 


Best trial: 16. Best value: 0.308976:  93%|█████████▎| 28/30 [00:46<00:03,  1.82s/it]

[I 2025-06-25 01:40:31,686] Trial 27 finished with value: 0.31517887115478516 and parameters: {'n_layers': 3, 'n_units_l0': 256, 'n_units_l1': 222, 'n_units_l2': 43, 'dropout_rate': 0.1335174425080499, 'optimizer': 'RMSprop', 'lr': 2.0270997272903768e-05, 'activation': 'Tanh'}. Best is trial 16 with value: 0.30897584557533264.


Best trial: 28. Best value: 0.306764:  97%|█████████▋| 29/30 [00:48<00:01,  1.73s/it]

[I 2025-06-25 01:40:33,217] Trial 28 finished with value: 0.30676358938217163 and parameters: {'n_layers': 2, 'n_units_l0': 195, 'n_units_l1': 123, 'dropout_rate': 0.21081833311803883, 'optimizer': 'RMSprop', 'lr': 5.025034733267555e-05, 'activation': 'ReLU'}. Best is trial 28 with value: 0.30676358938217163.


Best trial: 29. Best value: 0.302891: 100%|██████████| 30/30 [00:50<00:00,  1.67s/it]
[I 2025-06-25 01:40:34,800] A new study created in memory with name: no-name-294506ed-6677-43dd-bfce-3debc93b9f35


[I 2025-06-25 01:40:34,790] Trial 29 finished with value: 0.30289140343666077 and parameters: {'n_layers': 2, 'n_units_l0': 202, 'n_units_l1': 117, 'dropout_rate': 0.31366284922342336, 'optimizer': 'RMSprop', 'lr': 6.0160078144777785e-05, 'activation': 'ReLU'}. Best is trial 29 with value: 0.30289140343666077.
✓ MLP_PyTorch completado. Mejor LogLoss: 0.3029

--- Optimizando LogisticRegression_Embeddings (Nivel 1) ---


Best trial: 0. Best value: 0.319224:   5%|▌         | 1/20 [00:00<00:09,  2.09it/s]

[I 2025-06-25 01:40:35,276] Trial 0 finished with value: 0.31922374641950113 and parameters: {'C': 0.017670169402947963}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  10%|█         | 2/20 [00:03<00:36,  2.03s/it]

[I 2025-06-25 01:40:38,388] Trial 1 finished with value: 1.5955142482896418 and parameters: {'C': 50.61576888752309}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  15%|█▌        | 3/20 [00:05<00:31,  1.85s/it]

[I 2025-06-25 01:40:40,022] Trial 2 finished with value: 0.8098998458689198 and parameters: {'C': 2.465832945854912}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  20%|██        | 4/20 [00:06<00:25,  1.57s/it]

[I 2025-06-25 01:40:41,155] Trial 3 finished with value: 0.50440823117129 and parameters: {'C': 0.39079671568228835}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  25%|██▌       | 5/20 [00:06<00:16,  1.08s/it]

[I 2025-06-25 01:40:41,372] Trial 4 finished with value: 0.4062108845720676 and parameters: {'C': 0.0008632008168602544}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  35%|███▌      | 7/20 [00:06<00:07,  1.75it/s]

[I 2025-06-25 01:40:41,587] Trial 5 finished with value: 0.4062311119461451 and parameters: {'C': 0.0008629132190071859}. Best is trial 0 with value: 0.31922374641950113.
[I 2025-06-25 01:40:41,724] Trial 6 finished with value: 0.49179866551459905 and parameters: {'C': 0.00022310108018679258}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  40%|████      | 8/20 [00:09<00:13,  1.13s/it]

[I 2025-06-25 01:40:44,031] Trial 7 finished with value: 1.2866922738134148 and parameters: {'C': 15.741890047456648}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  45%|████▌     | 9/20 [00:10<00:12,  1.14s/it]

[I 2025-06-25 01:40:45,202] Trial 8 finished with value: 0.5085943495265486 and parameters: {'C': 0.4042872735027334}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 0. Best value: 0.319224:  50%|█████     | 10/20 [00:12<00:13,  1.36s/it]

[I 2025-06-25 01:40:47,067] Trial 9 finished with value: 0.7405196393655641 and parameters: {'C': 1.7718847354806828}. Best is trial 0 with value: 0.31922374641950113.


Best trial: 10. Best value: 0.319215:  55%|█████▌    | 11/20 [00:12<00:09,  1.10s/it]

[I 2025-06-25 01:40:47,587] Trial 10 finished with value: 0.31921511298162764 and parameters: {'C': 0.017654677164766052}. Best is trial 10 with value: 0.31921511298162764.


Best trial: 10. Best value: 0.319215:  60%|██████    | 12/20 [00:13<00:07,  1.09it/s]

[I 2025-06-25 01:40:48,070] Trial 11 finished with value: 0.31963801799006214 and parameters: {'C': 0.018388382022356858}. Best is trial 10 with value: 0.31921511298162764.


Best trial: 12. Best value: 0.31745:  65%|██████▌   | 13/20 [00:13<00:05,  1.28it/s] 

[I 2025-06-25 01:40:48,542] Trial 12 finished with value: 0.3174502549962926 and parameters: {'C': 0.012798666702408415}. Best is trial 12 with value: 0.3174502549962926.


Best trial: 13. Best value: 0.317428:  70%|███████   | 14/20 [00:14<00:04,  1.48it/s]

[I 2025-06-25 01:40:48,971] Trial 13 finished with value: 0.31742845333408026 and parameters: {'C': 0.012053282859112415}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  75%|███████▌  | 15/20 [00:14<00:02,  1.74it/s]

[I 2025-06-25 01:40:49,318] Trial 14 finished with value: 0.345137252213233 and parameters: {'C': 0.0028718250465162133}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  80%|████████  | 16/20 [00:15<00:02,  1.57it/s]

[I 2025-06-25 01:40:50,085] Trial 15 finished with value: 0.3535279019185097 and parameters: {'C': 0.061296196372412286}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  85%|████████▌ | 17/20 [00:15<00:01,  1.85it/s]

[I 2025-06-25 01:40:50,418] Trial 16 finished with value: 0.338999567018194 and parameters: {'C': 0.003409264942037956}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428:  90%|█████████ | 18/20 [00:16<00:01,  1.53it/s]

[I 2025-06-25 01:40:51,314] Trial 17 finished with value: 0.3731334741878992 and parameters: {'C': 0.08901581332229032}. Best is trial 13 with value: 0.31742845333408026.


Best trial: 13. Best value: 0.317428: 100%|██████████| 20/20 [00:17<00:00,  1.17it/s]
[I 2025-06-25 01:40:51,866] A new study created in memory with name: no-name-a40234a1-c907-4a16-a5cf-642ad3066311


[I 2025-06-25 01:40:51,702] Trial 18 finished with value: 0.33286536853181986 and parameters: {'C': 0.00415403421342495}. Best is trial 13 with value: 0.31742845333408026.
[I 2025-06-25 01:40:51,862] Trial 19 finished with value: 0.5296032502671237 and parameters: {'C': 0.00011996661220636725}. Best is trial 13 with value: 0.31742845333408026.
✓ LogisticRegression_Embeddings completado. Mejor LogLoss: 0.3174

--- Optimizando LogisticRegression_TFIDF (Nivel 1) ---


Best trial: 7. Best value: 0.183261:  35%|███▌      | 7/20 [00:00<00:00, 35.84it/s]

[I 2025-06-25 01:40:51,885] Trial 0 finished with value: 0.35527615580839433 and parameters: {'C': 0.31489116479568624}. Best is trial 0 with value: 0.35527615580839433.
[I 2025-06-25 01:40:51,922] Trial 1 finished with value: 0.1924042412264549 and parameters: {'C': 63.512210106407046}. Best is trial 1 with value: 0.1924042412264549.
[I 2025-06-25 01:40:51,953] Trial 2 finished with value: 0.18620053322128527 and parameters: {'C': 8.471801418819979}. Best is trial 2 with value: 0.18620053322128527.
[I 2025-06-25 01:40:51,978] Trial 3 finished with value: 0.21830131127974442 and parameters: {'C': 2.481040974867813}. Best is trial 2 with value: 0.18620053322128527.
[I 2025-06-25 01:40:51,992] Trial 4 finished with value: 0.5133252839930367 and parameters: {'C': 0.04207988669606638}. Best is trial 2 with value: 0.18620053322128527.
[I 2025-06-25 01:40:52,006] Trial 5 finished with value: 0.5133373460531051 and parameters: {'C': 0.042070539502879395}. Best is trial 2 with value: 0.1862005

                                                                                     

[I 2025-06-25 01:40:52,073] Trial 8 finished with value: 0.21739142624844643 and parameters: {'C': 2.5378155082656657}. Best is trial 7 with value: 0.18326064109811352.
[I 2025-06-25 01:40:52,099] Trial 9 finished with value: 0.18956593831523136 and parameters: {'C': 6.79657809075816}. Best is trial 7 with value: 0.18326064109811352.
[I 2025-06-25 01:40:52,136] Trial 10 finished with value: 0.19640185578666278 and parameters: {'C': 82.29631658321766}. Best is trial 7 with value: 0.18326064109811352.
[I 2025-06-25 01:40:52,169] Trial 11 finished with value: 0.18159010722748264 and parameters: {'C': 20.996451336399733}. Best is trial 11 with value: 0.18159010722748264.
[I 2025-06-25 01:40:52,201] Trial 12 finished with value: 0.18459080131066244 and parameters: {'C': 34.043053954671954}. Best is trial 11 with value: 0.18159010722748264.
[I 2025-06-25 01:40:52,233] Trial 13 finished with value: 0.18140298496485857 and parameters: {'C': 17.14446414712941}. Best is trial 13 with value: 0.18

Best trial: 17. Best value: 0.181396: 100%|██████████| 20/20 [00:00<00:00, 36.50it/s]

[I 2025-06-25 01:40:52,252] Trial 14 finished with value: 0.33732918864876865 and parameters: {'C': 0.3880040916855655}. Best is trial 13 with value: 0.18140298496485857.
[I 2025-06-25 01:40:52,281] Trial 15 finished with value: 0.1836636745703452 and parameters: {'C': 10.66261287572915}. Best is trial 13 with value: 0.18140298496485857.
[I 2025-06-25 01:40:52,304] Trial 16 finished with value: 0.27098679035802226 and parameters: {'C': 0.9253254956089687}. Best is trial 13 with value: 0.18140298496485857.
[I 2025-06-25 01:40:52,334] Trial 17 finished with value: 0.18139642209754364 and parameters: {'C': 18.63934843098346}. Best is trial 17 with value: 0.18139642209754364.
[I 2025-06-25 01:40:52,378] Trial 18 finished with value: 0.20862394501160347 and parameters: {'C': 3.217618717445632}. Best is trial 17 with value: 0.18139642209754364.
[I 2025-06-25 01:40:52,414] Trial 19 finished with value: 0.19892671588852887 and parameters: {'C': 96.04396902719638}. Best is trial 17 with value: 




In [None]:
main_classifier_models = {} 
print("--- Entrenando modelos finales con los mejores hiperparámetros ---\n")

# 1. XGBoost
params = model_results['XGBoost']['best_params']
final_xgb = xgb.XGBClassifier(objective='binary:logistic', eval_metric='logloss',
                              device='cuda' if device.type == 'cuda' else 'cpu',
                              **params)
final_xgb.fit(X_train_emb, y_train)
main_classifier_models['XGBoost'] = final_xgb 
print("✓ Modelo XGBoost final entrenado.")

# 2. Regresión Logística (Embeddings)
params = model_results['LogisticRegression_Embeddings']['best_params']
final_log_emb = LogisticRegression(solver='liblinear', random_state=42, max_iter=1000, **params)
final_log_emb.fit(X_train_emb_scaled, y_train)
main_classifier_models['LogisticRegression_Embeddings'] = final_log_emb # Usar el nuevo nombre de variable
print("✓ Modelo Regresión Logística (Embeddings) final entrenado.")

# 3. Regresión Logística (TF-IDF)
params = model_results['LogisticRegression_TFIDF']['best_params']
final_log_tfidf = LogisticRegression(solver='liblinear', random_state=42, max_iter=1000, **params)
final_log_tfidf.fit(X_train_tfidf, y_train)
main_classifier_models['LogisticRegression_TFIDF'] = final_log_tfidf # Usar el nuevo nombre de variable
print("✓ Modelo Regresión Logística (TF-IDF) final entrenado.")

# 4. MLP (PyTorch)
params = model_results['MLP_PyTorch']['best_params']
hidden_layers = [params[f'n_units_l{i}'] for i in range(params['n_layers'])]
final_mlp = MLP(input_size=X_train_emb_scaled.shape[1], hidden_layers=hidden_layers, output_size=num_classes, 
                activation_fn=getattr(nn, params['activation']), dropout_rate=params['dropout_rate']).to(device)
optimizer = getattr(optim, params['optimizer'])(final_mlp.parameters(), lr=params['lr'])
criterion = nn.CrossEntropyLoss()
train_dataset = TensorDataset(torch.tensor(X_train_emb_scaled, dtype=torch.float32), torch.tensor(y_train, dtype=torch.long))
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

for epoch in tqdm(range(30), desc="Epochs MLP final"):
    final_mlp.train()
    for data, target in train_loader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = final_mlp(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
main_classifier_models['MLP_PyTorch'] = final_mlp.eval() # Usar el nuevo nombre de variable
print("✓ Modelo MLP (PyTorch) final entrenado.")

print("\n✓ Todos los modelos finales han sido entrenados.")

--- Entrenando modelos finales con los mejores hiperparámetros ---

✓ Modelo XGBoost final entrenado.
✓ Modelo Regresión Logística (Embeddings) final entrenado.
✓ Modelo Regresión Logística (TF-IDF) final entrenado.


Epochs MLP final: 100%|██████████| 30/30 [00:01<00:00, 15.77it/s]

✓ Modelo MLP (PyTorch) final entrenado.

✓ Todos los modelos finales han sido entrenados.





In [10]:
print("--- Calculando pesos para el Ensemble del Clasificador Principal (Nivel 1) ---")
val_probas = {}

# Obtener predicciones de cada modelo en el set de validación
val_probas['XGBoost'] = main_classifier_models['XGBoost'].predict_proba(X_val_emb)
val_probas['LogisticRegression_Embeddings'] = main_classifier_models['LogisticRegression_Embeddings'].predict_proba(X_val_emb_scaled)
val_probas['LogisticRegression_TFIDF'] = main_classifier_models['LogisticRegression_TFIDF'].predict_proba(X_val_tfidf)
with torch.no_grad():
    mlp_outputs = main_classifier_models['MLP_PyTorch'](X_val_torch)
    val_probas['MLP_PyTorch'] = torch.softmax(mlp_outputs, dim=1).cpu().numpy()

# Calcular métricas y pesos del ensemble (mayor peso a menor log_loss)
losses = {name: log_loss(y_val, proba) for name, proba in val_probas.items()}
scores = {name: 1.0 / (loss + 1e-9) for name, loss in losses.items()}
total_score = sum(scores.values())
ensemble_weights = {name: score / total_score for name, score in scores.items()}

print("\n--- Pesos del Ensemble de Nivel 1 Calculados ---")
for name, w in ensemble_weights.items(): 
    print(f"{name:<30} | Peso: {w:.3f} | LogLoss (Val): {losses[name]:.4f}")

# También evaluamos aquí el rendimiento en el set de validación para tener una referencia
ensemble_proba_val = np.zeros_like(val_probas['XGBoost'])
for name, proba in val_probas.items():
    ensemble_proba_val += proba * ensemble_weights[name]

ensemble_log_loss_val = log_loss(y_val, ensemble_proba_val)
print(f"\nLogLoss del Ensemble L1 en Validación: {ensemble_log_loss_val:.4f}")

--- Calculando pesos para el Ensemble del Clasificador Principal (Nivel 1) ---

--- Pesos del Ensemble de Nivel 1 Calculados ---
XGBoost                        | Peso: 0.187 | LogLoss (Val): 0.3662
LogisticRegression_Embeddings  | Peso: 0.216 | LogLoss (Val): 0.3174
LogisticRegression_TFIDF       | Peso: 0.377 | LogLoss (Val): 0.1814
MLP_PyTorch                    | Peso: 0.220 | LogLoss (Val): 0.3110

LogLoss del Ensemble L1 en Validación: 0.2174


## 6. Entrenamiento del Clasificador de Sub-categorías (Nivel 2) con Optuna y Ensemble

Ahora, aplicamos la misma metodología robusta al clasificador de Nivel 2. Este se entrenará **únicamente con los datos de 'odio' balanceados sintéticamente**. Crearemos un ensemble de tres modelos (XGBoost, MLP, Regresión Logística) usando los embeddings como características.

In [11]:
print("--- Preparando datos y definiendo objetivos para el Clasificador de Sub-categorías (Nivel 2) ---")

# 1. Dividir los datos de 'odio' aumentados en su propio conjunto de entrenamiento y validación para HPO
if X_train_sub.shape[0] > 0:
    X_sub_train_emb, X_sub_val_emb, y_sub_train, y_sub_val = train_test_split(
        X_train_sub, y_train_sub, test_size=0.25, random_state=42, stratify=y_train_sub
    )

    # 2. Escalar los embeddings para el Nivel 2
    scaler_sub = StandardScaler()
    X_sub_train_emb_scaled = scaler_sub.fit_transform(X_sub_train_emb)
    X_sub_val_emb_scaled = scaler_sub.transform(X_sub_val_emb)

    # 3. Convertir a tensores de PyTorch para el Nivel 2
    X_sub_val_torch = torch.tensor(X_sub_val_emb_scaled, dtype=torch.float32).to(device)
    y_sub_val_torch = torch.tensor(y_sub_val, dtype=torch.long).to(device)
    
    num_sub_classes = len(np.unique(y_train_sub))
    print(f"Datos de Nivel 2 listos. {num_sub_classes} sub-clases detectadas.")

# --- Funciones Objetivo para Optuna (Nivel 2) ---

def objective_xgboost_L2(trial):
    params = {
        'objective': 'multi:softprob', 'num_class': num_sub_classes, 'eval_metric': 'mlogloss',
        'device': 'cuda' if device.type == 'cuda' else 'cpu',
        'n_estimators': trial.suggest_int('n_estimators', 100, 800),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2, log=True),
        'max_depth': trial.suggest_int('max_depth', 3, 8),
    }
    model = xgb.XGBClassifier(**params, early_stopping_rounds=10)
    model.fit(X_sub_train_emb, y_sub_train, eval_set=[(X_sub_val_emb, y_sub_val)], verbose=False)
    return log_loss(y_sub_val, model.predict_proba(X_sub_val_emb))

def objective_logistic_L2(trial):
    params = {'C': trial.suggest_float('C', 1e-3, 1e2, log=True), 'solver': 'liblinear', 'max_iter': 1000, 'multi_class': 'ovr'}
    model = LogisticRegression(**params, random_state=42)
    model.fit(X_sub_train_emb_scaled, y_sub_train)
    return log_loss(y_sub_val, model.predict_proba(X_sub_val_emb_scaled))
    
def objective_mlp_L2(trial):
    n_layers = trial.suggest_int('n_layers', 1, 2)
    hidden_layers = [trial.suggest_int(f'n_units_l{i}', 32, 128) for i in range(n_layers)]
    lr = trial.suggest_float('lr', 1e-4, 1e-2, log=True)
    
    model = MLP(X_train_sub.shape[1], hidden_layers, num_sub_classes, nn.ReLU, 0.3).to(device)
    optimizer = optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    train_dataset = TensorDataset(torch.tensor(X_sub_train_emb_scaled, dtype=torch.float32), torch.tensor(y_sub_train, dtype=torch.long))
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

    for epoch in range(20):
        for data, target in train_loader:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            loss = criterion(model(data), target)
            loss.backward()
            optimizer.step()

    model.eval()
    with torch.no_grad():
        val_loss = criterion(model(X_sub_val_torch), y_sub_val_torch).item()
    return val_loss

print("Funciones objetivo para Nivel 2 definidas.")

--- Preparando datos y definiendo objetivos para el Clasificador de Sub-categorías (Nivel 2) ---
Datos de Nivel 2 listos. 9 sub-clases detectadas.
Funciones objetivo para Nivel 2 definidas.


In [12]:
if X_train_sub.shape[0] > 0:
    # --- 1. Búsqueda de Hiperparámetros (HPO) para Nivel 2 ---
    models_config_L2 = {
        'XGBoost_L2': {'objective_func': objective_xgboost_L2, 'n_trials': 20},
        'MLP_PyTorch_L2': {'objective_func': objective_mlp_L2, 'n_trials': 25},
        'LogisticRegression_L2': {'objective_func': objective_logistic_L2, 'n_trials': 15}
    }
    model_results_L2 = {}
    for model_name, config in models_config_L2.items():
        print(f"\n--- Optimizando {model_name} (Nivel 2) ---")
        study = optuna.create_study(direction='minimize', sampler=optuna.samplers.TPESampler(seed=42))
        study.optimize(config['objective_func'], n_trials=config['n_trials'], show_progress_bar=True)
        model_results_L2[model_name] = {'best_params': study.best_params}

    # --- 2. Entrenamiento de los modelos finales del ensemble de Nivel 2 ---
    print("\n--- Entrenando modelos finales del Ensemble (Nivel 2) ---")
    sub_classifier_models = {}
    
    # Entrenar en el conjunto completo de datos de sub-categorías (aumentado)
    X_sub_train_full_scaled = scaler_sub.transform(X_train_sub)
    
    # XGBoost L2
    params = model_results_L2['XGBoost_L2']['best_params']
    final_xgb_L2 = xgb.XGBClassifier(objective='multi:softprob', num_class=num_sub_classes, eval_metric='mlogloss',
                                     device='cuda' if device.type == 'cuda' else 'cpu', **params)
    final_xgb_L2.fit(X_train_sub, y_train_sub)
    sub_classifier_models['XGBoost_L2'] = final_xgb_L2
    
    # Logistic Regression L2
    params = model_results_L2['LogisticRegression_L2']['best_params']
    final_log_L2 = LogisticRegression(solver='liblinear', random_state=42, max_iter=1000, **params)
    final_log_L2.fit(X_sub_train_full_scaled, y_train_sub)
    sub_classifier_models['LogisticRegression_L2'] = final_log_L2
    
    # MLP L2
    params = model_results_L2['MLP_PyTorch_L2']['best_params']
    hidden_layers = [params[f'n_units_l{i}'] for i in range(params['n_layers'])]
    final_mlp_L2 = MLP(X_train_sub.shape[1], hidden_layers, num_sub_classes, nn.ReLU, 0.3).to(device)
    optimizer = optim.Adam(final_mlp_L2.parameters(), lr=params['lr'])
    train_dataset_L2 = TensorDataset(torch.tensor(X_sub_train_full_scaled, dtype=torch.float32), torch.tensor(y_train_sub, dtype=torch.long))
    train_loader_L2 = DataLoader(train_dataset_L2, batch_size=64, shuffle=True)
    for epoch in tqdm(range(30), desc="Epochs MLP L2 final"):
        for data, target in train_loader_L2:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            loss = nn.CrossEntropyLoss()(final_mlp_L2(data), target)
            loss.backward()
            optimizer.step()
    sub_classifier_models['MLP_PyTorch_L2'] = final_mlp_L2.eval()
    print("✓ Todos los modelos del ensemble de Nivel 2 han sido entrenados.")

    # --- 3. Cálculo de pesos para el ensemble de Nivel 2 ---
    print("\n--- Calculando pesos para el Ensemble de Nivel 2 ---")
    val_probas_L2 = {}
    val_probas_L2['XGBoost_L2'] = sub_classifier_models['XGBoost_L2'].predict_proba(X_sub_val_emb)
    val_probas_L2['LogisticRegression_L2'] = sub_classifier_models['LogisticRegression_L2'].predict_proba(X_sub_val_emb_scaled)
    with torch.no_grad():
        mlp_outputs = sub_classifier_models['MLP_PyTorch_L2'](X_sub_val_torch)
        val_probas_L2['MLP_PyTorch_L2'] = torch.softmax(mlp_outputs, dim=1).cpu().numpy()

    losses_L2 = {name: log_loss(y_sub_val, proba, labels=np.unique(y_train_sub)) for name, proba in val_probas_L2.items()}
    scores_L2 = {name: 1.0 / (loss + 1e-9) for name, loss in losses_L2.items()}
    total_score_L2 = sum(scores_L2.values())
    ensemble_weights_L2 = {name: score / total_score_L2 for name, score in scores_L2.items()}

    print("\n--- Pesos del Ensemble de Nivel 2 Calculados ---")
    for name, w in ensemble_weights_L2.items():
        print(f"{name:<25} | Peso: {w:.3f} | LogLoss (Val): {losses_L2[name]:.4f}")
else:
    print("No hay datos para entrenar el clasificador de Nivel 2.")
    sub_classifier_models = None
    ensemble_weights_L2 = None

[I 2025-06-25 01:41:01,204] A new study created in memory with name: no-name-8f0f151a-d40e-40fb-acea-3645c80c84df



--- Optimizando XGBoost_L2 (Nivel 2) ---


Best trial: 0. Best value: 1.8716:   5%|▌         | 1/20 [00:08<02:47,  8.84s/it]

[I 2025-06-25 01:41:10,038] Trial 0 finished with value: 1.871595459867491 and parameters: {'n_estimators': 362, 'learning_rate': 0.17254716573280354, 'max_depth': 7}. Best is trial 0 with value: 1.871595459867491.


Best trial: 1. Best value: 1.86129:  10%|█         | 2/20 [00:23<03:43, 12.43s/it]

[I 2025-06-25 01:41:24,988] Trial 1 finished with value: 1.8612874601066922 and parameters: {'n_estimators': 519, 'learning_rate': 0.015958237752949748, 'max_depth': 3}. Best is trial 1 with value: 1.8612874601066922.


Best trial: 2. Best value: 1.86103:  15%|█▌        | 3/20 [00:31<02:52, 10.18s/it]

[I 2025-06-25 01:41:32,477] Trial 2 finished with value: 1.8610292711706993 and parameters: {'n_estimators': 140, 'learning_rate': 0.13394334706750485, 'max_depth': 6}. Best is trial 2 with value: 1.8610292711706993.


Best trial: 2. Best value: 1.86103:  20%|██        | 4/20 [02:51<16:23, 61.48s/it]

[I 2025-06-25 01:43:52,598] Trial 3 finished with value: 1.874791693331618 and parameters: {'n_estimators': 596, 'learning_rate': 0.010636066512540286, 'max_depth': 8}. Best is trial 2 with value: 1.8610292711706993.


Best trial: 2. Best value: 1.86103:  25%|██▌       | 5/20 [03:07<11:16, 45.10s/it]

[I 2025-06-25 01:44:08,658] Trial 4 finished with value: 1.8715940756877123 and parameters: {'n_estimators': 683, 'learning_rate': 0.018891200276189388, 'max_depth': 4}. Best is trial 2 with value: 1.8610292711706993.


Best trial: 2. Best value: 1.86103:  30%|███       | 6/20 [03:42<09:45, 41.84s/it]

[I 2025-06-25 01:44:44,172] Trial 5 finished with value: 1.8694218882719285 and parameters: {'n_estimators': 228, 'learning_rate': 0.024878734419814436, 'max_depth': 6}. Best is trial 2 with value: 1.8610292711706993.


Best trial: 2. Best value: 1.86103:  35%|███▌      | 7/20 [04:24<09:02, 41.71s/it]

[I 2025-06-25 01:45:25,618] Trial 6 finished with value: 1.8724744668697775 and parameters: {'n_estimators': 402, 'learning_rate': 0.023927528765580644, 'max_depth': 6}. Best is trial 2 with value: 1.8610292711706993.


Best trial: 7. Best value: 1.8578:  40%|████      | 8/20 [04:43<06:54, 34.57s/it] 

[I 2025-06-25 01:45:44,911] Trial 7 finished with value: 1.8578027175501988 and parameters: {'n_estimators': 197, 'learning_rate': 0.023993242906812727, 'max_depth': 5}. Best is trial 7 with value: 1.8578027175501988.


Best trial: 7. Best value: 1.8578:  45%|████▌     | 9/20 [04:47<04:35, 25.00s/it]

[I 2025-06-25 01:45:48,865] Trial 8 finished with value: 1.867544689935029 and parameters: {'n_estimators': 419, 'learning_rate': 0.10508421338691762, 'max_depth': 4}. Best is trial 7 with value: 1.8578027175501988.


Best trial: 9. Best value: 1.8563:  50%|█████     | 10/20 [04:52<03:08, 18.85s/it]

[I 2025-06-25 01:45:53,948] Trial 9 finished with value: 1.8563040273049156 and parameters: {'n_estimators': 460, 'learning_rate': 0.05898602410432694, 'max_depth': 3}. Best is trial 9 with value: 1.8563040273049156.


Best trial: 9. Best value: 1.8563:  55%|█████▌    | 11/20 [04:56<02:07, 14.19s/it]

[I 2025-06-25 01:45:57,566] Trial 10 finished with value: 1.8685762915699486 and parameters: {'n_estimators': 762, 'learning_rate': 0.07008140236396194, 'max_depth': 3}. Best is trial 9 with value: 1.8563040273049156.


Best trial: 9. Best value: 1.8563:  60%|██████    | 12/20 [05:03<01:37, 12.18s/it]

[I 2025-06-25 01:46:05,138] Trial 11 finished with value: 1.8695312115856657 and parameters: {'n_estimators': 274, 'learning_rate': 0.047333757398100015, 'max_depth': 4}. Best is trial 9 with value: 1.8563040273049156.


Best trial: 9. Best value: 1.8563:  65%|██████▌   | 13/20 [05:14<01:22, 11.78s/it]

[I 2025-06-25 01:46:16,005] Trial 12 finished with value: 1.8768729256920107 and parameters: {'n_estimators': 110, 'learning_rate': 0.03759084248991579, 'max_depth': 5}. Best is trial 9 with value: 1.8563040273049156.


Best trial: 9. Best value: 1.8563:  70%|███████   | 14/20 [05:27<01:12, 12.16s/it]

[I 2025-06-25 01:46:29,050] Trial 13 finished with value: 1.8677808131398315 and parameters: {'n_estimators': 551, 'learning_rate': 0.049268475811625474, 'max_depth': 5}. Best is trial 9 with value: 1.8563040273049156.


Best trial: 14. Best value: 1.85471:  75%|███████▌  | 15/20 [05:36<00:55, 11.09s/it]

[I 2025-06-25 01:46:37,663] Trial 14 finished with value: 1.8547085865588189 and parameters: {'n_estimators': 300, 'learning_rate': 0.034068566764668934, 'max_depth': 3}. Best is trial 14 with value: 1.8547085865588189.


Best trial: 14. Best value: 1.85471:  80%|████████  | 16/20 [05:40<00:35,  8.85s/it]

[I 2025-06-25 01:46:41,310] Trial 15 finished with value: 1.8700549570494414 and parameters: {'n_estimators': 323, 'learning_rate': 0.07898927623865827, 'max_depth': 3}. Best is trial 14 with value: 1.8547085865588189.


Best trial: 16. Best value: 1.85446:  85%|████████▌ | 17/20 [05:47<00:25,  8.49s/it]

[I 2025-06-25 01:46:48,953] Trial 16 finished with value: 1.8544610805343593 and parameters: {'n_estimators': 494, 'learning_rate': 0.03697206827624653, 'max_depth': 3}. Best is trial 16 with value: 1.8544610805343593.


Best trial: 16. Best value: 1.85446:  90%|█████████ | 18/20 [05:58<00:18,  9.29s/it]

[I 2025-06-25 01:47:00,120] Trial 17 finished with value: 1.8641441361134103 and parameters: {'n_estimators': 619, 'learning_rate': 0.03623633854174203, 'max_depth': 4}. Best is trial 16 with value: 1.8544610805343593.


Best trial: 16. Best value: 1.85446:  95%|█████████▌| 19/20 [06:06<00:08,  8.76s/it]

[I 2025-06-25 01:47:07,633] Trial 18 finished with value: 1.8552868984266362 and parameters: {'n_estimators': 478, 'learning_rate': 0.03444677227502705, 'max_depth': 3}. Best is trial 16 with value: 1.8544610805343593.


Best trial: 16. Best value: 1.85446: 100%|██████████| 20/20 [06:24<00:00, 19.22s/it]
[I 2025-06-25 01:47:25,517] A new study created in memory with name: no-name-9c4c901c-641a-42c2-a3b7-140e83fcf5b4


[I 2025-06-25 01:47:25,511] Trial 19 finished with value: 1.880659099069656 and parameters: {'n_estimators': 340, 'learning_rate': 0.012649749238063291, 'max_depth': 4}. Best is trial 16 with value: 1.8544610805343593.

--- Optimizando MLP_PyTorch_L2 (Nivel 2) ---


Best trial: 0. Best value: 4.43901:   4%|▍         | 1/25 [00:00<00:22,  1.07it/s]

[I 2025-06-25 01:47:26,449] Trial 0 finished with value: 4.4390106201171875 and parameters: {'n_layers': 1, 'n_units_l0': 124, 'lr': 0.0029106359131330704}. Best is trial 0 with value: 4.4390106201171875.


Best trial: 1. Best value: 1.80107:   8%|▊         | 2/25 [00:02<00:24,  1.05s/it]

[I 2025-06-25 01:47:27,576] Trial 1 finished with value: 1.8010735511779785 and parameters: {'n_layers': 2, 'n_units_l0': 47, 'n_units_l1': 47, 'lr': 0.00013066739238053285}. Best is trial 1 with value: 1.8010735511779785.


Best trial: 2. Best value: 1.77907:  12%|█▏        | 3/25 [00:03<00:23,  1.09s/it]

[I 2025-06-25 01:47:28,710] Trial 2 finished with value: 1.779065728187561 and parameters: {'n_layers': 2, 'n_units_l0': 90, 'n_units_l1': 100, 'lr': 0.00010994335574766199}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  16%|█▌        | 4/25 [00:04<00:23,  1.12s/it]

[I 2025-06-25 01:47:29,876] Trial 3 finished with value: 1.8567672967910767 and parameters: {'n_layers': 2, 'n_units_l0': 112, 'n_units_l1': 52, 'lr': 0.0002310201887845295}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  20%|██        | 5/25 [00:05<00:21,  1.05s/it]

[I 2025-06-25 01:47:30,819] Trial 4 finished with value: 2.6646475791931152 and parameters: {'n_layers': 1, 'n_units_l0': 61, 'lr': 0.0011207606211860567}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  24%|██▍       | 6/25 [00:06<00:19,  1.02s/it]

[I 2025-06-25 01:47:31,776] Trial 5 finished with value: 3.113705635070801 and parameters: {'n_layers': 1, 'n_units_l0': 60, 'lr': 0.0016738085788752138}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  28%|██▊       | 7/25 [00:07<00:18,  1.03s/it]

[I 2025-06-25 01:47:32,818] Trial 6 finished with value: 2.082794427871704 and parameters: {'n_layers': 1, 'n_units_l0': 60, 'lr': 0.0005404103854647331}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  32%|███▏      | 8/25 [00:08<00:17,  1.01s/it]

[I 2025-06-25 01:47:33,792] Trial 7 finished with value: 1.8509299755096436 and parameters: {'n_layers': 1, 'n_units_l0': 108, 'lr': 0.00025081156860452336}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  36%|███▌      | 9/25 [00:09<00:16,  1.06s/it]

[I 2025-06-25 01:47:34,968] Trial 8 finished with value: 3.2149441242218018 and parameters: {'n_layers': 2, 'n_units_l0': 89, 'n_units_l1': 36, 'lr': 0.0016409286730647919}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  40%|████      | 10/25 [00:10<00:15,  1.04s/it]

[I 2025-06-25 01:47:35,951] Trial 9 finished with value: 4.390132427215576 and parameters: {'n_layers': 1, 'n_units_l0': 38, 'lr': 0.007902619549708232}. Best is trial 2 with value: 1.779065728187561.


Best trial: 2. Best value: 1.77907:  44%|████▍     | 11/25 [00:11<00:14,  1.07s/it]

[I 2025-06-25 01:47:37,084] Trial 10 finished with value: 1.7906246185302734 and parameters: {'n_layers': 2, 'n_units_l0': 86, 'n_units_l1': 120, 'lr': 0.00010862348973937149}. Best is trial 2 with value: 1.779065728187561.


Best trial: 11. Best value: 1.77796:  48%|████▊     | 12/25 [00:12<00:14,  1.08s/it]

[I 2025-06-25 01:47:38,191] Trial 11 finished with value: 1.7779641151428223 and parameters: {'n_layers': 2, 'n_units_l0': 87, 'n_units_l1': 124, 'lr': 0.0001051379852350776}. Best is trial 11 with value: 1.7779641151428223.


Best trial: 11. Best value: 1.77796:  52%|█████▏    | 13/25 [00:13<00:13,  1.12s/it]

[I 2025-06-25 01:47:39,399] Trial 12 finished with value: 2.5736336708068848 and parameters: {'n_layers': 2, 'n_units_l0': 94, 'n_units_l1': 117, 'lr': 0.0004492445130341701}. Best is trial 11 with value: 1.7779641151428223.


Best trial: 11. Best value: 1.77796:  56%|█████▌    | 14/25 [00:15<00:12,  1.14s/it]

[I 2025-06-25 01:47:40,599] Trial 13 finished with value: 1.8754067420959473 and parameters: {'n_layers': 2, 'n_units_l0': 74, 'n_units_l1': 92, 'lr': 0.00023587584846791236}. Best is trial 11 with value: 1.7779641151428223.


Best trial: 14. Best value: 1.76321:  60%|██████    | 15/25 [00:16<00:11,  1.15s/it]

[I 2025-06-25 01:47:41,782] Trial 14 finished with value: 1.7632132768630981 and parameters: {'n_layers': 2, 'n_units_l0': 101, 'n_units_l1': 98, 'lr': 0.00010165624431986141}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  64%|██████▍   | 16/25 [00:17<00:10,  1.16s/it]

[I 2025-06-25 01:47:42,952] Trial 15 finished with value: 2.860482692718506 and parameters: {'n_layers': 2, 'n_units_l0': 108, 'n_units_l1': 128, 'lr': 0.0005029515102335909}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  68%|██████▊   | 17/25 [00:18<00:09,  1.16s/it]

[I 2025-06-25 01:47:44,105] Trial 16 finished with value: 1.7973941564559937 and parameters: {'n_layers': 2, 'n_units_l0': 73, 'n_units_l1': 70, 'lr': 0.00019903453904446326}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  72%|███████▏  | 18/25 [00:19<00:08,  1.15s/it]

[I 2025-06-25 01:47:45,254] Trial 17 finished with value: 2.3447344303131104 and parameters: {'n_layers': 2, 'n_units_l0': 98, 'n_units_l1': 102, 'lr': 0.00036306296540862033}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  76%|███████▌  | 19/25 [00:20<00:06,  1.15s/it]

[I 2025-06-25 01:47:46,398] Trial 18 finished with value: 3.280683755874634 and parameters: {'n_layers': 2, 'n_units_l0': 127, 'n_units_l1': 80, 'lr': 0.0008371623317801159}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  80%|████████  | 20/25 [00:22<00:05,  1.15s/it]

[I 2025-06-25 01:47:47,547] Trial 19 finished with value: 2.5437557697296143 and parameters: {'n_layers': 2, 'n_units_l0': 77, 'n_units_l1': 109, 'lr': 0.009818247569037463}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  84%|████████▍ | 21/25 [00:23<00:04,  1.15s/it]

[I 2025-06-25 01:47:48,687] Trial 20 finished with value: 1.7949382066726685 and parameters: {'n_layers': 2, 'n_units_l0': 101, 'n_units_l1': 86, 'lr': 0.0001509391812068579}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  88%|████████▊ | 22/25 [00:24<00:03,  1.13s/it]

[I 2025-06-25 01:47:49,763] Trial 21 finished with value: 1.7887300252914429 and parameters: {'n_layers': 2, 'n_units_l0': 87, 'n_units_l1': 100, 'lr': 0.00010092848759937053}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  92%|█████████▏| 23/25 [00:25<00:02,  1.14s/it]

[I 2025-06-25 01:47:50,936] Trial 22 finished with value: 1.789739727973938 and parameters: {'n_layers': 2, 'n_units_l0': 119, 'n_units_l1': 111, 'lr': 0.00016004207537686886}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321:  96%|█████████▌| 24/25 [00:26<00:01,  1.16s/it]

[I 2025-06-25 01:47:52,129] Trial 23 finished with value: 1.7726421356201172 and parameters: {'n_layers': 2, 'n_units_l0': 103, 'n_units_l1': 71, 'lr': 0.00010412314673201341}. Best is trial 14 with value: 1.7632132768630981.


Best trial: 14. Best value: 1.76321: 100%|██████████| 25/25 [00:27<00:00,  1.11s/it]
[I 2025-06-25 01:47:53,312] A new study created in memory with name: no-name-337d2ae3-e82b-47e2-90f6-e0a8c5315415


[I 2025-06-25 01:47:53,302] Trial 24 finished with value: 2.0805864334106445 and parameters: {'n_layers': 2, 'n_units_l0': 102, 'n_units_l1': 68, 'lr': 0.00031656550337424684}. Best is trial 14 with value: 1.7632132768630981.

--- Optimizando LogisticRegression_L2 (Nivel 2) ---




[I 2025-06-25 01:47:56,544] Trial 0 finished with value: 2.4420234137546917 and parameters: {'C': 0.0745934328572655}. Best is trial 0 with value: 2.4420234137546917.




[I 2025-06-25 01:48:08,763] Trial 1 finished with value: 11.713641844902337 and parameters: {'C': 56.69849511478853}. Best is trial 0 with value: 2.4420234137546917.




[I 2025-06-25 01:48:17,260] Trial 2 finished with value: 7.775417121187121 and parameters: {'C': 4.5705630998014515}. Best is trial 0 with value: 2.4420234137546917.




[I 2025-06-25 01:48:23,509] Trial 3 finished with value: 5.057685363398628 and parameters: {'C': 0.9846738873614566}. Best is trial 0 with value: 2.4420234137546917.




[I 2025-06-25 01:48:24,859] Trial 4 finished with value: 1.886957353270857 and parameters: {'C': 0.006026889128682512}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:26,158] Trial 5 finished with value: 1.8869608312675032 and parameters: {'C': 0.0060252157362038605}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:27,059] Trial 6 finished with value: 1.9479914460996808 and parameters: {'C': 0.0019517224641449498}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:37,189] Trial 7 finished with value: 10.340652516021565 and parameters: {'C': 21.42302175774105}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:43,636] Trial 8 finished with value: 5.10131980703163 and parameters: {'C': 1.0129197956845732}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:51,490] Trial 9 finished with value: 7.270012121318602 and parameters: {'C': 3.4702669886504163}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:53,745] Trial 10 finished with value: 2.1273002751272605 and parameters: {'C': 0.03603517820107174}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:54,637] Trial 11 finished with value: 1.9992424353051588 and parameters: {'C': 0.0010359916440554257}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:56,353] Trial 12 finished with value: 1.9008056710068957 and parameters: {'C': 0.010735378215036786}. Best is trial 4 with value: 1.886957353270857.




[I 2025-06-25 01:48:58,050] Trial 13 finished with value: 1.8902156589824692 and parameters: {'C': 0.00864552861056482}. Best is trial 4 with value: 1.886957353270857.


Best trial: 4. Best value: 1.88696: 100%|██████████| 15/15 [01:07<00:00,  4.53s/it]


[I 2025-06-25 01:49:01,232] Trial 14 finished with value: 2.5945114216694423 and parameters: {'C': 0.09683790422339339}. Best is trial 4 with value: 1.886957353270857.

--- Entrenando modelos finales del Ensemble (Nivel 2) ---


Epochs MLP L2 final: 100%|██████████| 30/30 [00:02<00:00, 11.48it/s]


✓ Todos los modelos del ensemble de Nivel 2 han sido entrenados.

--- Calculando pesos para el Ensemble de Nivel 2 ---

--- Pesos del Ensemble de Nivel 2 Calculados ---
XGBoost_L2                | Peso: 0.575 | LogLoss (Val): 0.4335
LogisticRegression_L2     | Peso: 0.178 | LogLoss (Val): 1.4022
MLP_PyTorch_L2            | Peso: 0.247 | LogLoss (Val): 1.0108


## 7. Evaluación Final del Pipeline Jerárquico Robusto

Evaluamos el pipeline completo. Primero, usamos el **ensemble ponderado de Nivel 1** para la predicción de "Odio vs. No-Odio". Luego, para las predicciones de "odio", usamos el **ensemble ponderado de Nivel 2** para predecir la sub-categoría.

In [None]:
print("--- Evaluación del pipeline jerárquico en el conjunto de prueba ---")

# 1. Preparar características de prueba para todos los modelos
X_test_emb_eval = df_test[embedding_cols].values
X_test_emb_scaled_eval = scaler.transform(X_test_emb_eval)
X_test_text_eval = df_test['text_stemmed'].values
X_test_tfidf_eval = tfidf_vectorizer.transform(X_test_text_eval)
X_test_torch_eval = torch.tensor(X_test_emb_scaled_eval, dtype=torch.float32).to(device)
y_main_true = df_test['main_label'].values

# 2. Obtener predicciones del ENSEMBLE de Nivel 1
test_probas_L1 = {}
test_probas_L1['XGBoost'] = main_classifier_models['XGBoost'].predict_proba(X_test_emb_eval)
test_probas_L1['LogisticRegression_Embeddings'] = main_classifier_models['LogisticRegression_Embeddings'].predict_proba(X_test_emb_scaled_eval)
test_probas_L1['LogisticRegression_TFIDF'] = main_classifier_models['LogisticRegression_TFIDF'].predict_proba(X_test_tfidf_eval)
with torch.no_grad():
    mlp_outputs = main_classifier_models['MLP_PyTorch'](X_test_torch_eval)
    test_probas_L1['MLP_PyTorch'] = torch.softmax(mlp_outputs, dim=1).cpu().numpy()

final_ensemble_proba_L1 = np.zeros_like(test_probas_L1['XGBoost'])
for name, proba in test_probas_L1.items():
    final_ensemble_proba_L1 += proba * ensemble_weights[name]
y_main_pred = np.argmax(final_ensemble_proba_L1, axis=1)

# 3. Evaluar Nivel 1
print("\n--- [Nivel 1] Rendimiento del Ensemble Principal (Prueba) ---")
print(classification_report(y_main_true, y_main_pred, target_names=['not-hate', 'hate']))

# 4. Obtener y evaluar predicciones del ENSEMBLE de Nivel 2
if sub_classifier_models is not None:
    # Evaluar solo en los datos que son VERDADERAMENTE odio para una métrica justa
    df_test_true_hate = df_test[df_test['main_label'] == 1].copy()
    if not df_test_true_hate.empty:
        # CORRECCIÓN: Usar el nuevo encoder para transformar las etiquetas verdaderas del subconjunto
        y_sub_true = sub_hate_only_encoder.transform(df_test_true_hate['sub_label_str'])
        
        # Preparar datos para el ensemble L2
        X_test_true_hate_emb = df_test_true_hate[embedding_cols].values
        X_test_true_hate_emb_scaled = scaler_sub.transform(X_test_true_hate_emb)
        X_test_true_hate_torch = torch.tensor(X_test_true_hate_emb_scaled, dtype=torch.float32).to(device)
        
        # Obtener y combinar probabilidades L2
        true_hate_probas_L2 = {}
        true_hate_probas_L2['XGBoost_L2'] = sub_classifier_models['XGBoost_L2'].predict_proba(X_test_true_hate_emb)
        true_hate_probas_L2['LogisticRegression_L2'] = sub_classifier_models['LogisticRegression_L2'].predict_proba(X_test_true_hate_emb_scaled)
        with torch.no_grad():
            mlp_outputs_L2 = sub_classifier_models['MLP_PyTorch_L2'](X_test_true_hate_torch)
            true_hate_probas_L2['MLP_PyTorch_L2'] = torch.softmax(mlp_outputs_L2, dim=1).cpu().numpy()
        
        final_true_hate_proba_L2 = np.zeros_like(true_hate_probas_L2['XGBoost_L2'])
        for name, proba in true_hate_probas_L2.items():
            final_true_hate_proba_L2 += proba * ensemble_weights_L2[name]
        y_sub_pred_for_eval = np.argmax(final_true_hate_proba_L2, axis=1)
        
        print("\n--- [Nivel 2] Rendimiento del Ensemble de Sub-categorías (Prueba) ---")
        # Usar las clases del nuevo encoder, que ahora tendrán el tamaño correcto.
        print(classification_report(y_sub_true, y_sub_pred_for_eval, target_names=sub_hate_only_encoder.classes_, zero_division=0))
else:
    print("\nEl clasificador de sub-categorías no fue entrenado.")

--- Evaluación del pipeline jerárquico en el conjunto de prueba ---

--- [Nivel 1] Rendimiento del Ensemble Principal (Prueba) ---
              precision    recall  f1-score   support

    not-hate       0.90      0.98      0.94       833
        hate       0.92      0.72      0.81       303

    accuracy                           0.91      1136
   macro avg       0.91      0.85      0.87      1136
weighted avg       0.91      0.91      0.90      1136


--- [Nivel 2] Rendimiento del Ensemble de Sub-categorías (Prueba) ---
                     precision    recall  f1-score   support

           Behavior       0.00      0.00      0.00         8
              Class       0.67      0.20      0.31        10
         Disability       1.00      0.08      0.15        12
          Ethnicity       0.00      0.00      0.00         9
             Gender       0.51      0.34      0.41        62
Physical Appearance       0.00      0.00      0.00        18
               Race       0.51      0.83   

## 8. Guardado de Artefactos

Guardamos todos los componentes del pipeline jerárquico: los modelos de ambos ensembles, sus respectivos pesos, transformadores y codificadores.

In [14]:
print(f"--- Guardando artefactos en {MODEL_OUTPUT_DIR} ---")

# ... (código para guardar modelos y pesos L1) ...

# 2. Guardar modelos y pesos del ensemble de Nivel 2
if sub_classifier_models is not None:
    # ... (código para guardar modelos y pesos L2) ...
    pass # Asumiendo que esta parte ya está correcta

# 3. Guardar transformadores y codificadores
with open(os.path.join(MODEL_OUTPUT_DIR, "scaler_L1.pkl"), 'wb') as f: pickle.dump(scaler, f)
if 'scaler_sub' in locals():
    with open(os.path.join(MODEL_OUTPUT_DIR, "scaler_L2.pkl"), 'wb') as f: pickle.dump(scaler_sub, f)
with open(os.path.join(MODEL_OUTPUT_DIR, "tfidf_vectorizer.pkl"), 'wb') as f: pickle.dump(tfidf_vectorizer, f)

# CORRECCIÓN: Guardar el nuevo encoder dedicado para las sub-categorías de odio.
if 'sub_hate_only_encoder' in locals():
    with open(os.path.join(MODEL_OUTPUT_DIR, "sub_hate_only_encoder.pkl"), 'wb') as f: pickle.dump(sub_hate_only_encoder, f)
print("\n✓ Scalers, TF-IDF Vectorizer y codificador de sub-etiquetas guardados.")


# 4. Guardar resultados de Optuna
# ... (código para guardar resultados de Optuna) ...

print("\n🎉 Pipeline jerárquico robusto completado y todos los artefactos han sido guardados.")

--- Guardando artefactos en datos_locales\model_output\hierarchical-job-1750836320 ---

✓ Scalers, TF-IDF Vectorizer y codificador de sub-etiquetas guardados.

🎉 Pipeline jerárquico robusto completado y todos los artefactos han sido guardados.
