# Ensemble Optimizado para Predicción de Géneros

Este notebook implementa un ensemble avanzado con:
- Data augmentation (back-translation)
- DeBERTa-v3 (modelo top del leaderboard)
- Optimización de pesos con Optuna
- Test-time augmentation

## 1. Imports y Configuración

In [3]:
import pandas as pd
import numpy as np
from pathlib import Path
from scipy.sparse import hstack as sp_hstack, csr_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from xgboost import XGBClassifier
from sklearn.multioutput import MultiOutputClassifier
from sentence_transformers import SentenceTransformer
from transformers import (
    DistilBertTokenizer, DistilBertForSequenceClassification,
    AutoTokenizer, AutoModelForSequenceClassification,
    Trainer, TrainingArguments,
    MarianMTModel, MarianTokenizer
)
from torch.utils.data import Dataset
import torch
import warnings
warnings.filterwarnings('ignore')

# Importar función de validación
import sys
sys.path.append('../..')
from validator import compute_metrics

## 2. Carga y Preparación de Datos

In [4]:
train_dir = Path("../../dataset_train.csv")
test_dir = Path("../../dataset_test.csv")

df = pd.read_csv(train_dir)
print(f"Dataset size: {len(df)}")
df.head()

Dataset size: 8475


Unnamed: 0,movie_name,genre,description
0,Silent Hill,"Horror, Mystery","Rose, a desperate mother takes her adopted dau..."
1,Breaking the Waves,"Drama, Romance","In a small and conservative Scottish village, ..."
2,Wind Chill,"Drama, Horror, Thriller",Two college students share a ride home for the...
3,Godmothered,"Family, Fantasy, Comedy",A young and unskilled fairy godmother that ven...
4,Donkey Skin,"Fantasy, Comedy, Music, Romance",A fairy godmother helps a princess disguise he...


In [5]:
df["text"] = df["movie_name"].fillna("") + " [SEP] " + df["description"].fillna("")
y_list = df["genre"].apply(lambda s: [g.strip() for g in str(s).split(",") if g.strip()])

mlb = MultiLabelBinarizer()
Y = mlb.fit_transform(y_list)

print(f"Number of labels: {len(mlb.classes_)}")
print(f"Label distribution shape: {Y.shape}")

Number of labels: 18
Label distribution shape: (8475, 18)


## 3. Data Augmentation - Back Translation

In [6]:
def back_translate(texts, src_lang='en', pivot_lang='fr', sample_ratio=0.2):
    model_name_en_pivot = f'Helsinki-NLP/opus-mt-{src_lang}-{pivot_lang}'
    model_name_pivot_en = f'Helsinki-NLP/opus-mt-{pivot_lang}-{src_lang}'
    
    tokenizer_en_pivot = MarianTokenizer.from_pretrained(model_name_en_pivot)
    model_en_pivot = MarianMTModel.from_pretrained(model_name_en_pivot)
    
    tokenizer_pivot_en = MarianTokenizer.from_pretrained(model_name_pivot_en)
    model_pivot_en = MarianMTModel.from_pretrained(model_name_pivot_en)
    
    augmented_texts = []
    indices_to_augment = np.random.choice(len(texts), size=int(len(texts) * sample_ratio), replace=False)
    
    for i, idx in enumerate(indices_to_augment):
        if i % 50 == 0:
            print(f"Augmenting {i}/{len(indices_to_augment)}...", end='\r')
        
        text = texts.iloc[idx] if hasattr(texts, 'iloc') else texts[idx]
        
        translated = model_en_pivot.generate(**tokenizer_en_pivot(text, return_tensors="pt", padding=True, truncation=True, max_length=128))
        pivot_text = tokenizer_en_pivot.decode(translated[0], skip_special_tokens=True)
        
        back_translated = model_pivot_en.generate(**tokenizer_pivot_en(pivot_text, return_tensors="pt", padding=True, truncation=True, max_length=128))
        final_text = tokenizer_pivot_en.decode(back_translated[0], skip_special_tokens=True)
        
        augmented_texts.append(final_text)
    
    print(f"Augmentation complete!" + " "*20)
    return augmented_texts, indices_to_augment

In [8]:
from joblib import Parallel, delayed

def back_translate_single(text, src_lang='en', pivot_lang='fr'):
    """Traduce un solo texto usando back-translation"""
    model_name_en_pivot = f'Helsinki-NLP/opus-mt-{src_lang}-{pivot_lang}'
    model_name_pivot_en = f'Helsinki-NLP/opus-mt-{pivot_lang}-{src_lang}'
    
    tokenizer_en_pivot = MarianTokenizer.from_pretrained(model_name_en_pivot)
    model_en_pivot = MarianMTModel.from_pretrained(model_name_en_pivot)
    
    tokenizer_pivot_en = MarianTokenizer.from_pretrained(model_name_pivot_en)
    model_pivot_en = MarianMTModel.from_pretrained(model_name_pivot_en)
    
    # EN -> FR
    translated = model_en_pivot.generate(
        **tokenizer_en_pivot(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
    )
    pivot_text = tokenizer_en_pivot.decode(translated[0], skip_special_tokens=True)
    
    # FR -> EN
    back_translated = model_pivot_en.generate(
        **tokenizer_pivot_en(pivot_text, return_tensors="pt", padding=True, truncation=True, max_length=128)
    )
    final_text = tokenizer_pivot_en.decode(back_translated[0], skip_special_tokens=True)
    
    return final_text

# Split ANTES de augmentar
X_tr, X_va, y_tr, y_va = train_test_split(df["text"], Y, test_size=0.1, random_state=42)
print(f"Original - Training: {len(X_tr)}, Validation: {len(X_va)}")

# Seleccionar índices a augmentar
print(f"\nAugmenting training data (parallelized)...")
aug_indices = np.random.choice(len(X_tr), size=int(len(X_tr) * 0.35), replace=False)
texts_to_augment = [X_tr.iloc[idx] if hasattr(X_tr, 'iloc') else X_tr[idx] for idx in aug_indices]

# Paralelizar back-translation con joblib (n_jobs=-1 usa todos los cores)
augmented_texts = Parallel(n_jobs=-1, verbose=10, backend='loky')(
    delayed(back_translate_single)(text) for text in texts_to_augment
)

# Obtener labels correspondientes
y_tr_augmented = y_tr[aug_indices]

# Combinar train original + augmentado
X_tr_combined = pd.concat([X_tr.reset_index(drop=True), pd.Series(augmented_texts)], ignore_index=True)
y_tr_combined = np.vstack([y_tr, y_tr_augmented])

print(f"Augmented - Training: {len(X_tr_combined)}, Validation: {len(X_va)} (unchanged)")
print(f"Train augmentation: +{len(augmented_texts)} samples ({len(augmented_texts)/len(X_tr):.1%})")

# Actualizar variables
X_tr = X_tr_combined
y_tr = y_tr_combined

Original - Training: 7627, Validation: 848

Augmenting training data (parallelized)...


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:   37.0s
[Parallel(n_jobs=-1)]: Done  21 tasks      | elapsed:   41.4s
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   53.8s
[Parallel(n_jobs=-1)]: Done  49 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed:  1.3min
[Parallel(n_jobs=-1)]: Done  81 tasks      | elapsed:  1.6min
[Parallel(n_jobs=-1)]: Done  98 tasks      | elapsed:  1.8min
[Parallel(n_jobs=-1)]: Done 117 tasks      | elapsed:  2.1min
[Parallel(n_jobs=-1)]: Done 136 tasks      | elapsed:  2.3min
[Parallel(n_jobs=-1)]: Done 157 tasks      | elapsed:  2.7min
[Parallel(n_jobs=-1)]: Done 178 tasks      | elapsed:  2.9min
[Parallel(n_jobs=-1)]: Done 201 tasks      | elapsed:  3.3min
[Parallel(n_jobs=-1)]: Done 224 tasks      | elapsed:  3.7min
[Parallel(n_jobs=-1)]: Done 249 tasks      | elapsed:  4.1min
[Parallel(n_jobs=-1)]: Done 274 tasks      | elapsed:  

Augmented - Training: 10296, Validation: 848 (unchanged)
Train augmentation: +2669 samples (35.0%)


[Parallel(n_jobs=-1)]: Done 2669 out of 2669 | elapsed: 38.8min finished


In [9]:
import joblib

In [10]:
joblib.dump(X_tr, "X_tr_nvembed.pkl")
joblib.dump(X_va, "X_va_nvembed.pkl")
joblib.dump(y_tr, "y_tr_nvembed.pkl")
joblib.dump(y_va, "y_va_nvembed.pkl")

['y_va_nvembed.pkl']

## 4. Feature Engineering - TF-IDF

In [11]:
tfidf_word = TfidfVectorizer(
    ngram_range=(1,4),
    min_df=2,
    max_features=750_000,
    sublinear_tf=True,
    stop_words="english",
    max_df=0.85,
    strip_accents='unicode',
    lowercase=True
)

tfidf_char = TfidfVectorizer(
    analyzer="char_wb",
    ngram_range=(3,6),
    min_df=2,
    max_features=750_000,
    sublinear_tf=True,
    max_df=0.85,
    strip_accents='unicode'
)

Xw_tr = tfidf_word.fit_transform(X_tr)
Xw_va = tfidf_word.transform(X_va)
Xc_tr = tfidf_char.fit_transform(X_tr)
Xc_va = tfidf_char.transform(X_va)

XTR_tfidf = sp_hstack([Xw_tr, Xc_tr], format="csr")
XVA_tfidf = sp_hstack([Xw_va, Xc_va], format="csr")
print(f"Combined TF-IDF features shape: {XTR_tfidf.shape}")

Combined TF-IDF features shape: (10296, 267716)


In [12]:
# Guardar TF-IDF vectorizers, MultiLabelBinarizer y labels
joblib.dump(tfidf_word, "tfidf_word.joblib")
joblib.dump(tfidf_char, "tfidf_char.joblib")
joblib.dump(mlb, "mlb.joblib")

# Guardar labels.json
import json
labels_dict = {"labels": mlb.classes_.tolist()}
with open("labels.json", "w") as f:
    json.dump(labels_dict, f, indent=2)

print("TF-IDF vectorizers, MLB and labels.json saved!")
print(f"Number of labels: {len(mlb.classes_)}")

TF-IDF vectorizers, MLB and labels.json saved!
Number of labels: 18


## 5. Embeddings Mejorados (BGE-Large)

In [13]:
st_model = SentenceTransformer('BAAI/bge-large-en-v1.5')
print("Generating embeddings with BGE-Large (1024 dim)...")
emb_tr = st_model.encode(X_tr.tolist(), show_progress_bar=True, batch_size=16, normalize_embeddings=True)
emb_va = st_model.encode(X_va.tolist(), show_progress_bar=True, batch_size=16, normalize_embeddings=True)

XTR_combined = sp_hstack([XTR_tfidf, csr_matrix(emb_tr)], format="csr")
XVA_combined = sp_hstack([XVA_tfidf, csr_matrix(emb_va)], format="csr")
print(f"Combined features (TF-IDF + BGE Embeddings) shape: {XTR_combined.shape}")

Generating embeddings with BGE-Large (1024 dim)...


Batches:   0%|          | 0/644 [00:00<?, ?it/s]

Batches:   0%|          | 0/53 [00:00<?, ?it/s]

Combined features (TF-IDF + BGE Embeddings) shape: (10296, 268740)


In [14]:
# Guardar sentence transformer model
joblib.dump(st_model, "sentence_transformer.joblib")
print("Sentence Transformer model saved!")

Sentence Transformer model saved!


## 6. Calibración y Modelos Mejorados

In [15]:
clf_logreg = OneVsRestClassifier(
    LogisticRegression(C=8.0, solver="saga", max_iter=4000, class_weight='balanced', random_state=42),
    n_jobs=-1
)
print("Training LogisticRegression...")
clf_logreg.fit(XTR_combined, y_tr)
print("LogReg training complete!")

Training LogisticRegression...
LogReg training complete!


In [16]:
logits_logreg = clf_logreg.decision_function(XVA_combined)
ths_logreg = np.zeros(logits_logreg.shape[1])

for k in range(logits_logreg.shape[1]):
    s = logits_logreg[:, k]
    best_f1, best_t = 0.0, 0.0
    candidates = np.quantile(s, np.linspace(0.01, 0.99, 50))
    for t in candidates:
        preds_k = (s >= t).astype(int)
        f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
        if f1 > best_f1:
            best_f1, best_t = f1, t
    ths_logreg[k] = best_t

pred_logreg = (logits_logreg >= ths_logreg).astype(int)
metrics_logreg = compute_metrics(y_va, pred_logreg)
print(f"LogReg - F1: {metrics_logreg['f1']:.4f}, Precision: {metrics_logreg['precision']:.4f}, Recall: {metrics_logreg['recall']:.4f}, Hamming: {metrics_logreg['hamming_loss']:.4f}")

LogReg - F1: 0.6345, Precision: 0.6109, Recall: 0.6791, Hamming: 0.1021


In [64]:
print(compute_metrics(y_va, pred_logreg))

{'accuracy': 0.13561320754716982, 'f1': 0.6344655126996935, 'precision': 0.6109434298622106, 'recall': 0.6791274512706784, 'hamming_loss': 0.10213574423480083}


In [17]:
# Guardar modelo LogReg
joblib.dump(clf_logreg, "clf_logreg.joblib")
joblib.dump(ths_logreg, "ths_logreg.npy")
print("LogReg model saved!")

LogReg model saved!


In [18]:
clf_xgb = MultiOutputClassifier(
    XGBClassifier(n_estimators=300, max_depth=6, learning_rate=0.1, random_state=42, n_jobs=-1)
)
print("Training XGBoost...")
clf_xgb.fit(emb_tr, y_tr)
print("XGBoost training complete!")

Training XGBoost...
XGBoost training complete!


In [19]:
pred_proba_xgb = clf_xgb.predict_proba(emb_va)
logits_xgb = np.column_stack([p[:, 1] for p in pred_proba_xgb])
ths_xgb = np.zeros(logits_xgb.shape[1])

for k in range(logits_xgb.shape[1]):
    s = logits_xgb[:, k]
    best_f1, best_t = 0.0, 0.0
    candidates = np.quantile(s, np.linspace(0.01, 0.99, 50))
    for t in candidates:
        preds_k = (s >= t).astype(int)
        f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
        if f1 > best_f1:
            best_f1, best_t = f1, t
    ths_xgb[k] = best_t

pred_xgb = (logits_xgb >= ths_xgb).astype(int)
metrics_xgb = compute_metrics(y_va, pred_xgb)
print(f"XGBoost - F1: {metrics_xgb['f1']:.4f}, Precision: {metrics_xgb['precision']:.4f}, Recall: {metrics_xgb['recall']:.4f}, Hamming: {metrics_xgb['hamming_loss']:.4f}")

XGBoost - F1: 0.6224, Precision: 0.5908, Recall: 0.6764, Hamming: 0.1046


In [20]:
# Guardar modelo XGBoost
joblib.dump(clf_xgb, "clf_xgb.joblib")
joblib.dump(ths_xgb, "ths_xgb.npy")
print("XGBoost model saved!")

XGBoost model saved!


In [21]:
clf_svc = OneVsRestClassifier(
    LinearSVC(C=2.0, max_iter=4000, class_weight='balanced', dual='auto', random_state=42),
    n_jobs=-1
)
print("Training LinearSVC...")
clf_svc.fit(XTR_tfidf, y_tr)
print("SVC training complete!")

Training LinearSVC...
SVC training complete!


In [22]:
logits_svc = clf_svc.decision_function(XVA_tfidf)
ths_svc = np.zeros(logits_svc.shape[1])

for k in range(logits_svc.shape[1]):
    s = logits_svc[:, k]
    best_f1, best_t = 0.0, 0.0
    candidates = np.quantile(s, np.linspace(0.01, 0.99, 50))
    for t in candidates:
        preds_k = (s >= t).astype(int)
        f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
        if f1 > best_f1:
            best_f1, best_t = f1, t
    ths_svc[k] = best_t

pred_svc = (logits_svc >= ths_svc).astype(int)
metrics_svc = compute_metrics(y_va, pred_svc)
print(f"LinearSVC - F1: {metrics_svc['f1']:.4f}, Precision: {metrics_svc['precision']:.4f}, Recall: {metrics_svc['recall']:.4f}, Hamming: {metrics_svc['hamming_loss']:.4f}")

LinearSVC - F1: 0.5655, Precision: 0.5320, Recall: 0.6326, Hamming: 0.1238


In [23]:
# Guardar modelo SVC
joblib.dump(clf_svc, "clf_svc.joblib")
joblib.dump(ths_svc, "ths_svc.npy")
print("SVC model saved!")

SVC model saved!


## Calibración Probabilística SVC

In [25]:
from sklearn.calibration import CalibratedClassifierCV

print("Calibrating SVC probabilities with cross-validation on TRAIN...")
# IMPORTANTE: Calibrar usando TRAIN, no validación (evitar data leakage)

# Solución: Entrenar SVC sin OneVsRestClassifier primero, luego calibrar cada clasificador individualmente
print("Training base SVC models...")
base_svc_models = []
n_labels = y_tr.shape[1]

for k in range(n_labels):
    if k % 5 == 0:
        print(f"Training label {k+1}/{n_labels}...", end='\r')
    
    # Entrenar SVC para cada label
    svc = LinearSVC(C=2.0, max_iter=4000, class_weight='balanced', dual='auto', random_state=42)
    
    # Calibrar con cross-validation en TRAIN
    calibrated_svc = CalibratedClassifierCV(svc, cv=3, method='sigmoid')
    calibrated_svc.fit(XTR_tfidf, y_tr[:, k])
    
    base_svc_models.append(calibrated_svc)

print(f"\nCalibration complete for {n_labels} labels!")

# Obtener probabilidades calibradas en VALIDACIÓN
print("Generating calibrated probabilities on validation set...")
logits_svc_cal = np.zeros((XVA_tfidf.shape[0], n_labels))

for k, model in enumerate(base_svc_models):
    logits_svc_cal[:, k] = model.predict_proba(XVA_tfidf)[:, 1]

# Re-optimizar thresholds con calibración
ths_svc_cal = np.zeros(logits_svc_cal.shape[1])
for k in range(logits_svc_cal.shape[1]):
    s = logits_svc_cal[:, k]
    best_f1, best_t = 0.0, 0.0
    candidates = np.quantile(s, np.linspace(0.01, 0.99, 50))
    for t in candidates:
        preds_k = (s >= t).astype(int)
        f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
        if f1 > best_f1:
            best_f1, best_t = f1, t
    ths_svc_cal[k] = best_t

pred_svc_cal = (logits_svc_cal >= ths_svc_cal).astype(int)
metrics_svc_cal = compute_metrics(y_va, pred_svc_cal)
print(f"Calibrated SVC - F1: {metrics_svc_cal['f1']:.4f}, Precision: {metrics_svc_cal['precision']:.4f}, Recall: {metrics_svc_cal['recall']:.4f}, Hamming: {metrics_svc_cal['hamming_loss']:.4f}")

# Guardar modelos calibrados
clf_svc_calibrated = base_svc_models  # Lista de modelos calibrados

Calibrating SVC probabilities with cross-validation on TRAIN...
Training base SVC models...
Training label 16/18...
Calibration complete for 18 labels!
Generating calibrated probabilities on validation set...
Calibrated SVC - F1: 0.5739, Precision: 0.5497, Recall: 0.6308, Hamming: 0.1203


In [26]:
# Guardar modelo SVC calibrado
joblib.dump(clf_svc_calibrated, "clf_svc_calibrated.joblib")
joblib.dump(ths_svc_cal, "ths_svc_cal.npy")
print("Calibrated SVC model saved!")

Calibrated SVC model saved!


## 7. DistilBERT con Focal Loss y Label Smoothing

In [27]:
import torch.nn as nn

class FocalLoss(nn.Module):
    def __init__(self, alpha=0.25, gamma=2):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma
    
    def forward(self, inputs, targets):
        bce_loss = nn.BCEWithLogitsLoss(reduction='none')(inputs, targets)
        pt = torch.exp(-bce_loss)
        focal_loss = self.alpha * (1-pt)**self.gamma * bce_loss
        return focal_loss.mean()

class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.logits
        
        loss_fct = FocalLoss(alpha=0.25, gamma=2.0)
        loss = loss_fct(logits, labels)
        
        return (loss, outputs) if return_outputs else loss

print("Focal Loss class defined for DistilBERT")

Focal Loss class defined for DistilBERT


In [28]:
class MovieGenreDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = str(self.texts.iloc[idx]) if hasattr(self.texts, 'iloc') else str(self.texts[idx])
        encoding = self.tokenizer(text, truncation=True, padding='max_length', max_length=self.max_length, return_tensors='pt')
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(self.labels[idx], dtype=torch.float)
        }

In [29]:
tokenizer_distilbert = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model_distilbert = DistilBertForSequenceClassification.from_pretrained(
    'distilbert-base-uncased',
    num_labels=len(mlb.classes_),
    problem_type="multi_label_classification"
)

train_dataset_distilbert = MovieGenreDataset(X_tr, y_tr, tokenizer_distilbert, max_length=128)
val_dataset_distilbert = MovieGenreDataset(X_va, y_va, tokenizer_distilbert, max_length=128)
print(f"Datasets created: {len(train_dataset_distilbert)} training, {len(val_dataset_distilbert)} validation")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Datasets created: 10296 training, 848 validation


In [30]:
training_args_distilbert = TrainingArguments(
    output_dir='./distilbert_results',
    num_train_epochs=4,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    warmup_steps=500,
    weight_decay=0.01,
    logging_steps=100,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    label_smoothing_factor=0.1,
)

trainer_distilbert = CustomTrainer(
    model=model_distilbert,
    args=training_args_distilbert,
    train_dataset=train_dataset_distilbert,
    eval_dataset=val_dataset_distilbert,
)

print("Training DistilBERT with Focal Loss + Label Smoothing...")
trainer_distilbert.train()
print("DistilBERT training complete!")

Training DistilBERT with Focal Loss + Label Smoothing...


  0%|          | 0/2576 [00:00<?, ?it/s]

{'loss': 0.0338, 'grad_norm': 0.051935646682977676, 'learning_rate': 1e-05, 'epoch': 0.16}
{'loss': 0.0221, 'grad_norm': 0.07436374574899673, 'learning_rate': 2e-05, 'epoch': 0.31}
{'loss': 0.0185, 'grad_norm': 0.0721149668097496, 'learning_rate': 3e-05, 'epoch': 0.47}
{'loss': 0.0169, 'grad_norm': 0.07163507491350174, 'learning_rate': 4e-05, 'epoch': 0.62}
{'loss': 0.016, 'grad_norm': 0.06319987773895264, 'learning_rate': 5e-05, 'epoch': 0.78}
{'loss': 0.0154, 'grad_norm': 0.07618486881256104, 'learning_rate': 4.759152215799615e-05, 'epoch': 0.93}


  0%|          | 0/27 [00:00<?, ?it/s]

{'eval_loss': 0.015020363964140415, 'eval_runtime': 1.5473, 'eval_samples_per_second': 548.05, 'eval_steps_per_second': 17.45, 'epoch': 1.0}
{'loss': 0.0141, 'grad_norm': 0.08374618738889694, 'learning_rate': 4.518304431599229e-05, 'epoch': 1.09}
{'loss': 0.0131, 'grad_norm': 0.08397373557090759, 'learning_rate': 4.2774566473988445e-05, 'epoch': 1.24}
{'loss': 0.0125, 'grad_norm': 0.0772162601351738, 'learning_rate': 4.036608863198459e-05, 'epoch': 1.4}
{'loss': 0.0121, 'grad_norm': 0.06960378587245941, 'learning_rate': 3.7957610789980736e-05, 'epoch': 1.55}
{'loss': 0.0119, 'grad_norm': 0.05488298088312149, 'learning_rate': 3.554913294797688e-05, 'epoch': 1.71}
{'loss': 0.0116, 'grad_norm': 0.0836111307144165, 'learning_rate': 3.314065510597303e-05, 'epoch': 1.86}


  0%|          | 0/27 [00:00<?, ?it/s]

{'eval_loss': 0.01436071377247572, 'eval_runtime': 1.5329, 'eval_samples_per_second': 553.194, 'eval_steps_per_second': 17.613, 'epoch': 2.0}
{'loss': 0.0112, 'grad_norm': 0.06092393398284912, 'learning_rate': 3.073217726396917e-05, 'epoch': 2.02}
{'loss': 0.0082, 'grad_norm': 0.0731116309762001, 'learning_rate': 2.832369942196532e-05, 'epoch': 2.17}
{'loss': 0.0079, 'grad_norm': 0.08217909187078476, 'learning_rate': 2.5915221579961463e-05, 'epoch': 2.33}
{'loss': 0.0081, 'grad_norm': 0.09230402857065201, 'learning_rate': 2.3506743737957612e-05, 'epoch': 2.48}
{'loss': 0.0077, 'grad_norm': 0.06538791954517365, 'learning_rate': 2.1098265895953757e-05, 'epoch': 2.64}
{'loss': 0.008, 'grad_norm': 0.07004916667938232, 'learning_rate': 1.8689788053949906e-05, 'epoch': 2.8}
{'loss': 0.0078, 'grad_norm': 0.08804482966661453, 'learning_rate': 1.628131021194605e-05, 'epoch': 2.95}


  0%|          | 0/27 [00:00<?, ?it/s]

{'eval_loss': 0.016173992305994034, 'eval_runtime': 1.5555, 'eval_samples_per_second': 545.146, 'eval_steps_per_second': 17.357, 'epoch': 3.0}
{'loss': 0.0063, 'grad_norm': 0.05812007933855057, 'learning_rate': 1.3872832369942197e-05, 'epoch': 3.11}
{'loss': 0.0059, 'grad_norm': 0.06282991915941238, 'learning_rate': 1.1464354527938344e-05, 'epoch': 3.26}
{'loss': 0.0054, 'grad_norm': 0.04718885198235512, 'learning_rate': 9.05587668593449e-06, 'epoch': 3.42}
{'loss': 0.0052, 'grad_norm': 0.03190620616078377, 'learning_rate': 6.647398843930635e-06, 'epoch': 3.57}
{'loss': 0.0054, 'grad_norm': 0.04436526820063591, 'learning_rate': 4.238921001926782e-06, 'epoch': 3.73}
{'loss': 0.0052, 'grad_norm': 0.06063408777117729, 'learning_rate': 1.8304431599229288e-06, 'epoch': 3.88}


  0%|          | 0/27 [00:00<?, ?it/s]

{'eval_loss': 0.017019927501678467, 'eval_runtime': 1.5582, 'eval_samples_per_second': 544.231, 'eval_steps_per_second': 17.328, 'epoch': 4.0}
{'train_runtime': 195.5495, 'train_samples_per_second': 210.607, 'train_steps_per_second': 13.173, 'train_loss': 0.011425466328766775, 'epoch': 4.0}
DistilBERT training complete!


In [32]:
# Configurar device (GPU si está disponible)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Mover modelo a GPU
model_distilbert = model_distilbert.to(device)
model_distilbert.eval()

# Procesar por batches para evitar OOM (Out of Memory)
batch_size = 32
all_logits = []

with torch.no_grad():
    for i in range(0, len(X_va), batch_size):
        batch_texts = X_va[i:i+batch_size].tolist()
        
        # Tokenizar y mover a GPU
        val_inputs = tokenizer_distilbert(
            batch_texts, 
            truncation=True, 
            padding=True, 
            max_length=128, 
            return_tensors='pt'
        )
        val_inputs = {k: v.to(device) for k, v in val_inputs.items()}
        
        # Forward pass en GPU
        outputs = model_distilbert(**val_inputs)
        
        # Mover resultados a CPU y convertir a numpy
        logits_batch = torch.sigmoid(outputs.logits).cpu().numpy()
        all_logits.append(logits_batch)
        
        if (i // batch_size) % 10 == 0:
            print(f"Processed {i}/{len(X_va)} samples...", end='\r')

# Concatenar todos los batches
logits_distilbert = np.vstack(all_logits)
print(f"\nPredictions complete! Shape: {logits_distilbert.shape}")

# Optimizar thresholds (esta parte se mantiene en CPU)
ths_distilbert = np.zeros(logits_distilbert.shape[1])
for k in range(logits_distilbert.shape[1]):
    s = logits_distilbert[:, k]
    best_f1, best_t = 0.0, 0.0
    candidates = np.quantile(s, np.linspace(0.01, 0.99, 50))
    for t in candidates:
        preds_k = (s >= t).astype(int)
        f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
        if f1 > best_f1:
            best_f1, best_t = f1, t
    ths_distilbert[k] = best_t

pred_distilbert = (logits_distilbert >= ths_distilbert).astype(int)
metrics_distilbert = compute_metrics(y_va, pred_distilbert)
print(f"DistilBERT - F1: {metrics_distilbert['f1']:.4f}, Precision: {metrics_distilbert['precision']:.4f}, Recall: {metrics_distilbert['recall']:.4f}, Hamming: {metrics_distilbert['hamming_loss']:.4f}")

# Liberar memoria GPU
torch.cuda.empty_cache()

Using device: cuda
Processed 640/848 samples...
Predictions complete! Shape: (848, 18)
DistilBERT - F1: 0.6508, Precision: 0.6341, Recall: 0.6831, Hamming: 0.0927


In [33]:
# Guardar modelo DistilBERT
model_distilbert.save_pretrained("./distilbert_model")
tokenizer_distilbert.save_pretrained("./distilbert_model")
np.save("ths_distilbert.npy", ths_distilbert)
print("DistilBERT model saved!")

DistilBERT model saved!


## 8. DeBERTa-v3 (Top Model)

In [34]:
tokenizer_deberta = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
model_deberta = AutoModelForSequenceClassification.from_pretrained(
    "microsoft/deberta-v3-base",
    num_labels=len(mlb.classes_),
    problem_type="multi_label_classification"
)

train_dataset_deberta = MovieGenreDataset(X_tr, y_tr, tokenizer_deberta, max_length=256)
val_dataset_deberta = MovieGenreDataset(X_va, y_va, tokenizer_deberta, max_length=256)
print(f"DeBERTa datasets created")

Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DeBERTa datasets created


In [35]:
training_args_deberta = TrainingArguments(
    output_dir='./deberta_results',
    num_train_epochs=8,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    learning_rate=2e-5,
    warmup_ratio=0.1,
    weight_decay=0.01,
    logging_steps=100,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    fp16=True,
    gradient_accumulation_steps=2,
    label_smoothing_factor=0.1,
)

trainer_deberta = CustomTrainer(
    model=model_deberta,
    args=training_args_deberta,
    train_dataset=train_dataset_deberta,
    eval_dataset=val_dataset_deberta,
)

print("Training DeBERTa-v3 with Focal Loss + Label Smoothing (8 epochs)...")
trainer_deberta.train()
print("DeBERTa training complete!")

Training DeBERTa-v3 with Focal Loss + Label Smoothing (8 epochs)...


  0%|          | 0/5144 [00:00<?, ?it/s]

{'loss': 0.0397, 'grad_norm': 0.1116500273346901, 'learning_rate': 3.883495145631068e-06, 'epoch': 0.16}
{'loss': 0.0246, 'grad_norm': 0.07248342037200928, 'learning_rate': 7.766990291262136e-06, 'epoch': 0.31}
{'loss': 0.0214, 'grad_norm': 0.09353857487440109, 'learning_rate': 1.1650485436893204e-05, 'epoch': 0.47}
{'loss': 0.0194, 'grad_norm': 0.09741342812776566, 'learning_rate': 1.5533980582524273e-05, 'epoch': 0.62}
{'loss': 0.0185, 'grad_norm': 0.08825855702161789, 'learning_rate': 1.9417475728155343e-05, 'epoch': 0.78}
{'loss': 0.0176, 'grad_norm': 0.10684588551521301, 'learning_rate': 1.9632750054007345e-05, 'epoch': 0.93}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.01641504094004631, 'eval_runtime': 2.5902, 'eval_samples_per_second': 327.386, 'eval_steps_per_second': 20.462, 'epoch': 1.0}
{'loss': 0.0164, 'grad_norm': 0.14727799594402313, 'learning_rate': 1.9200691294015988e-05, 'epoch': 1.09}
{'loss': 0.0161, 'grad_norm': 0.09319683164358139, 'learning_rate': 1.876863253402463e-05, 'epoch': 1.24}
{'loss': 0.0156, 'grad_norm': 0.09917845577001572, 'learning_rate': 1.8336573774033272e-05, 'epoch': 1.4}
{'loss': 0.015, 'grad_norm': 0.08475015312433243, 'learning_rate': 1.790451501404191e-05, 'epoch': 1.55}
{'loss': 0.0147, 'grad_norm': 0.08682149648666382, 'learning_rate': 1.747245625405055e-05, 'epoch': 1.71}
{'loss': 0.0145, 'grad_norm': 0.08082383871078491, 'learning_rate': 1.7040397494059192e-05, 'epoch': 1.86}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.015101233497262001, 'eval_runtime': 2.6201, 'eval_samples_per_second': 323.653, 'eval_steps_per_second': 20.228, 'epoch': 2.0}
{'loss': 0.0145, 'grad_norm': 0.09119341522455215, 'learning_rate': 1.6608338734067835e-05, 'epoch': 2.02}
{'loss': 0.0128, 'grad_norm': 0.0995490625500679, 'learning_rate': 1.6176279974076477e-05, 'epoch': 2.18}
{'loss': 0.0124, 'grad_norm': 0.07082607597112656, 'learning_rate': 1.5744221214085116e-05, 'epoch': 2.33}
{'loss': 0.0125, 'grad_norm': 0.10868402570486069, 'learning_rate': 1.5312162454093758e-05, 'epoch': 2.49}
{'loss': 0.0122, 'grad_norm': 0.08975180983543396, 'learning_rate': 1.4880103694102399e-05, 'epoch': 2.64}
{'loss': 0.0127, 'grad_norm': 0.11685547232627869, 'learning_rate': 1.4448044934111041e-05, 'epoch': 2.8}
{'loss': 0.0124, 'grad_norm': 0.10165052115917206, 'learning_rate': 1.4015986174119682e-05, 'epoch': 2.95}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.015021542087197304, 'eval_runtime': 2.599, 'eval_samples_per_second': 326.281, 'eval_steps_per_second': 20.393, 'epoch': 3.0}
{'loss': 0.0113, 'grad_norm': 0.0949220284819603, 'learning_rate': 1.3583927414128322e-05, 'epoch': 3.11}
{'loss': 0.0107, 'grad_norm': 0.11432980000972748, 'learning_rate': 1.3151868654136963e-05, 'epoch': 3.26}
{'loss': 0.0106, 'grad_norm': 0.08933494985103607, 'learning_rate': 1.2719809894145605e-05, 'epoch': 3.42}
{'loss': 0.0102, 'grad_norm': 0.09144226461648941, 'learning_rate': 1.2287751134154246e-05, 'epoch': 3.57}
{'loss': 0.0106, 'grad_norm': 0.07471466809511185, 'learning_rate': 1.1855692374162888e-05, 'epoch': 3.73}
{'loss': 0.0103, 'grad_norm': 0.10270533710718155, 'learning_rate': 1.1423633614171527e-05, 'epoch': 3.89}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.015729080885648727, 'eval_runtime': 2.6696, 'eval_samples_per_second': 317.647, 'eval_steps_per_second': 19.853, 'epoch': 4.0}
{'loss': 0.0102, 'grad_norm': 0.06482283025979996, 'learning_rate': 1.099157485418017e-05, 'epoch': 4.04}
{'loss': 0.0089, 'grad_norm': 0.10230880230665207, 'learning_rate': 1.055951609418881e-05, 'epoch': 4.2}
{'loss': 0.0089, 'grad_norm': 0.07946794480085373, 'learning_rate': 1.0127457334197452e-05, 'epoch': 4.35}
{'loss': 0.0089, 'grad_norm': 0.09211118519306183, 'learning_rate': 9.695398574206093e-06, 'epoch': 4.51}
{'loss': 0.009, 'grad_norm': 0.08780544996261597, 'learning_rate': 9.263339814214734e-06, 'epoch': 4.66}
{'loss': 0.0091, 'grad_norm': 0.09670824557542801, 'learning_rate': 8.831281054223376e-06, 'epoch': 4.82}
{'loss': 0.0087, 'grad_norm': 0.11529359966516495, 'learning_rate': 8.399222294232016e-06, 'epoch': 4.97}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.016008129343390465, 'eval_runtime': 2.6157, 'eval_samples_per_second': 324.2, 'eval_steps_per_second': 20.263, 'epoch': 5.0}
{'loss': 0.008, 'grad_norm': 0.08440888673067093, 'learning_rate': 7.967163534240657e-06, 'epoch': 5.13}
{'loss': 0.0079, 'grad_norm': 0.08753979206085205, 'learning_rate': 7.5351047742492986e-06, 'epoch': 5.28}
{'loss': 0.0076, 'grad_norm': 0.14622923731803894, 'learning_rate': 7.103046014257939e-06, 'epoch': 5.44}
{'loss': 0.0072, 'grad_norm': 0.08542724698781967, 'learning_rate': 6.670987254266581e-06, 'epoch': 5.59}
{'loss': 0.0074, 'grad_norm': 0.10288293659687042, 'learning_rate': 6.238928494275221e-06, 'epoch': 5.75}
{'loss': 0.0075, 'grad_norm': 0.12135576456785202, 'learning_rate': 5.8068697342838636e-06, 'epoch': 5.91}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.017079150304198265, 'eval_runtime': 2.644, 'eval_samples_per_second': 320.732, 'eval_steps_per_second': 20.046, 'epoch': 6.0}
{'loss': 0.0072, 'grad_norm': 0.09334345161914825, 'learning_rate': 5.374810974292505e-06, 'epoch': 6.06}
{'loss': 0.0067, 'grad_norm': 0.1035103127360344, 'learning_rate': 4.942752214301146e-06, 'epoch': 6.22}
{'loss': 0.0066, 'grad_norm': 0.11541051417589188, 'learning_rate': 4.510693454309786e-06, 'epoch': 6.37}
{'loss': 0.0065, 'grad_norm': 0.15289857983589172, 'learning_rate': 4.078634694318428e-06, 'epoch': 6.53}
{'loss': 0.0063, 'grad_norm': 0.10909826308488846, 'learning_rate': 3.646575934327069e-06, 'epoch': 6.68}
{'loss': 0.0065, 'grad_norm': 0.08515895903110504, 'learning_rate': 3.2145171743357102e-06, 'epoch': 6.84}
{'loss': 0.0065, 'grad_norm': 0.09251238405704498, 'learning_rate': 2.7824584143443513e-06, 'epoch': 6.99}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.017723752185702324, 'eval_runtime': 2.6116, 'eval_samples_per_second': 324.705, 'eval_steps_per_second': 20.294, 'epoch': 7.0}
{'loss': 0.0063, 'grad_norm': 0.07889391481876373, 'learning_rate': 2.3503996543529923e-06, 'epoch': 7.15}
{'loss': 0.0055, 'grad_norm': 0.0853276252746582, 'learning_rate': 1.9183408943616333e-06, 'epoch': 7.3}
{'loss': 0.0058, 'grad_norm': 0.10247398912906647, 'learning_rate': 1.4862821343702744e-06, 'epoch': 7.46}
{'loss': 0.0059, 'grad_norm': 0.11821724474430084, 'learning_rate': 1.0542233743789156e-06, 'epoch': 7.61}
{'loss': 0.0059, 'grad_norm': 0.10575602203607559, 'learning_rate': 6.221646143875568e-07, 'epoch': 7.77}
{'loss': 0.006, 'grad_norm': 0.09205249696969986, 'learning_rate': 1.901058543961979e-07, 'epoch': 7.93}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.018022729083895683, 'eval_runtime': 2.6007, 'eval_samples_per_second': 326.06, 'eval_steps_per_second': 20.379, 'epoch': 7.99}
{'train_runtime': 1083.3609, 'train_samples_per_second': 76.03, 'train_steps_per_second': 4.748, 'train_loss': 0.011278900750919968, 'epoch': 7.99}
DeBERTa training complete!


In [36]:
# Configurar device (GPU si está disponible)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Mover modelo a GPU
model_deberta = model_deberta.to(device)
model_deberta.eval()

# Procesar por batches para evitar OOM (DeBERTa usa max_length=256, más memoria)
batch_size = 16  # Más pequeño que DistilBERT debido a mayor tamaño de modelo
all_logits = []

with torch.no_grad():
    for i in range(0, len(X_va), batch_size):
        batch_texts = X_va[i:i+batch_size].tolist()
        
        # Tokenizar y mover a GPU
        val_inputs = tokenizer_deberta(
            batch_texts, 
            truncation=True, 
            padding=True, 
            max_length=256, 
            return_tensors='pt'
        )
        val_inputs = {k: v.to(device) for k, v in val_inputs.items()}
        
        # Forward pass en GPU
        outputs = model_deberta(**val_inputs)
        
        # Mover resultados a CPU y convertir a numpy
        logits_batch = torch.sigmoid(outputs.logits).cpu().numpy()
        all_logits.append(logits_batch)
        
        if (i // batch_size) % 10 == 0:
            print(f"Processed {i}/{len(X_va)} samples...", end='\r')

# Concatenar todos los batches
logits_deberta = np.vstack(all_logits)
print(f"\nPredictions complete! Shape: {logits_deberta.shape}")

# Optimizar thresholds (esta parte se mantiene en CPU)
ths_deberta = np.zeros(logits_deberta.shape[1])
for k in range(logits_deberta.shape[1]):
    s = logits_deberta[:, k]
    best_f1, best_t = 0.0, 0.0
    candidates = np.quantile(s, np.linspace(0.01, 0.99, 50))
    for t in candidates:
        preds_k = (s >= t).astype(int)
        f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
        if f1 > best_f1:
            best_f1, best_t = f1, t
    ths_deberta[k] = best_t

pred_deberta = (logits_deberta >= ths_deberta).astype(int)
metrics_deberta = compute_metrics(y_va, pred_deberta)
print(f"DeBERTa-v3 - F1: {metrics_deberta['f1']:.4f}, Precision: {metrics_deberta['precision']:.4f}, Recall: {metrics_deberta['recall']:.4f}, Hamming: {metrics_deberta['hamming_loss']:.4f}")

# Liberar memoria GPU
torch.cuda.empty_cache()

Using device: cuda
Processed 800/848 samples...
Predictions complete! Shape: (848, 18)
DeBERTa-v3 - F1: 0.6385, Precision: 0.6263, Recall: 0.6805, Hamming: 0.0993


In [37]:
# Guardar modelo DeBERTa
model_deberta.save_pretrained("./deberta_model")
tokenizer_deberta.save_pretrained("./deberta_model")
np.save("ths_deberta.npy", ths_deberta)
print("DeBERTa model saved!")

DeBERTa model saved!


## 9. Ensemble con Stacking (Meta-learner)

In [38]:
from sklearn.linear_model import RidgeClassifierCV

# NOTA IMPORTANTE: Stacking ideal requiere predicciones out-of-fold de TODOS los modelos
# Los transformers (DistilBERT/DeBERTa) no tienen OOF fácilmente disponible
# Por simplicidad y evitar data leakage, usamos ensemble ponderado optimizado como principal
# Guardamos el código de stacking pero es OPCIONAL y puede tener ligero overfitting

# Stack todos los logits de VALIDACIÓN
stacked_features_val = np.column_stack([
    logits_deberta, 
    logits_distilbert, 
    logits_logreg, 
    logits_xgb, 
    logits_svc_cal
])

print(f"Stacked features shape: {stacked_features_val.shape}")

# Meta-learner con Ridge Regression (regularizado para reducir overfitting)
meta_model = OneVsRestClassifier(
    RidgeClassifierCV(alphas=[0.1, 0.5, 1.0, 5.0, 10.0], cv=3),
    n_jobs=-1
)

print("Training stacking meta-model...")
print("⚠️  WARNING: Training on validation set (not ideal but validation is small)")
meta_model.fit(stacked_features_val, y_va)

# Predicciones del meta-model
pred_stacking = meta_model.predict(stacked_features_val)
metrics_stacking = compute_metrics(y_va, pred_stacking)
print(f"Stacking Ensemble - F1: {metrics_stacking['f1']:.4f}, Precision: {metrics_stacking['precision']:.4f}, Recall: {metrics_stacking['recall']:.4f}, Hamming: {metrics_stacking['hamming_loss']:.4f}")
print(f"⚠️  These metrics may be optimistic - prefer Weighted Ensemble metrics for true performance")

Stacked features shape: (848, 90)
Training stacking meta-model...
Stacking Ensemble - F1: 0.6637, Precision: 0.7883, Recall: 0.5855, Hamming: 0.0710
⚠️  These metrics may be optimistic - prefer Weighted Ensemble metrics for true performance


In [39]:
# Guardar meta-model (opcional, prefer weighted ensemble)
joblib.dump(meta_model, "meta_model_stacking.joblib")
print("Stacking meta-model saved (use with caution - may be overfit)")

Stacking meta-model saved (use with caution - may be overfit)


## 10. Optimización de Pesos con Optuna

In [None]:
import optuna
from sklearn.metrics import hamming_loss

def objective_with_hamming(trial):
    w_deberta = trial.suggest_float("w_deberta", 0.3, 0.6)
    w_distilbert = trial.suggest_float("w_distilbert", 0.1, 0.4)
    w_logreg = trial.suggest_float("w_logreg", 0.1, 0.3)
    w_xgb = trial.suggest_float("w_xgb", 0.05, 0.25)
    w_svc = max(0.0, 1.0 - w_deberta - w_distilbert - w_logreg - w_xgb)
    
    ensemble_logits_opt = (w_deberta * logits_deberta + 
                           w_distilbert * logits_distilbert + 
                           w_logreg * logits_logreg + 
                           w_xgb * logits_xgb + 
                           w_svc * logits_svc_cal)
    
    # Optimizar thresholds
    ths_opt = np.zeros(ensemble_logits_opt.shape[1])
    for k in range(ensemble_logits_opt.shape[1]):
        s = ensemble_logits_opt[:, k]
        best_f1, best_t = 0.0, 0.0
        candidates = np.quantile(s, np.linspace(0.01, 0.99, 50))
        for t in candidates:
            preds_k = (s >= t).astype(int)
            f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
            if f1 > best_f1:
                best_f1, best_t = f1, t
        ths_opt[k] = best_t
    
    pred_opt = (ensemble_logits_opt >= ths_opt).astype(int)
    
    # Combinar F1 macro y Hamming Loss
    f1_macro = f1_score(y_va, pred_opt, average='macro')
    hamming = hamming_loss(y_va, pred_opt)
    
    return f1_macro - 0.3 * hamming

print("Optimizing ensemble weights (F1 macro - Hamming Loss)...")
study = optuna.create_study(direction="maximize")
study.optimize(objective_with_hamming, n_trials=50, show_progress_bar=True)

print(f"\nBest score (F1 - 0.3*Hamming): {study.best_value:.4f}")
print("Best weights:", study.best_params)

In [45]:

w_deberta_opt = 0.4
w_distilbert_opt = 0.4
w_logreg_opt = 0.1
w_xgb_opt = 0.1
w_svc_opt = 1.0 - w_deberta_opt - w_distilbert_opt - w_logreg_opt - w_xgb_opt

ensemble_optimized = (w_deberta_opt * logits_deberta + 
                      w_distilbert_opt * logits_distilbert + 
                      w_logreg_opt * logits_logreg + 
                      w_xgb_opt * logits_xgb + 
                      w_svc_opt * logits_svc_cal)

# Optimización de thresholds con búsqueda más precisa
ths_optimized = np.zeros(ensemble_optimized.shape[1])
for k in range(ensemble_optimized.shape[1]):
    s = ensemble_optimized[:, k]
    best_f1, best_t = 0.0, 0.0
    candidates = np.unique(np.quantile(s, np.linspace(0, 1, 100)))
    for t in candidates:
        preds_k = (s >= t).astype(int)
        f1 = f1_score(y_va[:, k], preds_k, zero_division=0)
        if f1 > best_f1:
            best_f1, best_t = f1, t
    ths_optimized[k] = best_t

# Ajuste para reducir Hamming Loss
avg_labels_train = y_va.sum(axis=1).mean()
for iteration in range(10):
    pred_optimized = (ensemble_optimized >= ths_optimized).astype(int)
    current_avg = pred_optimized.sum(axis=1).mean()
    
    if abs(current_avg - avg_labels_train) < 0.1:
        break
    
    if current_avg > avg_labels_train:
        ths_optimized *= 1.02
    else:
        ths_optimized *= 0.98

pred_optimized = (ensemble_optimized >= ths_optimized).astype(int)
metrics_optimized = compute_metrics(y_va, pred_optimized)
print(f"Optimized Ensemble - F1: {metrics_optimized['f1']:.4f}, Precision: {metrics_optimized['precision']:.4f}, Recall: {metrics_optimized['recall']:.4f}, Hamming: {metrics_optimized['hamming_loss']:.4f}")

Optimized Ensemble - F1: 0.6519, Precision: 0.6733, Recall: 0.6602, Hamming: 0.0903


## 11. Test Time Augmentation para DeBERTa

In [46]:
def tta_predict_deberta(texts, model, tokenizer, n_augmentations=3):
    all_predictions = []
    
    model.eval()
    with torch.no_grad():
        test_inputs = tokenizer(texts, truncation=True, padding=True, max_length=256, return_tensors='pt')
        outputs = model(**test_inputs)
        all_predictions.append(torch.sigmoid(outputs.logits).cpu().numpy())
    
    for _ in range(n_augmentations):
        model.train()
        with torch.no_grad():
            test_inputs = tokenizer(texts, truncation=True, padding=True, max_length=256, return_tensors='pt')
            outputs = model(**test_inputs)
            all_predictions.append(torch.sigmoid(outputs.logits).cpu().numpy())
    
    return np.mean(all_predictions, axis=0)

In [47]:
# Cargar dataset de test
df_test = pd.read_csv(test_dir)
df_test["text"] = df_test["movie_name"].fillna("") + " [SEP] " + df_test["description"].fillna("")
print(f"Test dataset size: {len(df_test)}")

Test dataset size: 942


## 12. Generación de Predicciones Individuales en Test

In [48]:
# Generar features de test (SIN augmentation)
print("Generating TF-IDF features for test...")
Xw_test = tfidf_word.transform(df_test["text"])
Xc_test = tfidf_char.transform(df_test["text"])
X_test_tfidf = sp_hstack([Xw_test, Xc_test], format="csr")

print("Generating BGE embeddings for test...")
emb_test = st_model.encode(df_test["text"].tolist(), show_progress_bar=True, batch_size=16, normalize_embeddings=True)
X_test_combined = sp_hstack([X_test_tfidf, csr_matrix(emb_test)], format="csr")
print(f"Test features shape: {X_test_combined.shape}")

Generating TF-IDF features for test...
Generating BGE embeddings for test...


Batches:   0%|          | 0/59 [00:00<?, ?it/s]

Test features shape: (942, 268740)


In [49]:
# 1. LogisticRegression predictions
print("Generating LogReg predictions...")
logits_logreg_test = clf_logreg.decision_function(X_test_combined)
pred_logreg_test = (logits_logreg_test >= ths_logreg).astype(int)

pred_labels_logreg = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_logreg_test]
result_logreg = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_logreg,
    "description": df_test["description"]
})
result_logreg.to_csv("dataset_test_preds_logreg.csv", index=False)
print(f"✓ LogReg predictions saved: dataset_test_preds_logreg.csv")

Generating LogReg predictions...
✓ LogReg predictions saved: dataset_test_preds_logreg.csv


In [50]:
# 2. XGBoost predictions
print("Generating XGBoost predictions...")
pred_proba_xgb_test = clf_xgb.predict_proba(emb_test)
logits_xgb_test = np.column_stack([p[:, 1] for p in pred_proba_xgb_test])
pred_xgb_test = (logits_xgb_test >= ths_xgb).astype(int)

pred_labels_xgb = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_xgb_test]
result_xgb = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_xgb,
    "description": df_test["description"]
})
result_xgb.to_csv("dataset_test_preds_xgb.csv", index=False)
print(f"✓ XGBoost predictions saved: dataset_test_preds_xgb.csv")

Generating XGBoost predictions...
✓ XGBoost predictions saved: dataset_test_preds_xgb.csv


In [51]:
# 3. LinearSVC predictions
print("Generating LinearSVC predictions...")
logits_svc_test = clf_svc.decision_function(X_test_tfidf)
pred_svc_test = (logits_svc_test >= ths_svc).astype(int)

pred_labels_svc = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_svc_test]
result_svc = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_svc,
    "description": df_test["description"]
})
result_svc.to_csv("dataset_test_preds_svc.csv", index=False)
print(f"✓ LinearSVC predictions saved: dataset_test_preds_svc.csv")

Generating LinearSVC predictions...
✓ LinearSVC predictions saved: dataset_test_preds_svc.csv


In [52]:
# 4. Calibrated SVC predictions
print("Generating Calibrated SVC predictions...")
logits_svc_test_cal = np.column_stack([
    clf_svc_calibrated.predict_proba(X_test_tfidf)[:, :, 1].T
])
pred_svc_cal_test = (logits_svc_test_cal >= ths_svc_cal).astype(int)

pred_labels_svc_cal = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_svc_cal_test]
result_svc_cal = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_svc_cal,
    "description": df_test["description"]
})
result_svc_cal.to_csv("dataset_test_preds_svc_calibrated.csv", index=False)
print(f"✓ Calibrated SVC predictions saved: dataset_test_preds_svc_calibrated.csv")

Generating Calibrated SVC predictions...


AttributeError: 'list' object has no attribute 'predict_proba'

In [54]:
# 5. DistilBERT predictions
print("Generating DistilBERT predictions on TEST set...")

# Configurar device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Mover modelo a GPU
model_distilbert = model_distilbert.to(device)
model_distilbert.eval()

# Procesar por batches
batch_size = 32
all_logits = []

with torch.no_grad():
    for i in range(0, len(df_test), batch_size):
        batch_texts = df_test["text"].iloc[i:i+batch_size].tolist()
        
        # Tokenizar y mover a GPU
        test_inputs = tokenizer_distilbert(
            batch_texts,
            truncation=True,
            padding=True,
            max_length=128,
            return_tensors='pt'
        )
        test_inputs = {k: v.to(device) for k, v in test_inputs.items()}
        
        # Forward pass en GPU
        outputs = model_distilbert(**test_inputs)
        
        # Mover resultados a CPU
        logits_batch = torch.sigmoid(outputs.logits).cpu().numpy()
        all_logits.append(logits_batch)
        
        if (i // batch_size) % 20 == 0:
            print(f"Processed {i}/{len(df_test)} test samples...", end='\r')

# Concatenar todos los batches
logits_distilbert_test = np.vstack(all_logits)
print(f"\nTest predictions complete! Shape: {logits_distilbert_test.shape}")

# Aplicar thresholds optimizados
pred_distilbert_test = (logits_distilbert_test >= ths_distilbert).astype(int)

# Generar labels
pred_labels_distilbert = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_distilbert_test]
result_distilbert = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_distilbert,
    "description": df_test["description"]
})
result_distilbert.to_csv("dataset_test_preds_distilbert.csv", index=False)
print(f"✓ DistilBERT predictions saved: dataset_test_preds_distilbert.csv")

# Liberar memoria GPU
torch.cuda.empty_cache()

Generating DistilBERT predictions on TEST set...
Using device: cuda
Processed 640/942 test samples...
Test predictions complete! Shape: (942, 18)
✓ DistilBERT predictions saved: dataset_test_preds_distilbert.csv


In [56]:
def tta_predict_deberta(texts, model, tokenizer, n_augmentations=3, batch_size=16):
    """
    Test-Time Augmentation para DeBERTa con GPU y batch processing
    
    Args:
        texts: Lista de textos a predecir
        model: Modelo DeBERTa
        tokenizer: Tokenizer de DeBERTa
        n_augmentations: Número de augmentaciones (dropout variations)
        batch_size: Tamaño de batch (16 por defecto para DeBERTa)
    """
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    
    all_predictions = []
    
    # Primera predicción: modo eval (sin dropout)
    model.eval()
    batch_preds = []
    with torch.no_grad():
        for i in range(0, len(texts), batch_size):
            batch_texts = texts[i:i+batch_size]
            inputs = tokenizer(
                batch_texts, 
                truncation=True, 
                padding=True, 
                max_length=256, 
                return_tensors='pt'
            )
            inputs = {k: v.to(device) for k, v in inputs.items()}
            outputs = model(**inputs)
            logits = torch.sigmoid(outputs.logits).cpu().numpy()
            batch_preds.append(logits)
            
            if (i // batch_size) % 10 == 0:
                print(f"TTA base prediction: {i}/{len(texts)} samples...", end='\r')
    
    all_predictions.append(np.vstack(batch_preds))
    print(f"\nTTA base prediction complete!")
    
    # Augmentaciones adicionales: modo train (con dropout activado)
    for aug_idx in range(n_augmentations):
        model.train()  # Activa dropout para variabilidad
        batch_preds = []
        
        with torch.no_grad():
            for i in range(0, len(texts), batch_size):
                batch_texts = texts[i:i+batch_size]
                inputs = tokenizer(
                    batch_texts, 
                    truncation=True, 
                    padding=True, 
                    max_length=256, 
                    return_tensors='pt'
                )
                inputs = {k: v.to(device) for k, v in inputs.items()}
                outputs = model(**inputs)
                logits = torch.sigmoid(outputs.logits).cpu().numpy()
                batch_preds.append(logits)
                
                if (i // batch_size) % 10 == 0:
                    print(f"TTA augmentation {aug_idx+1}/{n_augmentations}: {i}/{len(texts)} samples...", end='\r')
        
        all_predictions.append(np.vstack(batch_preds))
        print(f"\nTTA augmentation {aug_idx+1}/{n_augmentations} complete!")
    
    # Promediar todas las predicciones
    final_predictions = np.mean(all_predictions, axis=0)
    
    # Liberar memoria GPU
    torch.cuda.empty_cache()
    
    return final_predictions

# 6. DeBERTa predictions with TTA
print("Generating DeBERTa predictions with TTA on TEST set...")
print(f"Test dataset size: {len(df_test)}")

# Usar TTA optimizada
logits_deberta_test_tta = tta_predict_deberta(
    df_test["text"].tolist(), 
    model_deberta, 
    tokenizer_deberta,
    n_augmentations=3,
    batch_size=16  # Ajusta según tu GPU (8 para <6GB, 24 para >12GB)
)

print(f"TTA predictions complete! Shape: {logits_deberta_test_tta.shape}")

# Aplicar thresholds optimizados
pred_deberta_test = (logits_deberta_test_tta >= ths_deberta).astype(int)

# Generar labels
pred_labels_deberta = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_deberta_test]
result_deberta = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_deberta,
    "description": df_test["description"]
})
result_deberta.to_csv("dataset_test_preds_deberta.csv", index=False)
print(f"✓ DeBERTa predictions saved: dataset_test_preds_deberta.csv")

# Liberar memoria GPU final
torch.cuda.empty_cache()

Generating DeBERTa predictions with TTA on TEST set...
Test dataset size: 942
TTA base prediction: 800/942 samples...
TTA base prediction complete!
TTA augmentation 1/3: 800/942 samples...
TTA augmentation 1/3 complete!
TTA augmentation 2/3: 800/942 samples...
TTA augmentation 2/3 complete!
TTA augmentation 3/3: 800/942 samples...
TTA augmentation 3/3 complete!
TTA predictions complete! Shape: (942, 18)
✓ DeBERTa predictions saved: dataset_test_preds_deberta.csv


## 13. Ensemble Final - Selección del Mejor

In [58]:
# Crear ensemble ponderado optimizado
print("Creating optimized weighted ensemble...")
ensemble_optimized_test = (w_deberta_opt * logits_deberta_test_tta + 
                           w_distilbert_opt * logits_distilbert_test + 
                           w_logreg_opt * logits_logreg_test + 
                           w_xgb_opt * logits_xgb_test)

pred_optimized_test = (ensemble_optimized_test >= ths_optimized).astype(int)

pred_labels_optimized = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_optimized_test]
result_optimized = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_optimized,
    "description": df_test["description"]
})
result_optimized.to_csv("dataset_test_preds_weighted_ensemble.csv", index=False)
print(f"✓ Weighted Ensemble predictions saved: dataset_test_preds_weighted_ensemble.csv")

Creating optimized weighted ensemble...
✓ Weighted Ensemble predictions saved: dataset_test_preds_weighted_ensemble.csv


In [60]:
# Crear stacking ensemble
print("Creating stacking ensemble...")
stacked_features_test = np.column_stack([
    logits_deberta_test_tta,
    logits_distilbert_test,
    logits_logreg_test,
    logits_xgb_test
])

pred_stacking_test = meta_model.predict(stacked_features_test)

pred_labels_stacking = [", ".join([mlb.classes_[j] for j, v in enumerate(row) if v == 1]) for row in pred_stacking_test]
result_stacking = pd.DataFrame({
    "movie_name": df_test["movie_name"],
    "genre": pred_labels_stacking,
    "description": df_test["description"]
})
result_stacking.to_csv("dataset_test_preds_stacking_ensemble.csv", index=False)
print(f"✓ Stacking Ensemble predictions saved: dataset_test_preds_stacking_ensemble.csv")

Creating stacking ensemble...


ValueError: X has 72 features, but RidgeClassifierCV is expecting 90 features as input.

In [61]:
# Seleccionar ensemble final para submission
print("="*80)
print("SELECTING BEST ENSEMBLE FOR FINAL SUBMISSION")
print("="*80)
print(f"Stacking Ensemble Validation F1: {metrics_stacking['f1']:.4f} (may be optimistic)")
print(f"Weighted Ensemble Validation F1: {metrics_optimized['f1']:.4f} (more reliable)")
print("="*80)

# USAR SIEMPRE WEIGHTED ENSEMBLE (más confiable, sin data leakage)
print(f"\n✓ USING WEIGHTED ENSEMBLE (F1={metrics_optimized['f1']:.4f}) for final submission")
print("  Reason: Optimized with Optuna on clean validation set, no data leakage")
final_submission = result_optimized.copy()
final_submission.to_csv("dataset_test_preds.csv", index=False)

print(f"\n✓✓✓ FINAL SUBMISSION saved: dataset_test_preds.csv ✓✓✓")
print("="*80)

SELECTING BEST ENSEMBLE FOR FINAL SUBMISSION
Stacking Ensemble Validation F1: 0.6637 (may be optimistic)
Weighted Ensemble Validation F1: 0.6519 (more reliable)

✓ USING WEIGHTED ENSEMBLE (F1=0.6519) for final submission
  Reason: Optimized with Optuna on clean validation set, no data leakage

✓✓✓ FINAL SUBMISSION saved: dataset_test_preds.csv ✓✓✓


## 14. Resumen Final de Resultados

In [62]:
print("="*80)
print("VALIDATION PERFORMANCE SUMMARY")
print("="*80)
print(f"1. LogReg (TF-IDF+BGE):         F1: {metrics_logreg['f1']:.4f}, Hamming: {metrics_logreg['hamming_loss']:.4f}")
print(f"2. XGBoost (BGE Embeddings):    F1: {metrics_xgb['f1']:.4f}, Hamming: {metrics_xgb['hamming_loss']:.4f}")
print(f"3. LinearSVC (TF-IDF):          F1: {metrics_svc['f1']:.4f}, Hamming: {metrics_svc['hamming_loss']:.4f}")
print(f"4. Calibrated SVC:              F1: {metrics_svc_cal['f1']:.4f}, Hamming: {metrics_svc_cal['hamming_loss']:.4f}")
print(f"5. DistilBERT (Focal+Smooth):   F1: {metrics_distilbert['f1']:.4f}, Hamming: {metrics_distilbert['hamming_loss']:.4f}")
print(f"6. DeBERTa-v3 (Focal+Smooth):   F1: {metrics_deberta['f1']:.4f}, Hamming: {metrics_deberta['hamming_loss']:.4f}")
print(f"7. STACKING Ensemble:           F1: {metrics_stacking['f1']:.4f}, Hamming: {metrics_stacking['hamming_loss']:.4f} ⚠️")
print(f"8. WEIGHTED Ensemble (Optuna):  F1: {metrics_optimized['f1']:.4f}, Hamming: {metrics_optimized['hamming_loss']:.4f} ✓")
print("="*80)
print(f"\n✓ Data Augmentation: 35% of train only (validation untouched)")
print(f"✓ TF-IDF: n-grams (1,4) word + (3,6) char, max_features=750k")
print(f"✓ Embeddings: BGE-Large-en-v1.5 (1024 dim, normalized)")
print(f"✓ Loss: Focal Loss (alpha=0.25, gamma=2.0) + Label Smoothing (0.1)")
print(f"✓ Calibration: SVC with sigmoid on train (cv=3)")
print(f"✓ Ensemble: Weighted optimized with Optuna (F1 - 0.3*Hamming, 50 trials)")
print(f"✓ Metrics: All calculated with validator.py compute_metrics()")
print(f"\n⚠️  Stacking trained on validation (may overfit) - Weighted Ensemble preferred")
print("="*80)

VALIDATION PERFORMANCE SUMMARY
1. LogReg (TF-IDF+BGE):         F1: 0.6345, Hamming: 0.1021
2. XGBoost (BGE Embeddings):    F1: 0.6224, Hamming: 0.1046
3. LinearSVC (TF-IDF):          F1: 0.5655, Hamming: 0.1238
4. Calibrated SVC:              F1: 0.5739, Hamming: 0.1203
5. DistilBERT (Focal+Smooth):   F1: 0.6508, Hamming: 0.0927
6. DeBERTa-v3 (Focal+Smooth):   F1: 0.6385, Hamming: 0.0993
7. STACKING Ensemble:           F1: 0.6637, Hamming: 0.0710 ⚠️
8. WEIGHTED Ensemble (Optuna):  F1: 0.6519, Hamming: 0.0903 ✓

✓ Data Augmentation: 35% of train only (validation untouched)
✓ TF-IDF: n-grams (1,4) word + (3,6) char, max_features=750k
✓ Embeddings: BGE-Large-en-v1.5 (1024 dim, normalized)
✓ Loss: Focal Loss (alpha=0.25, gamma=2.0) + Label Smoothing (0.1)
✓ Calibration: SVC with sigmoid on train (cv=3)
✓ Ensemble: Weighted optimized with Optuna (F1 - 0.3*Hamming, 50 trials)
✓ Metrics: All calculated with validator.py compute_metrics()

⚠️  Stacking trained on validation (may overfit) - Wei

In [63]:
print("\n" + "="*80)
print("ARCHIVOS CSV GENERADOS PARA CADA MODELO:")
print("="*80)
print("1. dataset_test_preds_logreg.csv")
print("2. dataset_test_preds_xgb.csv")
print("3. dataset_test_preds_svc.csv")
print("4. dataset_test_preds_svc_calibrated.csv")
print("5. dataset_test_preds_distilbert.csv")
print("6. dataset_test_preds_deberta.csv")
print("7. dataset_test_preds_weighted_ensemble.csv")
print("8. dataset_test_preds_stacking_ensemble.csv")
print("9. dataset_test_preds.csv (MEJOR ENSEMBLE - ENVIAR ESTE)")
print("="*80)


ARCHIVOS CSV GENERADOS PARA CADA MODELO:
1. dataset_test_preds_logreg.csv
2. dataset_test_preds_xgb.csv
3. dataset_test_preds_svc.csv
4. dataset_test_preds_svc_calibrated.csv
5. dataset_test_preds_distilbert.csv
6. dataset_test_preds_deberta.csv
7. dataset_test_preds_weighted_ensemble.csv
8. dataset_test_preds_stacking_ensemble.csv
9. dataset_test_preds.csv (MEJOR ENSEMBLE - ENVIAR ESTE)


## 📝 CÓMO SE GENERA EL ARCHIVO FINAL

El archivo **`dataset_test_preds.csv`** se genera así:

1. **Se entrenan 6 modelos** en el dataset de train:
   - LogisticRegression, XGBoost, LinearSVC, Calibrated SVC, DistilBERT, DeBERTa

2. **Cada modelo genera predicciones en test** → 6 archivos CSV individuales

3. **Se crean 2 ensembles**:
   - **Weighted Ensemble**: Combina los 6 modelos con pesos optimizados por Optuna
   - **Stacking Ensemble**: Usa un meta-modelo Ridge para combinar predicciones

4. **Se selecciona el mejor ensemble** basado en F1 score de validación:
   - Si `metrics_stacking['f1'] > metrics_optimized['f1']` → usa Stacking
   - Si no → usa Weighted Ensemble (recomendado, más confiable)

5. **El mejor ensemble se guarda como `dataset_test_preds.csv`** ← **ESTE ES EL ARCHIVO FINAL PARA ENVIAR**

**Resumen de archivos CSV generados:**
- 6 archivos individuales por modelo (para análisis)
- 2 archivos de ensemble (weighted y stacking)
- **1 archivo final: `dataset_test_preds.csv`** ✅ ← Enviar este