# Evaluación de Modelos

Este cuaderno carga los conjuntos de datos de entrenamiento y prueba, y a continuación evalúa los siguientes modelos:
1. **Modelos Básicos**: Random Forest, Naive Bayes, Logistic Regression, Linear SVM.
2. **Modelo Transformer**: DeBERTa-v3-large.

Si los modelos no existen previamente, se entrenarán automáticamente.

In [9]:
import pandas as pd
import os
import torch
import shutil
from typing import List
from transformers import AutoModelForSequenceClassification, AutoTokenizer

from src.genre_classification.F_Basic_Models import Basic_Models
from src.genre_classification.F_Dataset_Downloader import Dataset_Downloader
from src.genre_classification.F_Pretrained_models import Pretrained
from src.genre_classification.F_Compute_Metrics import Compute_Metrics

## 1. Carga de Datos
Cargamos los datos de entrenamiento y prueba. El set de entrenamiento sólo se usa si es necesario entrenar algún modelo.

In [10]:
dataset_downloader = Dataset_Downloader()
train_path, test_path = dataset_downloader(overwrite=False)

print(f"Cargando Train: {train_path}")
print(f"Cargando Test: {test_path}")

train_data = pd.read_csv(train_path)
test_data = pd.read_csv(test_path)

x_train, y_train = train_data.drop(columns=["genre"]), train_data["genre"]
x_test, y_test = test_data.drop(columns=["genre"]), test_data["genre"]

unique_labels = sorted(list(set(y_train)))
print(f"Etiquetas: {unique_labels}")

Cargando Train: C:\Users\alber\Desktop\CUARTO CURSO\PRIMER CUATRIMESTRE\Procesamiento del lenguaje natural II\Practica 1 NLP II\NLP_II_Practica1\datasets\dataset_train.csv
Cargando Test: C:\Users\alber\Desktop\CUARTO CURSO\PRIMER CUATRIMESTRE\Procesamiento del lenguaje natural II\Practica 1 NLP II\NLP_II_Practica1\datasets\dataset_test.csv
Etiquetas: ['action_adventure', 'comedy_family', 'documentary_factual', 'drama_romance', 'scifi_horror_fantasy', 'suspense_crime']


## 2. Modelos Básicos
Iteramos sobre cada uno de los tipos de modelos básicos. Si el modelo ya está guardado, lo cargamos. Si no, lo entrenamos y guardamos.

In [11]:
basic_models_names = ['Naive_Bayes', 'LogReg', 'Linear_SVM', 'Random_Forest']
results = {}

for model_name in basic_models_names:
    print(f"\n{'='*20} {model_name} {'='*20}")
    model = Basic_Models(model_type=model_name)
    
    model_file = f"./Models/Modelos_Basicos/{model_name}.joblib"
    
    if os.path.exists(model_file):
        print(f"Cargando modelo guardado desde {model_file}...")
        model.load_model(name=f"{model_name}.joblib")
    else:
        print(f"Modelo no encontrado. Entrenando {model_name}...")
        model.fit(x_train, y_train)
        model.save_model(name=model_name)
    

    print(f"Realizando predicciones con {model_name}...")
    y_hat = model.predict(x_test)
    results[model_name] = y_hat
    
    print(f"Evaluación de {model_name}:")
    metrics = model.evaluate(y_true=y_test, y_hat=y_hat, labels=set(y_test), evaluate_type="sk_learn_metrics")
    print(metrics) # Descomentar para ver reporte detallado aquí


NLTK configurado exitosamente.
Cargando modelo guardado desde ./Models/Modelos_Basicos/Naive_Bayes.joblib...
Modelo cargado correctamente.
Realizando predicciones con Naive_Bayes...
Iniciando predicción...
-> Predicción terminada.
Evaluación de Naive_Bayes:
{'action_adventure': {'precision': 0.511419068736142, 'recall': 0.43083963762024846, 'f1-score': 0.4676838850306686, 'support': 10707.0}, 'comedy_family': {'precision': 0.43112513144058884, 'recall': 0.2551867219917012, 'f1-score': 0.32060471784178285, 'support': 4820.0}, 'documentary_factual': {'precision': 0.4559240126645559, 'recall': 0.7578947368421053, 'f1-score': 0.5693476225158672, 'support': 3610.0}, 'drama_romance': {'precision': 0.5498112111530642, 'recall': 0.6263545371825626, 'f1-score': 0.585592204477785, 'support': 12089.0}, 'scifi_horror_fantasy': {'precision': 0.5063350983358548, 'recall': 0.5561902783547985, 'f1-score': 0.5300930508810137, 'support': 9628.0}, 'suspense_crime': {'precision': 0.5910608590074367, 'rec

## 3. Modelo Transformer (roberta-base)
Verificamos si existe el modelo pre-entrenado. Si no, realizamos el fine-tuning.

In [None]:
transformer_path = "./Models/Modelos_Transformer/roberta-base"
model_name = "roberta-base"
tokenizer_name = 'roberta-base'

train_texts = train_data["text"].tolist()
test_texts = test_data["text"].tolist()

transformer_wrapper = Pretrained(model_type=model_name, labels=unique_labels)

if os.path.exists(transformer_path) and len(os.listdir(transformer_path)) > 0:
    print(f"\nCargando Transformer guardado desde {transformer_path}...")
    transformer_wrapper.model = AutoModelForSequenceClassification.from_pretrained(transformer_path)
    transformer_wrapper.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
    transformer_wrapper.model.to(transformer_wrapper.device)
else:
    print("\nModelo Transformer no encontrado. Iniciando Fine-Tuning...")
    transformer_wrapper.fit(
        train_texts=train_texts,
        train_labels=train_data['genre'],
        batch_size=4,
        epochs=3,
        learning_rate=1e-5
    )
    
    # Guardar modelo
    transformer_wrapper.save_model(path=transformer_path)

print("Realizando inferencia con Transformer...")
predictions_ids = transformer_wrapper.transform(test_texts, batch_size=8)
predictions_labels = [transformer_wrapper.id2label[pid] for pid in predictions_ids]

results['Transformer'] = predictions_labels

Using device: cuda
Loading model: roberta-base...


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Cargando Transformer guardado desde ./Models/Modelos_Transformer/roberta-base...
Realizando inferencia con Transformer...


Map: 100%|██████████| 55400/55400 [00:01<00:00, 32375.77 examples/s]
Inference: 100%|██████████| 6925/6925 [01:14<00:00, 92.42it/s] 


## 4. Comparación Final
Mostramos un resumen de las métricas de todos los modelos.

In [None]:
final_metrics = []

for model_name, preds in results.items():
    cm = Compute_Metrics(y_pred=preds, y_true=y_test, labels=unique_labels)
    metrics_dict = cm.compute_all()
    
    final_metrics.append({
        "Model": model_name,
        "Accuracy": metrics_dict["accuracy"],
        "Macro F1": metrics_dict["macro_f1"]
    })

df_metrics = pd.DataFrame(final_metrics)
df_metrics = df_metrics.sort_values(by="Macro F1", ascending=False)

print("\n--- Tabla Comparativa de Resultados ---")
display(df_metrics)

best_model = df_metrics.iloc[0]['Model']
print(f"\nMatriz de Confusión del mejor modelo ({best_model}):")
cm_best = Compute_Metrics(y_pred=results[best_model], y_true=y_test, labels=unique_labels)
display(cm_best.confusion_matrix())

0.5287906137184115
0.5292418772563177
0.5220216606498195
0.48247292418772564
0.6059025270758123

--- Tabla Comparativa de Resultados ---


Unnamed: 0,Model,Accuracy,Macro F1
4,Transformer,0.605903,0.621847
1,LogReg,0.529242,0.534708
2,Linear_SVM,0.522022,0.522958
0,Naive_Bayes,0.528791,0.50587
3,Random_Forest,0.482473,0.480092



Matriz de Confusión del mejor modelo (Transformer):


Predicción,action_adventure,comedy_family,documentary_factual,drama_romance,scifi_horror_fantasy,suspense_crime
Real,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
action_adventure,5169,1053,183,866,1633,1803
comedy_family,307,3158,215,583,418,139
documentary_factual,39,100,3244,185,19,23
drama_romance,1202,893,712,7690,603,989
scifi_horror_fantasy,750,827,38,364,6471,1178
suspense_crime,2009,343,98,1005,3256,7835
