# üöÄ Entrenamiento de Modelos Transformer para Clasificaci√≥n de Ocupaciones ENAHO

---

## üìã Descripci√≥n
Script robusto y universal para entrenar modelos de clasificaci√≥n de texto.

### üéØ Modelos Soportados:
- **BETO**: `dccuchile/bert-base-spanish-wwm-cased` (Espa√±ol)
- **XLM-RoBERTa**: `FacebookAI/xlm-roberta-base` (Multiling√ºe)

### ‚ú® Caracter√≠sticas:
- ‚úÖ Cambio de modelo con una sola variable
- ‚úÖ M√©tricas detalladas (macro, micro, weighted)
- ‚úÖ Manejo robusto de errores
- ‚úÖ Logging detallado para debugging
- ‚úÖ Validaci√≥n de datos en cada paso
- ‚úÖ Guardado completo de modelo y artefactos
- ‚úÖ Funciones de predicci√≥n e inferencia

---

**Autor**: Sistema de Clasificaci√≥n ENAHO  
**Fecha**: 2025  
**Entorno**: VSCode con Jupyter Notebook  


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# ============================================================================
# INSTALACI√ìN DE DEPENDENCIAS (Ejecutar solo una vez)
# ============================================================================

# Descomenta si necesitas instalar las librer√≠as
# !pip install transformers==4.36.0 datasets==2.15.0 scikit-learn==1.3.2
# !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# !pip install accelerate sentencepiece

print("‚úÖ Si las librer√≠as ya est√°n instaladas, puedes continuar")


‚úÖ Si las librer√≠as ya est√°n instaladas, puedes continuar


In [3]:
# ============================================================================
# IMPORTACIONES Y VERIFICACI√ìN DEL ENTORNO
# ============================================================================

import sys
import os
import warnings
import logging
from datetime import datetime
from pathlib import Path

# Data & ML
import pandas as pd
import numpy as np
import torch
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_recall_fscore_support,
    classification_report,
    confusion_matrix
)

# Transformers
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
    EarlyStoppingCallback
)
from torch.utils.data import Dataset

# Utilities
from tqdm.auto import tqdm
import pickle
import json

warnings.filterwarnings('ignore')

# ============================================================================
# CONFIGURACI√ìN DE LOGGING
# ============================================================================

def setup_logging(output_dir):
    """Configura el sistema de logging para debugging"""
    os.makedirs(output_dir, exist_ok=True)
    log_file = os.path.join(output_dir, f'training_{datetime.now().strftime("%Y%m%d_%H%M%S")}.log')

    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s - %(levelname)s - %(message)s',
        handlers=[
            logging.FileHandler(log_file, encoding='utf-8'),
            logging.StreamHandler(sys.stdout)
        ]
    )
    return logging.getLogger(__name__)

# ============================================================================
# VERIFICACI√ìN DE GPU
# ============================================================================

print("\n" + "="*80)
print("üîç VERIFICACI√ìN DEL ENTORNO")
print("="*80)

print(f"\nüì¶ Versiones:")
print(f"   Python: {sys.version.split()[0]}")
print(f"   PyTorch: {torch.__version__}")
print(f"   CUDA disponible: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"\nüéÆ GPU Detectada:")
    print(f"   Dispositivo: {torch.cuda.get_device_name(0)}")
    print(f"   Memoria total: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"   Memoria libre: {(torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)) / 1e9:.2f} GB")
else:
    print("\n‚ö†Ô∏è  GPU no detectada - El entrenamiento ser√° lento")
    print("   Considera usar Google Colab o configurar CUDA")

print("\n" + "="*80)
print("‚úÖ Importaciones completadas correctamente")
print("="*80 + "\n")



üîç VERIFICACI√ìN DEL ENTORNO

üì¶ Versiones:
   Python: 3.12.12
   PyTorch: 2.8.0+cu126
   CUDA disponible: True

üéÆ GPU Detectada:
   Dispositivo: Tesla T4
   Memoria total: 15.83 GB
   Memoria libre: 15.83 GB

‚úÖ Importaciones completadas correctamente



In [4]:
# ============================================================================
# ‚öôÔ∏è  CONFIGURACI√ìN PRINCIPAL - MODIFICA AQU√ç
# ============================================================================

class ModelConfig:
    """
    Configuraci√≥n centralizada del modelo y entrenamiento

    IMPORTANTE: Solo necesitas cambiar MODEL_NAME para entrenar un modelo diferente
    """

    # ========================================================================
    # üéØ SELECCI√ìN DEL MODELO - CAMBIA SOLO ESTA L√çNEA
    # ========================================================================

    MODEL_NAME = "FacebookAI/xlm-roberta-base"  # Opci√≥n 1: XLM-RoBERTa (multiling√ºe)
    # MODEL_NAME = "dccuchile/bert-base-spanish-wwm-cased"  # Opci√≥n 2: BETO (espa√±ol)

    # ========================================================================
    # üìÇ RUTAS DE DATOS
    # ========================================================================

    # Ruta al archivo de datos (ajusta seg√∫n tu ubicaci√≥n)
    DATA_PATH = "/content/drive/MyDrive/PI_PEU/BASE_LIMPIA_VF.parquet"  # Cambia esta ruta

    # Directorio base para outputs
    BASE_OUTPUT_DIR = "/content/drive/MyDrive/PI_PEU/xlmRoberta"

    # ========================================================================
    # üìä COLUMNAS DEL DATASET
    # ========================================================================

    TEXT_COLUMN = "texto_final"  # Columna con el texto
    TARGET_COLUMN = "p505r4"     # Columna objetivo (clase)

    # ========================================================================
    # üéõÔ∏è  HIPERPAR√ÅMETROS DE ENTRENAMIENTO
    # ========================================================================

    # Tokenizaci√≥n
    MAX_LENGTH = 128  # Longitud m√°xima de tokens

    # Entrenamiento
    BATCH_SIZE = 16          # Ajusta seg√∫n tu GPU (16, 32, 64)
    LEARNING_RATE = 2e-5     # Tasa de aprendizaje
    NUM_EPOCHS = 3           # N√∫mero de √©pocas
    WARMUP_STEPS = 500       # Pasos de warmup
    WEIGHT_DECAY = 0.01      # Regularizaci√≥n

    # Divisi√≥n de datos
    TEST_SIZE = 0.15         # 15% para test
    VAL_SIZE = 0.15          # 15% para validaci√≥n
    RANDOM_STATE = 2025      # Semilla para reproducibilidad

    # Filtrado de clases raras
    MIN_SAMPLES_PER_CLASS = 10  # M√≠nimo de muestras por clase

    # Early stopping
    EARLY_STOPPING_PATIENCE = 3

    # ========================================================================
    # üîß CONFIGURACI√ìN AUTOM√ÅTICA (NO MODIFICAR)
    # ========================================================================

    def __init__(self):
        """Inicializa configuraci√≥n y crea directorios"""
        # Detectar tipo de modelo del nombre
        if "roberta" in self.MODEL_NAME.lower():
            self.model_type = "xlm-roberta"
        elif "bert" in self.MODEL_NAME.lower():
            self.model_type = "bert"
        else:
            self.model_type = "transformer"

        # Crear nombre descriptivo para el experimento
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        model_short_name = self.MODEL_NAME.split('/')[-1]
        self.experiment_name = f"{model_short_name}_{timestamp}"

        # Configurar directorios
        self.OUTPUT_DIR = os.path.join(self.BASE_OUTPUT_DIR, self.experiment_name)
        self.MODEL_SAVE_DIR = os.path.join(self.OUTPUT_DIR, "final_model")
        self.CHECKPOINT_DIR = os.path.join(self.OUTPUT_DIR, "checkpoints")

        # Crear directorios
        for dir_path in [self.OUTPUT_DIR, self.MODEL_SAVE_DIR, self.CHECKPOINT_DIR]:
            os.makedirs(dir_path, exist_ok=True)

        # Device
        self.DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

        # Configurar logging
        self.logger = setup_logging(self.OUTPUT_DIR)
        self.logger.info(f"Experimento iniciado: {self.experiment_name}")
        self.logger.info(f"Modelo seleccionado: {self.MODEL_NAME}")
        self.logger.info(f"Dispositivo: {self.DEVICE}")

    def display_config(self):
        """Muestra la configuraci√≥n actual"""
        print("\n" + "="*80)
        print("‚öôÔ∏è  CONFIGURACI√ìN DEL MODELO")
        print("="*80)
        print(f"\nü§ñ Modelo: {self.MODEL_NAME}")
        print(f"   Tipo: {self.model_type}")
        print(f"   Experimento: {self.experiment_name}")
        print(f"\nüìÇ Rutas:")
        print(f"   Datos: {self.DATA_PATH}")
        print(f"   Output: {self.OUTPUT_DIR}")
        print(f"   Modelo final: {self.MODEL_SAVE_DIR}")
        print(f"\nüìä Datos:")
        print(f"   Columna texto: {self.TEXT_COLUMN}")
        print(f"   Columna target: {self.TARGET_COLUMN}")
        print(f"   Max length: {self.MAX_LENGTH}")
        print(f"\nüéõÔ∏è  Entrenamiento:")
        print(f"   Batch size: {self.BATCH_SIZE}")
        print(f"   Learning rate: {self.LEARNING_RATE}")
        print(f"   Epochs: {self.NUM_EPOCHS}")
        print(f"   Early stopping: {self.EARLY_STOPPING_PATIENCE} epochs")
        print(f"\nüíæ Divisi√≥n de datos:")
        print(f"   Test: {self.TEST_SIZE*100:.0f}%")
        print(f"   Validaci√≥n: {self.VAL_SIZE*100:.0f}%")
        print(f"   Train: {(1-self.TEST_SIZE-self.VAL_SIZE)*100:.0f}%")
        print("\n" + "="*80 + "\n")

    def save_config(self):
        """Guarda la configuraci√≥n en JSON"""
        config_dict = {
            'model_name': self.MODEL_NAME,
            'model_type': self.model_type,
            'experiment_name': self.experiment_name,
            'data_path': self.DATA_PATH,
            'text_column': self.TEXT_COLUMN,
            'target_column': self.TARGET_COLUMN,
            'max_length': self.MAX_LENGTH,
            'batch_size': self.BATCH_SIZE,
            'learning_rate': self.LEARNING_RATE,
            'num_epochs': self.NUM_EPOCHS,
            'test_size': self.TEST_SIZE,
            'val_size': self.VAL_SIZE,
            'random_state': self.RANDOM_STATE,
            'min_samples_per_class': self.MIN_SAMPLES_PER_CLASS,
            'device': self.DEVICE,
            'timestamp': datetime.now().isoformat()
        }

        config_path = os.path.join(self.OUTPUT_DIR, 'config.json')
        with open(config_path, 'w', encoding='utf-8') as f:
            json.dump(config_dict, f, indent=2, ensure_ascii=False)

        self.logger.info(f"Configuraci√≥n guardada en: {config_path}")
        return config_path


# Inicializar configuraci√≥n
config = ModelConfig()
config.display_config()
config.save_config()



‚öôÔ∏è  CONFIGURACI√ìN DEL MODELO

ü§ñ Modelo: FacebookAI/xlm-roberta-base
   Tipo: xlm-roberta
   Experimento: xlm-roberta-base_20251112_182822

üìÇ Rutas:
   Datos: /content/drive/MyDrive/PI_PEU/BASE_LIMPIA_VF.parquet
   Output: /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822
   Modelo final: /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/final_model

üìä Datos:
   Columna texto: texto_final
   Columna target: p505r4
   Max length: 128

üéõÔ∏è  Entrenamiento:
   Batch size: 16
   Learning rate: 2e-05
   Epochs: 3
   Early stopping: 3 epochs

üíæ Divisi√≥n de datos:
   Test: 15%
   Validaci√≥n: 15%
   Train: 70%




'/content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/config.json'

In [5]:
# ============================================================================
# üìÇ CARGA Y VALIDACI√ìN DE DATOS
# ============================================================================

class DataLoader:
    """Cargador y validador de datos con manejo robusto de errores"""

    def __init__(self, config):
        self.config = config
        self.logger = config.logger

    def load_data(self):
        """
        Carga datos desde archivo con validaci√≥n

        Returns:
            pd.DataFrame: Datos cargados
        """
        try:
            self.logger.info(f"Cargando datos desde: {self.config.DATA_PATH}")

            # Verificar que el archivo existe
            if not os.path.exists(self.config.DATA_PATH):
                raise FileNotFoundError(
                    f"‚ùå El archivo no existe: {self.config.DATA_PATH}\n"
                    f"   Por favor, verifica la ruta en ModelConfig.DATA_PATH"
                )

            # Cargar seg√∫n extensi√≥n
            file_ext = os.path.splitext(self.config.DATA_PATH)[1].lower()

            if file_ext == '.parquet':
                df = pd.read_parquet(self.config.DATA_PATH)
            elif file_ext == '.csv':
                df = pd.read_csv(self.config.DATA_PATH)
            elif file_ext in ['.xlsx', '.xls']:
                df = pd.read_excel(self.config.DATA_PATH)
            else:
                raise ValueError(
                    f"‚ùå Formato no soportado: {file_ext}\n"
                    f"   Formatos v√°lidos: .parquet, .csv, .xlsx, .xls"
                )

            self.logger.info(f"‚úÖ Datos cargados: {df.shape[0]:,} filas x {df.shape[1]} columnas")

            return df

        except Exception as e:
            self.logger.error(f"‚ùå Error al cargar datos: {str(e)}")
            raise

    def validate_data(self, df):
        """
        Valida que los datos tengan las columnas necesarias

        Args:
            df: DataFrame a validar

        Raises:
            ValueError: Si faltan columnas requeridas
        """
        self.logger.info("Validando estructura de datos...")

        # Verificar columnas requeridas
        required_cols = [self.config.TEXT_COLUMN, self.config.TARGET_COLUMN]
        missing_cols = [col for col in required_cols if col not in df.columns]

        if missing_cols:
            available_cols = list(df.columns)
            raise ValueError(
                f"‚ùå Columnas faltantes: {missing_cols}\n"
                f"   Columnas disponibles: {available_cols}\n"
                f"   Verifica TEXT_COLUMN y TARGET_COLUMN en ModelConfig"
            )

        # Validar datos no nulos
        null_text = df[self.config.TEXT_COLUMN].isna().sum()
        null_target = df[self.config.TARGET_COLUMN].isna().sum()

        self.logger.info(
            f"   Valores nulos - Texto: {null_text:,}, Target: {null_target:,}"
        )

        # Validar textos vac√≠os
        empty_text = (df[self.config.TEXT_COLUMN].str.strip() == '').sum()
        if empty_text > 0:
            self.logger.warning(f"   ‚ö†Ô∏è  Textos vac√≠os: {empty_text:,}")

        self.logger.info("‚úÖ Validaci√≥n completada")

    def filter_valid_records(self, df):
        """
        Filtra registros v√°lidos (no nulos, no vac√≠os)

        Args:
            df: DataFrame original

        Returns:
            pd.DataFrame: DataFrame filtrado
        """
        self.logger.info("Filtrando registros v√°lidos...")

        initial_count = len(df)

        # Filtrar nulos y vac√≠os
        df_clean = df[
            df[self.config.TEXT_COLUMN].notna() &
            df[self.config.TARGET_COLUMN].notna() &
            (df[self.config.TEXT_COLUMN].str.strip() != '')
        ].copy()

        final_count = len(df_clean)
        removed = initial_count - final_count

        self.logger.info(
            f"   Registros iniciales: {initial_count:,}\n"
            f"   Registros v√°lidos: {final_count:,}\n"
            f"   Removidos: {removed:,} ({removed/initial_count*100:.2f}%)"
        )

        if final_count == 0:
            raise ValueError(
                "‚ùå No quedan registros v√°lidos despu√©s del filtrado\n"
                "   Verifica la calidad de tus datos"
            )

        return df_clean

    def filter_rare_classes(self, df):
        """
        Filtra clases con pocas muestras

        Args:
            df: DataFrame

        Returns:
            pd.DataFrame: DataFrame filtrado
        """
        self.logger.info(
            f"Filtrando clases con < {self.config.MIN_SAMPLES_PER_CLASS} muestras..."
        )

        # Contar muestras por clase
        class_counts = df[self.config.TARGET_COLUMN].value_counts()

        # Identificar clases v√°lidas
        valid_classes = class_counts[class_counts >= self.config.MIN_SAMPLES_PER_CLASS].index
        rare_classes = class_counts[class_counts < self.config.MIN_SAMPLES_PER_CLASS]

        # Filtrar
        df_filtered = df[df[self.config.TARGET_COLUMN].isin(valid_classes)].copy()

        self.logger.info(
            f"   Clases originales: {len(class_counts):,}\n"
            f"   Clases mantenidas: {len(valid_classes):,}\n"
            f"   Clases removidas: {len(rare_classes):,}\n"
            f"   Registros antes: {len(df):,}\n"
            f"   Registros despu√©s: {len(df_filtered):,}"
        )

        if len(df_filtered) == 0:
            raise ValueError(
                f"‚ùå No quedan registros despu√©s de filtrar clases raras\n"
                f"   Considera reducir MIN_SAMPLES_PER_CLASS"
            )

        return df_filtered

    def create_label_mapping(self, df):
        """
        Crea mapeo de etiquetas a √≠ndices

        Args:
            df: DataFrame

        Returns:
            tuple: (df_with_labels, label2id, id2label)
        """
        self.logger.info("Creando mapeo de etiquetas...")

        # Obtener clases √∫nicas ordenadas
        unique_labels = sorted(df[self.config.TARGET_COLUMN].unique())

        # Crear mapeos
        label2id = {label: idx for idx, label in enumerate(unique_labels)}
        id2label = {idx: label for label, idx in label2id.items()}

        # Agregar columna de √≠ndices num√©ricos
        df['label_id'] = df[self.config.TARGET_COLUMN].map(label2id)

        # Verificar que no hay nulos (no deber√≠a pasar)
        if df['label_id'].isna().any():
            raise ValueError("‚ùå Error en el mapeo de etiquetas")

        self.logger.info(
            f"‚úÖ Mapeo creado: {len(label2id)} clases (√≠ndices 0-{len(label2id)-1})"
        )

        # Mostrar distribuci√≥n de clases
        class_dist = df[self.config.TARGET_COLUMN].value_counts()
        self.logger.info(
            f"   Clase m√°s frecuente: {class_dist.index[0]} ({class_dist.iloc[0]:,} muestras)\n"
            f"   Clase menos frecuente: {class_dist.index[-1]} ({class_dist.iloc[-1]:,} muestras)\n"
            f"   Promedio por clase: {class_dist.mean():.1f}"
        )

        return df, label2id, id2label

    def split_data(self, df):
        """
        Divide datos en train, validation y test con estratificaci√≥n

        Args:
            df: DataFrame

        Returns:
            tuple: (train_df, val_df, test_df)
        """
        self.logger.info("Dividiendo datos...")

        try:
            # Primero separar test
            train_val, test = train_test_split(
                df,
                test_size=self.config.TEST_SIZE,
                random_state=self.config.RANDOM_STATE,
                stratify=df['label_id']
            )

            # Luego separar train y validation
            val_size_adjusted = self.config.VAL_SIZE / (1 - self.config.TEST_SIZE)
            train, val = train_test_split(
                train_val,
                test_size=val_size_adjusted,
                random_state=self.config.RANDOM_STATE,
                stratify=train_val['label_id']
            )

            self.logger.info(
                f"‚úÖ Divisi√≥n completada:\n"
                f"   Train: {len(train):,} ({len(train)/len(df)*100:.1f}%)\n"
                f"   Validation: {len(val):,} ({len(val)/len(df)*100:.1f}%)\n"
                f"   Test: {len(test):,} ({len(test)/len(df)*100:.1f}%)"
            )

            return train, val, test

        except ValueError as e:
            self.logger.error(
                f"‚ùå Error al dividir datos: {str(e)}\n"
                f"   Puede ser que algunas clases tengan muy pocas muestras\n"
                f"   Considera aumentar MIN_SAMPLES_PER_CLASS"
            )
            raise


# ============================================================================
# EJECUTAR CARGA DE DATOS
# ============================================================================

print("\n" + "="*80)
print("üìÇ CARGANDO Y PREPARANDO DATOS")
print("="*80 + "\n")

try:
    # Inicializar cargador
    data_loader = DataLoader(config)

    # Cargar datos
    df_raw = data_loader.load_data()

    # Validar estructura
    data_loader.validate_data(df_raw)

    # Filtrar registros v√°lidos
    df_valid = data_loader.filter_valid_records(df_raw)

    # Filtrar clases raras
    df_filtered = data_loader.filter_rare_classes(df_valid)

    # Crear mapeo de etiquetas
    df_final, label2id, id2label = data_loader.create_label_mapping(df_filtered)

    # Dividir datos
    train_df, val_df, test_df = data_loader.split_data(df_final)

    print("\n" + "="*80)
    print("‚úÖ DATOS PREPARADOS EXITOSAMENTE")
    print("="*80 + "\n")

    # Guardar informaci√≥n de las clases
    class_info = {
        'num_classes': len(label2id),
        'label2id': label2id,
        'id2label': id2label,
        'class_distribution': df_final[config.TARGET_COLUMN].value_counts().to_dict()
    }

    with open(os.path.join(config.OUTPUT_DIR, 'class_info.json'), 'w', encoding='utf-8') as f:
        json.dump(class_info, f, indent=2, ensure_ascii=False)

except Exception as e:
    print("\n" + "="*80)
    print("‚ùå ERROR EN LA CARGA DE DATOS")
    print("="*80)
    print(f"\n{str(e)}\n")
    print("Por favor, revisa:")
    print("1. La ruta del archivo en ModelConfig.DATA_PATH")
    print("2. Los nombres de columnas en TEXT_COLUMN y TARGET_COLUMN")
    print("3. La calidad de tus datos (nulos, vac√≠os, etc.)")
    print("\n" + "="*80 + "\n")
    raise



üìÇ CARGANDO Y PREPARANDO DATOS


‚úÖ DATOS PREPARADOS EXITOSAMENTE



In [6]:
# ============================================================================
# üî§ DATASET Y TOKENIZACI√ìN
# ============================================================================

class TextClassificationDataset(Dataset):
    """
    Dataset personalizado para clasificaci√≥n de texto
    Compatible con cualquier modelo de Hugging Face
    """

    def __init__(self, texts, labels, tokenizer, max_length):
        """
        Args:
            texts: Lista de textos
            labels: Lista de etiquetas (√≠ndices num√©ricos)
            tokenizer: Tokenizer de Hugging Face
            max_length: Longitud m√°xima de tokens
        """
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        """
        Obtiene un ejemplo tokenizado

        Returns:
            dict: Diccionario con input_ids, attention_mask y labels
        """
        text = str(self.texts[idx])
        label = int(self.labels[idx])

        # Tokenizar
        encoding = self.tokenizer(
            text,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }


# ============================================================================
# INICIALIZAR TOKENIZER Y DATASETS
# ============================================================================

print("\n" + "="*80)
print("üî§ INICIALIZANDO TOKENIZER Y DATASETS")
print("="*80 + "\n")

try:
    # Cargar tokenizer
    config.logger.info(f"Cargando tokenizer: {config.MODEL_NAME}")
    tokenizer = AutoTokenizer.from_pretrained(config.MODEL_NAME)

    print(f"‚úÖ Tokenizer cargado: {config.MODEL_NAME}")
    print(f"   Vocabulario: {len(tokenizer):,} tokens")
    print(f"   Tipo: {tokenizer.__class__.__name__}")

    # Crear datasets
    config.logger.info("Creando datasets...")

    train_dataset = TextClassificationDataset(
        texts=train_df[config.TEXT_COLUMN].tolist(),
        labels=train_df['label_id'].tolist(),
        tokenizer=tokenizer,
        max_length=config.MAX_LENGTH
    )

    val_dataset = TextClassificationDataset(
        texts=val_df[config.TEXT_COLUMN].tolist(),
        labels=val_df['label_id'].tolist(),
        tokenizer=tokenizer,
        max_length=config.MAX_LENGTH
    )

    test_dataset = TextClassificationDataset(
        texts=test_df[config.TEXT_COLUMN].tolist(),
        labels=test_df['label_id'].tolist(),
        tokenizer=tokenizer,
        max_length=config.MAX_LENGTH
    )

    print(f"\n‚úÖ Datasets creados:")
    print(f"   Train: {len(train_dataset):,} ejemplos")
    print(f"   Validation: {len(val_dataset):,} ejemplos")
    print(f"   Test: {len(test_dataset):,} ejemplos")

    # Verificar un ejemplo
    sample = train_dataset[0]
    print(f"\nüìù Ejemplo de muestra tokenizada:")
    print(f"   Input IDs shape: {sample['input_ids'].shape}")
    print(f"   Attention mask shape: {sample['attention_mask'].shape}")
    print(f"   Label: {sample['labels'].item()}")

    print("\n" + "="*80)
    print("‚úÖ TOKENIZACI√ìN COMPLETADA")
    print("="*80 + "\n")

except Exception as e:
    print("\n" + "="*80)
    print("‚ùå ERROR EN LA TOKENIZACI√ìN")
    print("="*80)
    print(f"\n{str(e)}\n")
    print("Posibles causas:")
    print("1. El modelo no est√° disponible (verifica MODEL_NAME)")
    print("2. No hay conexi√≥n a internet para descargar el tokenizer")
    print("3. Problema con los datos de entrada")
    print("\n" + "="*80 + "\n")
    raise



üî§ INICIALIZANDO TOKENIZER Y DATASETS



tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

‚úÖ Tokenizer cargado: FacebookAI/xlm-roberta-base
   Vocabulario: 250,002 tokens
   Tipo: XLMRobertaTokenizerFast

‚úÖ Datasets creados:
   Train: 220,937 ejemplos
   Validation: 47,344 ejemplos
   Test: 47,344 ejemplos

üìù Ejemplo de muestra tokenizada:
   Input IDs shape: torch.Size([128])
   Attention mask shape: torch.Size([128])
   Label: 356

‚úÖ TOKENIZACI√ìN COMPLETADA



In [7]:
# ============================================================================
# üìä M√âTRICAS DETALLADAS
# ============================================================================

def compute_detailed_metrics(eval_pred):
    """
    Calcula m√©tricas detalladas: Accuracy, Precision, Recall, F1
    Con variantes: macro, micro y weighted

    Args:
        eval_pred: Predicciones del modelo (predictions, label_ids)

    Returns:
        dict: Diccionario con todas las m√©tricas
    """
    predictions, labels = eval_pred

    # Obtener predicciones (argmax si son logits)
    if predictions.ndim > 1:
        preds = np.argmax(predictions, axis=1)
    else:
        preds = predictions

    # Accuracy
    accuracy = accuracy_score(labels, preds)

    # Precision, Recall, F1 con diferentes promedios
    # Macro: promedio sin ponderar (todas las clases tienen el mismo peso)
    precision_macro, recall_macro, f1_macro, _ = precision_recall_fscore_support(
        labels, preds, average='macro', zero_division=0
    )

    # Micro: agregado global (considera todas las muestras por igual)
    precision_micro, recall_micro, f1_micro, _ = precision_recall_fscore_support(
        labels, preds, average='micro', zero_division=0
    )

    # Weighted: promedio ponderado por soporte de cada clase
    precision_weighted, recall_weighted, f1_weighted, _ = precision_recall_fscore_support(
        labels, preds, average='weighted', zero_division=0
    )

    return {
        # Accuracy (solo una versi√≥n)
        'accuracy': accuracy,

        # F1 Score
        'f1_macro': f1_macro,
        'f1_micro': f1_micro,
        'f1_weighted': f1_weighted,

        # Precision
        'precision_macro': precision_macro,
        'precision_micro': precision_micro,
        'precision_weighted': precision_weighted,

        # Recall
        'recall_macro': recall_macro,
        'recall_micro': recall_micro,
        'recall_weighted': recall_weighted,
    }



def display_metrics(metrics, title="M√©tricas"):
    """Muestra las m√©tricas de forma organizada"""
    print("\n" + "="*80)
    print(f"üìä {title.upper()}")
    print("="*80 + "\n")

    # Mostrar Loss si existe
    loss = (
        metrics.get('test_loss') or
        metrics.get('eval_loss') or
        metrics.get('loss', None)
    )
    if loss is not None:
        print(f"üí• LOSS: {loss:.4f}")

    # Accuracy
    acc = (
        metrics.get('test_accuracy') or
        metrics.get('eval_accuracy') or
        metrics.get('accuracy', 0)
    )
    print(f"üéØ ACCURACY: {acc:.4f}")
    print("\n" + "-"*80)

    # Tabla
    print(f"\n{'M√©trica':<20} {'Macro':>12} {'Micro':>12} {'Weighted':>12}")
    print("-"*60)

    def get_m(name):
        return (
            metrics.get(f'test_{name}') or
            metrics.get(f'eval_{name}') or
            metrics.get(name, 0)
        )

    print(f"{'F1 Score':<20} {get_m('f1_macro'):>12.4f} {get_m('f1_micro'):>12.4f} {get_m('f1_weighted'):>12.4f}")
    print(f"{'Precision':<20} {get_m('precision_macro'):>12.4f} {get_m('precision_micro'):>12.4f} {get_m('precision_weighted'):>12.4f}")
    print(f"{'Recall':<20} {get_m('recall_macro'):>12.4f} {get_m('recall_micro'):>12.4f} {get_m('recall_weighted'):>12.4f}")

    print("\n" + "="*80 + "\n")


print("‚úÖ Funciones de m√©tricas cargadas (con loss incluido)")


‚úÖ Funciones de m√©tricas cargadas (con loss incluido)


In [8]:
# ============================================================================
# ü§ñ INICIALIZACI√ìN DEL MODELO
# ============================================================================

print("\n" + "="*80)
print("ü§ñ INICIALIZANDO MODELO")
print("="*80 + "\n")

try:
    config.logger.info(f"Cargando modelo: {config.MODEL_NAME}")

    # Cargar modelo
    model = AutoModelForSequenceClassification.from_pretrained(
        config.MODEL_NAME,
        num_labels=len(label2id),
        id2label=id2label,
        label2id=label2id,
        problem_type="single_label_classification"
    )

    # Mover a GPU si est√° disponible
    model.to(config.DEVICE)

    # Informaci√≥n del modelo
    num_params = sum(p.numel() for p in model.parameters())
    num_trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)

    print(f"‚úÖ Modelo cargado: {config.MODEL_NAME}")
    print(f"   Tipo: {model.__class__.__name__}")
    print(f"   N√∫mero de clases: {len(label2id)}")
    print(f"   Par√°metros totales: {num_params:,}")
    print(f"   Par√°metros entrenables: {num_trainable:,}")
    print(f"   Dispositivo: {config.DEVICE}")

    if torch.cuda.is_available():
        print(f"   Memoria GPU asignada: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")

    config.logger.info(f"Modelo inicializado con {num_params:,} par√°metros")

    print("\n" + "="*80)
    print("‚úÖ MODELO LISTO PARA ENTRENAMIENTO")
    print("="*80 + "\n")

except Exception as e:
    print("\n" + "="*80)
    print("‚ùå ERROR AL CARGAR EL MODELO")
    print("="*80)
    print(f"\n{str(e)}\n")
    print("Posibles causas:")
    print("1. El nombre del modelo es incorrecto")
    print("2. No hay conexi√≥n a internet para descargar el modelo")
    print("3. No hay suficiente memoria GPU/RAM")
    print("4. Incompatibilidad de versiones de transformers")
    print("\nModelos v√°lidos:")
    print("- FacebookAI/xlm-roberta-base")
    print("- dccuchile/bert-base-spanish-wwm-cased")
    print("\n" + "="*80 + "\n")
    raise



ü§ñ INICIALIZANDO MODELO



model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at FacebookAI/xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


‚úÖ Modelo cargado: FacebookAI/xlm-roberta-base
   Tipo: XLMRobertaForSequenceClassification
   N√∫mero de clases: 357
   Par√°metros totales: 278,318,181
   Par√°metros entrenables: 278,318,181
   Dispositivo: cuda
   Memoria GPU asignada: 1.11 GB

‚úÖ MODELO LISTO PARA ENTRENAMIENTO



In [9]:
# ============================================================================
# ‚öôÔ∏è  CONFIGURACI√ìN DEL ENTRENAMIENTO
# ============================================================================

print("\n" + "="*80)
print("‚öôÔ∏è  CONFIGURANDO ENTRENAMIENTO")
print("="*80 + "\n")

# Configuraci√≥n de argumentos de entrenamiento
training_args = TrainingArguments(
    # Directorios
    output_dir=config.CHECKPOINT_DIR,
    logging_dir=os.path.join(config.OUTPUT_DIR, 'logs'),

    # Hiperpar√°metros
    learning_rate=config.LEARNING_RATE,
    per_device_train_batch_size=config.BATCH_SIZE,
    per_device_eval_batch_size=config.BATCH_SIZE,
    num_train_epochs=config.NUM_EPOCHS,
    warmup_steps=config.WARMUP_STEPS,
    weight_decay=config.WEIGHT_DECAY,

    # Evaluaci√≥n y guardado
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1_weighted",  # Usar F1 weighted como m√©trica principal
    greater_is_better=True,

    # Logging
    logging_steps=100,
    logging_strategy="steps",

    # Optimizaci√≥n
    fp16=torch.cuda.is_available(),  # Precisi√≥n mixta si hay GPU
    gradient_accumulation_steps=1,

    # Otros
    seed=config.RANDOM_STATE,
    report_to="none",  # Desactivar reportes externos
    disable_tqdm=False,  # Mantener barra de progreso
)

# Inicializar Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_detailed_metrics,
    callbacks=[
        EarlyStoppingCallback(
            early_stopping_patience=config.EARLY_STOPPING_PATIENCE
        )
    ]
)

print("‚úÖ Configuraci√≥n de entrenamiento:")
print(f"   Learning rate: {config.LEARNING_RATE}")
print(f"   Batch size: {config.BATCH_SIZE}")
print(f"   Epochs: {config.NUM_EPOCHS}")
print(f"   Warmup steps: {config.WARMUP_STEPS}")
print(f"   Early stopping: {config.EARLY_STOPPING_PATIENCE} epochs")
print(f"   FP16 (mixed precision): {training_args.fp16}")
print(f"   M√©trica principal: f1_weighted")

print("\n" + "="*80)
print("‚úÖ TRAINER CONFIGURADO Y LISTO")
print("="*80 + "\n")

config.logger.info("Trainer configurado correctamente")



‚öôÔ∏è  CONFIGURANDO ENTRENAMIENTO

‚úÖ Configuraci√≥n de entrenamiento:
   Learning rate: 2e-05
   Batch size: 16
   Epochs: 3
   Warmup steps: 500
   Early stopping: 3 epochs
   FP16 (mixed precision): True
   M√©trica principal: f1_weighted

‚úÖ TRAINER CONFIGURADO Y LISTO



In [10]:
# ============================================================================
# üöÄ ENTRENAMIENTO DEL MODELO
# ============================================================================

print("\n" + "="*80)
print("üöÄ INICIANDO ENTRENAMIENTO")
print("="*80)
print(f"\nModelo: {config.MODEL_NAME}")
print(f"Datos de entrenamiento: {len(train_dataset):,} ejemplos")
print(f"Datos de validaci√≥n: {len(val_dataset):,} ejemplos")
print(f"\nEsto puede tomar varios minutos/horas dependiendo de:")
print("  - Tama√±o del dataset")
print("  - N√∫mero de √©pocas")
print("  - Hardware disponible (GPU/CPU)")
print("\n" + "="*80 + "\n")

try:
    # Registrar inicio
    start_time = datetime.now()
    config.logger.info("Iniciando entrenamiento...")

    # ENTRENAR
    train_result = trainer.train()

    # Registrar finalizaci√≥n
    end_time = datetime.now()
    training_time = end_time - start_time

    config.logger.info(f"Entrenamiento completado en {training_time}")

    print("\n" + "="*80)
    print("‚úÖ ENTRENAMIENTO COMPLETADO")
    print("="*80)
    print(f"\nTiempo total: {training_time}")
    print(f"Mejor modelo guardado en: {config.CHECKPOINT_DIR}")

    # Mostrar m√©tricas finales de entrenamiento
    print(f"\nüìä M√©tricas finales de entrenamiento:")
    print(f"   Training loss: {train_result.training_loss:.4f}")

    # Evaluar en validation set
    print("\n" + "-"*80)
    print("üìä Evaluando en conjunto de validaci√≥n...")
    val_metrics = trainer.evaluate()
    display_metrics(val_metrics, "M√©tricas de Validaci√≥n")

    print("="*80 + "\n")

except KeyboardInterrupt:
    print("\n" + "="*80)
    print("‚ö†Ô∏è  ENTRENAMIENTO INTERRUMPIDO POR EL USUARIO")
    print("="*80)
    print("\nEl modelo puede haber sido parcialmente entrenado.")
    print("Los checkpoints guardados est√°n disponibles en:")
    print(f"  {config.CHECKPOINT_DIR}")
    print("\n" + "="*80 + "\n")
    raise

except Exception as e:
    print("\n" + "="*80)
    print("‚ùå ERROR DURANTE EL ENTRENAMIENTO")
    print("="*80)
    print(f"\n{str(e)}\n")
    print("Posibles causas:")
    print("1. Memoria insuficiente (GPU/RAM)")
    print("   Soluci√≥n: Reduce BATCH_SIZE en ModelConfig")
    print("2. Datos corruptos o formato incorrecto")
    print("3. Incompatibilidad de versiones")
    print("\nRevisa el archivo de log para m√°s detalles:")
    print(f"  {os.path.join(config.OUTPUT_DIR, 'training_*.log')}")
    print("\n" + "="*80 + "\n")
    config.logger.error(f"Error en entrenamiento: {str(e)}", exc_info=True)
    raise



üöÄ INICIANDO ENTRENAMIENTO

Modelo: FacebookAI/xlm-roberta-base
Datos de entrenamiento: 220,937 ejemplos
Datos de validaci√≥n: 47,344 ejemplos

Esto puede tomar varios minutos/horas dependiendo de:
  - Tama√±o del dataset
  - N√∫mero de √©pocas
  - Hardware disponible (GPU/CPU)




Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro,F1 Micro,F1 Weighted,Precision Macro,Precision Micro,Precision Weighted,Recall Macro,Recall Micro,Recall Weighted
1,0.3548,0.373263,0.919229,0.414479,0.919229,0.908641,0.429819,0.919229,0.905562,0.427376,0.919229,0.919229
2,0.315,0.30843,0.932663,0.514334,0.932663,0.927288,0.532307,0.932663,0.926565,0.529108,0.932663,0.932663
3,0.2963,0.292726,0.93731,0.548037,0.93731,0.933028,0.561578,0.93731,0.931313,0.557026,0.93731,0.93731



‚úÖ ENTRENAMIENTO COMPLETADO

Tiempo total: 2:03:02.044670
Mejor modelo guardado en: /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/checkpoints

üìä M√©tricas finales de entrenamiento:
   Training loss: 0.4604

--------------------------------------------------------------------------------
üìä Evaluando en conjunto de validaci√≥n...



üìä M√âTRICAS DE VALIDACI√ìN

üí• LOSS: 0.2927
üéØ ACCURACY: 0.9373

--------------------------------------------------------------------------------

M√©trica                     Macro        Micro     Weighted
------------------------------------------------------------
F1 Score                   0.5480       0.9373       0.9330
Precision                  0.5616       0.9373       0.9313
Recall                     0.5570       0.9373       0.9373





In [11]:
# ============================================================================
# üß™ EVALUACI√ìN EN TEST SET
# ============================================================================

print("\n" + "="*80)
print("üß™ EVALUACI√ìN EN TEST SET")
print("="*80 + "\n")

try:
    config.logger.info("Evaluando en test set...")

    # Obtener predicciones en test set
    test_predictions = trainer.predict(test_dataset)

    # Extraer m√©tricas
    test_metrics = test_predictions.metrics

    # Mostrar m√©tricas
    display_metrics(test_metrics, "M√©tricas de Test (Evaluaci√≥n Final)")

    # Guardar m√©tricas en archivo
    metrics_file = os.path.join(config.OUTPUT_DIR, 'test_metrics.json')
    with open(metrics_file, 'w', encoding='utf-8') as f:
        json.dump(test_metrics, f, indent=2)

    config.logger.info(f"M√©tricas de test guardadas en: {metrics_file}")

    # ========================================================================
    # AN√ÅLISIS DETALLADO
    # ========================================================================

    print("\n" + "="*80)
    print("üìà AN√ÅLISIS DETALLADO")
    print("="*80 + "\n")

    # Obtener predicciones y etiquetas verdaderas
    y_pred = np.argmax(test_predictions.predictions, axis=1)
    y_true = test_predictions.label_ids

    # Reporte de clasificaci√≥n por clase
    print("üìä Reporte de Clasificaci√≥n por Clase:\n")

    # Crear reporte con nombres de clases
    target_names = [id2label[i] for i in range(len(id2label))]
    class_report = classification_report(
        y_true,
        y_pred,
        target_names=target_names,
        zero_division=0,
        digits=4
    )
    print(class_report)

    # Guardar reporte completo
    report_file = os.path.join(config.OUTPUT_DIR, 'classification_report.txt')
    with open(report_file, 'w', encoding='utf-8') as f:
        f.write("REPORTE DE CLASIFICACI√ìN - TEST SET\n")
        f.write("="*80 + "\n\n")
        f.write(f"Modelo: {config.MODEL_NAME}\n")
        f.write(f"Fecha: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Dataset: {config.DATA_PATH}\n")
        f.write("\n" + "="*80 + "\n\n")
        f.write(class_report)

    print(f"\n‚úÖ Reporte completo guardado en: {report_file}")

    # ========================================================================
    # AN√ÅLISIS DE ERRORES
    # ========================================================================

    print("\n" + "="*80)
    print("üîç AN√ÅLISIS DE ERRORES")
    print("="*80 + "\n")

    # Identificar predicciones incorrectas
    incorrect_mask = y_pred != y_true
    num_incorrect = incorrect_mask.sum()
    error_rate = num_incorrect / len(y_true) * 100

    print(f"Total de predicciones: {len(y_true):,}")
    print(f"Predicciones correctas: {(~incorrect_mask).sum():,}")
    print(f"Predicciones incorrectas: {num_incorrect:,}")
    print(f"Tasa de error: {error_rate:.2f}%")

    # Crear DataFrame con errores
    errors_df = test_df[incorrect_mask].copy()
    errors_df['predicted_label'] = [id2label[pred] for pred in y_pred[incorrect_mask]]
    errors_df['true_label'] = [id2label[true] for true in y_true[incorrect_mask]]
    errors_df['predicted_id'] = y_pred[incorrect_mask]
    errors_df['true_id'] = y_true[incorrect_mask]

    # Agregar probabilidades
    probs = torch.nn.functional.softmax(torch.tensor(test_predictions.predictions), dim=-1)
    max_probs = probs.max(dim=-1).values.numpy()
    errors_df['confidence'] = max_probs[incorrect_mask]

    # Guardar an√°lisis de errores
    errors_file = os.path.join(config.OUTPUT_DIR, 'error_analysis.csv')
    errors_df.to_csv(errors_file, index=False, encoding='utf-8')

    print(f"\n‚úÖ An√°lisis de errores guardado en: {errors_file}")

    # Mostrar clases con m√°s errores
    if len(errors_df) > 0:
        print("\nüìä Top 10 clases con m√°s errores de predicci√≥n:")
        error_by_class = errors_df['true_label'].value_counts().head(10)
        for idx, (clase, count) in enumerate(error_by_class.items(), 1):
            print(f"   {idx}. {clase}: {count} errores")

    print("\n" + "="*80 + "\n")

    config.logger.info("Evaluaci√≥n en test completada")

except Exception as e:
    print("\n" + "="*80)
    print("‚ùå ERROR EN LA EVALUACI√ìN")
    print("="*80)
    print(f"\n{str(e)}\n")
    config.logger.error(f"Error en evaluaci√≥n: {str(e)}", exc_info=True)
    raise



üß™ EVALUACI√ìN EN TEST SET




üìä M√âTRICAS DE TEST (EVALUACI√ìN FINAL)

üí• LOSS: 0.2929
üéØ ACCURACY: 0.9363

--------------------------------------------------------------------------------

M√©trica                     Macro        Micro     Weighted
------------------------------------------------------------
F1 Score                   0.5391       0.9363       0.9318
Precision                  0.5564       0.9363       0.9303
Recall                     0.5488       0.9363       0.9363



üìà AN√ÅLISIS DETALLADO

üìä Reporte de Clasificaci√≥n por Clase:

              precision    recall  f1-score   support

        0111     0.0000    0.0000    0.0000         3
        0112     0.0000    0.0000    0.0000         2
        0120     1.0000    0.7778    0.8750         9
        0211     0.5294    1.0000    0.6923         9
        0212     0.4000    0.4000    0.4000         5
        0213     0.0000    0.0000    0.0000         2
        0220     0.9864    1.0000    0.9932       145
        0311     1.0000  

In [12]:
# ============================================================================
# üíæ GUARDADO COMPLETO DEL MODELO Y ARTEFACTOS
# ============================================================================

print("\n" + "="*80)
print("üíæ GUARDANDO MODELO Y ARTEFACTOS")
print("="*80 + "\n")

try:
    config.logger.info("Guardando modelo final...")

    # 1. Guardar modelo y tokenizer
    trainer.save_model(config.MODEL_SAVE_DIR)
    tokenizer.save_pretrained(config.MODEL_SAVE_DIR)

    print(f"‚úÖ Modelo y tokenizer guardados en: {config.MODEL_SAVE_DIR}")
    config.logger.info(f"Modelo guardado en: {config.MODEL_SAVE_DIR}")

    # 2. Guardar artefactos (mapeos, configuraci√≥n, m√©tricas)
    artifacts = {
        'label2id': label2id,
        'id2label': id2label,
        'num_labels': len(label2id),
        'target_column': config.TARGET_COLUMN,
        'text_column': config.TEXT_COLUMN,
        'max_length': config.MAX_LENGTH,
        'model_name': config.MODEL_NAME,
        'model_type': config.model_type,
        'experiment_name': config.experiment_name,
        'training_date': datetime.now().isoformat(),
        'test_metrics': test_metrics,
        'val_metrics': val_metrics if 'val_metrics' in locals() else None,
        'training_time': str(training_time) if 'training_time' in locals() else None,
        'device': config.DEVICE,
    }

    artifacts_file = os.path.join(config.OUTPUT_DIR, 'artifacts.pkl')
    with open(artifacts_file, 'wb') as f:
        pickle.dump(artifacts, f)

    print(f"‚úÖ Artefactos guardados en: {artifacts_file}")
    config.logger.info(f"Artefactos guardados en: {artifacts_file}")

    # 3. Guardar resumen en JSON (legible)
    summary = {
        'experiment_name': config.experiment_name,
        'model_name': config.MODEL_NAME,
        'model_type': config.model_type,
        'num_classes': len(label2id),
        'training_date': datetime.now().isoformat(),
        'data_path': config.DATA_PATH,
        'max_length': config.MAX_LENGTH,
        'batch_size': config.BATCH_SIZE,
        'learning_rate': config.LEARNING_RATE,
        'num_epochs': config.NUM_EPOCHS,
        'test_accuracy': test_metrics.get('test_accuracy', test_metrics.get('eval_accuracy', 0)),
        'test_f1_macro': test_metrics.get('test_f1_macro', test_metrics.get('eval_f1_macro', 0)),
        'test_f1_weighted': test_metrics.get('test_f1_weighted', test_metrics.get('eval_f1_weighted', 0)),
        'device_used': config.DEVICE,
    }

    summary_file = os.path.join(config.OUTPUT_DIR, 'experiment_summary.json')
    with open(summary_file, 'w', encoding='utf-8') as f:
        json.dump(summary, f, indent=2, ensure_ascii=False)

    print(f"‚úÖ Resumen guardado en: {summary_file}")

    # 4. Crear archivo README
    readme_content = f"""# Experimento: {config.experiment_name}

## Informaci√≥n del Modelo
- **Modelo Base**: {config.MODEL_NAME}
- **Tipo**: {config.model_type}
- **N√∫mero de Clases**: {len(label2id)}
- **Fecha de Entrenamiento**: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

## Resultados (Test Set)
- **Accuracy**: {test_metrics.get('test_accuracy', test_metrics.get('eval_accuracy', 0)):.4f}
- **F1 Macro**: {test_metrics.get('test_f1_macro', test_metrics.get('eval_f1_macro', 0)):.4f}
- **F1 Weighted**: {test_metrics.get('test_f1_weighted', test_metrics.get('eval_f1_weighted', 0)):.4f}
- **Precision Weighted**: {test_metrics.get('test_precision_weighted', test_metrics.get('eval_precision_weighted', 0)):.4f}
- **Recall Weighted**: {test_metrics.get('test_recall_weighted', test_metrics.get('eval_recall_weighted', 0)):.4f}

## Archivos Generados
- `final_model/`: Modelo entrenado y tokenizer
- `artifacts.pkl`: Mapeos y metadata
- `config.json`: Configuraci√≥n del entrenamiento
- `test_metrics.json`: M√©tricas completas de test
- `classification_report.txt`: Reporte detallado por clase
- `error_analysis.csv`: An√°lisis de errores
- `experiment_summary.json`: Resumen del experimento

## C√≥mo Usar el Modelo

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import pickle

# Cargar modelo y tokenizer
model_path = "{config.MODEL_SAVE_DIR}"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Cargar artefactos
with open('{artifacts_file}', 'rb') as f:
    artifacts = pickle.load(f)

id2label = artifacts['id2label']

# Hacer predicci√≥n
text = "Tu texto aqu√≠"
inputs = tokenizer(text, return_tensors='pt', max_length={config.MAX_LENGTH},
                   padding='max_length', truncation=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax().item()
predicted_label = id2label[predicted_class]
print(f"Predicci√≥n: {{predicted_label}}")
```
"""

    readme_file = os.path.join(config.OUTPUT_DIR, 'README.md')
    with open(readme_file, 'w', encoding='utf-8') as f:
        f.write(readme_content)

    print(f"‚úÖ README creado en: {readme_file}")

    # 5. Listar todos los archivos generados
    print("\n" + "="*80)
    print("üìÅ ARCHIVOS GENERADOS")
    print("="*80 + "\n")

    all_files = [
        ('Modelo final', config.MODEL_SAVE_DIR),
        ('Artefactos', artifacts_file),
        ('Configuraci√≥n', os.path.join(config.OUTPUT_DIR, 'config.json')),
        ('M√©tricas de test', os.path.join(config.OUTPUT_DIR, 'test_metrics.json')),
        ('Reporte de clasificaci√≥n', report_file),
        ('An√°lisis de errores', errors_file),
        ('Resumen del experimento', summary_file),
        ('README', readme_file),
        ('Info de clases', os.path.join(config.OUTPUT_DIR, 'class_info.json')),
    ]

    for name, path in all_files:
        if os.path.exists(path):
            if os.path.isdir(path):
                print(f"‚úÖ {name:.<40} {path}")
            else:
                size = os.path.getsize(path) / 1024
                print(f"‚úÖ {name:.<40} {path} ({size:.1f} KB)")

    print("\n" + "="*80 + "\n")

    config.logger.info("Todos los archivos guardados exitosamente")

except Exception as e:
    print("\n" + "="*80)
    print("‚ùå ERROR AL GUARDAR ARCHIVOS")
    print("="*80)
    print(f"\n{str(e)}\n")
    config.logger.error(f"Error al guardar: {str(e)}", exc_info=True)
    raise



üíæ GUARDANDO MODELO Y ARTEFACTOS

‚úÖ Modelo y tokenizer guardados en: /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/final_model
‚úÖ Artefactos guardados en: /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/artifacts.pkl
‚úÖ Resumen guardado en: /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/experiment_summary.json
‚úÖ README creado en: /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/README.md

üìÅ ARCHIVOS GENERADOS

‚úÖ Modelo final............................ /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/final_model
‚úÖ Artefactos.............................. /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/artifacts.pkl (6.2 KB)
‚úÖ Configuraci√≥n........................... /content/drive/MyDrive/PI_PEU/xlmRoberta/xlm-roberta-base_20251112_182822/config.json (0.5 KB)
‚úÖ M√©tricas de test........................ /content

In [13]:
# ============================================================================
# üîÆ FUNCIONES DE PREDICCI√ìN E INFERENCIA
# ============================================================================

def load_trained_model_for_inference(model_dir, artifacts_path, device=None):
    """
    Carga el modelo entrenado para hacer predicciones

    Args:
        model_dir: Directorio del modelo guardado
        artifacts_path: Ruta al archivo de artefactos
        device: Dispositivo ('cuda' o 'cpu'), None para auto-detectar

    Returns:
        tuple: (model, tokenizer, artifacts)
    """
    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    import pickle

    if device is None:
        device = 'cuda' if torch.cuda.is_available() else 'cpu'

    print("\n" + "="*80)
    print("üìÇ CARGANDO MODELO PARA INFERENCIA")
    print("="*80 + "\n")

    # Cargar tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    print(f"‚úÖ Tokenizer cargado")

    # Cargar modelo
    model = AutoModelForSequenceClassification.from_pretrained(model_dir)
    model.to(device)
    model.eval()  # Modo evaluaci√≥n
    print(f"‚úÖ Modelo cargado en: {device}")

    # Cargar artefactos
    with open(artifacts_path, 'rb') as f:
        artifacts = pickle.load(f)
    print(f"‚úÖ Artefactos cargados")
    print(f"   N√∫mero de clases: {artifacts['num_labels']}")

    print("\n" + "="*80 + "\n")

    return model, tokenizer, artifacts


def predict_single(text, model, tokenizer, artifacts, device=None, top_k=5):
    """
    Predice la clase de un texto individual

    Args:
        text: Texto a clasificar
        model: Modelo entrenado
        tokenizer: Tokenizer
        artifacts: Diccionario de artefactos
        device: Dispositivo
        top_k: N√∫mero de predicciones principales a retornar

    Returns:
        dict: Resultados de la predicci√≥n
    """
    import torch
    import torch.nn.functional as F

    if device is None:
        device = next(model.parameters()).device

    # Tokenizar
    inputs = tokenizer(
        text,
        max_length=artifacts['max_length'],
        padding='max_length',
        truncation=True,
        return_tensors='pt'
    )
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Predicci√≥n
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = F.softmax(logits, dim=-1)

    # Predicci√≥n principal
    predicted_idx = torch.argmax(probs, dim=-1).item()
    predicted_prob = probs[0][predicted_idx].item()
    predicted_label = artifacts['id2label'][str(predicted_idx)]

    # Top-K predicciones
    top_probs, top_indices = torch.topk(probs[0], k=min(top_k, len(artifacts['id2label'])))
    top_predictions = [
        {
            'label': artifacts['id2label'][str(idx.item())],
            'probability': prob.item()
        }
        for prob, idx in zip(top_probs, top_indices)
    ]

    return {
        'text': text,
        'predicted_label': predicted_label,
        'predicted_probability': predicted_prob,
        'top_predictions': top_predictions
    }


def predict_batch(texts, model, tokenizer, artifacts, device=None, batch_size=32):
    """
    Predice las clases de m√∫ltiples textos

    Args:
        texts: Lista de textos
        model: Modelo entrenado
        tokenizer: Tokenizer
        artifacts: Diccionario de artefactos
        device: Dispositivo
        batch_size: Tama√±o del lote

    Returns:
        list: Lista de predicciones
    """
    import torch
    import torch.nn.functional as F
    from tqdm.auto import tqdm

    if device is None:
        device = next(model.parameters()).device

    model.eval()
    predictions = []

    # Procesar en lotes
    for i in tqdm(range(0, len(texts), batch_size), desc="Prediciendo"):
        batch_texts = texts[i:i+batch_size]

        # Tokenizar
        inputs = tokenizer(
            batch_texts,
            max_length=artifacts['max_length'],
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        inputs = {k: v.to(device) for k, v in inputs.items()}

        # Predicci√≥n
        with torch.no_grad():
            outputs = model(**inputs)
            logits = outputs.logits
            probs = F.softmax(logits, dim=-1)

        # Extraer predicciones
        predicted_indices = torch.argmax(probs, dim=-1).cpu().numpy()
        predicted_probs = torch.max(probs, dim=-1)[0].cpu().numpy()

        for idx, prob in zip(predicted_indices, predicted_probs):
            predictions.append({
                'predicted_label': artifacts['id2label'][str(idx)],
                'probability': float(prob)
            })

    return predictions


def interactive_prediction_mode(model, tokenizer, artifacts, device=None):
    """
    Modo interactivo para probar el modelo

    Args:
        model: Modelo entrenado
        tokenizer: Tokenizer
        artifacts: Diccionario de artefactos
        device: Dispositivo
    """
    print("\n" + "="*80)
    print("üß™ MODO DE PRUEBA INTERACTIVO")
    print("="*80)
    print("\nEscribe un texto para clasificar (o 'salir' para terminar)")
    print("Comandos especiales: 'salir', 'exit', 'quit', 'q'\n")

    while True:
        print("-" * 80)
        text = input("\nüìù Texto: ").strip()

        if text.lower() in ['salir', 'exit', 'quit', 'q']:
            print("\nüëã ¬°Hasta luego!\n")
            break

        if not text:
            print("‚ö†Ô∏è  Por favor, ingresa un texto v√°lido")
            continue

        # Realizar predicci√≥n
        result = predict_single(text, model, tokenizer, artifacts, device)

        # Mostrar resultados
        print("\nüìä RESULTADO:")
        print(f"   üéØ Clase predicha: {result['predicted_label']}")
        print(f"   üìà Confianza: {result['predicted_probability']:.2%}")
        print(f"\n   üîù Top 5 predicciones:")
        for i, pred in enumerate(result['top_predictions'], 1):
            bar = "‚ñà" * int(pred['probability'] * 20)
            print(f"      {i}. {pred['label']:<15} {pred['probability']:>6.2%} {bar}")
        print()


print("‚úÖ Funciones de predicci√≥n cargadas")


‚úÖ Funciones de predicci√≥n cargadas


In [14]:
# ============================================================================
# üí° EJEMPLO DE USO - PREDICCI√ìN CON EL MODELO ENTRENADO
# ============================================================================

# Este c√≥digo muestra c√≥mo usar el modelo despu√©s del entrenamiento
# Puedes ejecutarlo en este notebook o en uno nuevo

print("\n" + "="*80)
print("üí° EJEMPLOS DE USO")
print("="*80 + "\n")

# ============================================================================
# OPCI√ìN 1: Usar el modelo reci√©n entrenado (en este notebook)
# ============================================================================

print("üìù OPCI√ìN 1: Predicci√≥n individual con modelo actual\n")

# Ejemplo de textos para probar
ejemplos_texto = [
    "vendedor de abarrotes en bodega",
    "profesor de matem√°ticas en colegio secundario",
    "conductor de taxi"
]

print("Ejemplos de predicci√≥n:\n")
for texto in ejemplos_texto:
    result = predict_single(texto, model, tokenizer, artifacts, config.DEVICE)
    print(f"Texto: {texto}")
    print(f"Predicci√≥n: {result['predicted_label']} (confianza: {result['predicted_probability']:.2%})\n")

print("-" * 80 + "\n")

# ============================================================================
# OPCI√ìN 2: Cargar modelo guardado (en una nueva sesi√≥n)
# ============================================================================

print("üìù OPCI√ìN 2: C√≥digo para cargar el modelo en una nueva sesi√≥n\n")

codigo_ejemplo = f'''
# C√≥digo para usar en un notebook/script nuevo:

# 1. Cargar el modelo
model_inference, tokenizer_inference, artifacts_inference = load_trained_model_for_inference(
    model_dir="{config.MODEL_SAVE_DIR}",
    artifacts_path="{os.path.join(config.OUTPUT_DIR, 'artifacts.pkl')}"
)

# 2. Hacer predicci√≥n individual
texto = "alba√±il de construcci√≥n"
result = predict_single(
    text=texto,
    model=model_inference,
    tokenizer=tokenizer_inference,
    artifacts=artifacts_inference
)
print(f"Predicci√≥n: {{result['predicted_label']}}")
print(f"Confianza: {{result['predicted_probability']:.2%}}")

# 3. Hacer predicci√≥n por lotes
textos = ["texto 1", "texto 2", "texto 3"]
predictions = predict_batch(
    texts=textos,
    model=model_inference,
    tokenizer=tokenizer_inference,
    artifacts=artifacts_inference,
    batch_size=32
)

# 4. Modo interactivo
interactive_prediction_mode(
    model=model_inference,
    tokenizer=tokenizer_inference,
    artifacts=artifacts_inference
)
'''

print(codigo_ejemplo)

print("=" * 80 + "\n")



üí° EJEMPLOS DE USO

üìù OPCI√ìN 1: Predicci√≥n individual con modelo actual

Ejemplos de predicci√≥n:



KeyError: '193'

In [None]:
# ============================================================================
# üéÆ MODO INTERACTIVO - PROBAR EL MODELO
# ============================================================================

# Descomenta y ejecuta esta celda para probar el modelo interactivamente

# interactive_prediction_mode(
#     model=model,
#     tokenizer=tokenizer,
#     artifacts=artifacts,
#     device=config.DEVICE
# )

print("\nüí° Descomenta el c√≥digo arriba para activar el modo interactivo\n")


---

# üéâ ¬°ENTRENAMIENTO COMPLETADO!

## üìä Resumen de Resultados

Tu modelo ha sido entrenado exitosamente. Revisa:

1. **M√©tricas de Test**: Accuracy, F1, Precision, Recall (macro, micro, weighted)
2. **Archivos Generados**: Todos los archivos est√°n en el directorio de salida
3. **README**: Instrucciones detalladas de uso

## üöÄ Pr√≥ximos Pasos

### Para entrenar otro modelo:
1. Cambia `MODEL_NAME` en `ModelConfig`
2. Ejecuta todas las celdas nuevamente

### Para usar el modelo:
1. Usa las funciones de predicci√≥n en este notebook
2. O carga el modelo en un script nuevo (ver ejemplos arriba)

### Para mejorar los resultados:
- Ajusta los hiperpar√°metros en `ModelConfig`
- Aumenta `NUM_EPOCHS`
- Experimenta con diferentes `LEARNING_RATE`
- Aumenta `MAX_LENGTH` si tus textos son largos

---

## üìö Modelos Soportados

Este script funciona con:
- ‚úÖ `FacebookAI/xlm-roberta-base` (Multiling√ºe)
- ‚úÖ `dccuchile/bert-base-spanish-wwm-cased` (Espa√±ol)
- ‚úÖ Cualquier modelo de Hugging Face compatible con `AutoModelForSequenceClassification`

---

**¬øPreguntas o errores?** Revisa:
- El archivo de log en el directorio de salida
- Los mensajes de error detallados en cada celda
- La documentaci√≥n de transformers: https://huggingface.co/docs/transformers
