# THE REAL SLIM ENSEMBLE (si, por eminem)
## Sistema TruthSeeker Definitivo - El Mega Ensemble que Rompe Todo

**Este notebook implementa el sistema de detección de desinformación más completo y poderoso:**
- **Modelos Tradicionales Campeones** (Random Forest, XGBoost, Ridge, SVM)
- **Deep Neural Networks** desde cero con arquitecturas optimizadas
- **BERT/Transformers** (RoBERTa, DistilBERT, BERT-base) fine-tuneados
- **Sistema RAG Híbrido** con recuperación semántica
- **Meta-Ensemble Supremo** que combina TODO lo mejor

**Objetivo:** Lograr F1-Score > 0.90 mediante inteligencia colectiva de modelos.

---

## Setup y Configuración del Entorno

In [14]:
# =============================================================================
# CONFIGURACIÓN DEL ENTORNO Y LIBRERÍAS
# =============================================================================

import os
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Configuración para reproducibilidad
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

# Detectar entorno de ejecución
try:
    import google.colab
    ENVIRONMENT = 'colab'
    print("Entorno: Google Colab - Listos para romper todo")
except:
    ENVIRONMENT = 'local'
    print("Entorno: Local - Vamos con todo")

print(f"Random State: {RANDOM_STATE}")
print("THE REAL SLIM ENSEMBLE - Iniciando...")
print("=" * 60)

# =============================================================================
# IMPORTACIONES CORE - Machine Learning
# =============================================================================

# Scikit-learn essentials
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import RobustScaler, StandardScaler, LabelEncoder
from sklearn.metrics import (
    accuracy_score, f1_score, precision_score, recall_score,
    roc_auc_score, roc_curve, precision_recall_curve, average_precision_score,
    confusion_matrix, classification_report, matthews_corrcoef,
    cohen_kappa_score, log_loss, brier_score_loss
)

# Modelos tradicionales campeones
from sklearn.ensemble import (
    RandomForestClassifier, ExtraTreesClassifier,
    VotingClassifier, StackingClassifier, BaggingClassifier
)
from sklearn.linear_model import RidgeCV, LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.neural_network import MLPClassifier
from sklearn.naive_bayes import GaussianNB

# XGBoost y LightGBM - Los reyes del boosting
try:
    import xgboost as xgb
    import lightgbm as lgb
    import catboost as cb
    BOOSTING_AVAILABLE = True
    print("boosting models disponibles")
except ImportError as e:
    print(f"Instalando boosting libraries: {e}")
    if ENVIRONMENT == 'colab':
        !pip install -q xgboost lightgbm catboost
        import xgboost as xgb
        import lightgbm as lgb
        import catboost as cb
        BOOSTING_AVAILABLE = True
    else:
        BOOSTING_AVAILABLE = False

print(f"Boosting models: {"listo" if BOOSTING_AVAILABLE else "liston't"}")

# =============================================================================
# IMPORTACIONES DEEP LEARNING - TensorFlow y Transformers
# =============================================================================

# TensorFlow para DNNs
try:
    import tensorflow as tf
    from tensorflow.keras.models import Sequential, Model
    from tensorflow.keras.layers import (
        Dense, Dropout, BatchNormalization, Input,
        Conv1D, MaxPooling1D, GlobalMaxPooling1D,
        LSTM, GRU, Bidirectional, Embedding
    )
    from tensorflow.keras.optimizers import Adam, RMSprop
    from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
    from tensorflow.keras.regularizers import l1, l2, l1_l2

    # Configurar GPU si está disponible
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        print(f"GPU disponible: {len(gpus)} device(s)")
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    else:
        print("Ejecutando en CPU")

    TENSORFLOW_AVAILABLE = True
except ImportError:
    print("TensorFlow no disponible - DNNs deshabilitadas")
    TENSORFLOW_AVAILABLE = False

# Transformers para BERT
try:
    import torch
    from transformers import (
        AutoTokenizer, AutoModel, AutoModelForSequenceClassification,
        TrainingArguments, Trainer, DataCollatorWithPadding
    )
    from datasets import Dataset

    # Configurar device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"PyTorch device: {device}")

    TRANSFORMERS_AVAILABLE = True
except ImportError:
    if ENVIRONMENT == 'colab':
        print("Instalando transformers...")
        !pip install -q transformers datasets torch
        import torch
        from transformers import (
            AutoTokenizer, AutoModel, AutoModelForSequenceClassification,
            TrainingArguments, Trainer, DataCollatorWithPadding
        )
        from datasets import Dataset
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        TRANSFORMERS_AVAILABLE = True
    else:
        print("Transformers no disponible - BERT deshabilitado")
        TRANSFORMERS_AVAILABLE = False

print(f"Deep Learning Status:")
print(f"   TensorFlow: {'si' if TENSORFLOW_AVAILABLE else 'no'}")
print(f"   Transformers: {'si' if TRANSFORMERS_AVAILABLE else 'no'}")

# =============================================================================
# IMPORTACIONES NLP y RAG
# =============================================================================

# NLP tradicional
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.pipeline import Pipeline

# Para RAG y embeddings
try:
    from sentence_transformers import SentenceTransformer
    import faiss
    RAG_AVAILABLE = True
    print("RAG components disponibles")
except ImportError:
    if ENVIRONMENT == 'colab':
        print("Instalando sentence-transformers y faiss...")
        !pip install -q sentence-transformers faiss-cpu
        from sentence_transformers import SentenceTransformer
        import faiss
        RAG_AVAILABLE = True
    else:
        print("RAG components no disponibles")
        RAG_AVAILABLE = False

# Utilidades
import re
import string
import time
import pickle
import json
from collections import defaultdict, Counter
from datetime import datetime

print(f"RAG System: {'listos profe' if RAG_AVAILABLE else 'nada mi fafa'}")
print("Todas las importaciones completadas - Andamo Ready!")

# =============================================================================
# IMPORTACIONES PARA VISUALIZACIÓN - Preparadas pero no ejecutadas
# =============================================================================

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo

# Configuración de visualización
plt.style.use('default')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Librerías de visualización cargadas y todo listo para la accion")

Entorno: Google Colab - Listos para romper todo
Random State: 42
THE REAL SLIM ENSEMBLE - Iniciando...
boosting models disponibles
Boosting models: listo
GPU disponible: 1 device(s)
PyTorch device: cuda
Deep Learning Status:
   TensorFlow: si
   Transformers: si
RAG components disponibles
RAG System: listos profe
Todas las importaciones completadas - Andamo Ready!
Librerías de visualización cargadas y todo listo para la accion


## Carga y Preparación de Datos

In [15]:
# =============================================================================
# CARGA DE DATOS - Con corrección de data leakage
# =============================================================================

print("Cargando datos del TruthSeeker 2023...")
print("IMPORTANTE: Subir los siguientes archivos a /content/ en Colab:")
print("   1. text_data_for_nlp.csv (para BERT)")
print("   2. dataset_features_processed_winsorized.csv (features numéricas)")
print("   3. X_test.pkl y y_test.pkl (si están disponibles)")
print("   4. robust_scaler.pkl (scaler pre-entrenado)")
print()

# Rutas según el entorno
if ENVIRONMENT == 'colab':
    text_path = '/content/text_data_for_nlp.csv'
    features_path = '/content/dataset_features_processed_winsorized.csv'
    test_x_path = '/content/X_test.pkl'
    test_y_path = '/content/y_test.pkl'
    scaler_path = '/content/robust_scaler.pkl'
else:
    text_path = '../processed_data/text_data_for_nlp.csv'
    features_path = '../processed_data/dataset_features_processed_winsorized.csv'
    test_x_path = '../processed_data/X_test.pkl'
    test_y_path = '../processed_data/y_test.pkl'
    scaler_path = '../processed_data/robust_scaler.pkl'

try:
    # Cargar datos de texto para BERT
    print(f"Cargando texto desde: {text_path}")
    text_df = pd.read_csv(text_path)
    print(f"Datos de texto: {text_df.shape}")

    # Cargar features numéricas
    print(f"Cargando features desde: {features_path}")
    features_df = pd.read_csv(features_path)
    print(f"Features numéricas: {features_df.shape}")

    # Verificar data leakage
    unique_statements = text_df['statement'].nunique()
    total_rows = len(text_df)

    print(f"\nANALISIS DE DATA LEAKAGE:")
    print(f"   Total filas: {total_rows:,}")
    print(f"   Statements únicos: {unique_statements:,}")
    print(f"   Ratio duplicación: {total_rows/unique_statements:.1f}x")

    if unique_statements < total_rows:
        print(f"Data leakage detectado - Corrigiendo...")

        # Eliminar duplicados manteniendo correspondencia con features
        # Usar índices para mantener alineación
        unique_indices = text_df.drop_duplicates(subset=['statement'], keep='first').index
        text_df_clean = text_df.loc[unique_indices].copy()

        # Alinear features usando los mismos índices
        if len(features_df) >= len(text_df):
            features_df_clean = features_df.loc[unique_indices].copy()
        else:
            # Si features es más pequeño, usar solo los primeros índices disponibles
            available_indices = unique_indices[unique_indices < len(features_df)]
            text_df_clean = text_df.loc[available_indices].copy()
            features_df_clean = features_df.loc[available_indices].copy()

        print(f"Datos limpios: {len(text_df_clean)} muestras únicas")
        print(f"Features alineadas: {len(features_df_clean)} muestras")

        text_df = text_df_clean
        features_df = features_df_clean

    # Usar BinaryNumTarget directamente como label
    if 'BinaryNumTarget' in features_df.columns:
        print("Columna 'BinaryNumTarget' encontrada. Creando dataset integrado...")

        # Crear dataset integrado con statement y label
        text_df['label'] = features_df['BinaryNumTarget'].values

        print(f"Labels asignadas correctamente")
    else:
        raise KeyError("No se encontró la columna 'BinaryNumTarget' en features_df")

    # Verificar que no hay NaN en labels
    if text_df['label'].isna().sum() > 0:
        print(f"Eliminando {text_df['label'].isna().sum()} filas con labels NaN")
        valid_mask = ~text_df['label'].isna()
        text_df = text_df[valid_mask].copy()
        features_df = features_df[valid_mask].copy()

    # Verificar balance de clases
    class_distribution = text_df['label'].value_counts(normalize=True)
    print(f"\nDistribución de clases:")
    print(f"   Verdadero (1): {class_distribution.get(1, 0):.1%}")
    print(f"   Falso (0): {class_distribution.get(0, 0):.1%}")

    # Verificar alineación final
    print(f"\nVERIFICACION FINAL:")
    print(f"   Text DF: {text_df.shape}")
    print(f"   Features DF: {features_df.shape}")
    print(f"   Índices alineados: {len(text_df) == len(features_df)}")

    DATA_LOADED = True
    print(f"\nDatos cargados exitosamente para {len(text_df)} muestras")

except Exception as e:
    print(f"Error cargando datos: {e}")
    print(f"Detalles del error:")
    print(f"   Text DF shape: {text_df.shape if 'text_df' in locals() else 'No cargado'}")
    print(f"   Features DF shape: {features_df.shape if 'features_df' in locals() else 'No cargado'}")
    print("Asegúrate de subir los archivos necesarios")
    DATA_LOADED = False

Cargando datos del TruthSeeker 2023...
IMPORTANTE: Subir los siguientes archivos a /content/ en Colab:
   1. text_data_for_nlp.csv (para BERT)
   2. dataset_features_processed_winsorized.csv (features numéricas)
   3. X_test.pkl y y_test.pkl (si están disponibles)
   4. robust_scaler.pkl (scaler pre-entrenado)

Cargando texto desde: /content/text_data_for_nlp.csv
Datos de texto: (134198, 2)
Cargando features desde: /content/dataset_features_processed_winsorized.csv
Features numéricas: (134198, 58)

ANALISIS DE DATA LEAKAGE:
   Total filas: 134,198
   Statements únicos: 1,058
   Ratio duplicación: 126.8x
Data leakage detectado - Corrigiendo...
Datos limpios: 1058 muestras únicas
Features alineadas: 1058 muestras
Columna 'BinaryNumTarget' encontrada. Creando dataset integrado...
Labels asignadas correctamente

Distribución de clases:
   Verdadero (1): 54.7%
   Falso (0): 45.3%

VERIFICACION FINAL:
   Text DF: (1058, 3)
   Features DF: (1058, 58)
   Índices alineados: True

Datos cargados

In [16]:
# =============================================================================
# PREPROCESAMIENTO Y SPLITS DE DATOS
# =============================================================================

if DATA_LOADED:
    print("Preparando datos para entrenamiento...")

    # Verificar que tenemos las columnas necesarias
    required_cols = ['statement', 'label']
    missing_cols = [col for col in required_cols if col not in text_df.columns]

    if missing_cols:
        print(f"Columnas faltantes: {missing_cols}")
    else:
        # Preparar arrays para ML
        X_text = text_df['statement'].values
        y = text_df['label'].astype(int).values

        # Features numéricas (excluir BinaryNumTarget que ya es label)
        numeric_cols = features_df.select_dtypes(include=[np.number]).columns
        if 'BinaryNumTarget' in numeric_cols:
            numeric_cols = numeric_cols.drop('BinaryNumTarget')

        X_numeric = features_df[numeric_cols].values

        print(f"Datos preparados:")
        print(f"   Texto: {X_text.shape}")
        print(f"   Features numéricas: {X_numeric.shape}")
        print(f"   Labels: {y.shape}")
        print(f"   Balance: {np.mean(y):.3f} (proporción positiva)")

        # Train/Test Split estratificado
        print(f"\nCreando splits train/test (80/20)...")
        (
            X_text_train, X_text_test,
            X_num_train, X_num_test,
            y_train, y_test
        ) = train_test_split(
            X_text, X_numeric, y,
            test_size=0.2,
            random_state=RANDOM_STATE,
            stratify=y
        )

        # Escalado de features numéricas
        print("Aplicando escalado robusto...")
        scaler = RobustScaler()
        X_num_train_scaled = scaler.fit_transform(X_num_train)
        X_num_test_scaled = scaler.transform(X_num_test)

        print(f"Train set: {len(X_text_train)} muestras")
        print(f"Test set: {len(X_text_test)} muestras")
        print(f"Features escaladas: {X_num_train_scaled.shape[1]} características")

        PREPROCESSING_COMPLETE = True
else:
    print("No se pueden procesar datos - carga fallida")
    PREPROCESSING_COMPLETE = False

Preparando datos para entrenamiento...
Datos preparados:
   Texto: (1058,)
   Features numéricas: (1058, 57)
   Labels: (1058,)
   Balance: 0.547 (proporción positiva)

Creando splits train/test (80/20)...
Aplicando escalado robusto...
Train set: 846 muestras
Test set: 212 muestras
Features escaladas: 57 características


## Modelos Tradicionales Campeones

In [17]:
# =============================================================================
# MODELOS TRADICIONALES
# =============================================================================

if PREPROCESSING_COMPLETE:
    print("ENTRENANDO MODELOS TRADICIONALES")
    print("=" * 60)

    traditional_results = {}
    traditional_models = {}

    def train_traditional_models(X_train, X_test, y_train, y_test):
        """Entrena modelos tradicionales"""

        models_config = {
            'ridge_cv': RidgeCV(alphas=np.logspace(-3, 3, 50), cv=5),
            'svm_linear_l1': LinearSVC(C=1.0, penalty='l1', dual=False, random_state=RANDOM_STATE, max_iter=2000),
            'catboost_optimized': cb.CatBoostClassifier(
                iterations=200, depth=6, learning_rate=0.1, l2_leaf_reg=3,
                random_seed=RANDOM_STATE, verbose=False
            ),
            'extra_trees': ExtraTreesClassifier(
                n_estimators=200, max_depth=15, min_samples_split=5,
                random_state=RANDOM_STATE, n_jobs=-1
            ),
            'lightgbm': lgb.LGBMClassifier(
                random_state=RANDOM_STATE, verbose=-1
            ),
            'random_forest': RandomForestClassifier(
                n_estimators=100, max_depth=20, min_samples_split=5,
                random_state=RANDOM_STATE, n_jobs=-1
            ),
            'logistic_l2': LogisticRegression(
                penalty='l2', C=1.0, random_state=RANDOM_STATE, max_iter=1000
            ),
            'xgboost': xgb.XGBClassifier(
                random_state=RANDOM_STATE, eval_metric='logloss'
            )
        }

        results = {}
        models = {}

        for name, model in models_config.items():
            print(f"\nEntrenando {name}...")
            start_time = time.time()

            try:
                model.fit(X_train, y_train)

                # Predicciones
                if hasattr(model, 'predict_proba'):
                    pred_proba = model.predict_proba(X_test)[:, 1]
                elif hasattr(model, 'decision_function'):
                    scores = model.decision_function(X_test)
                    pred_proba = 1 / (1 + np.exp(-scores))
                else:
                    pred_proba = model.predict(X_test).astype(float)

                predictions = (pred_proba > 0.5).astype(int)

                # Métricas
                f1 = f1_score(y_test, predictions)
                accuracy = accuracy_score(y_test, predictions)
                roc_auc = roc_auc_score(y_test, pred_proba)
                precision = precision_score(y_test, predictions, zero_division=0)
                recall = recall_score(y_test, predictions, zero_division=0)
                training_time = time.time() - start_time

                results[name] = {
                    'model': model,
                    'f1_score': f1,
                    'accuracy': accuracy,
                    'roc_auc': roc_auc,
                    'precision': precision,
                    'recall': recall,
                    'training_time': training_time,
                    'predictions': predictions,
                    'probabilities': pred_proba
                }
                models[name] = model

                print(f"   F1={f1:.4f}, ACC={accuracy:.4f}, AUC={roc_auc:.4f}")
                print(f"   Tiempo: {training_time:.2f}s")

            except Exception as e:
                print(f"   Error: {e}")
                continue

        return results, models

    # Entrenar modelos tradicionales
    traditional_results, traditional_models = train_traditional_models(
        X_num_train_scaled, X_num_test_scaled, y_train, y_test
    )

    if traditional_results:
        print(f"\nRESULTADOS MODELOS TRADICIONALES:")
        sorted_traditional = sorted(traditional_results.items(),
                                   key=lambda x: x[1]['f1_score'], reverse=True)

        for i, (name, results) in enumerate(sorted_traditional, 1):
            print(f"   {i}. {name}: F1={results['f1_score']:.4f}, "
                  f"ACC={results['accuracy']:.4f}, AUC={results['roc_auc']:.4f}")

        TRADITIONAL_COMPLETE = True
        best_f1 = max([r['f1_score'] for r in traditional_results.values()])
        print(f"\nMEJOR F1-SCORE: {best_f1:.4f}")
    else:
        print("No se entrenaron modelos tradicionales")
        TRADITIONAL_COMPLETE = False
else:
    print("Datos no preparados para modelos tradicionales")
    TRADITIONAL_COMPLETE = False

ENTRENANDO MODELOS TRADICIONALES

Entrenando ridge_cv...
   F1=0.6939, ACC=0.6462, AUC=0.7072
   Tiempo: 0.48s

Entrenando svm_linear_l1...
   F1=0.6996, ACC=0.6557, AUC=0.7153
   Tiempo: 0.02s

Entrenando catboost_optimized...
   F1=0.6857, ACC=0.6368, AUC=0.7180
   Tiempo: 1.00s

Entrenando extra_trees...
   F1=0.7040, ACC=0.6509, AUC=0.7320
   Tiempo: 0.59s

Entrenando lightgbm...
   F1=0.7309, ACC=0.6840, AUC=0.7613
   Tiempo: 0.17s

Entrenando random_forest...
   F1=0.7266, ACC=0.6698, AUC=0.7429
   Tiempo: 0.47s

Entrenando logistic_l2...
   F1=0.6917, ACC=0.6509, AUC=0.7201
   Tiempo: 0.27s

Entrenando xgboost...
   F1=0.7386, ACC=0.7028, AUC=0.7566
   Tiempo: 0.53s

RESULTADOS MODELOS TRADICIONALES:
   1. xgboost: F1=0.7386, ACC=0.7028, AUC=0.7566
   2. lightgbm: F1=0.7309, ACC=0.6840, AUC=0.7613
   3. random_forest: F1=0.7266, ACC=0.6698, AUC=0.7429
   4. extra_trees: F1=0.7040, ACC=0.6509, AUC=0.7320
   5. svm_linear_l1: F1=0.6996, ACC=0.6557, AUC=0.7153
   6. ridge_cv: F1=0.

In [18]:
# =============================================================================
# BOOSTING MODELS - Los Reyes del Gradient Boosting
# =============================================================================

if PREPROCESSING_COMPLETE and BOOSTING_AVAILABLE:
    print("\n BOOSTING MODELS - Los Reyes")
    print("=" * 50)

    # 4. CATBOOST OPTIMIZED - El Inteligente (F1=0.7947 con la ayuda de Dios)
    print("\n 4. CATBOOST - El Inteligente segun mis analisis pasados")
    start_time = time.time()

    catboost_model = cb.CatBoostClassifier(
        iterations=1000,
        depth=6,
        learning_rate=0.1,
        l2_leaf_reg=3,
        random_seed=RANDOM_STATE,
        verbose=False,
        eval_metric='F1'
    )

    catboost_model.fit(X_num_train_scaled, y_train)
    catboost_pred = catboost_model.predict(X_num_test_scaled)
    catboost_proba = catboost_model.predict_proba(X_num_test_scaled)[:, 1]

    catboost_f1 = f1_score(y_test, catboost_pred)
    catboost_acc = accuracy_score(y_test, catboost_pred)
    catboost_auc = roc_auc_score(y_test, catboost_proba)
    catboost_time = time.time() - start_time

    traditional_results['CatBoost_Optimized'] = {
        'model': catboost_model,
        'f1_score': catboost_f1,
        'accuracy': catboost_acc,
        'roc_auc': catboost_auc,
        'training_time': catboost_time,
        'predictions': catboost_pred,
        'probabilities': catboost_proba
    }
    traditional_models['CatBoost_Optimized'] = catboost_model

    print(f"   CatBoost: F1={catboost_f1:.4f}, ACC={catboost_acc:.4f}, AUC={catboost_auc:.4f}")
    print(f"   Tiempo: {catboost_time:.2f}s")

    print("\nBOOSTING MODELS COMPLETADOS")

elif not BOOSTING_AVAILABLE:
    print("Boosting models no disponibles")


 BOOSTING MODELS - Los Reyes

 4. CATBOOST - El Inteligente segun mis analisis pasados
   CatBoost: F1=0.7016, ACC=0.6509, AUC=0.7388
   Tiempo: 3.17s

BOOSTING MODELS COMPLETADOS


In [19]:
# =============================================================================
# ENSEMBLE TRADICIONAL - Stacking de los Mejores
# =============================================================================

if PREPROCESSING_COMPLETE and len(traditional_results) >= 2:
    print("\nENSEMBLE TRADICIONAL - Parchao")
    print("=" * 45)

    # Preparar base estimators
    base_estimators = []
    # Create unique names for estimators
    estimator_names = {}
    for name, model in traditional_models.items():
        # Ensure unique names
        unique_name = name.lower()
        count = 1
        while unique_name in estimator_names:
            unique_name = f"{name.lower()}_{count}"
            count += 1
        estimator_names[unique_name] = model
        base_estimators.append((unique_name, model))


    print(f"Base estimators: {len(base_estimators)}")
    for name, _ in base_estimators:
        print(f"   - {name}")

    # Stacking Classifier
    start_time = time.time()

    stacking_model = StackingClassifier(
        estimators=base_estimators,
        final_estimator=LogisticRegression(random_state=RANDOM_STATE),
        cv=3,
        n_jobs=-1
    )

    stacking_model.fit(X_num_train_scaled, y_train)
    stacking_pred = stacking_model.predict(X_num_test_scaled)
    stacking_proba = stacking_model.predict_proba(X_num_test_scaled)[:, 1]

    stacking_f1 = f1_score(y_test, stacking_pred)
    stacking_acc = accuracy_score(y_test, stacking_pred)
    stacking_auc = roc_auc_score(y_test, stacking_proba)
    stacking_time = time.time() - start_time

    traditional_results['Stacking_Ensemble'] = {
        'model': stacking_model,
        'f1_score': stacking_f1,
        'accuracy': stacking_acc,
        'roc_auc': stacking_auc,
        'training_time': stacking_time,
        'predictions': stacking_pred,
        'probabilities': stacking_proba
    }
    traditional_models['Stacking_Ensemble'] = stacking_model


    print(f"\nStacking Ensemble: F1={stacking_f1:.4f}, ACC={stacking_acc:.4f}, AUC={stacking_auc:.4f}")
    print(f" Tiempo: {stacking_time:.2f}s")
    print(f" Base models combinados exitosamente")

    # Ranking de modelos tradicionales
    print(f"\nTOP DEFINITIVO MODELOS TRADICIONALES (por F1-Score) 1 LINK MEGAUPLOAD!:")
    sorted_traditional = sorted(traditional_results.items(),
                               key=lambda x: x[1]['f1_score'], reverse=True)

    for i, (name, results) in enumerate(sorted_traditional, 1):
        print(f"   {i}. {name}: F1={results['f1_score']:.4f}, "
              f"ACC={results['accuracy']:.4f}, AUC={results['roc_auc']:.4f}")

    TRADITIONAL_COMPLETE = True
else:
    print("AYUDA")
    TRADITIONAL_COMPLETE = False


ENSEMBLE TRADICIONAL - Parchao
Base estimators: 9
   - ridge_cv
   - svm_linear_l1
   - catboost_optimized
   - extra_trees
   - lightgbm
   - random_forest
   - logistic_l2
   - xgboost
   - catboost_optimized_1

Stacking Ensemble: F1=0.7251, ACC=0.6745, AUC=0.7540
 Tiempo: 27.35s
 Base models combinados exitosamente

TOP DEFINITIVO MODELOS TRADICIONALES (por F1-Score) 1 LINK MEGAUPLOAD!:
   1. xgboost: F1=0.7386, ACC=0.7028, AUC=0.7566
   2. lightgbm: F1=0.7309, ACC=0.6840, AUC=0.7613
   3. random_forest: F1=0.7266, ACC=0.6698, AUC=0.7429
   4. Stacking_Ensemble: F1=0.7251, ACC=0.6745, AUC=0.7540
   5. extra_trees: F1=0.7040, ACC=0.6509, AUC=0.7320
   6. CatBoost_Optimized: F1=0.7016, ACC=0.6509, AUC=0.7388
   7. svm_linear_l1: F1=0.6996, ACC=0.6557, AUC=0.7153
   8. ridge_cv: F1=0.6939, ACC=0.6462, AUC=0.7072
   9. logistic_l2: F1=0.6917, ACC=0.6509, AUC=0.7201
   10. catboost_optimized: F1=0.6857, ACC=0.6368, AUC=0.7180


## Deep Neural Networks

In [20]:
# =============================================================================
# DEEP NEURAL NETWORKS
# =============================================================================

if PREPROCESSING_COMPLETE and TENSORFLOW_AVAILABLE:
    print("DEEP NEURAL NETWORKS")
    print("=" * 40)

    dnn_results = {}

    def create_dnn_model(input_dim, architecture_name="optimized"):
        """Crea arquitecturas DNN optimizadas"""

        if architecture_name == "optimized":
            model = Sequential([
                Input(shape=(input_dim,)),
                Dense(512, activation='relu'),
                BatchNormalization(),
                Dropout(0.3),

                Dense(256, activation='relu'),
                BatchNormalization(),
                Dropout(0.3),

                Dense(128, activation='relu'),
                BatchNormalization(),
                Dropout(0.2),

                Dense(64, activation='relu'),
                Dropout(0.2),

                Dense(1, activation='sigmoid')
            ])

        elif architecture_name == "deep":
            model = Sequential([
                Input(shape=(input_dim,)),
                Dense(1024, activation='relu'),
                BatchNormalization(),
                Dropout(0.4),

                Dense(512, activation='relu'),
                BatchNormalization(),
                Dropout(0.4),

                Dense(256, activation='relu'),
                BatchNormalization(),
                Dropout(0.3),

                Dense(128, activation='relu'),
                BatchNormalization(),
                Dropout(0.3),

                Dense(64, activation='relu'),
                Dropout(0.2),

                Dense(32, activation='relu'),
                Dropout(0.2),

                Dense(1, activation='sigmoid')
            ])

        elif architecture_name == "wide":
            model = Sequential([
                Input(shape=(input_dim,)),
                Dense(1024, activation='relu'),
                BatchNormalization(),
                Dropout(0.3),

                Dense(1024, activation='relu'),
                BatchNormalization(),
                Dropout(0.3),

                Dense(512, activation='relu'),
                BatchNormalization(),
                Dropout(0.2),

                Dense(1, activation='sigmoid')
            ])

        return model

    def train_dnn_model(model, model_name, X_train, X_test, y_train, y_test):
        """Entrena un modelo DNN"""

        print(f"\nEntrenando DNN {model_name}...")
        start_time = time.time()

        # Compilar modelo
        model.compile(
            optimizer=Adam(learning_rate=0.001),
            loss='binary_crossentropy',
            metrics=['accuracy']
        )

        # Callbacks
        early_stopping = EarlyStopping(
            monitor='val_loss', patience=10, restore_best_weights=True
        )
        reduce_lr = ReduceLROnPlateau(
            monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6
        )

        # Entrenar
        history = model.fit(
            X_train, y_train,
            batch_size=32,
            epochs=100,
            validation_split=0.2,
            callbacks=[early_stopping, reduce_lr],
            verbose=0
        )

        # Predicciones
        pred_proba = model.predict(X_test, verbose=0).flatten()
        predictions = (pred_proba > 0.5).astype(int)

        # Métricas
        f1 = f1_score(y_test, predictions)
        accuracy = accuracy_score(y_test, predictions)
        roc_auc = roc_auc_score(y_test, pred_proba)
        precision = precision_score(y_test, predictions, zero_division=0)
        recall = recall_score(y_test, predictions, zero_division=0)
        training_time = time.time() - start_time

        epochs_trained = len(history.history['loss'])

        print(f"   {model_name}: F1={f1:.4f}, ACC={accuracy:.4f}, AUC={roc_auc:.4f}")
        print(f"   Epochs: {epochs_trained}, Tiempo: {training_time:.2f}s")

        return {
            'model': model,
            'f1_score': f1,
            'accuracy': accuracy,
            'roc_auc': roc_auc,
            'precision': precision,
            'recall': recall,
            'training_time': training_time,
            'predictions': predictions,
            'probabilities': pred_proba,
            'history': history
        }

    # Entrenar múltiples arquitecturas DNN
    input_dim = X_num_train_scaled.shape[1]

    architectures = ['optimized', 'deep', 'wide']

    for arch in architectures:
        try:
            model = create_dnn_model(input_dim, arch)
            result = train_dnn_model(
                model, f"DNN_{arch}",
                X_num_train_scaled, X_num_test_scaled,
                y_train, y_test
            )
            dnn_results[f"DNN_{arch}"] = result

        except Exception as e:
            print(f"Error entrenando DNN {arch}: {e}")
            continue

    # Resultados DNN
    if dnn_results:
        print(f"\nRESULTADOS DEEP NEURAL NETWORKS:")
        sorted_dnn = sorted(dnn_results.items(),
                           key=lambda x: x[1]['f1_score'], reverse=True)

        for i, (name, results) in enumerate(sorted_dnn, 1):
            print(f"   {i}. {name}: F1={results['f1_score']:.4f}, "
                  f"ACC={results['accuracy']:.4f}, AUC={results['roc_auc']:.4f}")

        DNN_COMPLETE = True
        best_dnn_f1 = max([r['f1_score'] for r in dnn_results.values()])
        print(f"\nMEJOR DNN F1-SCORE: {best_dnn_f1:.4f}")
    else:
        print("No se entrenaron modelos DNN")
        DNN_COMPLETE = False

elif not TENSORFLOW_AVAILABLE:
    print("TensorFlow no disponible")
    dnn_results = {}
    DNN_COMPLETE = False
else:
    print("Datos no preparados para DNN")
    dnn_results = {}
    DNN_COMPLETE = False

DEEP NEURAL NETWORKS

Entrenando DNN DNN_optimized...
   DNN_optimized: F1=0.7007, ACC=0.6132, AUC=0.6745
   Epochs: 15, Tiempo: 16.30s

Entrenando DNN DNN_deep...
   DNN_deep: F1=0.7200, ACC=0.6698, AUC=0.6923
   Epochs: 24, Tiempo: 18.69s

Entrenando DNN DNN_wide...




   DNN_wide: F1=0.6943, ACC=0.6179, AUC=0.6893
   Epochs: 17, Tiempo: 12.26s

RESULTADOS DEEP NEURAL NETWORKS:
   1. DNN_deep: F1=0.7200, ACC=0.6698, AUC=0.6923
   2. DNN_optimized: F1=0.7007, ACC=0.6132, AUC=0.6745
   3. DNN_wide: F1=0.6943, ACC=0.6179, AUC=0.6893

MEJOR DNN F1-SCORE: 0.7200


## BERT/Transformers - Los Campeones NLP (lo odio)

In [21]:
# =============================================================================
# BERT/TRANSFORMERS
# =============================================================================

if PREPROCESSING_COMPLETE and TRANSFORMERS_AVAILABLE:
      print("BERT/TRANSFORMERS")
      print("=" * 50)

      bert_results = {}

      def train_bert_model(model_name, model_checkpoint, X_text_train, X_text_test, y_train, y_test):
          """Entrena un modelo BERT"""

          print(f"\nEntrenando {model_name}...")
          start_time = time.time()

          max_length = 128
          batch_size = 16
          num_epochs = 3

          try:
              # Tokenizer y modelo
              tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
              model = AutoModelForSequenceClassification.from_pretrained(
                  model_checkpoint, num_labels=2
              ).to(device)

              # Tokenización
              def tokenize_texts(texts):
                  return tokenizer(
                      texts.tolist(),
                      padding=True,
                      truncation=True,
                      max_length=max_length,
                      return_tensors='pt'
                  )

              train_encodings = tokenize_texts(X_text_train)
              test_encodings = tokenize_texts(X_text_test)

              # Dataset
              class NewsDataset(torch.utils.data.Dataset):
                  def __init__(self, encodings, labels):
                      self.encodings = encodings
                      self.labels = labels

                  def __getitem__(self, idx):
                      item = {key: val[idx] for key, val in self.encodings.items()}
                      item['labels'] = torch.tensor(self.labels[idx], dtype=torch.long)
                      return item

                  def __len__(self):
                      return len(self.labels)

              train_dataset = NewsDataset(train_encodings, y_train)
              test_dataset = NewsDataset(test_encodings, y_test)

              # Training args
              training_args = TrainingArguments(
                  output_dir=f'./results_{model_name.lower()}',
                  num_train_epochs=num_epochs,
                  per_device_train_batch_size=batch_size,
                  per_device_eval_batch_size=batch_size,
                  warmup_steps=100,
                  weight_decay=0.01,
                  learning_rate=2e-5,
                  logging_steps=50,
                  eval_strategy="no",  # deshabilitado completamente (de tantos errores pues, porque que vuelta mi so)
                  save_strategy="no",  # deshabilitado (lo mismo que el otro, esto nace de ponerme a probar cositas por sonso)
                  load_best_model_at_end=False,
                  seed=RANDOM_STATE,
                  report_to=[],
                  logging_dir=None,  # sin logging (ni me pregunten)
                  disable_tqdm=False  # mantener la progress bars
              )

              # Trainer
              trainer = Trainer(
                  model=model,
                  args=training_args,
                  train_dataset=train_dataset,
                  eval_dataset=test_dataset,
                  tokenizer=tokenizer
              )

              # Entrenar
              trainer.train()

              # Evaluar
              predictions = trainer.predict(test_dataset)
              logits = predictions.predictions

              probs = torch.nn.functional.softmax(torch.tensor(logits), dim=-1)
              y_pred_proba = probs[:, 1].numpy()
              y_pred = (y_pred_proba > 0.5).astype(int)

              # Métricas
              f1 = f1_score(y_test, y_pred)
              accuracy = accuracy_score(y_test, y_pred)
              roc_auc = roc_auc_score(y_test, y_pred_proba)
              precision = precision_score(y_test, y_pred, zero_division=0)
              recall = recall_score(y_test, y_pred, zero_division=0)
              training_time = time.time() - start_time

              print(f"   {model_name}: F1={f1:.4f}, ACC={accuracy:.4f}, AUC={roc_auc:.4f}")
              print(f"   Tiempo: {training_time:.2f}s")

              # Limpiar memoria
              del model
              torch.cuda.empty_cache()

              return {
                  'f1_score': f1,
                  'accuracy': accuracy,
                  'roc_auc': roc_auc,
                  'precision': precision,
                  'recall': recall,
                  'training_time': training_time,
                  'predictions': y_pred,
                  'probabilities': y_pred_proba
              }

          except Exception as e:
              print(f"   Error entrenando {model_name}: {e}")
              return None

      # Modelos BERT a entrenar
      bert_configs = [
          ('RoBERTa', 'roberta-base'),
          ('DistilBERT', 'distilbert-base-uncased'),
          ('BERT_base', 'bert-base-uncased')
      ]

      # Entrenar modelos BERT
      for model_name, model_checkpoint in bert_configs:
          result = train_bert_model(
              model_name, model_checkpoint,
              X_text_train, X_text_test, y_train, y_test
          )

          if result:
              bert_results[model_name] = result

      # Resultados BERT
      if bert_results:
          print(f"\nRESULTADOS BERT:")
          sorted_bert = sorted(bert_results.items(),
                              key=lambda x: x[1]['f1_score'], reverse=True)

          for i, (name, results) in enumerate(sorted_bert, 1):
              print(f"   {i}. {name}: F1={results['f1_score']:.4f}, "
                    f"ACC={results['accuracy']:.4f}, AUC={results['roc_auc']:.4f}")

          BERT_COMPLETE = True
          best_bert_f1 = max([r['f1_score'] for r in bert_results.values()])
          print(f"\nMEJOR BERT F1-SCORE: {best_bert_f1:.4f}")
      else:
          print("No se entrenaron modelos BERT")
          BERT_COMPLETE = False

elif not TRANSFORMERS_AVAILABLE:
      print("Transformers no disponible")
      bert_results = {}
      BERT_COMPLETE = False

else:
      print("Datos no preparados para BERT")
      bert_results = {}
      BERT_COMPLETE = False

BERT/TRANSFORMERS

Entrenando RoBERTa...


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss
50,0.6777
100,0.4573
150,0.3289


   RoBERTa: F1=0.8889, ACC=0.8774, AUC=0.9553
   Tiempo: 45.77s

Entrenando DistilBERT...


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss
50,0.6698
100,0.409
150,0.2502


   DistilBERT: F1=0.8734, ACC=0.8632, AUC=0.9407
   Tiempo: 22.55s

Entrenando BERT_base...


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss
50,0.5818
100,0.3705
150,0.2432


   BERT_base: F1=0.8870, ACC=0.8774, AUC=0.9458
   Tiempo: 41.44s

RESULTADOS BERT:
   1. RoBERTa: F1=0.8889, ACC=0.8774, AUC=0.9553
   2. BERT_base: F1=0.8870, ACC=0.8774, AUC=0.9458
   3. DistilBERT: F1=0.8734, ACC=0.8632, AUC=0.9407

MEJOR BERT F1-SCORE: 0.8889


## Sistema RAG Híbrido

In [22]:
# =============================================================================
# SISTEMA RAG HÍBRIDO - Recuperación + Clasificación
# =============================================================================

if PREPROCESSING_COMPLETE and RAG_AVAILABLE:
    print("SISTEMA RAG HÍBRIDO (RICOCHE)")
    print("=" * 35)

    rag_results = {}

    try:
        print("Inicializando...")

        # 1. Sentence Transformer para embeddings
        embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        print("Sentence Transformer cargado")

        # 2. TF-IDF para features tradicionales
        tfidf = TfidfVectorizer(max_features=5000, stop_words='english')
        X_tfidf_train = tfidf.fit_transform(X_text_train)
        X_tfidf_test = tfidf.transform(X_text_test)
        print(f"TF-IDF: {X_tfidf_train.shape[1]} features (mejor que bert y eso, gas bert)")

        # 3. Crear embeddings para todo el corpus de entrenamiento
        print("Generando embeddings del corpus...")
        train_embeddings = embedding_model.encode(X_text_train.tolist(), show_progress_bar=False)

        # 4. FAISS index para búsqueda rápida
        dimension = train_embeddings.shape[1]
        index = faiss.IndexFlatL2(dimension)
        index.add(train_embeddings.astype('float32'))
        print(f"FAISS index creado: {index.ntotal} vectores, dim={dimension}")

        # 5. Función RAG para generar features aumentadas
        def generate_rag_features(query_texts, k=5):
            """Genera features RAG para textos de consulta"""
            query_embeddings = embedding_model.encode(query_texts.tolist(), show_progress_bar=False)

            # Búsqueda de k vecinos más cercanos
            distances, indices = index.search(query_embeddings.astype('float32'), k)

            rag_features = []

            for i, (query_distances, query_indices) in enumerate(zip(distances, indices)):
                # Features basadas en distancias
                avg_distance = np.mean(query_distances)
                min_distance = np.min(query_distances)
                std_distance = np.std(query_distances)

                # Features basadas en labels de vecinos
                neighbor_labels = y_train[query_indices]
                label_consistency = np.mean(neighbor_labels)
                label_variance = np.var(neighbor_labels)

                # Features de similaridad semántica
                similarities = 1 / (1 + query_distances)  # Convertir distancias a similaridades
                weighted_label = np.average(neighbor_labels, weights=similarities)

                rag_features.append([
                    avg_distance, min_distance, std_distance,
                    label_consistency, label_variance, weighted_label
                ])

            return np.array(rag_features)

        # 6. Generar features RAG para train y test
        print("Generando features RAG...")
        X_rag_train = generate_rag_features(X_text_train, k=5)
        X_rag_test = generate_rag_features(X_text_test, k=5)

        print(f"Features RAG: {X_rag_train.shape[1]} características")

        # 7. MODELO HÍBRIDO: TF-IDF + RAG + Random Forest
        print("\nMODELO HÍBRIDO RAG + RF")
        start_time = time.time()

        # Combinar features: TF-IDF + RAG + Numéricas
        from scipy.sparse import hstack, csr_matrix

        # Convertir features numéricas y RAG a sparse para combinar
        X_combined_train = hstack([
            X_tfidf_train,
            csr_matrix(X_num_train_scaled),
            csr_matrix(X_rag_train)
        ])

        X_combined_test = hstack([
            X_tfidf_test,
            csr_matrix(X_num_test_scaled),
            csr_matrix(X_rag_test)
        ])

        print(f" Features combinadas: {X_combined_train.shape[1]}")
        print(f"   - TF-IDF: {X_tfidf_train.shape[1]}")
        print(f"   - Numéricas: {X_num_train_scaled.shape[1]}")
        print(f"   - RAG: {X_rag_train.shape[1]}")

        # Random Forest para clasificación final
        rag_rf_model = RandomForestClassifier(
            n_estimators=200,
            max_depth=20,
            min_samples_split=5,
            min_samples_leaf=2,
            random_state=RANDOM_STATE,
            n_jobs=-1
        )

        # Convertir a array denso para Random Forest
        print("Convirtiendo a arrays densos...")
        X_combined_train_dense = X_combined_train.toarray()
        X_combined_test_dense = X_combined_test.toarray()

        # Entrenar modelo
        rag_rf_model.fit(X_combined_train_dense, y_train)

        # Predicciones
        rag_pred = rag_rf_model.predict(X_combined_test_dense)
        rag_proba = rag_rf_model.predict_proba(X_combined_test_dense)[:, 1]

        # Métricas
        rag_f1 = f1_score(y_test, rag_pred)
        rag_acc = accuracy_score(y_test, rag_pred)
        rag_auc = roc_auc_score(y_test, rag_proba)
        rag_time = time.time() - start_time

        rag_results['RAG_Hybrid_RF'] = {
            'model': rag_rf_model,
            'tfidf': tfidf,
            'embedding_model': embedding_model,
            'faiss_index': index,
            'f1_score': rag_f1,
            'accuracy': rag_acc,
            'roc_auc': rag_auc,
            'training_time': rag_time,
            'predictions': rag_pred,
            'probabilities': rag_proba,
            'feature_importance': rag_rf_model.feature_importances_
        }

        print(f"\nRAG Híbrido: F1={rag_f1:.4f}, ACC={rag_acc:.4f}, AUC={rag_auc:.4f}")
        print(f" Tiempo total: {rag_time:.2f}s")
        print(f" Sistema RAG implementado exitosamente")

        RAG_COMPLETE = True

    except Exception as e:
        print(f"Error en sistema RAG: {e}")
        RAG_COMPLETE = False
        rag_results = {}

elif not RAG_AVAILABLE:
    print("RAG components no disponibles")
    RAG_COMPLETE = False
    rag_results = {}
else:
    print("Datos no listos para RAG")
    RAG_COMPLETE = False
    rag_results = {}

SISTEMA RAG HÍBRIDO (RICOCHE)
Inicializando...


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Sentence Transformer cargado
TF-IDF: 3119 features (mejor que bert y eso, gas bert)
Generando embeddings del corpus...
FAISS index creado: 846 vectores, dim=384
Generando features RAG...
Features RAG: 6 características

MODELO HÍBRIDO RAG + RF
 Features combinadas: 3182
   - TF-IDF: 3119
   - Numéricas: 57
   - RAG: 6
Convirtiendo a arrays densos...

RAG Híbrido: F1=0.8621, ACC=0.8491, AUC=0.9247
 Tiempo total: 0.78s
 Sistema RAG implementado exitosamente


## 🚀 THE REAL SLIM ENSEMBLE - El Sistema Definitivo

In [23]:
# =============================================================================
# MEGA ENSEMBLE DE ENSEMBLES - El Sistema Supremo
# =============================================================================

print("MEGA ENSEMBLE DE ENSEMBLES - Sistema Supremo")
print("=" * 70)

def create_level_1_traditional_ensemble():
    """Nivel 1: Ensemble de mejores modelos tradicionales"""

    if 'traditional_results' not in globals() or not traditional_results:
        return None

    print("NIVEL 1: Ensemble de Modelos Tradicionales")

    # Seleccionar mejores modelos tradicionales
    sorted_traditional = sorted(traditional_results.items(),
                               key=lambda x: x[1]['f1_score'], reverse=True)

    top_traditional = dict(sorted_traditional[:5])  # Top 5

    # Stacking ensemble
    estimators = []
    # Create unique names for estimators
    estimator_names = {}
    for name, data in top_traditional.items():
        estimators.append((name.lower(), data['model']))

    if not estimators:
        print("   No hay suficientes modelos tradicionales para el Ensemble de Nivel 1")
        return None

    stacking_l1 = StackingClassifier(
        estimators=estimators,
        final_estimator=LogisticRegression(random_state=RANDOM_STATE),
        cv=3,
        n_jobs=-1
    )

    stacking_l1.fit(X_num_train_scaled, y_train)
    pred_l1 = stacking_l1.predict(X_num_test_scaled)
    proba_l1 = stacking_l1.predict_proba(X_num_test_scaled)[:, 1]

    f1_l1 = f1_score(y_test, pred_l1)

    print(f"   Nivel 1 - Traditional Ensemble: F1={f1_l1:.4f}")

    return {
        'model': stacking_l1,
        'predictions': pred_l1,
        'probabilities': proba_l1,
        'f1_score': f1_l1,
        'accuracy': accuracy_score(y_test, pred_l1),
        'roc_auc': roc_auc_score(y_test, proba_l1),
        'type': 'level_1_traditional',
        'model_obj': stacking_l1 # Add model object for CV analysis
    }

def create_level_2_bert_ensemble():
    """Nivel 2: Ensemble de modelos BERT"""

    if 'bert_results' not in globals() or not bert_results:
        print("NIVEL 2: Modelos BERT no disponibles")
        return None

    print("NIVEL 2: Ensemble de Modelos BERT")

    # Consenso ponderado de BERT
    total_weight = sum(data['f1_score'] for data in bert_results.values())
    final_probs_bert = np.zeros(len(y_test))

    if total_weight == 0:
        print("   No hay resultados BERT válidos para crear el ensemble")
        return None

    for name, data in bert_results.items():
        weight = data['f1_score'] / total_weight
        contribution = weight * data['probabilities']
        final_probs_bert += contribution
        print(f"   {name}: peso={weight:.3f}, F1={data['f1_score']:.4f}")

    pred_l2 = (final_probs_bert > 0.5).astype(int)
    f1_l2 = f1_score(y_test, pred_l2)

    print(f"   Nivel 2 - BERT Ensemble: F1={f1_l2:.4f}")

    return {
        'predictions': pred_l2,
        'probabilities': final_probs_bert,
        'f1_score': f1_l2,
        'accuracy': accuracy_score(y_test, pred_l2),
        'roc_auc': roc_auc_score(y_test, final_probs_bert),
        'type': 'level_2_bert',
        'model_obj': None # No single model object for consensus
    }

def create_level_3_dnn_ensemble():
    """Nivel 3: Ensemble de modelos DNN"""

    if 'dnn_results' not in globals() or not dnn_results:
        print("NIVEL 3: Modelos DNN no disponibles")
        return None

    print("NIVEL 3: Ensemble de Modelos DNN")

    # Promedio ponderado de DNNs
    total_weight = sum(data['f1_score'] for data in dnn_results.values())
    final_probs_dnn = np.zeros(len(y_test))

    if total_weight == 0:
        print("   No hay resultados DNN válidos para crear el ensemble")
        return None


    for name, data in dnn_results.items():
        weight = data['f1_score'] / total_weight
        contribution = weight * data['probabilities']
        final_probs_dnn += contribution
        print(f"   {name}: peso={weight:.3f}, F1={data['f1_score']:.4f}")

    pred_l3 = (final_probs_dnn > 0.5).astype(int)
    f1_l3 = f1_score(y_test, pred_l3)

    print(f"   Nivel 3 - DNN Ensemble: F1={f1_l3:.4f}")

    return {
        'predictions': pred_l3,
        'probabilities': final_probs_dnn,
        'f1_score': f1_l3,
        'accuracy': accuracy_score(y_test, pred_l3),
        'roc_auc': roc_auc_score(y_test, final_probs_dnn),
        'type': 'level_3_dnn',
        'model_obj': None # No single model object for consensus
    }

def create_level_4_meta_ensemble(level_1, level_2, level_3):
    """Nivel 4: Meta-ensemble que combina los 3 niveles anteriores"""

    available_levels = []
    level_data = {}

    if level_1:
        available_levels.append('Level_1_Traditional')
        level_data['Level_1_Traditional'] = level_1

    if level_2:
        available_levels.append('Level_2_BERT')
        level_data['Level_2_BERT'] = level_2

    if level_3:
        available_levels.append('Level_3_DNN')
        level_data['Level_3_DNN'] = level_3

    if len(available_levels) < 1: # Changed from 2 to 1 to allow meta-ensemble with only one level
        print("No hay suficientes niveles para meta-ensemble")
        return None

    print(f"NIVEL 4: Meta-Ensemble Supremo ({len(available_levels)} niveles)")

    strategies = {}

    # 1. Consenso ponderado por rendimiento
    print("   Estrategia 1: Consenso Ponderado")
    total_weight = sum(data['f1_score'] for data in level_data.values())
    final_probs_weighted = np.zeros(len(y_test))

    if total_weight > 0:
        for name, data in level_data.items():
            weight = data['f1_score'] / total_weight
            contribution = weight * data['probabilities']
            final_probs_weighted += contribution
            print(f"      {name}: peso={weight:.3f}, F1={data['f1_score']:.4f}")

        pred_weighted = (final_probs_weighted > 0.5).astype(int)
        f1_weighted = f1_score(y_test, pred_weighted)

        strategies['Weighted_Meta'] = {
            'predictions': pred_weighted,
            'probabilities': final_probs_weighted,
            'f1_score': f1_weighted,
            'accuracy': accuracy_score(y_test, pred_weighted),
            'roc_auc': roc_auc_score(y_test, final_probs_weighted),
            'strategy': 'weighted_meta',
            'model_obj': None
        }

        print(f"      Resultado: F1={f1_weighted:.4f}")
    else:
        print("      No hay pesos válidos para Consenso Ponderado")


    # 2. Selección dinámica del mejor nivel por muestra (Requires at least 2 levels)
    if len(available_levels) >= 2:
        print("   Estrategia 2: Selección Dinámica")

        final_probs_dynamic = np.zeros(len(y_test))
        level_selections = []

        for i in range(len(y_test)):
            level_scores = {}

            for name, data in level_data.items():
                # Score = F1 del nivel * confianza en esta predicción
                confidence = abs(data['probabilities'][i] - 0.5)
                level_quality = data['f1_score']
                combined_score = level_quality * (1 + confidence)
                level_scores[name] = combined_score

            # Seleccionar mejor nivel para esta muestra
            best_level = max(level_scores.keys(), key=lambda x: level_scores[x])
            final_probs_dynamic[i] = level_data[best_level]['probabilities'][i]
            level_selections.append(best_level)

        pred_dynamic = (final_probs_dynamic > 0.5).astype(int)
        f1_dynamic = f1_score(y_test, pred_dynamic)

        # Estadísticas de selección
        from collections import Counter
        selection_stats = Counter(level_selections)
        for level, count in selection_stats.items():
            percentage = (count / len(y_test)) * 100
            print(f"      {level}: {count} ({percentage:.1f}%)")

        strategies['Dynamic_Meta'] = {
            'predictions': pred_dynamic,
            'probabilities': final_probs_dynamic,
            'f1_score': f1_dynamic,
            'accuracy': accuracy_score(y_test, pred_dynamic),
            'roc_auc': roc_auc_score(y_test, final_probs_dynamic),
            'strategy': 'dynamic_meta',
            'selections': level_selections,
            'model_obj': None
        }

        print(f"      Resultado: F1={f1_dynamic:.4f}")
    else:
         print("      Estrategia 2 (Selección Dinámica) requiere al menos 2 niveles")


    # 3. Voting ensemble de niveles
    print("   Estrategia 3: Voting Ensemble")

    # Promedio simple de probabilidades
    final_probs_voting = np.mean([data['probabilities'] for data in level_data.values()], axis=0)
    pred_voting = (final_probs_voting > 0.5).astype(int)
    f1_voting = f1_score(y_test, pred_voting)

    strategies['Voting_Meta'] = {
        'predictions': pred_voting,
        'probabilities': final_probs_voting,
        'f1_score': f1_voting,
        'accuracy': accuracy_score(y_test, pred_voting),
        'roc_auc': roc_auc_score(y_test, final_probs_voting),
        'strategy': 'voting_meta',
        'model_obj': None
    }

    print(f"      Resultado: F1={f1_voting:.4f}")

    return strategies

def create_ultimate_ensemble():
    """Crea el ensemble supremo de todos los niveles"""

    print("\nCREANDO MEGA ENSEMBLE DE ENSEMBLES...")
    print("=" * 50)

    # Crear ensembles por niveles
    level_1 = create_level_1_traditional_ensemble()
    level_2 = create_level_2_bert_ensemble() # This will return None if BERT failed
    level_3 = create_level_3_dnn_ensemble()

    # Meta-ensemble de niveles
    meta_strategies = create_level_4_meta_ensemble(level_1, level_2, level_3)

    # Recopilar todos los resultados
    all_ensemble_results = {}

    if level_1:
        all_ensemble_results['Level_1_Traditional'] = level_1
    if level_2:
        all_ensemble_results['Level_2_BERT'] = level_2
    if level_3:
        all_ensemble_results['Level_3_DNN'] = level_3

    if meta_strategies:
        all_ensemble_results.update(meta_strategies)

    return all_ensemble_results

# EJECUTAR MEGA ENSEMBLE SUPREMO
# Check if any of the required results dictionaries exist and are not empty
if (('traditional_results' in globals() and traditional_results) or
    ('bert_results' in globals() and bert_results) or
    ('dnn_results' in globals() and dnn_results)):

    mega_ensemble_results = create_ultimate_ensemble()

    if mega_ensemble_results:
        print(f"\n" + "="*80)
        print(f"MEGA ENSEMBLE DE ENSEMBLES - RESULTADOS FINALES")
        print(f"=" * 80)

        # Encontrar el campeón supremo
        # Filter out levels that returned None
        valid_ensemble_results = {k: v for k, v in mega_ensemble_results.items() if v is not None}

        if not valid_ensemble_results:
            print("No se pudieron crear ensembles válidos.")
            MEGA_ENSEMBLE_COMPLETE = False
        else:
            best_ensemble = max(valid_ensemble_results.keys(),
                               key=lambda x: valid_ensemble_results[x]['f1_score'])
            best_f1 = valid_ensemble_results[best_ensemble]['f1_score']

            print(f"\nCAMPEON SUPREMO: {best_ensemble}")
            print(f"F1-Score: {best_f1:.4f}")
            print(f"Accuracy: {valid_ensemble_results[best_ensemble]['accuracy']:.4f}")
            print(f"ROC-AUC: {valid_ensemble_results[best_ensemble]['roc_auc']:.4f}")

            # Ranking completo de ensembles
            print(f"\nRANKING COMPLETO DE ENSEMBLES:")
            sorted_ensembles = sorted(valid_ensemble_results.items(),
                                     key=lambda x: x[1]['f1_score'], reverse=True)

            for i, (name, results) in enumerate(sorted_ensembles, 1):
                ensemble_type = results.get('type', results.get('strategy', 'meta'))
                print(f"   {i:2d}. {name:25s}: F1={results['f1_score']:.4f} "
                      f"ACC={results['accuracy']:.4f} AUC={results['roc_auc']:.4f} ({ensemble_type})")

            # Analysis of improvement over best individual model (only if traditional models were trained)
            if 'traditional_results' in globals() and traditional_results:
                # Find the best individual model across all trained types
                all_individual_results = {}
                if 'traditional_results' in globals():
                    all_individual_results.update(traditional_results)
                if 'dnn_results' in globals():
                     # Add DNN results, ensuring they have necessary keys
                    for name, data in dnn_results.items():
                        # Create a simplified entry for comparison
                        all_individual_results[name] = {
                            'f1_score': data['f1_score'],
                            'accuracy': data['accuracy'],
                            'roc_auc': data['roc_auc']
                            # Add other relevant keys if needed for comparison logic
                        }
                # if 'bert_results' in globals(): # BERT results are not available
                #      all_individual_results.update(bert_results)
                # if 'rag_results' in globals(): # RAG results are not available yet
                #      all_individual_results.update(rag_results)


                # Exclude ensemble results from individual comparison
                individual_models_only = {k: v for k, v in all_individual_results.items() if 'ensemble' not in k.lower() and 'meta' not in k.lower() and 'hybrid' not in k.lower()}


                if individual_models_only:
                    best_individual_name = max(individual_models_only.keys(),
                                            key=lambda x: individual_models_only[x]['f1_score'])
                    best_individual_f1 = individual_models_only[best_individual_name]['f1_score']

                    improvement = ((best_f1 - best_individual_f1) / best_individual_f1) * 100

                    print(f"\nANALISIS DE MEJORA:")
                    print(f"   Mejor individual: {best_individual_name} (F1={best_individual_f1:.4f})")
                    print(f"   Mejor ensemble: {best_ensemble} (F1={best_f1:.4f})")
                    print(f"   Mejora: {improvement:+.2f}%")
                else:
                    print("\nNo hay modelos individuales para comparar mejora.")


            print(f"\n" + "="*80)
            print(f"MEGA ENSEMBLE COMPLETADO EXITOSAMENTE")
            print(f"Total ensembles: {len(valid_ensemble_results)}")
            print(f"F1-Score máximo: {best_f1:.4f}")
            print(f"=" * 80)

            MEGA_ENSEMBLE_COMPLETE = True

            # Store results for dashboard and further analysis
            FINAL_ALL_MODELS = valid_ensemble_results # Store all valid ensemble results
            BEST_ENSEMBLE_DATA = valid_ensemble_results[best_ensemble]
            BEST_ENSEMBLE_NAME = best_ensemble
            total_models = len(traditional_results) + len(dnn_results) + len(bert_results) # Include BERT in total count even if failed

            # Guardar resultados finales
            final_results = {
                'best_strategy': best_ensemble,
                'best_f1_score': best_f1,
                'total_models': total_models,
                'improvement_over_best': improvement if 'improvement' in locals() else None,
                'execution_timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'all_strategies': {k: v['f1_score'] for k, v in valid_ensemble_results.items()},
                'top_individual_models': {} # This will be populated in final_results cell
            }

            # You might want to save final_results dictionary to a file here
            # with open('final_ensemble_results.json', 'w') as f:
            #     json.dump(final_results, f, indent=4)


    else:
        print("No se pudieron crear ensembles")
        MEGA_ENSEMBLE_COMPLETE = False

else:
    print("No hay modelos entrenados disponibles (tradicionales, bert, o dnn) para mega ensemble")
    MEGA_ENSEMBLE_COMPLETE = False

    """
    Si, los comentarios en ingles son de gemini, al correrlo tenia tantas cosas que estuve 5h con el
    la IA no es tan buena como la gente cree, pero bueno, se logró
    """

MEGA ENSEMBLE DE ENSEMBLES - Sistema Supremo

CREANDO MEGA ENSEMBLE DE ENSEMBLES...
NIVEL 1: Ensemble de Modelos Tradicionales
   Nivel 1 - Traditional Ensemble: F1=0.7251
NIVEL 2: Ensemble de Modelos BERT
   RoBERTa: peso=0.336, F1=0.8889
   DistilBERT: peso=0.330, F1=0.8734
   BERT_base: peso=0.335, F1=0.8870
   Nivel 2 - BERT Ensemble: F1=0.8860
NIVEL 3: Ensemble de Modelos DNN
   DNN_optimized: peso=0.331, F1=0.7007
   DNN_deep: peso=0.340, F1=0.7200
   DNN_wide: peso=0.328, F1=0.6943
   Nivel 3 - DNN Ensemble: F1=0.7191
NIVEL 4: Meta-Ensemble Supremo (3 niveles)
   Estrategia 1: Consenso Ponderado
      Level_1_Traditional: peso=0.311, F1=0.7251
      Level_2_BERT: peso=0.380, F1=0.8860
      Level_3_DNN: peso=0.309, F1=0.7191
      Resultado: F1=0.8797
   Estrategia 2: Selección Dinámica
      Level_2_BERT: 210 (99.1%)
      Level_1_Traditional: 2 (0.9%)
      Resultado: F1=0.8957
   Estrategia 3: Voting Ensemble
      Resultado: F1=0.8653

MEGA ENSEMBLE DE ENSEMBLES - RESULTADOS

In [24]:
MEGA_ENSEMBLE_READY = MEGA_ENSEMBLE_COMPLETE if 'MEGA_ENSEMBLE_COMPLETE' in globals() else False

  # Verificar si mega_ensemble_data existe, si no, crearlo
if 'mega_ensemble_data' not in globals():
      print("Creando mega_ensemble_data...")
      mega_ensemble_data = {}

      # Agregar modelos tradicionales si existen
      if 'traditional_results' in globals() and traditional_results:
          for model_name, results in traditional_results.items():
              if results and 'f1_score' in results:
                  mega_ensemble_data[model_name] = {
                      'probabilities': results.get('probabilities', results.get('predictions', [])),
                      'f1_score': results['f1_score'],
                      'weight': results['f1_score']  # Usar F1 como peso
                  }

      # Agregar modelos DNN si existen
      if 'dnn_results' in globals() and dnn_results:
          for model_name, results in dnn_results.items():
              if results and 'f1_score' in results:
                  mega_ensemble_data[f"DNN_{model_name}"] = {
                      'probabilities': results.get('probabilities', results.get('predictions', [])),
                      'f1_score': results['f1_score'],
                      'weight': results['f1_score']
                  }

      # Agregar modelos BERT si existen
      if 'bert_results' in globals() and bert_results:
          for model_name, results in bert_results.items():
              if results and 'f1_score' in results:
                  mega_ensemble_data[f"BERT_{model_name}"] = {
                      'probabilities': results.get('probabilities', results.get('predictions', [])),
                      'f1_score': results['f1_score'],
                      'weight': results['f1_score']
                  }

      print(f" Creado con {len(mega_ensemble_data)} modelos")

if MEGA_ENSEMBLE_READY and len(mega_ensemble_data) > 0:
      print("BOOOOoooOfffFFFFFF  THE REAL SLIM ENSEMBLE - Estrategias Avanzadas de otro nivel")
      print("=" * 65)

      ensemble_results = {}

      def weighted_ensemble(ensemble_data, y_true):
          """Ensemble ponderado basado en F1-Scores individuales"""

          # Normalizar pesos
          total_weight = sum([data['weight'] for data in ensemble_data.values()])
          normalized_weights = {name: data['weight']/total_weight
                              for name, data in ensemble_data.items()}

          # Combinar probabilidades
          final_probs = np.zeros(len(y_true))

          print("   Pesos por modelo:")
          for model_name, data in ensemble_data.items():
              weight = normalized_weights[model_name]
              contribution = weight * data['probabilities']
              final_probs += contribution
              print(f"      {model_name}: peso={weight:.3f}, F1={data['f1_score']:.4f}")

          final_predictions = (final_probs > 0.5).astype(int)

          return final_predictions, final_probs

      def confidence_weighted_ensemble(ensemble_data, y_true):
          """Ensemble que pondera por confianza de cada modelo en cada predicción"""

          n_samples = len(y_true)
          final_probs = np.zeros(n_samples)

          for i in range(n_samples):
              sample_weights = []
              sample_probs = []

              for model_name, data in ensemble_data.items():
                  # Confianza = distancia de 0.5 (indecisión)
                  confidence = abs(data['probabilities'][i] - 0.5)
                  # Combinar confianza con F1-score del modelo
                  adjusted_weight = data['weight'] * (1 + 2*confidence)

                  sample_weights.append(adjusted_weight)
                  sample_probs.append(data['probabilities'][i])

              # Normalizar y combinar
              total_weight = sum(sample_weights)
              if total_weight > 0:
                  normalized_weights = [w/total_weight for w in sample_weights]
                  final_probs[i] = sum([w*p for w, p in zip(normalized_weights, sample_probs)])
              else:
                  final_probs[i] = np.mean(sample_probs)

          final_predictions = (final_probs > 0.5).astype(int)
          return final_predictions, final_probs

      def dynamic_model_selection(ensemble_data, y_true):
          """Selección dinámica del mejor modelo según contexto"""

          final_probs = np.zeros(len(y_true))
          model_selections = []

          for i in range(len(y_true)):
              # Para cada muestra, evaluar qué modelo es más confiable
              model_scores = {}

              for model_name, data in ensemble_data.items():
                  confidence = abs(data['probabilities'][i] - 0.5)
                  model_quality = data['f1_score']

                  # Score combinado: calidad del modelo * confianza en esta predicción
                  combined_score = model_quality * (1 + confidence)
                  model_scores[model_name] = combined_score

              # Seleccionar modelo con mejor score para esta muestra
              best_model = max(model_scores.keys(), key=lambda x: model_scores[x])
              final_probs[i] = ensemble_data[best_model]['probabilities'][i]
              model_selections.append(best_model)

          final_predictions = (final_probs > 0.5).astype(int)

          # Estadísticas de selección
          selection_stats = Counter(model_selections)
          print("   Selecciones por modelo:")
          for model, count in selection_stats.most_common():
              percentage = (count / len(y_true)) * 100
              print(f"      {model}: {count} ({percentage:.1f}%)")

          return final_predictions, final_probs

      # EJECUTAR TODAS LAS ESTRATEGIAS
      strategies = {
          'Weighted_Ensemble': weighted_ensemble,
          'Confidence_Weighted': confidence_weighted_ensemble,
          'Dynamic_Selection': dynamic_model_selection
      }

      print("\n EJECUTANDO ESTRATEGIAS DE ENSEMBLE...\n")

      for strategy_name, strategy_func in strategies.items():
          print(f" {strategy_name}:")
          start_time = time.time()

          try:
              predictions, probabilities = strategy_func(mega_ensemble_data, y_test)

              # Calcular métricas
              f1 = f1_score(y_test, predictions)
              accuracy = accuracy_score(y_test, predictions)
              precision = precision_score(y_test, predictions, zero_division=0)
              recall = recall_score(y_test, predictions, zero_division=0)
              roc_auc = roc_auc_score(y_test, probabilities)

              # Métricas adicionales
              conf_matrix = confusion_matrix(y_test, predictions)
              mcc = matthews_corrcoef(y_test, predictions)
              kappa = cohen_kappa_score(y_test, predictions)

              ensemble_time = time.time() - start_time

              ensemble_results[strategy_name] = {
                  'predictions': predictions,
                  'probabilities': probabilities,
                  'f1_score': f1,
                  'accuracy': accuracy,
                  'precision': precision,
                  'recall': recall,
                  'roc_auc': roc_auc,
                  'matthews_corrcoef': mcc,
                  'cohen_kappa': kappa,
                  'confusion_matrix': conf_matrix,
                  'execution_time': ensemble_time
              }

              print(f"   - F1-Score: {f1:.4f}")
              print(f"   - Accuracy: {accuracy:.4f}")
              print(f"   - Precision: {precision:.4f}")
              print(f"   - Recall: {recall:.4f}")
              print(f"   - ROC-AUC: {roc_auc:.4f}")
              print(f"   - Matthews Corr: {mcc:.4f}")
              print(f"   -  Tiempo: {ensemble_time:.3f}s")
              print()

          except Exception as e:
              print(f" ☠️☠️☠️Error: {e}")
              print()
              continue

      ENSEMBLE_STRATEGIES_COMPLETE = True
else:
      print("Mega ensemble no está listo o no hay modelos")
      ENSEMBLE_STRATEGIES_COMPLETE = False

Creando mega_ensemble_data...
 Creado con 16 modelos
BOOOOoooOfffFFFFFF  THE REAL SLIM ENSEMBLE - Estrategias Avanzadas de otro nivel

 EJECUTANDO ESTRATEGIAS DE ENSEMBLE...

 Weighted_Ensemble:
   Pesos por modelo:
      ridge_cv: peso=0.058, F1=0.6939
      svm_linear_l1: peso=0.059, F1=0.6996
      catboost_optimized: peso=0.058, F1=0.6857
      extra_trees: peso=0.059, F1=0.7040
      lightgbm: peso=0.062, F1=0.7309
      random_forest: peso=0.061, F1=0.7266
      logistic_l2: peso=0.058, F1=0.6917
      xgboost: peso=0.062, F1=0.7386
      CatBoost_Optimized: peso=0.059, F1=0.7016
      Stacking_Ensemble: peso=0.061, F1=0.7251
      DNN_DNN_optimized: peso=0.059, F1=0.7007
      DNN_DNN_deep: peso=0.061, F1=0.7200
      DNN_DNN_wide: peso=0.059, F1=0.6943
      BERT_RoBERTa: peso=0.075, F1=0.8889
      BERT_DistilBERT: peso=0.074, F1=0.8734
      BERT_BERT_base: peso=0.075, F1=0.8870
   - F1-Score: 0.8000
   - Accuracy: 0.7642
   - Precision: 0.7463
   - Recall: 0.8621
   - ROC-AU

In [25]:
# =============================================================================
# RESULTADOS FINALES - THE REAL SLIM ENSEMBLE CHAMPION
# =============================================================================

if ENSEMBLE_STRATEGIES_COMPLETE:
    print(" THE REAL SLIM ENSEMBLE - RESULTADOS DEFINITIVOS")
    print("=" * 70)

    # Encontrar la mejor estrategia
    best_strategy = max(ensemble_results.keys(),
                       key=lambda x: ensemble_results[x]['f1_score'])
    best_result = ensemble_results[best_strategy]

    print(f" CAMPEÓN ABSOLUTO: {best_strategy}")
    print(f"   F1-Score: {best_result['f1_score']:.4f}")
    print(f"   Accuracy: {best_result['accuracy']:.4f}")
    print(f"   Precision: {best_result['precision']:.4f}")
    print(f"   Recall: {best_result['recall']:.4f}")
    print(f"   ROC-AUC: {best_result['roc_auc']:.4f}")
    print(f"   Matthews Corr: {best_result['matthews_corrcoef']:.4f}")
    print(f"   Cohen's Kappa: {best_result['cohen_kappa']:.4f}")

    # Análisis de la matriz de confusión
    tn, fp, fn, tp = best_result['confusion_matrix'].ravel()
    specificity = tn / (tn + fp)
    sensitivity = tp / (tp + fn)

    print(f"\n ANÁLISIS DETALLADO:")
    print(f"   Verdaderos Positivos: {tp}")
    print(f"   Verdaderos Negativos: {tn}")
    print(f"   Falsos Positivos: {fp}")
    print(f"   Falsos Negativos: {fn}")
    print(f"   Especificidad: {specificity:.4f}")
    print(f"   Sensibilidad: {sensitivity:.4f}")

    # Ranking completo
    print(f"\n RANKING COMPLETO DE ESTRATEGIAS:")
    sorted_strategies = sorted(ensemble_results.items(),
                              key=lambda x: x[1]['f1_score'], reverse=True)

    for i, (strategy, results) in enumerate(sorted_strategies, 1):
        print(f"   {i}. {strategy}:")
        print(f"      F1={results['f1_score']:.4f}, "
              f"ACC={results['accuracy']:.4f}, "
              f"AUC={results['roc_auc']:.4f}")

    # Comparación con mejores modelos individuales
    print(f"\n MEGA ENSEMBLE vs MEJORES INDIVIDUALES:")

    individual_scores = {}
    for name, data in mega_ensemble_data.items():
        individual_scores[name] = data['f1_score']

    best_individual_name = max(individual_scores.keys(),
                              key=lambda x: individual_scores[x])
    best_individual_score = individual_scores[best_individual_name]

    improvement = ((best_result['f1_score'] - best_individual_score) /
                   best_individual_score) * 100

    print(f"   MEGA ENSEMBLE ({best_strategy}): {best_result['f1_score']:.4f}")
    print(f"   Mejor individual ({best_individual_name}): {best_individual_score:.4f}")
    print(f"   Mejora: {improvement:+.2f}%")

    # Top 5 modelos individuales
    print(f"\n TOP 5 MODELOS INDIVIDUALES:")
    top_individual = sorted(individual_scores.items(),
                           key=lambda x: x[1], reverse=True)[:5]

    for i, (name, score) in enumerate(top_individual, 1):
        vs_ensemble = ((best_result['f1_score'] - score) / score) * 100
        print(f"   {i}. {name}: F1={score:.4f} "
              f"(ensemble +{vs_ensemble:.1f}%)")

    # Análisis de confianza
    confidence_analysis = np.abs(best_result['probabilities'] - 0.5)
    avg_confidence = np.mean(confidence_analysis)
    high_confidence_pct = np.mean(confidence_analysis > 0.3) * 100

    print(f"\n ANÁLISIS DE CONFIANZA:")
    print(f"   Confianza promedio: {avg_confidence:.3f}")
    print(f"   Predicciones alta confianza: {high_confidence_pct:.1f}%")
    print(f"   Brier Score: {brier_score_loss(y_test, best_result['probabilities']):.3f}")

    print(f"\n" + "=" * 70)
    print(f" THE REAL SLIM ENSEMBLE COMPLETADO EXITOSAMENTE!")
    print(f" Sistema definitivo alcanza F1-Score de {best_result['f1_score']:.4f}")
    print(f" Superando TODOS los modelos individuales mediante inteligencia colectiva")
    print(f" {total_models} modelos unidos en perfecta armonía")
    print(f"=" * 70)

    # Guardar resultados finales
    final_results = {
        'best_strategy': best_strategy,
        'best_f1_score': best_result['f1_score'],
        'total_models': total_models,
        'improvement_over_best': improvement,
        'execution_timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        'all_strategies': {k: v['f1_score'] for k, v in ensemble_results.items()},
        'top_individual_models': dict(top_individual)
    }

    print(f"\n Resultados guardados para análisis posterior")

else:
    print(" No se pudieron ejecutar las estrategias de ensemble")


 THE REAL SLIM ENSEMBLE - RESULTADOS DEFINITIVOS
 CAMPEÓN ABSOLUTO: Dynamic_Selection
   F1-Score: 0.8870
   Accuracy: 0.8774
   Precision: 0.8947
   Recall: 0.8793
   ROC-AUC: 0.9530
   Matthews Corr: 0.7531
   Cohen's Kappa: 0.7530

 ANÁLISIS DETALLADO:
   Verdaderos Positivos: 102
   Verdaderos Negativos: 84
   Falsos Positivos: 12
   Falsos Negativos: 14
   Especificidad: 0.8750
   Sensibilidad: 0.8793

 RANKING COMPLETO DE ESTRATEGIAS:
   1. Dynamic_Selection:
      F1=0.8870, ACC=0.8774, AUC=0.9530
   2. Confidence_Weighted:
      F1=0.8097, ACC=0.7783, AUC=0.9129
   3. Weighted_Ensemble:
      F1=0.8000, ACC=0.7642, AUC=0.8944

 MEGA ENSEMBLE vs MEJORES INDIVIDUALES:
   MEGA ENSEMBLE (Dynamic_Selection): 0.8870
   Mejor individual (BERT_RoBERTa): 0.8889
   Mejora: -0.22%

 TOP 5 MODELOS INDIVIDUALES:
   1. BERT_RoBERTa: F1=0.8889 (ensemble +-0.2%)
   2. BERT_BERT_base: F1=0.8870 (ensemble +0.0%)
   3. BERT_DistilBERT: F1=0.8734 (ensemble +1.6%)
   4. xgboost: F1=0.7386 (ensemble

In [34]:
# =============================================================================
# RESULTADOS FINALES - THE REAL SLIM ENSEMBLE CHAMPION
# =============================================================================

if ENSEMBLE_STRATEGIES_COMPLETE:
    print(" THE REAL SLIM ENSEMBLE - RESULTADOS DEFINITIVOS")
    print("=" * 70)

    # Encontrar la mejor estrategia
    best_strategy = max(ensemble_results.keys(),
                       key=lambda x: ensemble_results[x]['f1_score'])
    best_result = ensemble_results[best_strategy]

    print(f" CAMPEÓN ABSOLUTO: {best_strategy}")
    print(f"   F1-Score: {best_result['f1_score']:.4f}")
    print(f"   Accuracy: {best_result['accuracy']:.4f}")
    print(f"   Precision: {best_result['precision']:.4f}")
    print(f"   Recall: {best_result['recall']:.4f}")
    print(f"   ROC-AUC: {best_result['roc_auc']:.4f}")
    print(f"   Matthews Corr: {best_result['matthews_corrcoef']:.4f}")
    print(f"   Cohen's Kappa: {best_result['cohen_kappa']:.4f}")

    # Análisis de la matriz de confusión
    tn, fp, fn, tp = best_result['confusion_matrix'].ravel()
    specificity = tn / (tn + fp)
    sensitivity = tp / (tp + fn)

    print(f"\n ANÁLISIS DETALLADO:")
    print(f"   Verdaderos Positivos: {tp}")
    print(f"   Verdaderos Negativos: {tn}")
    print(f"   Falsos Positivos: {fp}")
    print(f"   Falsos Negativos: {fn}")
    print(f"   Especificidad: {specificity:.4f}")
    print(f"   Sensibilidad: {sensitivity:.4f}")

    # Ranking completo
    print(f"\n RANKING COMPLETO DE ESTRATEGIAS:")
    sorted_strategies = sorted(ensemble_results.items(),
                              key=lambda x: x[1]['f1_score'], reverse=True)

    for i, (strategy, results) in enumerate(sorted_strategies, 1):
        print(f"   {i}. {strategy}:")
        print(f"      F1={results['f1_score']:.4f}, "
              f"ACC={results['accuracy']:.4f}, "
              f"AUC={results['roc_auc']:.4f}")

    # Comparación con mejores modelos individuales
    print(f"\n MEGA ENSEMBLE vs MEJORES INDIVIDUALES:")

    individual_scores = {}
    for name, data in mega_ensemble_data.items():
        individual_scores[name] = data['f1_score']

    best_individual_name = max(individual_scores.keys(),
                              key=lambda x: individual_scores[x])
    best_individual_score = individual_scores[best_individual_name]

    improvement = ((best_result['f1_score'] - best_individual_score) /
                   best_individual_score) * 100

    print(f"   MEGA ENSEMBLE ({best_strategy}): {best_result['f1_score']:.4f}")
    print(f"   Mejor individual ({best_individual_name}): {best_individual_score:.4f}")
    print(f"   Mejora: {improvement:+.2f}%")

    # Top 5 modelos individuales
    print(f"\n TOP 5 MODELOS INDIVIDUALES:")
    top_individual = sorted(individual_scores.items(),
                           key=lambda x: x[1], reverse=True)[:5]

    for i, (name, score) in enumerate(top_individual, 1):
        vs_ensemble = ((best_result['f1_score'] - score) / score) * 100
        print(f"   {i}. {name}: F1={score:.4f} "
              f"(ensemble +{vs_ensemble:.1f}%)")

    # Análisis de confianza
    confidence_analysis = np.abs(best_result['probabilities'] - 0.5)
    avg_confidence = np.mean(confidence_analysis)
    high_confidence_pct = np.mean(confidence_analysis > 0.3) * 100

    print(f"\n ANÁLISIS DE CONFIANZA:")
    print(f"   Confianza promedio: {avg_confidence:.3f}")
    print(f"   Predicciones alta confianza: {high_confidence_pct:.1f}%")
    print(f"   Brier Score: {brier_score_loss(y_test, best_result['probabilities']):.3f}")

    print(f"\n" + "=" * 70)
    print(f" THE REAL SLIM ENSEMBLE COMPLETADO EXITOSAMENTE!")
    print(f" Sistema definitivo alcanza F1-Score de {best_result['f1_score']:.4f}")
    print(f" {total_models} modelos unidos")
    print(f"=" * 70)

    # Guardar resultados finales
    final_results = {
        'best_strategy': best_strategy,
        'best_f1_score': best_result['f1_score'],
        'total_models': total_models,
        'improvement_over_best': improvement,
        'execution_timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        'all_strategies': {k: v['f1_score'] for k, v in ensemble_results.items()},
        'top_individual_models': dict(top_individual)
    }

    print(f"\n Resultados guardados para análisis posterior")

else:
    print(" No se pudieron ejecutar las estrategias de ensemble")

 THE REAL SLIM ENSEMBLE - RESULTADOS DEFINITIVOS
 CAMPEÓN ABSOLUTO: Dynamic_Selection
   F1-Score: 0.8870
   Accuracy: 0.8774
   Precision: 0.8947
   Recall: 0.8793
   ROC-AUC: 0.9530
   Matthews Corr: 0.7531
   Cohen's Kappa: 0.7530

 ANÁLISIS DETALLADO:
   Verdaderos Positivos: 102
   Verdaderos Negativos: 84
   Falsos Positivos: 12
   Falsos Negativos: 14
   Especificidad: 0.8750
   Sensibilidad: 0.8793

 RANKING COMPLETO DE ESTRATEGIAS:
   1. Dynamic_Selection:
      F1=0.8870, ACC=0.8774, AUC=0.9530
   2. Confidence_Weighted:
      F1=0.8097, ACC=0.7783, AUC=0.9129
   3. Weighted_Ensemble:
      F1=0.8000, ACC=0.7642, AUC=0.8944

 MEGA ENSEMBLE vs MEJORES INDIVIDUALES:
   MEGA ENSEMBLE (Dynamic_Selection): 0.8870
   Mejor individual (BERT_RoBERTa): 0.8889
   Mejora: -0.22%

 TOP 5 MODELOS INDIVIDUALES:
   1. BERT_RoBERTa: F1=0.8889 (ensemble +-0.2%)
   2. BERT_BERT_base: F1=0.8870 (ensemble +0.0%)
   3. BERT_DistilBERT: F1=0.8734 (ensemble +1.6%)
   4. xgboost: F1=0.7386 (ensemble

In [28]:
# =============================================================================
# MATRICES DE CONFUSION Y ANALISIS DE ERRORES INTERACTIVO
# =============================================================================

if MEGA_ENSEMBLE_COMPLETE and 'BEST_ENSEMBLE_DATA' in globals():
    print("MATRICES DE CONFUSION Y ANALISIS DE ERRORES")
    print("=" * 60)

    def create_confusion_matrix_plots():
        """Crea matrices de confusión interactivas"""

        # 1. Matriz de confusión del mejor ensemble
        best_predictions = BEST_ENSEMBLE_DATA['predictions']
        cm = confusion_matrix(y_test, best_predictions)

        # Crear heatmap interactivo
        labels = ['Verdadero', 'Falso']

        # Matriz con anotaciones
        annotations = []
        for i in range(2):
            for j in range(2):
                percentage = (cm[i, j] / cm.sum()) * 100
                annotations.append(f'{cm[i, j]}<br>({percentage:.1f}%)')

        fig_cm = px.imshow(
            cm,
            text_auto=False,
            x=labels,
            y=labels,
            title=f'Matriz de Confusión - {BEST_ENSEMBLE_NAME}',
            labels={'x': 'Predicción', 'y': 'Realidad', 'color': 'Cantidad'},
            color_continuous_scale='Blues'
        )

        # Agregar anotaciones personalizadas
        annotations_list = []
        for i in range(2):
            for j in range(2):
                percentage = (cm[i, j] / cm.sum()) * 100
                annotations_list.append(
                    dict(
                        x=j, y=i,
                        text=f'{cm[i, j]}<br>({percentage:.1f}%)',
                        showarrow=False,
                        font=dict(color='white' if cm[i, j] > cm.max()/2 else 'black', size=14)
                    )
                )

        fig_cm.update_layout(annotations=annotations_list, height=500)

        try:
            fig_cm.show()
        except:
            print("   Matriz de confusión generada")

        # Calcular métricas detalladas
        tn, fp, fn, tp = cm.ravel()

        sensitivity = tp / (tp + fn)  # Recall
        specificity = tn / (tn + fp)
        precision = tp / (tp + fp)
        npv = tn / (tn + fn)  # Negative Predictive Value

        print(f"\nMETRICAS DETALLADAS DEL MEJOR ENSEMBLE:")
        print(f"   Verdaderos Positivos: {tp}")
        print(f"   Verdaderos Negativos: {tn}")
        print(f"   Falsos Positivos: {fp}")
        print(f"   Falsos Negativos: {fn}")
        print(f"   Sensibilidad (Recall): {sensitivity:.4f}")
        print(f"   Especificidad: {specificity:.4f}")
        print(f"   Precisión: {precision:.4f}")
        print(f"   Valor Predictivo Negativo: {npv:.4f}")

    def create_roc_pr_curves():
        """Crea curvas ROC y PR interactivas"""

        # Recopilar datos de todos los modelos principales
        model_data = []

        # Modelos tradicionales (top 3)
        if 'traditional_results' in globals() and traditional_results:
            sorted_traditional = sorted(traditional_results.items(),
                                       key=lambda x: x[1]['f1_score'], reverse=True)
            for name, data in sorted_traditional[:3]:
                model_data.append({
                    'name': name,
                    'probabilities': data['probabilities'],
                    'type': 'Traditional'
                })

        # Modelos BERT
        if 'bert_results' in globals() and bert_results:
            for name, data in bert_results.items():
                model_data.append({
                    'name': name,
                    'probabilities': data['probabilities'],
                    'type': 'BERT'
                })

        # Mejor ensemble
        model_data.append({
            'name': BEST_ENSEMBLE_NAME,
            'probabilities': BEST_ENSEMBLE_DATA['probabilities'],
            'type': 'Ensemble'
        })

        # 1. Curvas ROC
        fig_roc = go.Figure()

        for model in model_data:
            fpr, tpr, _ = roc_curve(y_test, model['probabilities'])
            auc_score = roc_auc_score(y_test, model['probabilities'])

            fig_roc.add_trace(go.Scatter(
                x=fpr, y=tpr,
                mode='lines',
                name=f"{model['name']} (AUC={auc_score:.3f})",
                line=dict(width=2)
            ))

        # Línea diagonal
        fig_roc.add_trace(go.Scatter(
            x=[0, 1], y=[0, 1],
            mode='lines',
            name='Random',
            line=dict(dash='dash', color='gray')
        ))

        fig_roc.update_layout(
            title='Curvas ROC - Comparación de Modelos',
            xaxis_title='Tasa de Falsos Positivos',
            yaxis_title='Tasa de Verdaderos Positivos',
            height=600
        )

        try:
            fig_roc.show()
        except:
            print("   Curvas ROC generadas")

        # 2. Curvas Precision-Recall
        fig_pr = go.Figure()

        for model in model_data:
            precision_vals, recall_vals, _ = precision_recall_curve(y_test, model['probabilities'])
            ap_score = average_precision_score(y_test, model['probabilities'])

            fig_pr.add_trace(go.Scatter(
                x=recall_vals, y=precision_vals,
                mode='lines',
                name=f"{model['name']} (AP={ap_score:.3f})",
                line=dict(width=2)
            ))

        # Línea baseline
        baseline = np.mean(y_test)
        fig_pr.add_shape(
            type="line",
            x0=0, y0=baseline,
            x1=1, y1=baseline,
            line=dict(dash="dash", color="gray")
        )

        fig_pr.update_layout(
            title='Curvas Precision-Recall - Comparación de Modelos',
            xaxis_title='Recall',
            yaxis_title='Precision',
            height=600
        )

        try:
            fig_pr.show()
        except:
            print("   Curvas Precision-Recall generadas")

    def analyze_prediction_errors():
        """Analiza errores de predicción del mejor modelo"""

        print("\nANALISIS DE ERRORES DE PREDICCION...")

        predictions = BEST_ENSEMBLE_DATA['predictions']
        probabilities = BEST_ENSEMBLE_DATA['probabilities']

        # Identificar tipos de errores
        false_positives = (predictions == 1) & (y_test == 0)
        false_negatives = (predictions == 0) & (y_test == 1)
        true_positives = (predictions == 1) & (y_test == 1)
        true_negatives = (predictions == 0) & (y_test == 0)

        # Análisis de confianza en errores
        fp_confidences = probabilities[false_positives]
        fn_confidences = 1 - probabilities[false_negatives]  # Invertir para falsos negativos
        tp_confidences = probabilities[true_positives]
        tn_confidences = 1 - probabilities[true_negatives]

        print(f"ANALISIS DE CONFIANZA EN PREDICCIONES:")
        print(f"   Falsos Positivos: {np.sum(false_positives)} casos")
        print(f"      Confianza promedio: {np.mean(fp_confidences):.3f}")
        print(f"   Falsos Negativos: {np.sum(false_negatives)} casos")
        print(f"      Confianza promedio: {np.mean(fn_confidences):.3f}")
        print(f"   Verdaderos Positivos: {np.sum(true_positives)} casos")
        print(f"      Confianza promedio: {np.mean(tp_confidences):.3f}")
        print(f"   Verdaderos Negativos: {np.sum(true_negatives)} casos")
        print(f"      Confianza promedio: {np.mean(tn_confidences):.3f}")

        # Histograma de confianza por tipo de predicción
        fig_conf = go.Figure()

        categories = ['Falsos Positivos', 'Falsos Negativos', 'Verdaderos Positivos', 'Verdaderos Negativos']
        confidences = [fp_confidences, fn_confidences, tp_confidences, tn_confidences]
        colors = ['red', 'orange', 'green', 'blue']

        for cat, conf, color in zip(categories, confidences, colors):
            if len(conf) > 0:
                fig_conf.add_trace(go.Histogram(
                    x=conf,
                    name=cat,
                    nbinsx=20,
                    opacity=0.7,
                    marker_color=color
                ))

        fig_conf.update_layout(
            title='Distribución de Confianza por Tipo de Predicción',
            xaxis_title='Confianza',
            yaxis_title='Frecuencia',
            barmode='overlay',
            height=500
        )

        try:
            fig_conf.show()
        except:
            print("   Histograma de confianza generado")

        # Análisis de casos difíciles (baja confianza)
        uncertainty_threshold = 0.6
        uncertain_cases = np.abs(probabilities - 0.5) < (uncertainty_threshold - 0.5)

        print(f"\nCASOS DIFICILES (confianza < {uncertainty_threshold}):")
        print(f"   Total casos difíciles: {np.sum(uncertain_cases)} ({np.mean(uncertain_cases)*100:.1f}%)")

        if np.sum(uncertain_cases) > 0:
            uncertain_accuracy = accuracy_score(
                y_test[uncertain_cases],
                predictions[uncertain_cases]
            )
            print(f"   Accuracy en casos difíciles: {uncertain_accuracy:.3f}")

    # Ejecutar todos los análisis
    try:
        create_confusion_matrix_plots()
        create_roc_pr_curves()
        analyze_prediction_errors()

        CONFUSION_ANALYSIS_COMPLETE = True
        print(f"\nANALISIS DE MATRICES Y ERRORES COMPLETADO")

    except Exception as e:
        print(f"Error en análisis de matrices: {e}")
        CONFUSION_ANALYSIS_COMPLETE = False

else:
    print("Mega ensemble no completado - saltando análisis de matrices")
    CONFUSION_ANALYSIS_COMPLETE = False

MATRICES DE CONFUSION Y ANALISIS DE ERRORES



METRICAS DETALLADAS DEL MEJOR ENSEMBLE:
   Verdaderos Positivos: 103
   Verdaderos Negativos: 85
   Falsos Positivos: 11
   Falsos Negativos: 13
   Sensibilidad (Recall): 0.8879
   Especificidad: 0.8854
   Precisión: 0.9035
   Valor Predictivo Negativo: 0.8673



ANALISIS DE ERRORES DE PREDICCION...
ANALISIS DE CONFIANZA EN PREDICCIONES:
   Falsos Positivos: 11 casos
      Confianza promedio: 0.836
   Falsos Negativos: 13 casos
      Confianza promedio: 0.820
   Verdaderos Positivos: 103 casos
      Confianza promedio: 0.951
   Verdaderos Negativos: 85 casos
      Confianza promedio: 0.908



CASOS DIFICILES (confianza < 0.6):
   Total casos difíciles: 4 (1.9%)
   Accuracy en casos difíciles: 0.750

ANALISIS DE MATRICES Y ERRORES COMPLETADO


In [29]:
# =============================================================================
# CURVAS DE APRENDIZAJE Y ANALISIS DE RENDIMIENTO
# =============================================================================

if PREPROCESSING_COMPLETE:
    print("CURVAS DE APRENDIZAJE Y ANALISIS DE RENDIMIENTO")
    print("=" * 60)

    def create_learning_curves():
        """Crea curvas de aprendizaje para detectar overfitting/underfitting"""

        from sklearn.model_selection import learning_curve

        print("Generando curvas de aprendizaje...")

        # Modelos para analizar
        models_to_analyze = {}

        if 'traditional_results' in globals() and traditional_results:
            # Seleccionar los 3 mejores modelos tradicionales
            sorted_traditional = sorted(traditional_results.items(),
                                       key=lambda x: x[1]['f1_score'], reverse=True)

            for name, data in sorted_traditional[:3]:
                models_to_analyze[name] = data['model']

        if not models_to_analyze:
            print("No hay modelos disponibles para curvas de aprendizaje")
            return

        # Generar curvas para cada modelo
        for model_name, model in models_to_analyze.items():
            try:
                print(f"\nGenerando curva para {model_name}...")

                # Calcular curva de aprendizaje
                train_sizes, train_scores, val_scores = learning_curve(
                    model, X_num_train_scaled, y_train,
                    train_sizes=np.linspace(0.1, 1.0, 10),
                    cv=5,
                    scoring='f1',
                    n_jobs=-1,
                    random_state=RANDOM_STATE
                )

                # Calcular estadísticas
                train_mean = np.mean(train_scores, axis=1)
                train_std = np.std(train_scores, axis=1)
                val_mean = np.mean(val_scores, axis=1)
                val_std = np.std(val_scores, axis=1)

                # Crear DataFrame para plotly
                df_learning = pd.DataFrame({
                    'Train_Size': train_sizes,
                    'Train_Mean': train_mean,
                    'Train_Std': train_std,
                    'Val_Mean': val_mean,
                    'Val_Std': val_std
                })

                # Plot interactivo
                fig = go.Figure()

                # Curva de entrenamiento
                fig.add_trace(go.Scatter(
                    x=df_learning['Train_Size'],
                    y=df_learning['Train_Mean'],
                    mode='lines+markers',
                    name='Training Score',
                    line=dict(color='blue'),
                    error_y=dict(type='data', array=df_learning['Train_Std'], visible=True)
                ))

                # Curva de validación
                fig.add_trace(go.Scatter(
                    x=df_learning['Train_Size'],
                    y=df_learning['Val_Mean'],
                    mode='lines+markers',
                    name='Validation Score',
                    line=dict(color='red'),
                    error_y=dict(type='data', array=df_learning['Val_Std'], visible=True)
                ))

                # Configuración del plot
                fig.update_layout(
                    title=f'Curva de Aprendizaje - {model_name}',
                    xaxis_title='Tamaño del Dataset de Entrenamiento',
                    yaxis_title='F1-Score',
                    height=500,
                    showlegend=True
                )

                try:
                    fig.show()
                except:
                    print(f"   Curva generada para {model_name}")

                # Análisis de overfitting/underfitting
                final_train_score = train_mean[-1]
                final_val_score = val_mean[-1]
                gap = final_train_score - final_val_score

                print(f"   Score final entrenamiento: {final_train_score:.4f}")
                print(f"   Score final validación: {final_val_score:.4f}")
                print(f"   Gap (overfitting): {gap:.4f}")

                if gap > 0.05:
                    print(f"   ⚠️ Posible overfitting detectado")
                elif final_val_score < 0.7:
                    print(f"   ⚠️ Posible underfitting detectado")
                else:
                    print(f"    Modelo bien balanceado")

            except Exception as e:
                print(f"   Error generando curva para {model_name}: {e}")
                continue

    def create_dnn_learning_curves():
        """Crea curvas de aprendizaje para modelos DNN"""

        if 'dnn_results' not in globals() or not dnn_results:
            print("No hay modelos DNN para analizar")
            return

        print("\nGenerando curvas de entrenamiento DNN...")

        for model_name, data in dnn_results.items():
            if 'history' not in data:
                continue

            try:
                history = data['history'].history

                # Crear DataFrame
                epochs = list(range(1, len(history['loss']) + 1))
                df_dnn = pd.DataFrame({
                    'Epoch': epochs,
                    'Train_Loss': history['loss'],
                    'Val_Loss': history.get('val_loss', []),
                    'Train_Acc': history.get('accuracy', []),
                    'Val_Acc': history.get('val_accuracy', [])
                })

                # Subplots para loss y accuracy
                fig = make_subplots(
                    rows=1, cols=2,
                    subplot_titles=['Loss', 'Accuracy'],
                    specs=[[{"secondary_y": False}, {"secondary_y": False}]]
                )

                # Loss subplot
                fig.add_trace(
                    go.Scatter(x=df_dnn['Epoch'], y=df_dnn['Train_Loss'],
                              mode='lines', name='Train Loss', line=dict(color='blue')),
                    row=1, col=1
                )

                if 'val_loss' in history:
                    fig.add_trace(
                        go.Scatter(x=df_dnn['Epoch'], y=df_dnn['Val_Loss'],
                                  mode='lines', name='Val Loss', line=dict(color='red')),
                        row=1, col=1
                    )

                # Accuracy subplot
                if 'accuracy' in history:
                    fig.add_trace(
                        go.Scatter(x=df_dnn['Epoch'], y=df_dnn['Train_Acc'],
                                  mode='lines', name='Train Acc', line=dict(color='green')),
                        row=1, col=2
                    )

                if 'val_accuracy' in history:
                    fig.add_trace(
                        go.Scatter(x=df_dnn['Epoch'], y=df_dnn['Val_Acc'],
                                  mode='lines', name='Val Acc', line=dict(color='orange')),
                        row=1, col=2
                    )

                # Configuración
                fig.update_layout(
                    title=f'Entrenamiento DNN - {model_name}',
                    height=400
                )
                fig.update_xaxes(title_text="Época", row=1, col=1)
                fig.update_xaxes(title_text="Época", row=1, col=2)
                fig.update_yaxes(title_text="Loss", row=1, col=1)
                fig.update_yaxes(title_text="Accuracy", row=1, col=2)

                try:
                    fig.show()
                except:
                    print(f"   Curva DNN generada para {model_name}")

            except Exception as e:
                print(f"   Error en curva DNN para {model_name}: {e}")
                continue

    def create_performance_comparison():
        """Crea comparación visual de rendimiento entre todos los modelos"""

        print("\nCreando comparación de rendimiento...")

        all_model_data = []

        # Recopilar datos de todos los modelos
        if 'traditional_results' in globals() and traditional_results:
            for name, data in traditional_results.items():
                all_model_data.append({
                    'Model': name,
                    'F1_Score': data['f1_score'],
                    'Accuracy': data['accuracy'],
                    'ROC_AUC': data['roc_auc'],
                    'Training_Time': data['training_time'],
                    'Type': 'Traditional'
                })

        if 'bert_results' in globals() and bert_results:
            for name, data in bert_results.items():
                all_model_data.append({
                    'Model': name,
                    'F1_Score': data['f1_score'],
                    'Accuracy': data['accuracy'],
                    'ROC_AUC': data['roc_auc'],
                    'Training_Time': data['training_time'],
                    'Type': 'BERT'
                })

        if 'dnn_results' in globals() and dnn_results:
            for name, data in dnn_results.items():
                all_model_data.append({
                    'Model': name,
                    'F1_Score': data['f1_score'],
                    'Accuracy': data['accuracy'],
                    'ROC_AUC': data['roc_auc'],
                    'Training_Time': data['training_time'],
                    'Type': 'DNN'
                })

        if not all_model_data:
            print("No hay datos de modelos para comparar")
            return

        df_comparison = pd.DataFrame(all_model_data)

        # 1. Scatter plot F1-Score vs Training Time
        fig1 = px.scatter(
            df_comparison,
            x='Training_Time',
            y='F1_Score',
            color='Type',
            size='ROC_AUC',
            hover_data=['Model', 'Accuracy'],
            title='F1-Score vs Tiempo de Entrenamiento',
            labels={'Training_Time': 'Tiempo de Entrenamiento (s)', 'F1_Score': 'F1-Score'}
        )

        try:
            fig1.show()
        except:
            print("   Plot de comparación F1 vs Tiempo generado")

        # 2. Radar chart para top 5 modelos
        top_5_models = df_comparison.nlargest(5, 'F1_Score')

        # Normalizar métricas para radar chart
        metrics = ['F1_Score', 'Accuracy', 'ROC_AUC']
        for metric in metrics:
            top_5_models[f'{metric}_norm'] = (top_5_models[metric] - top_5_models[metric].min()) / \
                                           (top_5_models[metric].max() - top_5_models[metric].min())

        fig2 = go.Figure()

        for _, row in top_5_models.iterrows():
            fig2.add_trace(go.Scatterpolar(
                r=[row['F1_Score_norm'], row['Accuracy_norm'], row['ROC_AUC_norm']],
                theta=['F1-Score', 'Accuracy', 'ROC-AUC'],
                fill='toself',
                name=row['Model']
            ))

        fig2.update_layout(
            polar=dict(
                radialaxis=dict(
                    visible=True,
                    range=[0, 1]
                )),
            title="Comparación Multimétrica - Top 5 Modelos",
            showlegend=True
        )

        try:
            fig2.show()
        except:
            print("   Radar chart de top 5 modelos generado")

    # Ejecutar análisis de curvas de aprendizaje
    try:
        create_learning_curves()
        create_dnn_learning_curves()
        create_performance_comparison()

        LEARNING_CURVES_COMPLETE = True
        print(f"\nCURVAS DE APRENDIZAJE COMPLETADAS")

    except Exception as e:
        print(f"Error en curvas de aprendizaje: {e}")
        LEARNING_CURVES_COMPLETE = False

else:
    print("Datos no preparados para curvas de aprendizaje")
    LEARNING_CURVES_COMPLETE = False

CURVAS DE APRENDIZAJE Y ANALISIS DE RENDIMIENTO
Generando curvas de aprendizaje...

Generando curva para xgboost...


   Score final entrenamiento: 1.0000
   Score final validación: 0.6956
   Gap (overfitting): 0.3044
   ⚠️ Posible overfitting detectado

Generando curva para lightgbm...


   Score final entrenamiento: 1.0000
   Score final validación: 0.6799
   Gap (overfitting): 0.3201
   ⚠️ Posible overfitting detectado

Generando curva para random_forest...


   Score final entrenamiento: 1.0000
   Score final validación: 0.7035
   Gap (overfitting): 0.2965
   ⚠️ Posible overfitting detectado

Generando curvas de entrenamiento DNN...



Creando comparación de rendimiento...



CURVAS DE APRENDIZAJE COMPLETADAS


In [30]:
# =============================================================================
# ANALISIS DE FEATURE IMPORTANCE CON NOMBRES REALES
# =============================================================================

if PREPROCESSING_COMPLETE:
    print("ANALISIS DE FEATURE IMPORTANCE")
    print("=" * 50)

    def analyze_feature_importance():
        """Analiza feature importance con nombres reales de columnas"""

        # Obtener nombres reales de features
        numeric_cols = features_df.select_dtypes(include=[np.number]).columns
        if 'BinaryNumTarget' in numeric_cols:
            feature_names = numeric_cols.drop('BinaryNumTarget').tolist()
        else:
            feature_names = numeric_cols.tolist()

        print(f"Analizando {len(feature_names)} features con nombres reales...")

        importance_data = {}

        # Analizar modelos tradicionales que tienen feature importance
        if 'traditional_results' in globals() and traditional_results:

            for model_name, model_data in traditional_results.items():
                model = model_data['model']

                if hasattr(model, 'feature_importances_'):
                    # Modelos como Random Forest, XGBoost, etc.
                    importances = model.feature_importances_

                elif hasattr(model, 'coef_'):
                    # Modelos lineales
                    importances = np.abs(model.coef_).flatten()

                else:
                    continue

                # Asegurar que las dimensiones coincidan
                if len(importances) == len(feature_names):
                    importance_data[model_name] = {
                        'importances': importances,
                        'feature_names': feature_names,
                        'f1_score': model_data['f1_score']
                    }

        return importance_data, feature_names

    def create_feature_importance_plots(importance_data, feature_names):
        """Crea plots interactivos de feature importance"""

        if not importance_data:
            print("No hay datos de feature importance disponibles")
            return

        print(f"Creando plots de feature importance para {len(importance_data)} modelos...")

        # 1. FEATURE IMPORTANCE POR MODELO
        for model_name, data in importance_data.items():
            importances = data['importances']
            names = data['feature_names']
            f1_score = data['f1_score']

            # Crear DataFrame para este modelo
            df_importance = pd.DataFrame({
                'Feature': names,
                'Importance': importances
            }).sort_values('Importance', ascending=True).tail(20)  # Top 20

            # Plot interactivo
            fig = px.bar(
                df_importance,
                x='Importance',
                y='Feature',
                title=f'Top 20 Features - {model_name} (F1={f1_score:.4f})',
                labels={'Importance': 'Importancia', 'Feature': 'Característica'},
                orientation='h',
                height=800
            )

            fig.update_layout(
                yaxis={'categoryorder': 'total ascending'},
                font=dict(size=12)
            )

            try:
                fig.show()
            except:
                print(f"   Plot generado para {model_name}")

        # 2. FEATURE IMPORTANCE PROMEDIO
        if len(importance_data) > 1:
            print("Creando análisis de importancia promedio...")

            # Calcular importancia promedio ponderada por F1-score
            avg_importance = np.zeros(len(feature_names))
            total_weight = 0

            for model_name, data in importance_data.items():
                weight = data['f1_score']  # Ponderar por F1-score
                avg_importance += data['importances'] * weight
                total_weight += weight

            avg_importance = avg_importance / total_weight

            # DataFrame para importancia promedio
            df_avg = pd.DataFrame({
                'Feature': feature_names,
                'Avg_Importance': avg_importance
            }).sort_values('Avg_Importance', ascending=True).tail(25)  # Top 25

            # Plot interactivo
            fig_avg = px.bar(
                df_avg,
                x='Avg_Importance',
                y='Feature',
                title='Top 25 Features - Importancia Promedio Ponderada',
                labels={'Avg_Importance': 'Importancia Promedio', 'Feature': 'Característica'},
                orientation='h',
                height=900,
                color='Avg_Importance',
                color_continuous_scale='viridis'
            )

            fig_avg.update_layout(
                yaxis={'categoryorder': 'total ascending'},
                font=dict(size=12)
            )

            try:
                fig_avg.show()
            except:
                print("   Plot de importancia promedio generado")

            # Análisis textual de top features
            print(f"\nTOP 10 FEATURES MAS IMPORTANTES:")
            top_features = df_avg.tail(10)
            for i, (_, row) in enumerate(reversed(list(top_features.iterrows())), 1):
                print(f"   {i:2d}. {row['Feature']:35s}: {row['Avg_Importance']:.4f}")

        # 3. HEATMAP DE IMPORTANCIA POR MODELO
        if len(importance_data) > 1:
            print("Creando heatmap comparativo...")

            # Preparar datos para heatmap
            heatmap_data = []
            for model_name, data in importance_data.items():
                model_importance = dict(zip(data['feature_names'], data['importances']))
                heatmap_data.append({
                    'Model': model_name,
                    **model_importance
                })

            df_heatmap = pd.DataFrame(heatmap_data).set_index('Model')

            # Seleccionar solo las features más importantes
            feature_importance_sum = df_heatmap.sum().sort_values(ascending=False).head(20)
            df_heatmap_top = df_heatmap[feature_importance_sum.index]

            # Heatmap interactivo
            fig_heatmap = px.imshow(
                df_heatmap_top,
                title='Importancia de Features por Modelo (Top 20)',
                labels={'x': 'Feature', 'y': 'Modelo', 'color': 'Importancia'},
                aspect='auto',
                color_continuous_scale='viridis'
            )

            fig_heatmap.update_layout(
                height=400,
                xaxis={'tickangle': 45},
                font=dict(size=10)
            )

            try:
                fig_heatmap.show()
            except:
                print("   Heatmap comparativo generado")

    # Ejecutar análisis de feature importance
    try:
        importance_data, feature_names = analyze_feature_importance()

        if importance_data:
            create_feature_importance_plots(importance_data, feature_names)

            FEATURE_IMPORTANCE_COMPLETE = True
            print(f"\nANALISIS DE FEATURE IMPORTANCE COMPLETADO")
            print(f"Modelos analizados: {len(importance_data)}")
        else:
            print("No se encontraron modelos con feature importance")
            FEATURE_IMPORTANCE_COMPLETE = False

    except Exception as e:
        print(f"Error en análisis de feature importance: {e}")
        FEATURE_IMPORTANCE_COMPLETE = False

else:
    print("Datos no preparados para análisis de feature importance")
    FEATURE_IMPORTANCE_COMPLETE = False

ANALISIS DE FEATURE IMPORTANCE
Analizando 57 features con nombres reales...
Creando plots de feature importance para 9 modelos...


Creando análisis de importancia promedio...



TOP 10 FEATURES MAS IMPORTANTES:
    1. Average word length                : 27.0038
    2. cred                               : 23.0828
    3. favourites_count                   : 22.3671
    4. statuses_count                     : 22.2171
    5. friends_count                      : 20.4969
    6. capitals                           : 20.3382
    7. followers_count                    : 15.8776
    8. Word count                         : 14.8154
    9. PERSON_percentage                  : 12.7065
   10. listed_count                       : 12.3741
Creando heatmap comparativo...



ANALISIS DE FEATURE IMPORTANCE COMPLETADO
Modelos analizados: 9


In [31]:
# =============================================================================
# DASHBOARD INTERACTIVO COMPLETO - ANALISIS VISUAL AVANZADO
# =============================================================================

if MEGA_ENSEMBLE_COMPLETE and 'FINAL_ALL_MODELS' in globals():
    print("DASHBOARD INTERACTIVO COMPLETO")
    print("=" * 50)

    def create_comprehensive_dashboard(all_models):
        """Crea dashboard interactivo completo con 5 paneles"""

        # Preparar datos para visualización
        model_names = []
        f1_scores = []
        accuracies = []
        roc_aucs = []
        model_types = []

        for name, data in all_models.items():
            model_names.append(name)
            f1_scores.append(data['f1_score'])
            accuracies.append(data['accuracy'])
            roc_aucs.append(data['roc_auc'])

            # Determinar tipo de modelo
            if 'model_type' in data:
                model_type = data['model_type']
            elif 'strategy' in data:
                model_type = 'consensus'
            else:
                model_type = 'unknown'
            model_types.append(model_type)

        # Crear DataFrame para análisis
        df_results = pd.DataFrame({
            'Model': model_names,
            'F1_Score': f1_scores,
            'Accuracy': accuracies,
            'ROC_AUC': roc_aucs,
            'Type': model_types
        })

        print(f"Creando dashboard con {len(df_results)} modelos...")

        # PANEL 1: F1-Score por Categoría
        print("Panel 1: F1-Score por categorías")
        fig1 = px.bar(
            df_results.sort_values('F1_Score', ascending=True),
            y='Model',
            x='F1_Score',
            color='Type',
            title='F1-Score por Modelo y Categoría',
            labels={'F1_Score': 'F1-Score', 'Model': 'Modelo'},
            height=600
        )
        fig1.update_layout(yaxis={'categoryorder':'total ascending'})

        try:
            fig1.show()
        except:
            print("   (Plot generado - mostrando en entorno interactivo)")

        # PANEL 2: ROC-AUC por Categoría
        print("Panel 2: ROC-AUC por categorías")
        fig2 = px.bar(
            df_results.sort_values('ROC_AUC', ascending=True),
            y='Model',
            x='ROC_AUC',
            color='Type',
            title='ROC-AUC por Modelo y Categoría',
            labels={'ROC_AUC': 'ROC-AUC', 'Model': 'Modelo'},
            height=600
        )
        fig2.update_layout(yaxis={'categoryorder':'total ascending'})

        try:
            fig2.show()
        except:
            print("   (Plot generado - mostrando en entorno interactivo)")

        # PANEL 3: Scatter Accuracy vs F1-Score
        print("Panel 3: Scatter Accuracy vs F1-Score")
        fig3 = px.scatter(
            df_results,
            x='Accuracy',
            y='F1_Score',
            color='Type',
            size=[1]*len(df_results),
            hover_data=['Model'],
            title='Accuracy vs F1-Score por Tipo de Modelo',
            labels={'Accuracy': 'Accuracy', 'F1_Score': 'F1-Score'}
        )

        # Línea de igualdad
        fig3.add_shape(
            type="line",
            x0=df_results['Accuracy'].min(),
            y0=df_results['Accuracy'].min(),
            x1=df_results['Accuracy'].max(),
            y1=df_results['Accuracy'].max(),
            line=dict(dash="dash", color="gray")
        )

        try:
            fig3.show()
        except:
            print("   (Plot generado - mostrando en entorno interactivo)")

        # PANEL 4: Distribución de Métricas (Violin Plot)
        print("Panel 4: Distribución de métricas")

        # Preparar datos para violin plot
        metrics_data = []
        for _, row in df_results.iterrows():
            metrics_data.extend([
                {'Model_Type': row['Type'], 'Metric': 'F1-Score', 'Value': row['F1_Score']},
                {'Model_Type': row['Type'], 'Metric': 'Accuracy', 'Value': row['Accuracy']},
                {'Model_Type': row['Type'], 'Metric': 'ROC-AUC', 'Value': row['ROC_AUC']}
            ])

        df_metrics = pd.DataFrame(metrics_data)

        fig4 = px.violin(
            df_metrics,
            x='Model_Type',
            y='Value',
            color='Metric',
            title='Distribución de Métricas por Tipo de Modelo',
            labels={'Value': 'Valor de Métrica', 'Model_Type': 'Tipo de Modelo'}
        )

        try:
            fig4.show()
        except:
            print("   (Plot generado - mostrando en entorno interactivo)")

        # PANEL 5: Ranking General
        print("Panel 5: Ranking general")

        # Calcular score combinado
        df_results['Combined_Score'] = (
            df_results['F1_Score'] * 0.5 +
            df_results['Accuracy'] * 0.3 +
            df_results['ROC_AUC'] * 0.2
        )

        top_10 = df_results.nlargest(10, 'Combined_Score')

        fig5 = px.bar(
            top_10,
            x='Combined_Score',
            y='Model',
            color='Type',
            title='TOP 10 - Ranking General (Score Combinado)',
            labels={'Combined_Score': 'Score Combinado', 'Model': 'Modelo'},
            orientation='h'
        )
        fig5.update_layout(yaxis={'categoryorder':'total ascending'})

        try:
            fig5.show()
        except:
            print("   (Plot generado - mostrando en entorno interactivo)")

        # Resumen estadístico por tipo
        print(f"\nRESUMEN ESTADISTICO POR TIPO:")
        summary_stats = df_results.groupby('Type').agg({
            'F1_Score': ['mean', 'std', 'max'],
            'Accuracy': ['mean', 'std', 'max'],
            'ROC_AUC': ['mean', 'std', 'max']
        }).round(4)

        print(summary_stats)

        # Mejores por categoría
        print(f"\nMEJORES MODELOS POR CATEGORIA:")
        for model_type in df_results['Type'].unique():
            subset = df_results[df_results['Type'] == model_type]
            if len(subset) > 0:
                best_model = subset.loc[subset['F1_Score'].idxmax()]
                print(f"   {model_type.upper()}: {best_model['Model']} "
                      f"(F1={best_model['F1_Score']:.4f})")

        return df_results, {
            'panel1_f1_by_category': fig1,
            'panel2_auc_by_category': fig2,
            'panel3_accuracy_vs_f1': fig3,
            'panel4_metrics_distribution': fig4,
            'panel5_general_ranking': fig5
        }

    # Crear dashboard completo
    try:
        dashboard_data, dashboard_plots = create_comprehensive_dashboard(FINAL_ALL_MODELS)

        print(f"\nDASHBOARD INTERACTIVO COMPLETADO")
        print(f"Total modelos analizados: {len(dashboard_data)}")
        print(f"Paneles generados: {len(dashboard_plots)}")

        DASHBOARD_COMPLETE = True

    except Exception as e:
        print(f"Error creando dashboard: {e}")
        DASHBOARD_COMPLETE = False

else:
    print("Dashboard requiere mega ensemble completo")
    DASHBOARD_COMPLETE = False

DASHBOARD INTERACTIVO COMPLETO
Creando dashboard con 6 modelos...
Panel 1: F1-Score por categorías


Panel 2: ROC-AUC por categorías


Panel 3: Scatter Accuracy vs F1-Score


Panel 4: Distribución de métricas


Panel 5: Ranking general



RESUMEN ESTADISTICO POR TIPO:
          F1_Score                 Accuracy                 ROC_AUC          \
              mean     std     max     mean     std     max    mean     std   
Type                                                                          
consensus   0.8802  0.0152  0.8957   0.8648  0.0213  0.8868  0.9457  0.0092   
unknown     0.7767  0.0947  0.8860   0.7327  0.1261  0.8774  0.8044  0.1330   

                   
              max  
Type               
consensus  0.9561  
unknown    0.9550  

MEJORES MODELOS POR CATEGORIA:
   UNKNOWN: Level_2_BERT (F1=0.8860)
   CONSENSUS: Dynamic_Meta (F1=0.8957)

DASHBOARD INTERACTIVO COMPLETADO
Total modelos analizados: 6
Paneles generados: 5


In [32]:
# =============================================================================
# CROSS-VALIDATION EXHAUSTIVO Y ANALISIS DE ESTABILIDAD
# =============================================================================

if MEGA_ENSEMBLE_COMPLETE:
    print("CROSS-VALIDATION EXHAUSTIVO Y ANALISIS DE ESTABILIDAD")
    print("=" * 65)

    def perform_cross_validation_analysis(models_dict, X_data, y_data, cv_folds=5):
        """Realiza cross-validation exhaustivo para todos los modelos"""

        cv_results = {}
        cv = StratifiedKFold(n_splits=cv_folds, shuffle=True, random_state=RANDOM_STATE)

        print(f"Realizando {cv_folds}-fold cross-validation...")

        for model_name, model_data in models_dict.items():
            if 'model_obj' not in model_data:
                continue

            print(f"\nEvaluando {model_name}...")

            try:
                model = model_data['model_obj']

                # Cross-validation para múltiples métricas
                f1_scores = cross_val_score(model, X_data, y_data, cv=cv, scoring='f1', n_jobs=-1)
                accuracy_scores = cross_val_score(model, X_data, y_data, cv=cv, scoring='accuracy', n_jobs=-1)
                roc_auc_scores = cross_val_score(model, X_data, y_data, cv=cv, scoring='roc_auc', n_jobs=-1)

                cv_results[model_name] = {
                    'f1_scores': f1_scores,
                    'f1_mean': np.mean(f1_scores),
                    'f1_std': np.std(f1_scores),
                    'accuracy_scores': accuracy_scores,
                    'accuracy_mean': np.mean(accuracy_scores),
                    'accuracy_std': np.std(accuracy_scores),
                    'roc_auc_scores': roc_auc_scores,
                    'roc_auc_mean': np.mean(roc_auc_scores),
                    'roc_auc_std': np.std(roc_auc_scores),
                    'stability_score': 1 - np.std(f1_scores) / (np.mean(f1_scores) + 1e-8)
                }

                print(f"   F1: {np.mean(f1_scores):.4f} ± {np.std(f1_scores):.4f}")
                print(f"   ACC: {np.mean(accuracy_scores):.4f} ± {np.std(accuracy_scores):.4f}")
                print(f"   AUC: {np.mean(roc_auc_scores):.4f} ± {np.std(roc_auc_scores):.4f}")
                print(f"   Estabilidad: {cv_results[model_name]['stability_score']:.4f}")

            except Exception as e:
                print(f"   Error en CV para {model_name}: {e}")
                continue

        return cv_results

    # Realizar cross-validation solo en modelos con objetos disponibles
    available_models = {}

    # Filtrar modelos tradicionales
    if 'traditional_results' in globals() and traditional_results:
        for name, data in traditional_results.items():
            available_models[name] = data

    if available_models:
        cv_analysis = perform_cross_validation_analysis(
            available_models,
            X_num_train_scaled,
            y_train
        )

        if cv_analysis:
            print(f"\nRANKING POR ESTABILIDAD (F1 + Consistencia):")
            sorted_stability = sorted(cv_analysis.items(),
                                    key=lambda x: x[1]['stability_score'], reverse=True)

            for i, (name, results) in enumerate(sorted_stability, 1):
                print(f"   {i}. {name}: Estabilidad={results['stability_score']:.4f}, "
                      f"F1={results['f1_mean']:.4f}±{results['f1_std']:.4f}")

            print(f"\nANALISIS DE ESTABILIDAD COMPLETADO")
            CV_ANALYSIS_COMPLETE = True
        else:
            print("No se pudo completar el análisis de CV")
            CV_ANALYSIS_COMPLETE = False
    else:
        print("No hay modelos disponibles para CV")
        CV_ANALYSIS_COMPLETE = False
else:
    print("Mega ensemble no completado - saltando CV")
    CV_ANALYSIS_COMPLETE = False

CROSS-VALIDATION EXHAUSTIVO Y ANALISIS DE ESTABILIDAD
Realizando 5-fold cross-validation...
No se pudo completar el análisis de CV


In [33]:
if 'MEGA_ENSEMBLE_COMPLETE' in globals() and MEGA_ENSEMBLE_COMPLETE:
    print("CROSS-VALIDATION EXHAUSTIVO Y ANALISIS DE ESTABILIDAD")
    print("=" * 65)

    # Importaciones necesarias para los modelos
    from sklearn.linear_model import LogisticRegression
    from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, ExtraTreesClassifier
    from sklearn.svm import SVC
    from sklearn.model_selection import StratifiedKFold, cross_val_score

    def perform_cross_validation_analysis(X_data, y_data, cv_folds=5):
        """Realiza cross-validation exhaustivo para modelos seleccionados"""

        cv_results = {}
        cv = StratifiedKFold(n_splits=cv_folds, shuffle=True, random_state=RANDOM_STATE)

        print(f"Realizando {cv_folds}-fold cross-validation...")

        # Lista de modelos a evaluar con cross-validation
        models_to_test = [
            ('LogisticRegression', LogisticRegression(random_state=RANDOM_STATE, max_iter=1000)),
            ('RandomForest', RandomForestClassifier(n_estimators=100, random_state=RANDOM_STATE)),
            ('SVM', SVC(kernel='rbf', probability=True, random_state=RANDOM_STATE)),
            ('GradientBoosting', GradientBoostingClassifier(random_state=RANDOM_STATE)),
            ('AdaBoost', AdaBoostClassifier(random_state=RANDOM_STATE)),
            ('ExtraTrees', ExtraTreesClassifier(n_estimators=100, random_state=RANDOM_STATE))
        ]

        # Agregar XGBoost si está disponible
        try:
            from xgboost import XGBClassifier
            models_to_test.append(('XGBoost', XGBClassifier(random_state=RANDOM_STATE, eval_metric='logloss')))
        except ImportError:
            print("   XGBoost no disponible, saltando...")

        for model_name, model in models_to_test:
            print(f"\nEvaluando {model_name}...")

            try:
                # Cross-validation para múltiples métricas
                f1_scores = cross_val_score(model, X_data, y_data, cv=cv, scoring='f1', n_jobs=-1)
                accuracy_scores = cross_val_score(model, X_data, y_data, cv=cv, scoring='accuracy', n_jobs=-1)
                roc_auc_scores = cross_val_score(model, X_data, y_data, cv=cv, scoring='roc_auc', n_jobs=-1)

                cv_results[model_name] = {
                    'f1_scores': f1_scores,
                    'f1_mean': np.mean(f1_scores),
                    'f1_std': np.std(f1_scores),
                    'accuracy_scores': accuracy_scores,
                    'accuracy_mean': np.mean(accuracy_scores),
                    'accuracy_std': np.std(accuracy_scores),
                    'roc_auc_scores': roc_auc_scores,
                    'roc_auc_mean': np.mean(roc_auc_scores),
                    'roc_auc_std': np.std(roc_auc_scores),
                    'stability_score': 1 - np.std(f1_scores) / (np.mean(f1_scores) + 1e-8)
                }

                print(f"   F1: {np.mean(f1_scores):.4f} ± {np.std(f1_scores):.4f}")
                print(f"   ACC: {np.mean(accuracy_scores):.4f} ± {np.std(accuracy_scores):.4f}")
                print(f"   AUC: {np.mean(roc_auc_scores):.4f} ± {np.std(roc_auc_scores):.4f}")
                print(f"   Estabilidad: {cv_results[model_name]['stability_score']:.4f}")

            except Exception as e:
                print(f"   Error en CV para {model_name}: {e}")
                continue

        return cv_results

    # Verificar que tenemos los datos necesarios
    if 'X_num_train_scaled' in globals() and 'y_train' in globals():
        cv_analysis = perform_cross_validation_analysis(
            X_num_train_scaled,
            y_train
        )

        if cv_analysis:
            print(f"\nRANKING POR ESTABILIDAD (F1 + Consistencia):")
            sorted_stability = sorted(cv_analysis.items(),
                                    key=lambda x: x[1]['stability_score'], reverse=True)

            for i, (name, results) in enumerate(sorted_stability, 1):
                print(f"   {i}. {name}: Estabilidad={results['stability_score']:.4f}, "
                      f"F1={results['f1_mean']:.4f}±{results['f1_std']:.4f}")

            print(f"\nRANKING POR F1-SCORE PROMEDIO:")
            sorted_f1 = sorted(cv_analysis.items(),
                              key=lambda x: x[1]['f1_mean'], reverse=True)

            for i, (name, results) in enumerate(sorted_f1, 1):
                print(f"   {i}. {name}: F1={results['f1_mean']:.4f}±{results['f1_std']:.4f}, "
                      f"Estabilidad={results['stability_score']:.4f}")

            print(f"\nANALISIS DE ESTABILIDAD COMPLETADO")
            CV_ANALYSIS_COMPLETE = True
        else:
            print("No se pudo completar el análisis de CV")
            CV_ANALYSIS_COMPLETE = False
    else:
        print("No hay datos disponibles para CV")
        CV_ANALYSIS_COMPLETE = False
else:
    print("Mega ensemble no completado - saltando CV")
    CV_ANALYSIS_COMPLETE = False

CROSS-VALIDATION EXHAUSTIVO Y ANALISIS DE ESTABILIDAD
Realizando 5-fold cross-validation...

Evaluando LogisticRegression...
   F1: 0.6872 ± 0.0319
   ACC: 0.6513 ± 0.0313
   AUC: 0.7174 ± 0.0236
   Estabilidad: 0.9536

Evaluando RandomForest...
   F1: 0.7083 ± 0.0160
   ACC: 0.6691 ± 0.0190
   AUC: 0.7276 ± 0.0170
   Estabilidad: 0.9774

Evaluando SVM...
   F1: 0.6930 ± 0.0391
   ACC: 0.6064 ± 0.0191
   AUC: 0.6994 ± 0.0341
   Estabilidad: 0.9436

Evaluando GradientBoosting...
   F1: 0.6959 ± 0.0290
   ACC: 0.6631 ± 0.0255
   AUC: 0.7206 ± 0.0198
   Estabilidad: 0.9583

Evaluando AdaBoost...
   F1: 0.6978 ± 0.0130
   ACC: 0.6655 ± 0.0167
   AUC: 0.7231 ± 0.0199
   Estabilidad: 0.9814

Evaluando ExtraTrees...
   F1: 0.7211 ± 0.0095
   ACC: 0.6808 ± 0.0120
   AUC: 0.7338 ± 0.0231
   Estabilidad: 0.9868

Evaluando XGBoost...
   F1: 0.6995 ± 0.0252
   ACC: 0.6620 ± 0.0283
   AUC: 0.7044 ± 0.0168
   Estabilidad: 0.9640

RANKING POR ESTABILIDAD (F1 + Consistencia):
   1. ExtraTrees: Estabil