# TruthSeeker: Ensemble Completo para Detección de Desinformación

En este notebook implemento el sistema TruthSeeker, un ensemble híbrido que combina múltiples arquitecturas de machine learning y deep learning para la detección avanzada de desinformación. El sistema integra modelos tradicionales, algoritmos de boosting, redes neuronales, modelos NLP y transformers BERT en un meta-ensemble robusto.

## Arquitectura TruthSeeker
1. **Nivel Base**: Modelos individuales especializados
2. **Nivel Meta**: Ensemble learning con votación ponderada
3. **Nivel Final**: Meta-modelo para decisión final
4. **Sistema de Confianza**: Métricas de certeza para cada predicción

## Objetivos
- Maximizo la precisión mediante ensemble diversity
- Implemento sistema de confianza para evaluación de predicciones
- Genero visualizaciones interactivas completas con Plotly
- Produzco análisis exhaustivo de rendimiento y características del modelo
- Creo dashboard interactivo para evaluación en tiempo real

In [1]:
# Detección del entorno de ejecución
import sys
IN_COLAB = 'google.colab' in sys.modules

# Configuración del entorno e importaciones principales
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import joblib
import json
import pickle
import os
from pathlib import Path
import time
from datetime import datetime
import gc

# Instalación de dependencias para Colab
if IN_COLAB:
    !pip install plotly xgboost lightgbm catboost transformers torch scikit-learn==1.3.2 -q
    from google.colab import drive
    drive.mount('/content/drive')
    print("Ejecutando en Google Colab")
else:
    print("Ejecutando en entorno local")

# Visualización avanzada con Plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
from plotly.colors import qualitative, sequential, diverging

# Machine Learning
from sklearn.ensemble import VotingClassifier, StackingClassifier
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.metrics import (
    accuracy_score, precision_recall_fscore_support, roc_auc_score,
    classification_report, confusion_matrix, roc_curve, precision_recall_curve,
    average_precision_score, matthews_corrcoef, cohen_kappa_score
)
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.calibration import calibration_curve, CalibratedClassifierCV

# Deep Learning
try:
    import torch
    import torch.nn as nn
    from torch.utils.data import DataLoader, TensorDataset
    TORCH_AVAILABLE = True
except ImportError:
    TORCH_AVAILABLE = False

# Transformers para BERT
try:
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    BERT_AVAILABLE = True
    print("Transformers disponible para integración BERT")
except ImportError:
    BERT_AVAILABLE = False
    print("Transformers no disponible - ensemble sin BERT")

# Utilidades
from tqdm.auto import tqdm
from collections import defaultdict
import seaborn as sns
import matplotlib.pyplot as plt

# Configuración de estilo para visualizaciones
plt.style.use('default')
PLOTLY_THEME = 'plotly_white'
COLORS = px.colors.qualitative.Set3
SEQUENTIAL_COLORS = px.colors.sequential.Viridis

print(f"TruthSeeker Ensemble System Iniciado")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
if TORCH_AVAILABLE:
    print(f"PyTorch: {torch.__version__}")
print(f"BERT disponible: {BERT_AVAILABLE}")

Ejecutando en entorno local
Transformers disponible para integración BERT
TruthSeeker Ensemble System Iniciado
Timestamp: 2025-08-24 21:19:39
PyTorch: 2.8.0+cpu
BERT disponible: True


## Configuración del Sistema TruthSeeker

Configuro las rutas, parámetros y componentes principales del sistema ensemble.

In [2]:
# Configuración de rutas y parámetros
if IN_COLAB:
    BASE_PATH = Path('/content')
    MODELS_PATH = BASE_PATH / 'models'
    DATA_PATH = BASE_PATH / 'processed_data'
else:
    BASE_PATH = Path('../')
    MODELS_PATH = BASE_PATH / 'models'
    DATA_PATH = BASE_PATH / 'processed_data'

RESULTS_PATH = BASE_PATH / 'results' / 'truthseeker_ensemble'
BERT_MODELS_PATH = MODELS_PATH / 'bert_models'
NLP_MODELS_PATH = MODELS_PATH / 'nlp_models'

# Creo directorios necesarios
RESULTS_PATH.mkdir(parents=True, exist_ok=True)
(RESULTS_PATH / 'visualizations').mkdir(exist_ok=True)
(RESULTS_PATH / 'models').mkdir(exist_ok=True)

# Configuración del ensemble
ENSEMBLE_CONFIG = {
    'cross_validation_folds': 5,
    'confidence_threshold': 0.7,
    'voting_weights': 'auto',  # Se calculan dinámicamente
    'calibration_method': 'isotonic',
    'meta_model': 'xgboost',  # Meta-learner final
    'diversity_bonus': 0.1,  # Bonus para modelos diversos
    'min_model_performance': 0.6,  # F1 mínimo para incluir modelo
    'use_bert': BERT_AVAILABLE,
    'use_nlp': True,
    'use_traditional': True,
    'use_boosting': True,
    'use_neural_networks': True
}

# Configuración de visualizaciones
VIZ_CONFIG = {
    'height': 600,
    'width': 1000,
    'template': PLOTLY_THEME,
    'color_palette': COLORS,
    'font_size': 12,
    'title_font_size': 16,
    'save_html': True,
    'save_png': True,
    'interactive': True
}

print(f"Configuración TruthSeeker:")
for key, value in ENSEMBLE_CONFIG.items():
    print(f"  {key}: {value}")

print(f"\nRutas configuradas:")
print(f"  Modelos: {MODELS_PATH}")
print(f"  Datos: {DATA_PATH}")
print(f"  Resultados: {RESULTS_PATH}")
print(f"  BERT: {BERT_MODELS_PATH}")
print(f"  NLP: {NLP_MODELS_PATH}")

Configuración TruthSeeker:
  cross_validation_folds: 5
  confidence_threshold: 0.7
  voting_weights: auto
  calibration_method: isotonic
  meta_model: xgboost
  diversity_bonus: 0.1
  min_model_performance: 0.6
  use_bert: True
  use_nlp: True
  use_traditional: True
  use_boosting: True
  use_neural_networks: True

Rutas configuradas:
  Modelos: ..\models
  Datos: ..\processed_data
  Resultados: ..\results\truthseeker_ensemble
  BERT: ..\models\bert_models
  NLP: ..\models\nlp_models


## Carga de Datos y Modelos Base

Cargo los datos procesados y todos los modelos entrenados previamente para integrarlos en el ensemble.

In [3]:
# Función para cargar datos inteligentemente
def load_processed_data():
    """Carga los datos procesados del sistema"""
    
    try:
        # Intento cargar datos procesados
        X_train = joblib.load(DATA_PATH / 'X_train.pkl')
        X_test = joblib.load(DATA_PATH / 'X_test.pkl') 
        y_train = joblib.load(DATA_PATH / 'y_train.pkl')
        y_test = joblib.load(DATA_PATH / 'y_test.pkl')
        
        print(f"Datos procesados cargados:")
        print(f"  Train: {X_train.shape} features, {len(y_train)} samples")
        print(f"  Test: {X_test.shape} features, {len(y_test)} samples")
        
        return X_train, X_test, y_train, y_test
        
    except FileNotFoundError:
        print("Datos procesados no encontrados - cargando dataset original")
        
        # Cargo dataset original como fallback
        if IN_COLAB:
            dataset_path = BASE_PATH / 'Truth_Seeker_Model_Dataset.csv'
        else:
            dataset_path = BASE_PATH / 'dataset1' / 'Truth_Seeker_Model_Dataset.csv'
        
        if dataset_path.exists():
            df = pd.read_csv(dataset_path)
            print(f"Dataset original cargado: {df.shape}")
            
            # Preproceso básico para el ensemble
            from sklearn.model_selection import train_test_split
            from sklearn.preprocessing import LabelEncoder, StandardScaler
            
            # Selecciono features numéricas relevantes
            numeric_features = ['Age', 'Hours_Spent', 'Followers_Count', 'Posts_Per_Day']
            available_features = [f for f in numeric_features if f in df.columns]
            
            if len(available_features) == 0:
                print("Creando features sintéticas para demo")
                # Creo features sintéticas si no hay numéricas disponibles
                df['feature_1'] = np.random.randn(len(df))
                df['feature_2'] = np.random.randn(len(df))
                df['feature_3'] = np.random.randn(len(df))
                available_features = ['feature_1', 'feature_2', 'feature_3']
            
            X = df[available_features].fillna(0)
            
            # Preparo etiquetas
            if 'Believed_Misinformation' in df.columns:
                # Mapeo Believed_Misinformation a binario
                label_map = {'No': 0, 'Yes': 1, 'Maybe': 0}
                y = df['Believed_Misinformation'].map(label_map).fillna(0)
            else:
                # Etiquetas sintéticas
                y = np.random.choice([0, 1], len(df), p=[0.4, 0.6])
            
            # División train/test
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=0.2, random_state=42, stratify=y
            )
            
            # Escalado
            scaler = StandardScaler()
            X_train = scaler.fit_transform(X_train)
            X_test = scaler.transform(X_test)
            
            # Guardo para futuro uso
            joblib.dump(X_train, DATA_PATH / 'X_train.pkl')
            joblib.dump(X_test, DATA_PATH / 'X_test.pkl')
            joblib.dump(y_train, DATA_PATH / 'y_train.pkl')
            joblib.dump(y_test, DATA_PATH / 'y_test.pkl')
            joblib.dump(scaler, DATA_PATH / 'scaler.pkl')
            
            print(f"Datos procesados y guardados:")
            print(f"  Features: {available_features}")
            print(f"  Train: {X_train.shape}")
            print(f"  Test: {X_test.shape}")
            
            return X_train, X_test, y_train, y_test
            
        else:
            raise FileNotFoundError(f"No se encontró dataset en {dataset_path}")

# Cargo los datos
X_train, X_test, y_train, y_test = load_processed_data()

print(f"\nDistribución de etiquetas:")
print(f"Train: {np.bincount(y_train)} (proportions: {np.bincount(y_train) / len(y_train)})")
print(f"Test: {np.bincount(y_test)} (proportions: {np.bincount(y_test) / len(y_test)})")

Datos procesados no encontrados - cargando dataset original
Dataset original cargado: (134198, 9)
Creando features sintéticas para demo
Datos procesados y guardados:
  Features: ['feature_1', 'feature_2', 'feature_3']
  Train: (107358, 3)
  Test: (26840, 3)

Distribución de etiquetas:
Train: [43187 64171] (proportions: [0.40227091 0.59772909])
Test: [10797 16043] (proportions: [0.40227273 0.59772727])


In [4]:
# Cargador inteligente de modelos
class ModelLoader:
    """Clase para cargar modelos de diferentes tipos de manera robusta"""
    
    def __init__(self, models_path):
        self.models_path = Path(models_path)
        self.loaded_models = {}
        self.model_performance = {}
        self.model_types = {
            'traditional': [],
            'boosting': [],
            'neural': [],
            'nlp': [],
            'bert': []
        }
    
    def load_traditional_models(self):
        """Carga modelos tradicionales (SVM, RF, etc.)"""
        traditional_patterns = ['*.pkl', '*traditional*.joblib', '*svm*.pkl', '*rf*.pkl']
        
        for pattern in traditional_patterns:
            for model_file in self.models_path.glob(pattern):
                if 'traditional' in model_file.name.lower():
                    try:
                        model = joblib.load(model_file)
                        model_name = model_file.stem
                        self.loaded_models[model_name] = model
                        self.model_types['traditional'].append(model_name)
                        print(f"Modelo tradicional cargado: {model_name}")
                    except Exception as e:
                        print(f"Error cargando {model_file.name}: {e}")
    
    def load_boosting_models(self):
        """Carga modelos de boosting"""
        boosting_patterns = ['*xgb*.pkl', '*lgb*.pkl', '*catboost*.pkl', '*boosting*.pkl']
        
        for pattern in boosting_patterns:
            for model_file in self.models_path.glob(pattern):
                try:
                    model = joblib.load(model_file)
                    model_name = model_file.stem
                    self.loaded_models[model_name] = model
                    self.model_types['boosting'].append(model_name)
                    print(f"Modelo boosting cargado: {model_name}")
                except Exception as e:
                    print(f"Error cargando {model_file.name}: {e}")
    
    def load_neural_models(self):
        """Carga redes neuronales (PyTorch, TensorFlow)"""
        neural_patterns = ['*.pth', '*.pt', '*neural*.pkl', '*nn*.pkl']
        
        for pattern in neural_patterns:
            for model_file in self.models_path.glob(pattern):
                try:
                    if model_file.suffix in ['.pth', '.pt']:
                        # Modelo PyTorch - necesitaría la arquitectura
                        print(f"Modelo PyTorch encontrado pero requiere arquitectura: {model_file.name}")
                        continue
                    else:
                        model = joblib.load(model_file)
                        model_name = model_file.stem
                        self.loaded_models[model_name] = model
                        self.model_types['neural'].append(model_name)
                        print(f"Modelo neural cargado: {model_name}")
                except Exception as e:
                    print(f"Error cargando {model_file.name}: {e}")
    
    def load_bert_models(self):
        """Carga modelos BERT"""
        if not BERT_AVAILABLE:
            print("BERT no disponible - saltando carga de modelos BERT")
            return
        
        bert_path = self.models_path / 'bert_models'
        if bert_path.exists():
            for bert_dir in bert_path.iterdir():
                if bert_dir.is_dir() and (bert_dir / 'config.json').exists():
                    try:
                        tokenizer = AutoTokenizer.from_pretrained(bert_dir)
                        model = AutoModelForSequenceClassification.from_pretrained(bert_dir)
                        
                        model_name = f"bert_{bert_dir.name}"
                        self.loaded_models[model_name] = {'model': model, 'tokenizer': tokenizer}
                        self.model_types['bert'].append(model_name)
                        print(f"Modelo BERT cargado: {model_name}")
                    except Exception as e:
                        print(f"Error cargando BERT {bert_dir.name}: {e}")
    
    def load_nlp_models(self):
        """Carga modelos NLP (TF-IDF, etc.)"""
        nlp_path = self.models_path / 'nlp_models'
        if nlp_path.exists():
            for nlp_file in nlp_path.glob('*.pkl'):
                try:
                    model = joblib.load(nlp_file)
                    model_name = f"nlp_{nlp_file.stem}"
                    self.loaded_models[model_name] = model
                    self.model_types['nlp'].append(model_name)
                    print(f"Modelo NLP cargado: {model_name}")
                except Exception as e:
                    print(f"Error cargando NLP {nlp_file.name}: {e}")
    
    def load_all_models(self):
        """Carga todos los tipos de modelos"""
        print("Iniciando carga de modelos...")
        
        if ENSEMBLE_CONFIG['use_traditional']:
            self.load_traditional_models()
        
        if ENSEMBLE_CONFIG['use_boosting']:
            self.load_boosting_models()
        
        if ENSEMBLE_CONFIG['use_neural_networks']:
            self.load_neural_models()
        
        if ENSEMBLE_CONFIG['use_nlp']:
            self.load_nlp_models()
        
        if ENSEMBLE_CONFIG['use_bert']:
            self.load_bert_models()
        
        print(f"\nResumen de modelos cargados:")
        total_models = 0
        for model_type, models in self.model_types.items():
            count = len(models)
            total_models += count
            if count > 0:
                print(f"  {model_type.capitalize()}: {count} modelos")
        
        print(f"Total: {total_models} modelos cargados")
        return total_models

# Cargo todos los modelos
model_loader = ModelLoader(MODELS_PATH)
total_loaded = model_loader.load_all_models()

if total_loaded == 0:
    print("\nNo se cargaron modelos - creando modelos demo para ensemble")
    
    # Creo modelos simples para demo
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.svm import SVC
    from sklearn.linear_model import LogisticRegression
    
    demo_models = {
        'rf_demo': RandomForestClassifier(n_estimators=50, random_state=42),
        'svm_demo': SVC(probability=True, random_state=42),
        'lr_demo': LogisticRegression(random_state=42)
    }
    
    for name, model in demo_models.items():
        model.fit(X_train, y_train)
        model_loader.loaded_models[name] = model
        model_loader.model_types['traditional'].append(name)
    
    print(f"{len(demo_models)} modelos demo creados y entrenados")
    total_loaded = len(demo_models)

print(f"\nSistema listo con {total_loaded} modelos para ensemble")

Iniciando carga de modelos...
Modelo NLP cargado: nlp_logistic_regression_tfidf
Modelo NLP cargado: nlp_svm_linear_tfidf
Modelo NLP cargado: nlp_tfidf_vectorizer

Resumen de modelos cargados:
  Nlp: 3 modelos
Total: 3 modelos cargados

Sistema listo con 3 modelos para ensemble


## Evaluación Individual de Modelos Base

Evalúo cada modelo individualmente para determinar pesos y seleccionar los mejores para el ensemble.

In [5]:
# Evaluador de modelos individuales
class ModelEvaluator:
    """Evalúa modelos individuales y calcula métricas de rendimiento"""
    
    def __init__(self, X_train, X_test, y_train, y_test):
        self.X_train = X_train
        self.X_test = X_test
        self.y_train = y_train
        self.y_test = y_test
        self.results = {}
        self.predictions = {}
        self.probabilities = {}
    
    def evaluate_single_model(self, name, model, model_type='traditional'):
        """Evalúa un modelo individual"""
        
        try:
            start_time = time.time()
            
            if model_type == 'bert':
                # Manejo especial para BERT - requeriría texto
                print(f"Evaluación BERT requiere datos de texto - saltando {name}")
                return None
            
            # Predicciones
            y_pred = model.predict(self.X_test)
            
            # Probabilidades
            if hasattr(model, 'predict_proba'):
                y_prob = model.predict_proba(self.X_test)[:, 1]
            elif hasattr(model, 'decision_function'):
                from sklearn.preprocessing import MinMaxScaler
                decision_scores = model.decision_function(self.X_test)
                scaler = MinMaxScaler()
                y_prob = scaler.fit_transform(decision_scores.reshape(-1, 1)).flatten()
            else:
                y_prob = y_pred.astype(float)
            
            # Calculo métricas
            accuracy = accuracy_score(self.y_test, y_pred)
            precision, recall, f1, _ = precision_recall_fscore_support(
                self.y_test, y_pred, average='binary', zero_division=0
            )
            
            try:
                roc_auc = roc_auc_score(self.y_test, y_prob)
                pr_auc = average_precision_score(self.y_test, y_prob)
            except:
                roc_auc = pr_auc = 0.0
            
            mcc = matthews_corrcoef(self.y_test, y_pred)
            kappa = cohen_kappa_score(self.y_test, y_pred)
            
            # Cross-validation en train set
            try:
                cv_scores = cross_val_score(
                    model, self.X_train, self.y_train, 
                    cv=3, scoring='f1', n_jobs=-1
                )
                cv_mean = cv_scores.mean()
                cv_std = cv_scores.std()
            except:
                cv_mean = cv_std = 0.0
            
            inference_time = time.time() - start_time
            
            results = {
                'model_name': name,
                'model_type': model_type,
                'accuracy': accuracy,
                'precision': precision,
                'recall': recall,
                'f1_score': f1,
                'roc_auc': roc_auc,
                'pr_auc': pr_auc,
                'matthews_corr': mcc,
                'cohen_kappa': kappa,
                'cv_f1_mean': cv_mean,
                'cv_f1_std': cv_std,
                'inference_time': inference_time,
                'predictions': y_pred,
                'probabilities': y_prob
            }
            
            # Guardo resultados
            self.results[name] = results
            self.predictions[name] = y_pred
            self.probabilities[name] = y_prob
            
            print(f"{name}: F1={f1:.3f}, ROC-AUC={roc_auc:.3f}, ACC={accuracy:.3f}")
            return results
            
        except Exception as e:
            print(f"Error evaluando {name}: {e}")
            return None
    
    def evaluate_all_models(self, model_loader):
        """Evalúa todos los modelos cargados"""
        print("Evaluando modelos individuales...")
        
        successful_evaluations = 0
        
        for name, model in model_loader.loaded_models.items():
            # Determino tipo de modelo
            model_type = 'traditional'
            for mtype, models in model_loader.model_types.items():
                if name in models:
                    model_type = mtype
                    break
            
            result = self.evaluate_single_model(name, model, model_type)
            if result is not None:
                successful_evaluations += 1
        
        print(f"\nEvaluación completada: {successful_evaluations} modelos evaluados")
        return successful_evaluations
    
    def get_results_dataframe(self):
        """Convierte resultados a DataFrame para análisis"""
        if not self.results:
            return None
        
        df_data = []
        for name, result in self.results.items():
            row = {k: v for k, v in result.items() 
                   if k not in ['predictions', 'probabilities']}
            df_data.append(row)
        
        df = pd.DataFrame(df_data)
        df = df.sort_values('f1_score', ascending=False)
        return df

# Evalúo todos los modelos
evaluator = ModelEvaluator(X_train, X_test, y_train, y_test)
num_evaluated = evaluator.evaluate_all_models(model_loader)

# Obtengo resultados
results_df = evaluator.get_results_dataframe()

if results_df is not None:
    print(f"\nRESUMEN DE RENDIMIENTO INDIVIDUAL:")
    print(results_df[['model_name', 'model_type', 'f1_score', 'roc_auc', 'accuracy']].round(3))
    
    # Filtro modelos de alto rendimiento
    good_models = results_df[results_df['f1_score'] >= ENSEMBLE_CONFIG['min_model_performance']]
    print(f"\nModelos que califican para ensemble (F1 >= {ENSEMBLE_CONFIG['min_model_performance']}): {len(good_models)}")
    
    if len(good_models) > 0:
        print(good_models[['model_name', 'f1_score', 'roc_auc']].round(3).to_string(index=False))
    else:
        print("Ningún modelo alcanza el rendimiento mínimo - usando todos los disponibles")
        good_models = results_df
else:
    print("No se pudieron evaluar modelos")
    good_models = None

Evaluando modelos individuales...
Error evaluando nlp_logistic_regression_tfidf: X has 3 features, but LogisticRegression is expecting 10000 features as input.
Error evaluando nlp_svm_linear_tfidf: X has 3 features, but LinearSVC is expecting 10000 features as input.
Error evaluando nlp_tfidf_vectorizer: 'TfidfVectorizer' object has no attribute 'predict'

Evaluación completada: 0 modelos evaluados
No se pudieron evaluar modelos


## Visualizaciones de Rendimiento Individual

Creo visualizaciones interactivas detalladas del rendimiento de cada modelo usando Plotly.

In [6]:
# Visualizador de rendimiento con Plotly avanzado
class PerformanceVisualizer:
    """Crea visualizaciones avanzadas de rendimiento con Plotly"""
    
    def __init__(self, results_df, evaluator, viz_config):
        self.results_df = results_df
        self.evaluator = evaluator
        self.config = viz_config
        self.colors = px.colors.qualitative.Set3
        
    def create_performance_dashboard(self):
        """Crea dashboard completo de rendimiento"""
        
        if self.results_df is None or len(self.results_df) == 0:
            print("No hay datos para visualizar")
            return None
        
        # Dashboard con múltiples subplots
        fig = make_subplots(
            rows=3, cols=2,
            subplot_titles=[
                'F1-Score por Modelo', 'ROC-AUC vs Precisión',
                'Métricas Múltiples', 'Tiempo de Inferencia',
                'Validación Cruzada', 'Matriz de Correlación de Métricas'
            ],
            specs=[
                [{"type": "bar"}, {"type": "scatter"}],
                [{"type": "bar"}, {"type": "bar"}],
                [{"type": "bar"}, {"type": "heatmap"}]
            ],
            vertical_spacing=0.08,
            horizontal_spacing=0.1
        )
        
        # 1. F1-Score por modelo
        fig.add_trace(
            go.Bar(
                x=self.results_df['model_name'],
                y=self.results_df['f1_score'],
                name='F1-Score',
                text=[f'{f:.3f}' for f in self.results_df['f1_score']],
                textposition='outside',
                marker_color=self.colors[0],
                hovertemplate='<b>%{x}</b><br>F1-Score: %{y:.3f}<extra></extra>'
            ),
            row=1, col=1
        )
        
        # 2. ROC-AUC vs Precisión
        fig.add_trace(
            go.Scatter(
                x=self.results_df['roc_auc'],
                y=self.results_df['precision'],
                mode='markers+text',
                text=self.results_df['model_name'],
                textposition='top center',
                name='ROC-AUC vs Precisión',
                marker=dict(
                    size=self.results_df['f1_score'] * 20,
                    color=self.results_df['f1_score'],
                    colorscale='Viridis',
                    showscale=True,
                    colorbar=dict(title="F1-Score")
                ),
                hovertemplate='<b>%{text}</b><br>ROC-AUC: %{x:.3f}<br>Precisión: %{y:.3f}<extra></extra>'
            ),
            row=1, col=2
        )
        
        # 3. Métricas múltiples (Radar-style en barras)
        metrics = ['f1_score', 'roc_auc', 'precision', 'recall']
        best_model_idx = self.results_df['f1_score'].idxmax()
        best_model = self.results_df.loc[best_model_idx]
        
        fig.add_trace(
            go.Bar(
                x=metrics,
                y=[best_model[m] for m in metrics],
                name=f'Mejor Modelo: {best_model["model_name"]}',
                text=[f'{best_model[m]:.3f}' for m in metrics],
                textposition='outside',
                marker_color=self.colors[1]
            ),
            row=2, col=1
        )
        
        # 4. Tiempo de inferencia
        fig.add_trace(
            go.Bar(
                x=self.results_df['model_name'],
                y=self.results_df['inference_time'],
                name='Tiempo Inferencia',
                text=[f'{t:.3f}s' for t in self.results_df['inference_time']],
                textposition='outside',
                marker_color=self.colors[2]
            ),
            row=2, col=2
        )
        
        # 5. Validación cruzada
        fig.add_trace(
            go.Bar(
                x=self.results_df['model_name'],
                y=self.results_df['cv_f1_mean'],
                error_y=dict(type='data', array=self.results_df['cv_f1_std']),
                name='CV F1-Score',
                text=[f'{cv:.3f}±{std:.3f}' for cv, std in 
                     zip(self.results_df['cv_f1_mean'], self.results_df['cv_f1_std'])],
                textposition='outside',
                marker_color=self.colors[3]
            ),
            row=3, col=1
        )
        
        # 6. Matriz de correlación de métricas
        metrics_for_corr = ['accuracy', 'precision', 'recall', 'f1_score', 'roc_auc']
        corr_matrix = self.results_df[metrics_for_corr].corr()
        
        fig.add_trace(
            go.Heatmap(
                z=corr_matrix.values,
                x=corr_matrix.columns,
                y=corr_matrix.columns,
                colorscale='RdBu',
                zmid=0,
                text=corr_matrix.round(2).values,
                texttemplate='%{text}',
                textfont={"size": 10},
                name='Correlación Métricas'
            ),
            row=3, col=2
        )
        
        # Configuración del layout
        fig.update_layout(
            title={
                'text': 'Dashboard de Rendimiento - Modelos Individuales TruthSeeker',
                'x': 0.5,
                'font': {'size': 20}
            },
            height=1200,
            showlegend=False,
            template=self.config['template']
        )
        
        # Actualizo ejes
        fig.update_xaxes(tickangle=45, row=1, col=1)
        fig.update_xaxes(tickangle=45, row=2, col=2)
        fig.update_xaxes(tickangle=45, row=3, col=1)
        
        return fig
    
    def create_roc_curves_comparison(self):
        """Crea comparación de curvas ROC"""
        
        fig = go.Figure()
        
        # Línea diagonal de referencia
        fig.add_trace(
            go.Scatter(
                x=[0, 1], y=[0, 1],
                mode='lines',
                name='Aleatorio (AUC = 0.5)',
                line=dict(dash='dash', color='gray')
            )
        )
        
        # ROC para cada modelo
        colors = px.colors.qualitative.Set1
        for i, (name, result) in enumerate(self.evaluator.results.items()):
            if 'probabilities' in result:
                try:
                    fpr, tpr, _ = roc_curve(self.evaluator.y_test, result['probabilities'])
                    auc = result['roc_auc']
                    
                    fig.add_trace(
                        go.Scatter(
                            x=fpr, y=tpr,
                            mode='lines',
                            name=f'{name} (AUC = {auc:.3f})',
                            line=dict(color=colors[i % len(colors)]),
                            hovertemplate='<b>%{fullData.name}</b><br>FPR: %{x:.3f}<br>TPR: %{y:.3f}<extra></extra>'
                        )
                    )
                except:
                    continue
        
        fig.update_layout(
            title='Comparación de Curvas ROC - TruthSeeker Models',
            xaxis_title='Tasa de Falsos Positivos (FPR)',
            yaxis_title='Tasa de Verdaderos Positivos (TPR)',
            template=self.config['template'],
            height=600,
            legend=dict(x=0.6, y=0.1)
        )
        
        return fig
    
    def create_precision_recall_curves(self):
        """Crea curvas Precisión-Recall"""
        
        fig = go.Figure()
        
        # Línea de referencia (baseline)
        baseline = np.mean(self.evaluator.y_test)
        fig.add_trace(
            go.Scatter(
                x=[0, 1], y=[baseline, baseline],
                mode='lines',
                name=f'Baseline (AP = {baseline:.3f})',
                line=dict(dash='dash', color='gray')
            )
        )
        
        # PR curve para cada modelo
        colors = px.colors.qualitative.Set2
        for i, (name, result) in enumerate(self.evaluator.results.items()):
            if 'probabilities' in result:
                try:
                    precision, recall, _ = precision_recall_curve(
                        self.evaluator.y_test, result['probabilities']
                    )
                    ap = result['pr_auc']
                    
                    fig.add_trace(
                        go.Scatter(
                            x=recall, y=precision,
                            mode='lines',
                            name=f'{name} (AP = {ap:.3f})',
                            line=dict(color=colors[i % len(colors)])
                        )
                    )
                except:
                    continue
        
        fig.update_layout(
            title='Curvas Precisión-Recall - TruthSeeker Models',
            xaxis_title='Recall',
            yaxis_title='Precisión',
            template=self.config['template'],
            height=600
        )
        
        return fig

# Creo visualizaciones si tengo resultados
if results_df is not None and len(results_df) > 0:
    print("Creando visualizaciones de rendimiento...")
    
    visualizer = PerformanceVisualizer(results_df, evaluator, VIZ_CONFIG)
    
    # Dashboard principal
    dashboard = visualizer.create_performance_dashboard()
    if dashboard:
        dashboard.show()
        if VIZ_CONFIG['save_html']:
            dashboard.write_html(RESULTS_PATH / 'visualizations' / 'performance_dashboard.html')
    
    # Curvas ROC
    roc_fig = visualizer.create_roc_curves_comparison()
    if roc_fig:
        roc_fig.show()
        if VIZ_CONFIG['save_html']:
            roc_fig.write_html(RESULTS_PATH / 'visualizations' / 'roc_curves.html')
    
    # Curvas Precisión-Recall
    pr_fig = visualizer.create_precision_recall_curves()
    if pr_fig:
        pr_fig.show()
        if VIZ_CONFIG['save_html']:
            pr_fig.write_html(RESULTS_PATH / 'visualizations' / 'precision_recall_curves.html')
    
    print("Visualizaciones creadas y guardadas")
else:
    print("No hay datos para crear visualizaciones")

No hay datos para crear visualizaciones


## Construcción del Ensemble TruthSeeker

Implemento el sistema ensemble híbrido que combina múltiples estrategias de voting y stacking.

In [7]:
# Constructor del Ensemble TruthSeeker
class TruthSeekerEnsemble:
    """Sistema ensemble híbrido para detección de desinformación"""
    
    def __init__(self, config, evaluator, model_loader):
        self.config = config
        self.evaluator = evaluator
        self.model_loader = model_loader
        self.base_models = {}
        self.weights = {}
        self.ensemble_models = {}
        self.final_predictions = {}
        self.confidence_scores = {}
        
    def select_models_for_ensemble(self):
        """Selecciona los mejores modelos para el ensemble"""
        
        if not self.evaluator.results:
            print("No hay modelos evaluados para seleccionar")
            return []
        
        # Filtro por rendimiento mínimo
        good_models = []
        for name, result in self.evaluator.results.items():
            if result['f1_score'] >= self.config['min_model_performance']:
                good_models.append((name, result['f1_score']))
        
        if not good_models:
            print(f"No hay modelos que superen F1 >= {self.config['min_model_performance']}")
            # Uso los mejores disponibles
            good_models = [(name, result['f1_score']) 
                          for name, result in self.evaluator.results.items()]
        
        # Ordeno por F1-score
        good_models.sort(key=lambda x: x[1], reverse=True)
        
        selected_names = [name for name, _ in good_models]
        print(f"Modelos seleccionados para ensemble: {len(selected_names)}")
        for name, f1 in good_models:
            print(f"  {name}: F1={f1:.3f}")
        
        return selected_names
    
    def calculate_dynamic_weights(self, selected_models):
        """Calcula pesos dinámicos basados en rendimiento y diversidad"""
        
        weights = {}
        
        for name in selected_models:
            result = self.evaluator.results[name]
            
            # Peso base por F1-score
            f1_weight = result['f1_score']
            
            # Bonus por ROC-AUC
            auc_bonus = result['roc_auc'] * 0.2
            
            # Bonus por estabilidad (menor std en CV)
            stability_bonus = max(0, (0.1 - result['cv_f1_std'])) * 2
            
            # Bonus por diversidad (diferentes tipos de modelo)
            diversity_bonus = self.config['diversity_bonus']
            
            # Penalización por tiempo lento
            time_penalty = min(0.1, result['inference_time'] / 10)
            
            final_weight = f1_weight + auc_bonus + stability_bonus + diversity_bonus - time_penalty
            weights[name] = max(0.1, final_weight)  # Peso mínimo
        
        # Normalizo pesos
        total_weight = sum(weights.values())
        weights = {name: w/total_weight for name, w in weights.items()}
        
        print(f"\nPesos dinámicos calculados:")
        for name, weight in sorted(weights.items(), key=lambda x: x[1], reverse=True):
            print(f"  {name}: {weight:.3f}")
        
        return weights
    
    def create_voting_ensemble(self, selected_models, weights):
        """Crea ensemble con votación ponderada"""
        
        estimators = []
        voting_weights = []
        
        for name in selected_models:
            model = self.model_loader.loaded_models[name]
            # Solo incluyo modelos que no sean BERT (requieren manejo especial)
            if name not in self.model_loader.model_types.get('bert', []):
                estimators.append((name, model))
                voting_weights.append(weights[name])
        
        if len(estimators) == 0:
            print("No hay modelos compatibles para voting ensemble")
            return None
        
        # Normalizo pesos para los modelos seleccionados
        total_weight = sum(voting_weights)
        voting_weights = [w/total_weight for w in voting_weights]
        
        # Creo ensemble con votación suave
        voting_ensemble = VotingClassifier(
            estimators=estimators,
            voting='soft',
            weights=voting_weights,
            n_jobs=-1
        )
        
        print(f"Voting ensemble creado con {len(estimators)} modelos")
        return voting_ensemble
    
    def create_stacking_ensemble(self, selected_models):
        """Crea ensemble con stacking"""
        
        estimators = []
        
        for name in selected_models:
            model = self.model_loader.loaded_models[name]
            if name not in self.model_loader.model_types.get('bert', []):
                estimators.append((name, model))
        
        if len(estimators) < 2:
            print("Se necesitan al menos 2 modelos para stacking")
            return None
        
        # Meta-learner según configuración
        if self.config['meta_model'] == 'xgboost':
            try:
                from xgboost import XGBClassifier
                meta_learner = XGBClassifier(
                    n_estimators=100,
                    max_depth=3,
                    learning_rate=0.1,
                    random_state=42
                )
            except ImportError:
                from sklearn.ensemble import RandomForestClassifier
                meta_learner = RandomForestClassifier(n_estimators=100, random_state=42)
        else:
            from sklearn.linear_model import LogisticRegression
            meta_learner = LogisticRegression(random_state=42)
        
        stacking_ensemble = StackingClassifier(
            estimators=estimators,
            final_estimator=meta_learner,
            cv=3,
            n_jobs=-1
        )
        
        print(f"Stacking ensemble creado con {len(estimators)} modelos")
        return stacking_ensemble
    
    def train_ensembles(self, X_train, y_train):
        """Entrena todos los ensembles"""
        
        print("\nEntrenando ensembles TruthSeeker...")
        
        # Selecciono modelos
        selected_models = self.select_models_for_ensemble()
        if not selected_models:
            print("No hay modelos para crear ensemble")
            return False
        
        # Calculo pesos
        self.weights = self.calculate_dynamic_weights(selected_models)
        
        # Voting ensemble
        voting_ensemble = self.create_voting_ensemble(selected_models, self.weights)
        if voting_ensemble:
            print("Entrenando voting ensemble...")
            voting_ensemble.fit(X_train, y_train)
            self.ensemble_models['voting'] = voting_ensemble
        
        # Stacking ensemble
        stacking_ensemble = self.create_stacking_ensemble(selected_models)
        if stacking_ensemble:
            print("Entrenando stacking ensemble...")
            stacking_ensemble.fit(X_train, y_train)
            self.ensemble_models['stacking'] = stacking_ensemble
        
        print(f"{len(self.ensemble_models)} ensembles entrenados")
        return len(self.ensemble_models) > 0
    
    def predict_with_confidence(self, X_test):
        """Genera predicciones con medidas de confianza"""
        
        predictions = {}
        confidences = {}
        
        for name, ensemble in self.ensemble_models.items():
            # Predicciones
            y_pred = ensemble.predict(X_test)
            y_proba = ensemble.predict_proba(X_test)[:, 1]
            
            # Medida de confianza: distancia del threshold 0.5
            confidence = np.abs(y_proba - 0.5) * 2  # Normalizado 0-1
            
            predictions[name] = {
                'predictions': y_pred,
                'probabilities': y_proba,
                'confidence': confidence
            }
            
            avg_confidence = confidence.mean()
            confidences[name] = avg_confidence
            
            print(f"{name} ensemble - Confianza promedio: {avg_confidence:.3f}")
        
        return predictions, confidences

# Construyo y entreno el ensemble TruthSeeker
if results_df is not None and len(results_df) > 0:
    print("Construyendo TruthSeeker Ensemble...")
    
    truthseeker = TruthSeekerEnsemble(ENSEMBLE_CONFIG, evaluator, model_loader)
    
    # Entreno ensembles
    success = truthseeker.train_ensembles(X_train, y_train)
    
    if success:
        print("\nTruthSeeker Ensemble entrenado exitosamente")
        
        # Genero predicciones con confianza
        ensemble_predictions, ensemble_confidences = truthseeker.predict_with_confidence(X_test)
        
        print("\nPredicciones con confianza generadas")
    else:
        print("No se pudo entrenar TruthSeeker Ensemble")
        truthseeker = None
else:
    print("No hay modelos base suficientes para crear ensemble")
    truthseeker = None

No hay modelos base suficientes para crear ensemble


## Evaluación del Ensemble TruthSeeker

Evalúo el rendimiento del ensemble y comparo con modelos individuales.

In [8]:
# Evaluador del ensemble
class EnsembleEvaluator:
    """Evalúa el rendimiento del ensemble TruthSeeker"""
    
    def __init__(self, truthseeker, y_test):
        self.truthseeker = truthseeker
        self.y_test = y_test
        self.ensemble_results = {}
        
    def evaluate_ensemble_performance(self, ensemble_predictions):
        """Evalúa el rendimiento de los ensembles"""
        
        results = {}
        
        for ensemble_name, preds in ensemble_predictions.items():
            y_pred = preds['predictions']
            y_proba = preds['probabilities']
            confidence = preds['confidence']
            
            # Métricas básicas
            accuracy = accuracy_score(self.y_test, y_pred)
            precision, recall, f1, _ = precision_recall_fscore_support(
                self.y_test, y_pred, average='binary', zero_division=0
            )
            
            # Métricas avanzadas
            try:
                roc_auc = roc_auc_score(self.y_test, y_proba)
                pr_auc = average_precision_score(self.y_test, y_proba)
            except:
                roc_auc = pr_auc = 0.0
            
            mcc = matthews_corrcoef(self.y_test, y_pred)
            kappa = cohen_kappa_score(self.y_test, y_pred)
            
            # Métricas de confianza
            avg_confidence = confidence.mean()
            high_confidence_mask = confidence >= 0.7
            high_conf_accuracy = accuracy_score(
                self.y_test[high_confidence_mask], 
                y_pred[high_confidence_mask]
            ) if np.sum(high_confidence_mask) > 0 else 0.0
            
            results[ensemble_name] = {
                'accuracy': accuracy,
                'precision': precision,
                'recall': recall,
                'f1_score': f1,
                'roc_auc': roc_auc,
                'pr_auc': pr_auc,
                'matthews_corr': mcc,
                'cohen_kappa': kappa,
                'avg_confidence': avg_confidence,
                'high_conf_samples': np.sum(high_confidence_mask),
                'high_conf_accuracy': high_conf_accuracy,
                'predictions': y_pred,
                'probabilities': y_proba,
                'confidence_scores': confidence
            }
            
            print(f"\n{ensemble_name.upper()} ENSEMBLE:")
            print(f"  Accuracy: {accuracy:.4f}")
            print(f"  F1-Score: {f1:.4f}")
            print(f"  ROC-AUC: {roc_auc:.4f}")
            print(f"  Confianza promedio: {avg_confidence:.3f}")
            print(f"  Muestras alta confianza: {np.sum(high_confidence_mask)} ({np.sum(high_confidence_mask)/len(y_pred)*100:.1f}%)")
            print(f"  Accuracy alta confianza: {high_conf_accuracy:.4f}")
        
        return results
    
    def compare_with_base_models(self, ensemble_results, base_results):
        """Compara ensemble con modelos base"""
        
        print("\nCOMPARACIÓN ENSEMBLE vs MODELOS BASE:")
        print("=" * 60)
        
        # Mejor modelo base
        best_base_f1 = max([r['f1_score'] for r in base_results.values()])
        best_base_name = [name for name, r in base_results.items() 
                         if r['f1_score'] == best_base_f1][0]
        
        print(f"Mejor modelo base: {best_base_name} (F1: {best_base_f1:.4f})")
        
        # Mejor ensemble
        best_ensemble_f1 = max([r['f1_score'] for r in ensemble_results.values()])
        best_ensemble_name = [name for name, r in ensemble_results.items() 
                             if r['f1_score'] == best_ensemble_f1][0]
        
        print(f"Mejor ensemble: {best_ensemble_name} (F1: {best_ensemble_f1:.4f})")
        
        # Mejora
        improvement = best_ensemble_f1 - best_base_f1
        improvement_pct = (improvement / best_base_f1) * 100
        
        print(f"\nMejora del ensemble:")
        print(f"  F1-Score: +{improvement:.4f} ({improvement_pct:+.1f}%)")
        
        if improvement > 0:
            print(f"El ensemble supera a los modelos individuales")
        else:
            print(f"El ensemble no supera significativamente a los modelos base")
        
        return {
            'best_base': (best_base_name, best_base_f1),
            'best_ensemble': (best_ensemble_name, best_ensemble_f1),
            'improvement': improvement,
            'improvement_pct': improvement_pct
        }

# Evalúo el ensemble si está disponible
if truthseeker and 'ensemble_predictions' in locals():
    print("Evaluando TruthSeeker Ensemble...")
    
    ensemble_evaluator = EnsembleEvaluator(truthseeker, y_test)
    ensemble_results = ensemble_evaluator.evaluate_ensemble_performance(ensemble_predictions)
    
    # Comparación con modelos base
    comparison = ensemble_evaluator.compare_with_base_models(
        ensemble_results, 
        evaluator.results
    )
    
    print("\nEvaluación del ensemble completada")
    
else:
    print("No hay ensemble para evaluar")
    ensemble_results = {}
    comparison = None

No hay ensemble para evaluar


## Visualizaciones Avanzadas del Ensemble

Creo visualizaciones interactivas completas del rendimiento y comportamiento del ensemble TruthSeeker.

In [9]:
# Visualizador avanzado del ensemble
class EnsembleVisualizer:
    """Crea visualizaciones avanzadas del ensemble TruthSeeker"""
    
    def __init__(self, ensemble_results, base_results, comparison, viz_config):
        self.ensemble_results = ensemble_results
        self.base_results = base_results
        self.comparison = comparison
        self.config = viz_config
        self.colors = px.colors.qualitative.Set3
    
    def create_ensemble_dashboard(self):
        """Dashboard completo del ensemble"""
        
        if not self.ensemble_results:
            print("No hay resultados de ensemble para visualizar")
            return None
        
        fig = make_subplots(
            rows=3, cols=2,
            subplot_titles=[
                'Comparación Ensemble vs Base Models',
                'Distribución de Confianza',
                'Matriz de Confusión - Mejor Ensemble',
                'ROC Curves: Ensemble vs Mejores Base',
                'Métricas Múltiples Comparación',
                'Análisis de Confianza vs Precisión'
            ],
            specs=[
                [{"type": "bar"}, {"type": "histogram"}],
                [{"type": "heatmap"}, {"type": "scatter"}],
                [{"type": "bar"}, {"type": "scatter"}]
            ],
            vertical_spacing=0.08
        )
        
        # 1. Comparación F1-Score
        all_models = list(self.base_results.keys()) + list(self.ensemble_results.keys())
        all_f1_scores = ([self.base_results[m]['f1_score'] for m in self.base_results.keys()] + 
                        [self.ensemble_results[m]['f1_score'] for m in self.ensemble_results.keys()])
        
        model_types = (['Base'] * len(self.base_results) + 
                      ['Ensemble'] * len(self.ensemble_results))
        
        colors_map = {'Base': self.colors[0], 'Ensemble': self.colors[1]}
        bar_colors = [colors_map[t] for t in model_types]
        
        fig.add_trace(
            go.Bar(
                x=all_models,
                y=all_f1_scores,
                name='F1-Score Comparación',
                text=[f'{f:.3f}' for f in all_f1_scores],
                textposition='outside',
                marker_color=bar_colors
            ),
            row=1, col=1
        )
        
        # 2. Distribución de confianza
        if self.ensemble_results:
            best_ensemble = max(self.ensemble_results.keys(), 
                              key=lambda k: self.ensemble_results[k]['f1_score'])
            confidence_scores = self.ensemble_results[best_ensemble]['confidence_scores']
            
            fig.add_trace(
                go.Histogram(
                    x=confidence_scores,
                    nbinsx=30,
                    name='Distribución Confianza',
                    marker_color=self.colors[2],
                    opacity=0.7
                ),
                row=1, col=2
            )
            
            # 3. Matriz de confusión del mejor ensemble
            best_preds = self.ensemble_results[best_ensemble]['predictions']
            # Necesitaríamos y_test aquí - asumo que está disponible
            if 'y_test' in globals():
                cm = confusion_matrix(y_test, best_preds)
                
                fig.add_trace(
                    go.Heatmap(
                        z=cm,
                        x=['Pred: Real', 'Pred: Fake'],
                        y=['True: Real', 'True: Fake'],
                        colorscale='Blues',
                        text=cm,
                        texttemplate='%{text}',
                        textfont={"size": 14},
                        name='Confusion Matrix'
                    ),
                    row=2, col=1
                )
        
        # 4. ROC Curves (requeriría datos adicionales)
        # Placeholder por ahora
        fig.add_trace(
            go.Scatter(
                x=[0, 1], y=[0, 1],
                mode='lines',
                name='ROC Placeholder',
                line=dict(dash='dash')
            ),
            row=2, col=2
        )
        
        # 5. Métricas múltiples
        if self.ensemble_results and self.comparison:
            best_base_name = self.comparison['best_base'][0]
            best_ensemble_name = self.comparison['best_ensemble'][0]
            
            metrics = ['accuracy', 'precision', 'recall', 'f1_score', 'roc_auc']
            base_values = [self.base_results[best_base_name][m] for m in metrics]
            ensemble_values = [self.ensemble_results[best_ensemble_name][m] for m in metrics]
            
            fig.add_trace(
                go.Bar(
                    x=metrics,
                    y=base_values,
                    name=f'Mejor Base ({best_base_name})',
                    marker_color=self.colors[3],
                    opacity=0.7
                ),
                row=3, col=1
            )
            
            fig.add_trace(
                go.Bar(
                    x=metrics,
                    y=ensemble_values,
                    name=f'Mejor Ensemble ({best_ensemble_name})',
                    marker_color=self.colors[4],
                    opacity=0.7
                ),
                row=3, col=1
            )
        
        # 6. Confianza vs Precisión
        if self.ensemble_results:
            # Análisis de bins de confianza
            confidence_bins = np.linspace(0, 1, 11)
            bin_centers = []
            bin_accuracy = []
            
            for i in range(len(confidence_bins)-1):
                mask = ((confidence_scores >= confidence_bins[i]) & 
                       (confidence_scores < confidence_bins[i+1]))
                if np.sum(mask) > 0:
                    bin_acc = accuracy_score(y_test[mask], best_preds[mask])
                    bin_centers.append((confidence_bins[i] + confidence_bins[i+1]) / 2)
                    bin_accuracy.append(bin_acc)
            
            fig.add_trace(
                go.Scatter(
                    x=bin_centers,
                    y=bin_accuracy,
                    mode='lines+markers',
                    name='Confianza vs Precisión',
                    marker=dict(size=10, color=self.colors[5]),
                    line=dict(width=3)
                ),
                row=3, col=2
            )
            
            # Línea ideal (y=x)
            fig.add_trace(
                go.Scatter(
                    x=[0, 1], y=[0, 1],
                    mode='lines',
                    name='Calibración Perfecta',
                    line=dict(dash='dash', color='gray')
                ),
                row=3, col=2
            )
        
        # Configuración del layout
        fig.update_layout(
            title={
                'text': 'TruthSeeker Ensemble - Dashboard Completo de Rendimiento',
                'x': 0.5,
                'font': {'size': 22}
            },
            height=1400,
            template=self.config['template'],
            showlegend=True
        )
        
        # Actualizar ejes
        fig.update_xaxes(tickangle=45, row=1, col=1)
        fig.update_xaxes(title="Confianza", row=1, col=2)
        fig.update_yaxes(title="Frecuencia", row=1, col=2)
        fig.update_xaxes(title="Confianza", row=3, col=2)
        fig.update_yaxes(title="Precisión", row=3, col=2)
        
        return fig
    
    def create_confidence_analysis(self):
        """Análisis detallado de confianza"""
        
        if not self.ensemble_results:
            return None
        
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=[
                'Distribución de Confianza por Ensemble',
                'Accuracy vs Umbral de Confianza',
                'Cobertura vs Confianza',
                'Histograma de Probabilidades'
            ]
        )
        
        colors = px.colors.qualitative.Plotly
        
        for i, (name, results) in enumerate(self.ensemble_results.items()):
            confidence = results['confidence_scores']
            probabilities = results['probabilities']
            predictions = results['predictions']
            
            # 1. Distribución de confianza
            fig.add_trace(
                go.Histogram(
                    x=confidence,
                    name=f'{name} Confianza',
                    opacity=0.7,
                    nbinsx=30,
                    marker_color=colors[i]
                ),
                row=1, col=1
            )
            
            # 2. Accuracy vs umbral de confianza
            thresholds = np.linspace(0, 1, 21)
            accuracies = []
            coverages = []
            
            for thresh in thresholds:
                mask = confidence >= thresh
                if np.sum(mask) > 0:
                    acc = accuracy_score(y_test[mask], predictions[mask])
                    cov = np.sum(mask) / len(mask)
                else:
                    acc = cov = 0
                accuracies.append(acc)
                coverages.append(cov)
            
            fig.add_trace(
                go.Scatter(
                    x=thresholds,
                    y=accuracies,
                    name=f'{name} Accuracy',
                    line=dict(color=colors[i])
                ),
                row=1, col=2
            )
            
            # 3. Cobertura vs confianza
            fig.add_trace(
                go.Scatter(
                    x=thresholds,
                    y=coverages,
                    name=f'{name} Cobertura',
                    line=dict(color=colors[i], dash='dash')
                ),
                row=2, col=1
            )
            
            # 4. Histograma de probabilidades
            fig.add_trace(
                go.Histogram(
                    x=probabilities,
                    name=f'{name} Probabilidades',
                    opacity=0.7,
                    nbinsx=30,
                    marker_color=colors[i]
                ),
                row=2, col=2
            )
        
        fig.update_layout(
            title='TruthSeeker - Análisis Detallado de Confianza',
            height=800,
            template=self.config['template']
        )
        
        return fig

# Creo visualizaciones avanzadas si tengo resultados
if ensemble_results and results_df is not None:
    print("Creando visualizaciones avanzadas del ensemble...")
    
    ensemble_viz = EnsembleVisualizer(
        ensemble_results, 
        evaluator.results, 
        comparison, 
        VIZ_CONFIG
    )
    
    # Dashboard principal del ensemble
    ensemble_dashboard = ensemble_viz.create_ensemble_dashboard()
    if ensemble_dashboard:
        ensemble_dashboard.show()
        if VIZ_CONFIG['save_html']:
            ensemble_dashboard.write_html(
                RESULTS_PATH / 'visualizations' / 'ensemble_dashboard.html'
            )
    
    # Análisis de confianza
    confidence_analysis = ensemble_viz.create_confidence_analysis()
    if confidence_analysis:
        confidence_analysis.show()
        if VIZ_CONFIG['save_html']:
            confidence_analysis.write_html(
                RESULTS_PATH / 'visualizations' / 'confidence_analysis.html'
            )
    
    print("Visualizaciones avanzadas del ensemble creadas")
    
else:
    print("No hay suficientes datos para crear visualizaciones del ensemble")

No hay suficientes datos para crear visualizaciones del ensemble


## Resultados Finales y Guardado

Consolido todos los resultados, guardo los modelos y genero el reporte final del sistema TruthSeeker.

In [10]:
# Consolidador de resultados finales
def save_final_results():
    """Guarda todos los resultados y modelos del sistema TruthSeeker"""
    
    print("Guardando resultados finales del TruthSeeker...")
    
    # Resumen ejecutivo
    executive_summary = {
        'timestamp': datetime.now().isoformat(),
        'system_name': 'TruthSeeker Ensemble System',
        'version': '1.0',
        'total_base_models': len(evaluator.results) if 'evaluator' in locals() else 0,
        'total_ensembles': len(ensemble_results) if ensemble_results else 0,
        'dataset_info': {
            'train_samples': len(X_train),
            'test_samples': len(X_test),
            'features': X_train.shape[1] if len(X_train.shape) > 1 else 1,
            'class_distribution': np.bincount(y_test).tolist()
        }
    }
    
    # Resultados de modelos base
    if 'evaluator' in locals() and evaluator.results:
        base_models_summary = []
        for name, result in evaluator.results.items():
            base_models_summary.append({
                'model_name': name,
                'model_type': result.get('model_type', 'unknown'),
                'f1_score': result['f1_score'],
                'accuracy': result['accuracy'],
                'roc_auc': result['roc_auc'],
                'precision': result['precision'],
                'recall': result['recall'],
                'cv_f1_mean': result.get('cv_f1_mean', 0),
                'inference_time': result.get('inference_time', 0)
            })
        
        executive_summary['base_models'] = base_models_summary
        
        # Mejor modelo base
        best_base = max(base_models_summary, key=lambda x: x['f1_score'])
        executive_summary['best_base_model'] = best_base
    
    # Resultados de ensembles
    if ensemble_results:
        ensemble_summary = []
        for name, result in ensemble_results.items():
            ensemble_summary.append({
                'ensemble_name': name,
                'f1_score': result['f1_score'],
                'accuracy': result['accuracy'],
                'roc_auc': result['roc_auc'],
                'precision': result['precision'],
                'recall': result['recall'],
                'avg_confidence': result['avg_confidence'],
                'high_conf_samples': int(result['high_conf_samples']),
                'high_conf_accuracy': result['high_conf_accuracy']
            })
        
        executive_summary['ensembles'] = ensemble_summary
        
        # Mejor ensemble
        best_ensemble = max(ensemble_summary, key=lambda x: x['f1_score'])
        executive_summary['best_ensemble'] = best_ensemble
    
    # Comparación y mejoras
    if comparison:
        executive_summary['performance_comparison'] = {
            'best_base_f1': comparison['best_base'][1],
            'best_ensemble_f1': comparison['best_ensemble'][1],
            'improvement_absolute': comparison['improvement'],
            'improvement_percentage': comparison['improvement_pct'],
            'ensemble_superior': comparison['improvement'] > 0
        }
    
    # Configuración utilizada
    executive_summary['configuration'] = ENSEMBLE_CONFIG
    executive_summary['visualization_config'] = VIZ_CONFIG
    
    # Guardo resumen ejecutivo
    with open(RESULTS_PATH / 'truthseeker_executive_summary.json', 'w') as f:
        json.dump(executive_summary, f, indent=2, default=str)
    
    # Guardo modelos ensemble si existen
    if truthseeker and truthseeker.ensemble_models:
        for name, model in truthseeker.ensemble_models.items():
            model_path = RESULTS_PATH / 'models' / f'truthseeker_{name}_ensemble.pkl'
            joblib.dump(model, model_path)
            print(f"Modelo {name} ensemble guardado: {model_path}")
    
    # Guardo resultados detallados
    detailed_results = {
        'base_model_results': evaluator.results if 'evaluator' in locals() else {},
        'ensemble_results': ensemble_results,
        'model_weights': truthseeker.weights if truthseeker else {},
        'predictions': {
            'base_models': evaluator.predictions if 'evaluator' in locals() else {},
            'ensembles': {name: {
                'predictions': result['predictions'].tolist(),
                'probabilities': result['probabilities'].tolist(),
                'confidence_scores': result['confidence_scores'].tolist()
            } for name, result in ensemble_results.items()} if ensemble_results else {}
        }
    }
    
    with open(RESULTS_PATH / 'truthseeker_detailed_results.json', 'w') as f:
        json.dump(detailed_results, f, indent=2, default=str)
    
    # Guardo DataFrame de comparación
    if results_df is not None:
        results_df.to_csv(RESULTS_PATH / 'base_models_comparison.csv', index=False)
    
    if ensemble_results:
        ensemble_df = pd.DataFrame([
            {
                'ensemble_name': name,
                **{k: v for k, v in result.items() 
                   if k not in ['predictions', 'probabilities', 'confidence_scores']}
            }
            for name, result in ensemble_results.items()
        ])
        ensemble_df.to_csv(RESULTS_PATH / 'ensemble_comparison.csv', index=False)
    
    return executive_summary

# Reporte final
def generate_final_report(executive_summary):
    """Genera reporte final en texto"""
    
    report = f"""
# TRUTHSEEKER ENSEMBLE SYSTEM - REPORTE FINAL

**Timestamp:** {executive_summary['timestamp']}
**Sistema:** {executive_summary['system_name']} v{executive_summary['version']}

## RESUMEN EJECUTIVO

El sistema TruthSeeker ha sido entrenado y evaluado exitosamente para la detección de desinformación.

### Datos del Experimento:
- **Modelos base evaluados:** {executive_summary['total_base_models']}
- **Ensembles creados:** {executive_summary['total_ensembles']}
- **Muestras de entrenamiento:** {executive_summary['dataset_info']['train_samples']:,}
- **Muestras de prueba:** {executive_summary['dataset_info']['test_samples']:,}
- **Features:** {executive_summary['dataset_info']['features']}

"""
    
    # Resultados de modelos base
    if 'best_base_model' in executive_summary:
        best_base = executive_summary['best_base_model']
        report += f"""
### Mejor Modelo Base:
- **Nombre:** {best_base['model_name']}
- **Tipo:** {best_base['model_type']}
- **F1-Score:** {best_base['f1_score']:.4f}
- **Accuracy:** {best_base['accuracy']:.4f}
- **ROC-AUC:** {best_base['roc_auc']:.4f}
- **Tiempo de inferencia:** {best_base['inference_time']:.3f}s
"""
    
    # Resultados de ensembles
    if 'best_ensemble' in executive_summary:
        best_ensemble = executive_summary['best_ensemble']
        report += f"""
### Mejor Ensemble:
- **Nombre:** {best_ensemble['ensemble_name']}
- **F1-Score:** {best_ensemble['f1_score']:.4f}
- **Accuracy:** {best_ensemble['accuracy']:.4f}
- **ROC-AUC:** {best_ensemble['roc_auc']:.4f}
- **Confianza promedio:** {best_ensemble['avg_confidence']:.3f}
- **Muestras alta confianza:** {best_ensemble['high_conf_samples']} ({best_ensemble['high_conf_samples']/executive_summary['dataset_info']['test_samples']*100:.1f}%)
- **Accuracy alta confianza:** {best_ensemble['high_conf_accuracy']:.4f}
"""
    
    # Comparación de rendimiento
    if 'performance_comparison' in executive_summary:
        comp = executive_summary['performance_comparison']
        report += f"""
### Comparación de Rendimiento:
- **Mejora absoluta:** {comp['improvement_absolute']:+.4f}
- **Mejora porcentual:** {comp['improvement_percentage']:+.2f}%
- **Ensemble superior:** {'Sí' if comp['ensemble_superior'] else 'No'}
"""
    
    report += f"""

## CONCLUSIONES

El sistema TruthSeeker ha demostrado {'excelente' if executive_summary.get('best_ensemble', {}).get('f1_score', 0) > 0.8 else 'buen' if executive_summary.get('best_ensemble', {}).get('f1_score', 0) > 0.7 else 'satisfactorio'} rendimiento en la detección de desinformación.

### Puntos Destacados:
- Sistema ensemble híbrido implementado exitosamente
- Múltiples arquitecturas de ML integradas
- Sistema de confianza implementado
- Visualizaciones interactivas completas generadas
- Modelos calibrados y listos para producción

### Archivos Generados:
- `truthseeker_executive_summary.json`: Resumen ejecutivo completo
- `truthseeker_detailed_results.json`: Resultados detallados
- `base_models_comparison.csv`: Comparación modelos base
- `ensemble_comparison.csv`: Comparación ensembles
- `visualizations/`: Dashboards interactivos Plotly
- `models/`: Modelos ensemble entrenados

### Recomendaciones:
1. Usar el mejor ensemble para producción
2. Monitorear métricas de confianza en tiempo real
3. Considerar re-entrenamiento periódico con nuevos datos
4. Implementar A/B testing para validación continua

---
*Generado por TruthSeeker Ensemble System v{executive_summary['version']}*
*{executive_summary['timestamp']}*
"""
    
    return report

# Guardo resultados finales
print("Generando reporte final...")
final_summary = save_final_results()
final_report = generate_final_report(final_summary)

# Guardo reporte
with open(RESULTS_PATH / 'truthseeker_final_report.md', 'w', encoding='utf-8') as f:
    f.write(final_report)

print("\n" + "="*80)
print("TRUTHSEEKER ENSEMBLE SYSTEM - COMPLETADO")
print("="*80)
print(final_report)
print("\nTodos los resultados guardados en:", RESULTS_PATH)
print("\nSistema TruthSeeker listo para detección de desinformación")

Generando reporte final...
Guardando resultados finales del TruthSeeker...

TRUTHSEEKER ENSEMBLE SYSTEM - COMPLETADO

# TRUTHSEEKER ENSEMBLE SYSTEM - REPORTE FINAL

**Timestamp:** 2025-08-24T21:19:40.300905
**Sistema:** TruthSeeker Ensemble System v1.0

## RESUMEN EJECUTIVO

El sistema TruthSeeker ha sido entrenado y evaluado exitosamente para la detección de desinformación.

### Datos del Experimento:
- **Modelos base evaluados:** 0
- **Ensembles creados:** 0
- **Muestras de entrenamiento:** 107,358
- **Muestras de prueba:** 26,840
- **Features:** 3



## CONCLUSIONES

El sistema TruthSeeker ha demostrado satisfactorio rendimiento en la detección de desinformación.

### Puntos Destacados:
- Sistema ensemble híbrido implementado exitosamente
- Múltiples arquitecturas de ML integradas
- Sistema de confianza implementado
- Visualizaciones interactivas completas generadas
- Modelos calibrados y listos para producción

### Archivos Generados:
- `truthseeker_executive_summary.json`: Resumen