# 🏦 Experimentación Controlada con German Credit Dataset

## 📋 Objetivos del Notebook

Este notebook implementa una **experimentación controlada** enfocada en:

1. **German Credit Dataset** específicamente
2. **Optimización con Optuna** para cada modelo
3. **Métricas detalladas**: AUC, PSI, Traffic Light
4. **Comparación entre**: Train, Test, Holdout
5. **Análisis por algoritmo** individual

## 🎯 Métricas de Evaluación

### Métricas Principales:
- **AUC-ROC**: Capacidad discriminante del modelo
- **PSI (Population Stability Index)**: Estabilidad de distribución entre muestras
- **Traffic Light**: Precisión en grupos de riesgo para rating bancario

### Traffic Light Methodology:
- **Verde**: Modelo predice correctamente la probabilidad de default
- **Amarillo**: Subestimación o sobrestimación leve
- **Rojo**: Subestimación o sobrestimación significativa

## 🚀 Modelos a Optimizar

1. **XGBoost** - Gradient Boosting optimizado
2. **CatBoost** - Gradient Boosting con manejo de categóricas
3. **LightGBM** - Gradient Boosting eficiente
4. **RandomForest** - Ensemble de árboles
5. **LogisticRegression** - Modelo lineal baseline

## 📊 Estructura de Evaluación

Para cada modelo optimizado:
- **Train Performance**: Métricas en datos de entrenamiento
- **Test Performance**: Métricas en datos de prueba
- **Holdout Performance**: Métricas en datos de validación
- **Comparación**: Análisis de estabilidad y generalización

---

**¡Empecemos con la experimentación controlada!** 🎯


In [1]:
# Importación de librerías
import sys
import os
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from pathlib import Path
import yaml
import logging
from tqdm import tqdm
import time
import optuna
from optuna.samplers import TPESampler
from optuna.pruners import MedianPruner

# Scikit-learn
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.metrics import roc_auc_score, roc_curve, precision_recall_curve, auc
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Modelos específicos
import xgboost as xgb
import lightgbm as lgb
import catboost as cb

# UCI Repository
from ucimlrepo import fetch_ucirepo

# Configuración
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Configurar logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("✅ Librerías importadas correctamente")


  from .autonotebook import tqdm as notebook_tqdm


✅ Librerías importadas correctamente


In [2]:
# Configuración del proyecto
PROJECT_ROOT = Path('..')
DATA_DIR = PROJECT_ROOT / 'data'
RESULTS_DIR = PROJECT_ROOT / 'results'
CONFIGS_DIR = PROJECT_ROOT / 'configs'

# Crear directorios si no existen
DATA_DIR.mkdir(exist_ok=True)
RESULTS_DIR.mkdir(exist_ok=True)

# Configuración de experimento
RANDOM_STATE = 42
N_TRIALS = 50  # Número de trials para Optuna
CV_FOLDS = 5   # Folds para cross-validation

print(f"📁 Directorio del proyecto: {PROJECT_ROOT.absolute()}")
print(f"📊 Número de trials Optuna: {N_TRIALS}")
print(f"🔄 Folds de CV: {CV_FOLDS}")


📁 Directorio del proyecto: c:\Users\carlo\OneDrive\Documentos\repos\tb-grado-repo\notebooks\..
📊 Número de trials Optuna: 50
🔄 Folds de CV: 5


In [3]:
# Cargar German Credit Dataset
print("📥 Cargando German Credit Dataset...")

try:
    # Cargar dataset desde UCI Repository
    german_credit = fetch_ucirepo(id=144)
    
    # Obtener datos
    X = german_credit.data.features
    y = german_credit.data.targets
    
    print(f"✅ Dataset cargado exitosamente")
    print(f"   📊 Forma de X: {X.shape}")
    print(f"   📊 Forma de y: {y.shape}")
    print(f"   🎯 Variable objetivo: {y.columns[0]}")
    
    # Mostrar información del dataset
    print(f"\n📋 Información del dataset:")
    print(f"   Features: {list(X.columns)}")
    print(f"   Tipos de datos: {X.dtypes.value_counts().to_dict()}")
    print(f"   Valores únicos en target: {y.iloc[:, 0].value_counts().to_dict()}")
    
except Exception as e:
    print(f"❌ Error cargando dataset: {e}")
    print("🔄 Intentando cargar desde archivo local...")
    
    # Intentar cargar desde archivo local si existe
    local_file = DATA_DIR / 'german_credit.csv'
    if local_file.exists():
        df = pd.read_csv(local_file)
        X = df.drop('target', axis=1)
        y = df[['target']]
        print(f"✅ Dataset cargado desde archivo local")
    else:
        print(f"❌ No se pudo cargar el dataset")
        raise


📥 Cargando German Credit Dataset...
✅ Dataset cargado exitosamente
   📊 Forma de X: (1000, 20)
   📊 Forma de y: (1000, 1)
   🎯 Variable objetivo: class

📋 Información del dataset:
   Features: ['Attribute1', 'Attribute2', 'Attribute3', 'Attribute4', 'Attribute5', 'Attribute6', 'Attribute7', 'Attribute8', 'Attribute9', 'Attribute10', 'Attribute11', 'Attribute12', 'Attribute13', 'Attribute14', 'Attribute15', 'Attribute16', 'Attribute17', 'Attribute18', 'Attribute19', 'Attribute20']
   Tipos de datos: {dtype('O'): 13, dtype('int64'): 7}
   Valores únicos en target: {1: 700, 2: 300}


In [4]:
# Preprocesamiento de datos
print("🔧 Preprocesando datos...")

# Convertir target a binario (1 = bad credit, 0 = good credit)
y_binary = (y.iloc[:, 0] == 2).astype(int)  # 2 = bad credit en German dataset

print(f"📊 Distribución del target:")
print(f"   Good Credit (0): {(y_binary == 0).sum()} ({(y_binary == 0).mean()*100:.1f}%)")
print(f"   Bad Credit (1): {(y_binary == 1).sum()} ({(y_binary == 1).mean()*100:.1f}%)")

# Identificar variables categóricas y numéricas
categorical_cols = X.select_dtypes(include=['object', 'category']).columns.tolist()
numerical_cols = X.select_dtypes(include=['int64', 'float64']).columns.tolist()

print(f"\n📋 Tipos de variables:")
print(f"   Categóricas: {len(categorical_cols)} - {categorical_cols}")
print(f"   Numéricas: {len(numerical_cols)} - {numerical_cols}")

# Codificar variables categóricas
X_encoded = X.copy()
label_encoders = {}

for col in categorical_cols:
    le = LabelEncoder()
    X_encoded[col] = le.fit_transform(X[col].astype(str))
    label_encoders[col] = le

print(f"✅ Variables categóricas codificadas")
print(f"📊 Forma final: {X_encoded.shape}")


🔧 Preprocesando datos...
📊 Distribución del target:
   Good Credit (0): 700 (70.0%)
   Bad Credit (1): 300 (30.0%)

📋 Tipos de variables:
   Categóricas: 13 - ['Attribute1', 'Attribute3', 'Attribute4', 'Attribute6', 'Attribute7', 'Attribute9', 'Attribute10', 'Attribute12', 'Attribute14', 'Attribute15', 'Attribute17', 'Attribute19', 'Attribute20']
   Numéricas: 7 - ['Attribute2', 'Attribute5', 'Attribute8', 'Attribute11', 'Attribute13', 'Attribute16', 'Attribute18']
✅ Variables categóricas codificadas
📊 Forma final: (1000, 20)


In [5]:
# División de datos: Train (60%) / Test (20%) / Holdout (20%)
print("📊 Dividiendo datos en Train/Test/Holdout...")

# Primera división: Train+Test (80%) / Holdout (20%)
X_temp, X_holdout, y_temp, y_holdout = train_test_split(
    X_encoded, y_binary, 
    test_size=0.2, 
    random_state=RANDOM_STATE, 
    stratify=y_binary
)

# Segunda división: Train (60%) / Test (20%)
X_train, X_test, y_train, y_test = train_test_split(
    X_temp, y_temp, 
    test_size=0.25,  # 0.25 de 0.8 = 0.2 del total
    random_state=RANDOM_STATE, 
    stratify=y_temp
)

print(f"✅ División de datos completada:")
print(f"   🏋️ Train: {X_train.shape[0]} muestras (60%)")
print(f"   🧪 Test: {X_test.shape[0]} muestras (20%)")
print(f"   🔒 Holdout: {X_holdout.shape[0]} muestras (20%)")

# Verificar distribución del target en cada conjunto
print(f"\n📊 Distribución del target por conjunto:")
for name, y_set in [('Train', y_train), ('Test', y_test), ('Holdout', y_holdout)]:
    bad_rate = y_set.mean()
    print(f"   {name}: {bad_rate:.3f} ({y_set.sum()}/{len(y_set)})")


📊 Dividiendo datos en Train/Test/Holdout...
✅ División de datos completada:
   🏋️ Train: 600 muestras (60%)
   🧪 Test: 200 muestras (20%)
   🔒 Holdout: 200 muestras (20%)

📊 Distribución del target por conjunto:
   Train: 0.300 (180/600)
   Test: 0.300 (60/200)
   Holdout: 0.300 (60/200)


In [6]:
# Clase para métricas de evaluación
class CreditScoringMetrics:
    """
    Clase para calcular métricas específicas de scoring crediticio
    """
    
    @staticmethod
    def calculate_auc_roc(y_true, y_pred_proba):
        """
        Calcula AUC-ROC
        """
        return roc_auc_score(y_true, y_pred_proba)
    
    @staticmethod
    def calculate_psi(expected, actual, bins=10):
        """
        Calcula Population Stability Index (PSI)
        
        Args:
            expected: Distribución esperada (train)
            actual: Distribución actual (test/holdout)
            bins: Número de bins para discretizar
        
        Returns:
            PSI value
        """
        # Crear bins basados en la distribución esperada
        breakpoints = np.linspace(0, 1, bins + 1)
        breakpoints[0] = -np.inf
        breakpoints[-1] = np.inf
        
        # Discretizar ambas distribuciones
        expected_binned = pd.cut(expected, bins=breakpoints, labels=False)
        actual_binned = pd.cut(actual, bins=breakpoints, labels=False)
        
        # Calcular frecuencias
        expected_freq = pd.Series(expected_binned).value_counts(normalize=True, sort=False)
        actual_freq = pd.Series(actual_binned).value_counts(normalize=True, sort=False)
        
        # Asegurar que ambos tengan los mismos bins
        for i in range(bins):
            if i not in expected_freq.index:
                expected_freq[i] = 0
            if i not in actual_freq.index:
                actual_freq[i] = 0
        
        expected_freq = expected_freq.sort_index()
        actual_freq = actual_freq.sort_index()
        
        # Calcular PSI
        psi = 0
        for i in range(bins):
            if expected_freq.iloc[i] > 0:
                psi += (actual_freq.iloc[i] - expected_freq.iloc[i]) * \
                       np.log(actual_freq.iloc[i] / expected_freq.iloc[i])
        
        return psi
    
    @staticmethod
    def calculate_traffic_light(y_true, y_pred_proba, n_groups=10):
        """
        Calcula Traffic Light para grupos de riesgo
        
        Args:
            y_true: Valores reales
            y_pred_proba: Probabilidades predichas
            n_groups: Número de grupos de riesgo
        
        Returns:
            Dict con estadísticas de Traffic Light
        """
        # Crear grupos de riesgo basados en probabilidades predichas
        df = pd.DataFrame({
            'actual': y_true,
            'predicted': y_pred_proba
        })
        
        # Ordenar por probabilidad predicha (descendente)
        df = df.sort_values('predicted', ascending=False).reset_index(drop=True)
        
        # Crear grupos de riesgo
        group_size = len(df) // n_groups
        df['group'] = 0
        
        for i in range(n_groups):
            start_idx = i * group_size
            if i == n_groups - 1:  # Último grupo incluye el resto
                end_idx = len(df)
            else:
                end_idx = (i + 1) * group_size
            
            df.loc[start_idx:end_idx-1, 'group'] = i + 1
        
        # Calcular métricas por grupo
        group_stats = []
        for group in range(1, n_groups + 1):
            group_data = df[df['group'] == group]
            if len(group_data) > 0:
                actual_rate = group_data['actual'].mean()
                predicted_rate = group_data['predicted'].mean()
                
                # Determinar color del semáforo
                diff = abs(actual_rate - predicted_rate)
                if diff <= 0.05:  # 5% de tolerancia
                    color = 'green'
                elif diff <= 0.10:  # 10% de tolerancia
                    color = 'yellow'
                else:
                    color = 'red'
                
                group_stats.append({
                    'group': group,
                    'actual_rate': actual_rate,
                    'predicted_rate': predicted_rate,
                    'difference': diff,
                    'color': color,
                    'size': len(group_data)
                })
        
        # Calcular estadísticas generales
        colors = [stat['color'] for stat in group_stats]
        green_pct = colors.count('green') / len(colors) * 100
        yellow_pct = colors.count('yellow') / len(colors) * 100
        red_pct = colors.count('red') / len(colors) * 100
        
        return {
            'group_stats': group_stats,
            'green_percentage': green_pct,
            'yellow_percentage': yellow_pct,
            'red_percentage': red_pct,
            'total_groups': len(group_stats)
        }
    
    @classmethod
    def evaluate_model(cls, y_true, y_pred_proba, y_train_proba=None):
        """
        Evalúa un modelo con todas las métricas
        
        Args:
            y_true: Valores reales
            y_pred_proba: Probabilidades predichas
            y_train_proba: Probabilidades en train (para PSI)
        
        Returns:
            Dict con todas las métricas
        """
        results = {}
        
        # AUC-ROC
        results['auc_roc'] = cls.calculate_auc_roc(y_true, y_pred_proba)
        
        # PSI (si se proporcionan datos de train)
        if y_train_proba is not None:
            results['psi'] = cls.calculate_psi(y_train_proba, y_pred_proba)
        
        # Traffic Light
        traffic_light = cls.calculate_traffic_light(y_true, y_pred_proba)
        results['traffic_light'] = traffic_light
        
        return results

print("✅ Clase de métricas creada")


✅ Clase de métricas creada


In [7]:
# Clase para optimización con Optuna
class OptunaOptimizer:
    """
    Clase para optimizar modelos con Optuna
    """
    
    def __init__(self, X_train, y_train, cv_folds=5, n_trials=50):
        self.X_train = X_train
        self.y_train = y_train
        self.cv_folds = cv_folds
        self.n_trials = n_trials
        self.best_params = {}
        self.best_scores = {}
        
    def optimize_xgboost(self):
        """
        Optimiza XGBoost con Optuna
        """
        def objective(trial):
            params = {
                'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
                'max_depth': trial.suggest_int('max_depth', 3, 10),
                'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
                'subsample': trial.suggest_float('subsample', 0.6, 1.0),
                'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
                'reg_alpha': trial.suggest_float('reg_alpha', 0, 10),
                'reg_lambda': trial.suggest_float('reg_lambda', 0, 10),
                'random_state': RANDOM_STATE
            }
            
            model = xgb.XGBClassifier(**params)
            cv_scores = cross_val_score(model, self.X_train, self.y_train, 
                                      cv=self.cv_folds, scoring='roc_auc')
            return cv_scores.mean()
        
        study = optuna.create_study(direction='maximize', sampler=TPESampler(seed=RANDOM_STATE))
        study.optimize(objective, n_trials=self.n_trials)
        
        self.best_params['xgboost'] = study.best_params
        self.best_scores['xgboost'] = study.best_value
        
        return study.best_params
    
    def optimize_lightgbm(self):
        """
        Optimiza LightGBM con Optuna
        """
        def objective(trial):
            params = {
                'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
                'max_depth': trial.suggest_int('max_depth', 3, 10),
                'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
                'subsample': trial.suggest_float('subsample', 0.6, 1.0),
                'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
                'reg_alpha': trial.suggest_float('reg_alpha', 0, 10),
                'reg_lambda': trial.suggest_float('reg_lambda', 0, 10),
                'random_state': RANDOM_STATE,
                'verbose': -1
            }
            
            model = lgb.LGBMClassifier(**params)
            cv_scores = cross_val_score(model, self.X_train, self.y_train, 
                                      cv=self.cv_folds, scoring='roc_auc')
            return cv_scores.mean()
        
        study = optuna.create_study(direction='maximize', sampler=TPESampler(seed=RANDOM_STATE))
        study.optimize(objective, n_trials=self.n_trials)
        
        self.best_params['lightgbm'] = study.best_params
        self.best_scores['lightgbm'] = study.best_value
        
        return study.best_params
    
    def optimize_catboost(self):
        """
        Optimiza CatBoost con Optuna
        """
        def objective(trial):
            params = {
                'iterations': trial.suggest_int('iterations', 100, 1000),
                'depth': trial.suggest_int('depth', 3, 10),
                'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
                'l2_leaf_reg': trial.suggest_float('l2_leaf_reg', 1, 10),
                'random_seed': RANDOM_STATE,
                'verbose': False
            }
            
            model = cb.CatBoostClassifier(**params)
            cv_scores = cross_val_score(model, self.X_train, self.y_train, 
                                      cv=self.cv_folds, scoring='roc_auc')
            return cv_scores.mean()
        
        study = optuna.create_study(direction='maximize', sampler=TPESampler(seed=RANDOM_STATE))
        study.optimize(objective, n_trials=self.n_trials)
        
        self.best_params['catboost'] = study.best_params
        self.best_scores['catboost'] = study.best_value
        
        return study.best_params
    
    def optimize_random_forest(self):
        """
        Optimiza Random Forest con Optuna
        """
        def objective(trial):
            params = {
                'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
                'max_depth': trial.suggest_int('max_depth', 3, 20),
                'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
                'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
                'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2', None]),
                'random_state': RANDOM_STATE
            }
            
            model = RandomForestClassifier(**params)
            cv_scores = cross_val_score(model, self.X_train, self.y_train, 
                                      cv=self.cv_folds, scoring='roc_auc')
            return cv_scores.mean()
        
        study = optuna.create_study(direction='maximize', sampler=TPESampler(seed=RANDOM_STATE))
        study.optimize(objective, n_trials=self.n_trials)
        
        self.best_params['random_forest'] = study.best_params
        self.best_scores['random_forest'] = study.best_value
        
        return study.best_params
    
    def optimize_logistic_regression(self):
        """
        Optimiza Logistic Regression con Optuna
        """
        def objective(trial):
            params = {
                'C': trial.suggest_float('C', 0.01, 100, log=True),
                'penalty': trial.suggest_categorical('penalty', ['l1', 'l2']),
                'solver': 'liblinear',  # Compatible con l1 y l2
                'random_state': RANDOM_STATE
            }
            
            model = LogisticRegression(**params)
            cv_scores = cross_val_score(model, self.X_train, self.y_train, 
                                      cv=self.cv_folds, scoring='roc_auc')
            return cv_scores.mean()
        
        study = optuna.create_study(direction='maximize', sampler=TPESampler(seed=RANDOM_STATE))
        study.optimize(objective, n_trials=self.n_trials)
        
        self.best_params['logistic_regression'] = study.best_params
        self.best_scores['logistic_regression'] = study.best_value
        
        return study.best_params

print("✅ Clase de optimización creada")


✅ Clase de optimización creada


In [8]:
# Inicializar optimizador
optimizer = OptunaOptimizer(X_train, y_train, cv_folds=CV_FOLDS, n_trials=N_TRIALS)

print(f"🚀 Optimizador inicializado")
print(f"   📊 Datos de entrenamiento: {X_train.shape}")
print(f"   🔄 Folds de CV: {CV_FOLDS}")
print(f"   🎯 Trials por modelo: {N_TRIALS}")


🚀 Optimizador inicializado
   📊 Datos de entrenamiento: (600, 20)
   🔄 Folds de CV: 5
   🎯 Trials por modelo: 50


In [None]:
# Optimizar todos los modelos
print("🔥 OPTIMIZANDO TODOS LOS MODELOS...")
print("="*50)

# Lista de modelos a optimizar
models_to_optimize = [
    ('XGBoost', optimizer.optimize_xgboost),
    ('LightGBM', optimizer.optimize_lightgbm),
    ('CatBoost', optimizer.optimize_catboost),
    ('RandomForest', optimizer.optimize_random_forest),
    ('LogisticRegression', optimizer.optimize_logistic_regression)
]

# Optimizar cada modelo
for model_name, optimize_func in models_to_optimize:
    print(f"\n🔥 Optimizando {model_name}...")
    start_time = time.time()
    
    try:
        best_params = optimize_func()
        end_time = time.time()
        
        print(f"✅ {model_name} optimizado en {end_time - start_time:.1f} segundos")
        print(f"   🏆 Mejor CV Score: {optimizer.best_scores[model_name.lower().replace(' ', '_')]:.4f}")
        print(f"   ⚙️ Mejores parámetros: {best_params}")
        
    except Exception as e:
        print(f"❌ Error optimizando {model_name}: {e}")

print(f"\n✅ Optimización completada para todos los modelos")


In [None]:
# Resumen de optimización
print("📊 RESUMEN DE OPTIMIZACIÓN")
print("="*50)

for model_name, score in optimizer.best_scores.items():
    print(f"{model_name.upper()}: {score:.4f}")

# Encontrar el mejor modelo
best_model_name = max(optimizer.best_scores, key=optimizer.best_scores.get)
best_score = optimizer.best_scores[best_model_name]

print(f"\n🏆 MEJOR MODELO: {best_model_name.upper()}")
print(f"📊 Mejor CV Score: {best_score:.4f}")


In [None]:
# Entrenar modelos optimizados y evaluar
print("🏋️ Entrenando modelos optimizados...")

# Crear modelos con mejores parámetros
models = {
    'XGBoost': xgb.XGBClassifier(**optimizer.best_params['xgboost'], random_state=RANDOM_STATE),
    'LightGBM': lgb.LGBMClassifier(**optimizer.best_params['lightgbm'], random_state=RANDOM_STATE, verbose=-1),
    'CatBoost': cb.CatBoostClassifier(**optimizer.best_params['catboost'], random_seed=RANDOM_STATE, verbose=False),
    'RandomForest': RandomForestClassifier(**optimizer.best_params['random_forest'], random_state=RANDOM_STATE),
    'LogisticRegression': LogisticRegression(**optimizer.best_params['logistic_regression'], random_state=RANDOM_STATE)
}

# Entrenar todos los modelos
trained_models = {}
for name, model in models.items():
    print(f"   Entrenando {name}...")
    model.fit(X_train, y_train)
    trained_models[name] = model

print("✅ Todos los modelos entrenados")


In [None]:
# Evaluar modelos en Train, Test y Holdout
print("📊 EVALUANDO MODELOS EN TRAIN/TEST/HOLDOUT...")
print("="*60)

results = {}

for model_name, model in trained_models.items():
    print(f"\n🔍 Evaluando {model_name}...")
    
    # Predicciones en cada conjunto
    train_proba = model.predict_proba(X_train)[:, 1]
    test_proba = model.predict_proba(X_test)[:, 1]
    holdout_proba = model.predict_proba(X_holdout)[:, 1]
    
    # Evaluar en cada conjunto
    train_metrics = CreditScoringMetrics.evaluate_model(y_train, train_proba)
    test_metrics = CreditScoringMetrics.evaluate_model(y_test, test_proba, train_proba)
    holdout_metrics = CreditScoringMetrics.evaluate_model(y_holdout, holdout_proba, train_proba)
    
    results[model_name] = {
        'train': train_metrics,
        'test': test_metrics,
        'holdout': holdout_metrics
    }
    
    # Mostrar resultados
    print(f"   📈 Train  - AUC: {train_metrics['auc_roc']:.4f}, PSI: N/A, Green: {train_metrics['traffic_light']['green_percentage']:.1f}%")
    print(f"   🧪 Test   - AUC: {test_metrics['auc_roc']:.4f}, PSI: {test_metrics['psi']:.4f}, Green: {test_metrics['traffic_light']['green_percentage']:.1f}%")
    print(f"   🔒 Holdout - AUC: {holdout_metrics['auc_roc']:.4f}, PSI: {holdout_metrics['psi']:.4f}, Green: {holdout_metrics['traffic_light']['green_percentage']:.1f}%")


In [None]:
# Crear tabla comparativa
print("📋 TABLA COMPARATIVA DE RESULTADOS")
print("="*80)

# Crear DataFrame con resultados
comparison_data = []

for model_name, model_results in results.items():
    for dataset in ['train', 'test', 'holdout']:
        metrics = model_results[dataset]
        
        row = {
            'Modelo': model_name,
            'Dataset': dataset.capitalize(),
            'AUC-ROC': metrics['auc_roc'],
            'PSI': metrics.get('psi', np.nan),
            'Traffic_Light_Green_%': metrics['traffic_light']['green_percentage'],
            'Traffic_Light_Yellow_%': metrics['traffic_light']['yellow_percentage'],
            'Traffic_Light_Red_%': metrics['traffic_light']['red_percentage']
        }
        comparison_data.append(row)

comparison_df = pd.DataFrame(comparison_data)

# Mostrar tabla
display(comparison_df.round(4))

# Guardar resultados
comparison_df.to_csv(RESULTS_DIR / 'model_comparison_results.csv', index=False)
print(f"\n💾 Resultados guardados en: {RESULTS_DIR / 'model_comparison_results.csv'}")


In [None]:
# Visualizaciones
print("📊 CREANDO VISUALIZACIONES...")

# Configurar subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Comparación de Modelos por Dataset', fontsize=16, fontweight='bold')

# 1. AUC-ROC por dataset
ax1 = axes[0, 0]
pivot_auc = comparison_df.pivot(index='Modelo', columns='Dataset', values='AUC-ROC')
pivot_auc.plot(kind='bar', ax=ax1, width=0.8)
ax1.set_title('AUC-ROC por Dataset')
ax1.set_ylabel('AUC-ROC')
ax1.legend(title='Dataset')
ax1.tick_params(axis='x', rotation=45)
ax1.axhline(y=0.65, color='red', linestyle='--', alpha=0.7, label='Umbral (0.65)')

# 2. PSI por dataset (solo test y holdout)
ax2 = axes[0, 1]
psi_data = comparison_df[comparison_df['Dataset'].isin(['Test', 'Holdout'])]
pivot_psi = psi_data.pivot(index='Modelo', columns='Dataset', values='PSI')
pivot_psi.plot(kind='bar', ax=ax2, width=0.8)
ax2.set_title('PSI por Dataset')
ax2.set_ylabel('PSI')
ax2.legend(title='Dataset')
ax2.tick_params(axis='x', rotation=45)
ax2.axhline(y=0.10, color='red', linestyle='--', alpha=0.7, label='Umbral (0.10)')

# 3. Traffic Light Green % por dataset
ax3 = axes[1, 0]
pivot_green = comparison_df.pivot(index='Modelo', columns='Dataset', values='Traffic_Light_Green_%')
pivot_green.plot(kind='bar', ax=ax3, width=0.8)
ax3.set_title('Traffic Light Green % por Dataset')
ax3.set_ylabel('Green %')
ax3.legend(title='Dataset')
ax3.tick_params(axis='x', rotation=45)
ax3.axhline(y=80, color='red', linestyle='--', alpha=0.7, label='Umbral (80%)')

# 4. Comparación de estabilidad (AUC Test vs Holdout)
ax4 = axes[1, 1]
test_auc = comparison_df[comparison_df['Dataset'] == 'Test']['AUC-ROC'].values
holdout_auc = comparison_df[comparison_df['Dataset'] == 'Holdout']['AUC-ROC'].values
models_list = comparison_df[comparison_df['Dataset'] == 'Test']['Modelo'].values

x = np.arange(len(models_list))
width = 0.35

ax4.bar(x - width/2, test_auc, width, label='Test', alpha=0.8)
ax4.bar(x + width/2, holdout_auc, width, label='Holdout', alpha=0.8)

ax4.set_title('Estabilidad: Test vs Holdout AUC')
ax4.set_ylabel('AUC-ROC')
ax4.set_xticks(x)
ax4.set_xticklabels(models_list, rotation=45)
ax4.legend()
ax4.axhline(y=0.65, color='red', linestyle='--', alpha=0.7, label='Umbral (0.65)')

plt.tight_layout()
plt.show()

print("✅ Visualizaciones completadas")


In [None]:
# Análisis detallado del mejor modelo
print("🏆 ANÁLISIS DEL MEJOR MODELO")
print("="*50)

# Encontrar el mejor modelo basado en AUC en holdout
holdout_results = comparison_df[comparison_df['Dataset'] == 'Holdout']
best_model_name = holdout_results.loc[holdout_results['AUC-ROC'].idxmax(), 'Modelo']
best_model_auc = holdout_results['AUC-ROC'].max()

print(f"🥇 Mejor modelo: {best_model_name}")
print(f"📊 Mejor AUC en Holdout: {best_model_auc:.4f}")

# Mostrar métricas detalladas del mejor modelo
best_model_results = results[best_model_name]

print(f"\n📈 MÉTRICAS DETALLADAS DE {best_model_name.upper()}:")
print("-" * 50)

for dataset, metrics in best_model_results.items():
    print(f"\n{dataset.upper()}:")
    print(f"   AUC-ROC: {metrics['auc_roc']:.4f}")
    if 'psi' in metrics:
        print(f"   PSI: {metrics['psi']:.4f}")
    print(f"   Traffic Light:")
    print(f"      Verde: {metrics['traffic_light']['green_percentage']:.1f}%")
    print(f"      Amarillo: {metrics['traffic_light']['yellow_percentage']:.1f}%")
    print(f"      Rojo: {metrics['traffic_light']['red_percentage']:.1f}%")


In [None]:
# Resumen ejecutivo
print("📋 RESUMEN EJECUTIVO")
print("="*50)

# Estadísticas generales
print(f"\n📊 ESTADÍSTICAS GENERALES:")
print(f"   Total de modelos evaluados: {len(trained_models)}")
print(f"   Mejor AUC en Holdout: {best_model_auc:.4f}")
print(f"   Modelos que superan umbral AUC (0.65): {len(holdout_results[holdout_results['AUC-ROC'] >= 0.65])}")

# Análisis de estabilidad
print(f"\n🔄 ANÁLISIS DE ESTABILIDAD:")
for model_name in trained_models.keys():
    test_auc = comparison_df[(comparison_df['Modelo'] == model_name) & (comparison_df['Dataset'] == 'Test')]['AUC-ROC'].iloc[0]
    holdout_auc = comparison_df[(comparison_df['Modelo'] == model_name) & (comparison_df['Dataset'] == 'Holdout')]['AUC-ROC'].iloc[0]
    stability = abs(test_auc - holdout_auc)
    
    print(f"   {model_name}: {stability:.4f} ({'✅ Estable' if stability < 0.05 else '⚠️ Inestable'})")

# Recomendaciones
print(f"\n💡 RECOMENDACIONES:")
print(f"   1. Modelo recomendado: {best_model_name}")
print(f"   2. AUC en Holdout: {best_model_auc:.4f} ({'✅ Cumple' if best_model_auc >= 0.65 else '❌ No cumple'} umbral)")

if best_model_auc < 0.65:
    print(f"   3. ⚠️ Ningún modelo alcanza el umbral mínimo de AUC (0.65)")
    print(f"      Considerar: Feature engineering, más datos, o modelos más complejos")

print(f"\n✅ Análisis completo finalizado")
