# ‚úÖ Validaci√≥n y Evaluaci√≥n de Modelos de ML
## T√©cnicas Rigurosas para Garantizar Generalizaci√≥n

<a href="https://colab.research.google.com/github/yourusername/ml-course/blob/main/01_Sistemas_aprendizaje_automatico/03_Validacion_evaluacion/validacion_evaluacion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

## üìã Resumen del Notebook

Este notebook cubre **metodolog√≠as rigurosas de validaci√≥n y evaluaci√≥n** para garantizar que los modelos de ML generalicen correctamente.

### üéØ Objetivos de Aprendizaje

1. **Problem√°ticas de Generalizaci√≥n**:
   - Overfitting vs Underfitting
   - Trade-off Sesgo-Varianza
   - Detecci√≥n y mitigaci√≥n

2. **Estrategias de Validaci√≥n**:
   - Train-Test Split
   - K-Fold Cross-Validation
   - Stratified K-Fold
   - Leave-One-Out (LOO)
   - Time Series Split

3. **M√©tricas de Evaluaci√≥n**:
   - **Clasificaci√≥n**: Accuracy, Precision, Recall, F1, ROC-AUC, PR-AUC
   - **Regresi√≥n**: MSE, RMSE, MAE, R¬≤, MAPE
   - Matrices de confusi√≥n y curvas ROC

4. **Optimizaci√≥n de Hiperpar√°metros**:
   - Grid Search
   - Random Search
   - Bayesian Optimization

### üìä Contenido

- Teor√≠a con fundamentaci√≥n matem√°tica
- Implementaciones pr√°cticas
- Visualizaciones de curvas de aprendizaje
- Comparaciones de estrategias de validaci√≥n
- Pipeline completo de evaluaci√≥n

---

In [None]:
# Configuraci√≥n del entorno
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import (train_test_split, cross_val_score, KFold, 
                                      StratifiedKFold, LeaveOneOut, TimeSeriesSplit,
                                      GridSearchCV, RandomizedSearchCV, learning_curve,
                                      validation_curve)
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score,
                            confusion_matrix, classification_report, roc_curve, roc_auc_score,
                            precision_recall_curve, average_precision_score,
                            mean_squared_error, mean_absolute_error, r2_score,
                            mean_absolute_percentage_error)
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier
from sklearn.svm import SVC, SVR
from sklearn.linear_model import LogisticRegression, Ridge, Lasso
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification, make_regression, load_breast_cancer, load_diabetes
from sklearn.preprocessing import StandardScaler
import warnings

warnings.filterwarnings('ignore')
sns.set_style('darkgrid')
plt.rcParams['figure.dpi'] = 100

print("‚úÖ Entorno configurado correctamente")
print(f"NumPy: {np.__version__} | Pandas: {pd.__version__}")
print(f"Scikit-learn: M√≥dulos de validaci√≥n, m√©tricas y optimizaci√≥n cargados")

## 1. Overfitting vs Underfitting

### üéØ Problem√°ticas de Generalizaci√≥n

#### Overfitting (Sobreajuste)
Modelo aprende **ruido y peculiaridades** del conjunto de entrenamiento en lugar de patrones generales.

**S√≠ntomas**:
- Error entrenamiento << Error validaci√≥n
- Modelo muy complejo para los datos
- Rendimiento degrada con datos nuevos

#### Underfitting (Subajuste)
Modelo es **demasiado simple** para capturar patrones subyacentes.

**S√≠ntomas**:
- Error entrenamiento alto
- Error validaci√≥n alto
- Modelo no captura relaciones en los datos

### Trade-off Sesgo-Varianza

$$\text{Error Total} = \text{Sesgo}^2 + \text{Varianza} + \text{Ruido Irreducible}$$

- **Sesgo alto** ‚Üí Underfitting (modelo simple)
- **Varianza alta** ‚Üí Overfitting (modelo complejo)
- **Objetivo**: Balance √≥ptimo entre ambos

In [None]:
# Demostraci√≥n Pr√°ctica: Overfitting vs Underfitting con Curvas de Aprendizaje

print("üéØ Visualizando Overfitting, Underfitting y el Modelo √ìptimo\n")

# Generar datos con ruido
np.random.seed(42)
X_demo = np.sort(np.random.rand(100, 1) * 10, axis=0)
y_demo = np.sin(X_demo).ravel() + np.random.randn(100) * 0.3

# Crear tres modelos con diferentes complejidades
from sklearn.tree import DecisionTreeRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

models_complexity = {
    'Underfitting\n(max_depth=1)': DecisionTreeRegressor(max_depth=1, random_state=42),
    '√ìptimo\n(max_depth=5)': DecisionTreeRegressor(max_depth=5, random_state=42),
    'Overfitting\n(max_depth=20)': DecisionTreeRegressor(max_depth=20, random_state=42)
}

# Dividir datos
X_train_demo, X_test_demo, y_train_demo, y_test_demo = train_test_split(
    X_demo, y_demo, test_size=0.3, random_state=42
)

# Visualizaci√≥n
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

X_plot = np.linspace(0, 10, 500).reshape(-1, 1)

for idx, (name, model) in enumerate(models_complexity.items()):
    # Entrenar
    model.fit(X_train_demo, y_train_demo)
    y_train_pred = model.predict(X_train_demo)
    y_test_pred = model.predict(X_test_demo)
    
    # Calcular errores
    train_score = r2_score(y_train_demo, y_train_pred)
    test_score = r2_score(y_test_demo, y_test_pred)
    train_mse = mean_squared_error(y_train_demo, y_train_pred)
    test_mse = mean_squared_error(y_test_demo, y_test_pred)
    
    # Plot
    axes[idx].scatter(X_train_demo, y_train_demo, color='blue', s=40, alpha=0.6, label='Train')
    axes[idx].scatter(X_test_demo, y_test_demo, color='red', s=40, alpha=0.6, label='Test')
    axes[idx].plot(X_plot, model.predict(X_plot), 'g-', linewidth=2, label='Modelo')
    
    axes[idx].set_title(f'{name}\nTrain R¬≤: {train_score:.3f} | Test R¬≤: {test_score:.3f}\n'
                       f'Train MSE: {train_mse:.3f} | Test MSE: {test_mse:.3f}',
                       fontweight='bold', fontsize=10)
    axes[idx].set_xlabel('X')
    axes[idx].set_ylabel('y')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä An√°lisis:")
print("="*70)
print("‚ö†Ô∏è  UNDERFITTING:")
print("   ‚Ä¢ Train score BAJO, Test score BAJO")
print("   ‚Ä¢ Modelo demasiado simple, no captura patrones")
print("\n‚úÖ MODELO √ìPTIMO:")
print("   ‚Ä¢ Train score ALTO, Test score ALTO y similar")
print("   ‚Ä¢ Generaliza bien a datos nuevos")
print("\n‚ùå OVERFITTING:")
print("   ‚Ä¢ Train score MUY ALTO, Test score BAJO")
print("   ‚Ä¢ Memoriza ruido del training set")

## 2. Estrategias de Validaci√≥n

### üìä M√©todos de Validaci√≥n

#### Train-Test Split Simple
Divisi√≥n aleatoria: t√≠picamente 70-30% o 80-20%
- ‚úÖ R√°pido, simple
- ‚ö†Ô∏è Alta varianza, depende de la partici√≥n

#### K-Fold Cross-Validation
Divide datos en K folds, entrena K veces
- ‚úÖ Estimador m√°s robusto
- ‚úÖ Usa todos los datos
- ‚ö†Ô∏è K veces m√°s costoso

#### Stratified K-Fold
K-Fold preservando distribuci√≥n de clases
- ‚úÖ Esencial para datos desbalanceados
- ‚úÖ Reduce varianza en problemas multiclase

#### Leave-One-Out (LOO)
K-Fold con K=n (cada muestra es un fold)
- ‚úÖ Casi imparcial
- ‚ö†Ô∏è Muy costoso, alta varianza

#### Time Series Split
Validaci√≥n cronol√≥gica (no aleatoria)
- ‚úÖ Respeta orden temporal
- ‚úÖ Previene data leakage futuro‚Üípasado
- üéØ Obligatorio para series temporales

In [None]:
# Ejemplo Pr√°ctico Comprensivo: Pipeline Completo de Validaci√≥n y Evaluaci√≥n

print("üî¨ Pipeline Completo: Validaci√≥n Cruzada + M√©tricas + Optimizaci√≥n\n")

# Cargar dataset de clasificaci√≥n binaria
cancer = load_breast_cancer()
X_cancer = cancer.data
y_cancer = cancer.target

print(f"Dataset: Breast Cancer Wisconsin")
print(f"  ‚Ä¢ Muestras: {len(y_cancer)}")
print(f"  ‚Ä¢ Features: {X_cancer.shape[1]}")
print(f"  ‚Ä¢ Clases: {len(np.unique(y_cancer))} (Malignant: {sum(y_cancer==0)}, Benign: {sum(y_cancer==1)})")

# Normalizar
scaler_cancer = StandardScaler()
X_cancer_scaled = scaler_cancer.fit_transform(X_cancer)

# 1. COMPARACI√ìN DE ESTRATEGIAS DE VALIDACI√ìN
print("\nüìä 1. COMPARACI√ìN DE ESTRATEGIAS DE VALIDACI√ìN")
print("="*70)

models_val = {
    'RandomForest': RandomForestClassifier(n_estimators=100, random_state=42),
    'SVM (RBF)': SVC(kernel='rbf', random_state=42, probability=True),
    'Logistic Reg': LogisticRegression(max_iter=1000, random_state=42)
}

validation_strategies = {
    'Train-Test (80-20)': None,  # Handled separately
    'K-Fold (5)': KFold(n_splits=5, shuffle=True, random_state=42),
    'K-Fold (10)': KFold(n_splits=10, shuffle=True, random_state=42),
    'Stratified K-Fold (5)': StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
}

results_validation = {model_name: {} for model_name in models_val.keys()}

for model_name, model in models_val.items():
    # Train-Test Split
    X_train, X_test, y_train, y_test = train_test_split(
        X_cancer_scaled, y_cancer, test_size=0.2, random_state=42, stratify=y_cancer
    )
    model_clone = sklearn.base.clone(model)
    model_clone.fit(X_train, y_train)
    results_validation[model_name]['Train-Test (80-20)'] = model_clone.score(X_test, y_test)
    
    # Cross-Validation strategies
    for strategy_name, cv in validation_strategies.items():
        if cv is not None:
            scores = cross_val_score(model, X_cancer_scaled, y_cancer, cv=cv, scoring='accuracy')
            results_validation[model_name][strategy_name] = scores.mean()

# Visualizar resultados
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 5))

# Tabla de resultados
df_results = pd.DataFrame(results_validation).T
print(f"\n{df_results}\n")

# Plot 1: Comparaci√≥n de estrategias
strategies = list(validation_strategies.keys())
x_pos = np.arange(len(strategies))
width = 0.25

for idx, model_name in enumerate(models_val.keys()):
    values = [results_validation[model_name][s] for s in strategies]
    ax1.bar(x_pos + idx*width, values, width, label=model_name, alpha=0.8)

ax1.set_ylabel('Accuracy')
ax1.set_title('Comparaci√≥n de Estrategias de Validaci√≥n', fontweight='bold', fontsize=12)
ax1.set_xticks(x_pos + width)
ax1.set_xticklabels(strategies, rotation=15, ha='right')
ax1.legend()
ax1.grid(True, alpha=0.3, axis='y')
ax1.set_ylim([0.85, 1.0])

# 2. M√âTRICAS DETALLADAS (usando Random Forest con Stratified K-Fold)
print("\nüìà 2. AN√ÅLISIS DETALLADO DE M√âTRICAS (Random Forest)")
print("="*70)

import sklearn.base
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X_cancer_scaled, y_cancer, test_size=0.2, random_state=42, stratify=y_cancer
)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)
y_pred_proba = rf_model.predict_proba(X_test)[:, 1]

# M√©tricas
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_proba)

print(f"Accuracy:  {accuracy:.4f}")
print(f"Precision: {precision:.4f} (De los predichos como positivos, % correctos)")
print(f"Recall:    {recall:.4f} (De todos los positivos reales, % detectados)")
print(f"F1-Score:  {f1:.4f} (Media arm√≥nica Precision-Recall)")
print(f"ROC-AUC:   {roc_auc:.4f} (√Årea bajo curva ROC)")

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

# Plot 2: Matriz de Confusi√≥n
import sklearn.metrics
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax2,
           xticklabels=['Malignant', 'Benign'], 
           yticklabels=['Malignant', 'Benign'])
ax2.set_title(f'Matriz de Confusi√≥n\nAccuracy: {accuracy:.4f}', fontweight='bold', fontsize=12)
ax2.set_ylabel('True Label')
ax2.set_xlabel('Predicted Label')

plt.tight_layout()
plt.show()

# 3. CURVA ROC
print("\nüìâ 3. CURVA ROC")

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

fig, (ax3, ax4) = plt.subplots(1, 2, figsize=(14, 5))

ax3.plot(fpr, tpr, 'b-', linewidth=2, label=f'ROC Curve (AUC = {roc_auc:.4f})')
ax3.plot([0, 1], [0, 1], 'r--', linewidth=2, label='Random Classifier')
ax3.fill_between(fpr, tpr, alpha=0.2)
ax3.set_xlabel('False Positive Rate')
ax3.set_ylabel('True Positive Rate (Recall)')
ax3.set_title('Receiver Operating Characteristic (ROC)', fontweight='bold', fontsize=12)
ax3.legend()
ax3.grid(True, alpha=0.3)

# 4. OPTIMIZACI√ìN DE HIPERPAR√ÅMETROS (Grid Search)
print("\nüîß 4. OPTIMIZACI√ìN DE HIPERPAR√ÅMETROS")
print("="*70)

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 20, None],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=42),
    scoring='roc_auc',
    n_jobs=-1,
    verbose=0
)

print("Ejecutando Grid Search...")
grid_search.fit(X_cancer_scaled, y_cancer)

print(f"\nMejores hiperpar√°metros: {grid_search.best_params_}")
print(f"Mejor ROC-AUC (CV): {grid_search.best_score_:.4f}")

# Visualizar resultados de Grid Search (subset)
results_df = pd.DataFrame(grid_search.cv_results_)
top_10 = results_df.nsmallest(10, 'rank_test_score')[['params', 'mean_test_score', 'std_test_score']]

# Plot resultados top 10
top_10_sorted = top_10.sort_values('mean_test_score', ascending=True)
y_pos = np.arange(len(top_10_sorted))

ax4.barh(y_pos, top_10_sorted['mean_test_score'], xerr=top_10_sorted['std_test_score'], 
        alpha=0.8, color='green', edgecolor='black')
ax4.set_yticks(y_pos)
ax4.set_yticklabels([f"Config {i+1}" for i in range(len(top_10_sorted))], fontsize=9)
ax4.set_xlabel('ROC-AUC Score')
ax4.set_title('Top 10 Configuraciones (Grid Search)', fontweight='bold', fontsize=12)
ax4.axvline(x=grid_search.best_score_, color='red', linestyle='--', linewidth=2, label='Best')
ax4.legend()
ax4.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("‚úÖ PIPELINE COMPLETO")
print("="*70)
print("‚úîÔ∏è  Comparaci√≥n de estrategias de validaci√≥n")
print("‚úîÔ∏è  Evaluaci√≥n con m√∫ltiples m√©tricas")
print("‚úîÔ∏è  Matriz de confusi√≥n y curva ROC")
print("‚úîÔ∏è  Optimizaci√≥n de hiperpar√°metros con Grid Search")
print("\nüí° Mejores pr√°cticas aplicadas:")
print("   ‚Ä¢ Normalizaci√≥n de datos")
print("   ‚Ä¢ Stratified K-Fold para preservar distribuci√≥n de clases")
print("   ‚Ä¢ M√∫ltiples m√©tricas para evaluaci√≥n completa")
print("   ‚Ä¢ Cross-validation en optimizaci√≥n de hiperpar√°metros")

## 3. Conclusiones

### üìö Conceptos Clave
1. **Overfitting/Underfitting**: Balance entre complejidad y generalizaci√≥n
2. **Validaci√≥n Cruzada**: K-Fold, Stratified, Time Series Split
3. **M√©tricas**: Accuracy, Precision, Recall, F1, ROC-AUC, MSE, R¬≤
4. **Optimizaci√≥n**: Grid Search, Random Search, Nested CV

### ‚úÖ Mejores Pr√°cticas
- Usar Stratified K-Fold para datos desbalanceados
- Evaluar con m√∫ltiples m√©tricas (no solo accuracy)
- Nested CV para evaluaci√≥n imparcial con optimizaci√≥n de hiperpar√°metros
- Normalizar datos para modelos basados en distancia
- Time Series Split para datos temporales

### üéØ Ejercicios
1. Implementa Nested CV y compara con CV simple
2. Compara Grid Search vs Random Search en un problema complejo
3. Analiza trade-off Precision-Recall en clasificaci√≥n desbalanceada
4. Crea curvas de aprendizaje para detectar overfitting/underfitting
5. Implementa validaci√≥n temporal para series temporales

### üìñ Recursos
- Scikit-learn: Model Selection https://scikit-learn.org/stable/model_selection.html
- "The Elements of Statistical Learning" - Hastie et al.
- "Hands-On Machine Learning" - Aur√©lien G√©ron

---

**üéì Siguiente**: [Deep Learning](../04_Deep_learning/deep_learning.ipynb)