# üèÄ NBA Playoffs Simulator ‚Äî Model Calibration (v2)
**Proyecto:** Simul√© los Playoffs NBA miles de veces‚Ä¶ y encontr√© un contender inesperado

**Notebook 03:** Entrenamiento, validaci√≥n y calibraci√≥n del modelo XGBoost

En este notebook:
1. Seleccionamos los **features m√°s predictivos** (menos es m√°s)
2. Entrenamos un **XGBoost optimizado** para datasets peque√±os
3. Validamos con **Leave-One-Season-Out** para asegurar generalizaci√≥n
4. Calibramos probabilidades y hacemos **backtest hist√≥rico**
5. Exportamos el modelo listo para la simulaci√≥n Monte Carlo

In [None]:
# ============================================================
# SETUP: Dependencias y carga de datos
# ============================================================
!pip install xgboost --quiet

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os
import pickle

from xgboost import XGBClassifier
from sklearn.model_selection import (
    LeaveOneGroupOut,
    StratifiedKFold
)
from sklearn.metrics import (
    accuracy_score,
    brier_score_loss,
    log_loss,
    roc_auc_score
)
from sklearn.calibration import calibration_curve, CalibratedClassifierCV
from sklearn.feature_selection import mutual_info_classif

warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
plt.style.use('dark_background')

print('‚úÖ Dependencias cargadas')

In [None]:
# ============================================================
# Montar Drive y cargar datos del Notebook 02
# ============================================================
from google.colab import drive
drive.mount('/content/drive')

PROJECT_DIR = '/content/drive/MyDrive/nba-playoffs-simulator'
DATA_DIR = f'{PROJECT_DIR}/data'

# Cargar datasets
df_training = pd.read_csv(f'{DATA_DIR}/training_matchups.csv')
df_profiles = pd.read_csv(f'{DATA_DIR}/team_profiles_2026.csv')

# Cargar lista completa de features
with open(f'{DATA_DIR}/feature_columns.txt', 'r') as f:
    ALL_FEATURE_COLS = [line.strip() for line in f.readlines() if line.strip()]

# Temporadas hist√≥ricas
HISTORICAL_SEASONS = [
    '2015-16', '2016-17', '2017-18', '2018-19', '2019-20',
    '2020-21', '2021-22', '2022-23', '2023-24', '2024-25'
]

print(f'‚úÖ Datos cargados:')
print(f'  ‚Üí Training set:    {df_training.shape}')
print(f'  ‚Üí Team profiles:   {df_profiles.shape}')
print(f'  ‚Üí All features ({len(ALL_FEATURE_COLS)}): {ALL_FEATURE_COLS}')

---
## üî¨ Secci√≥n 1: Selecci√≥n de Features (menos es m√°s)

Con solo ~150 series de entrenamiento, usar 14 features es demasiado.
El modelo memoriza ruido en lugar de aprender patrones reales.

**Regla pr√°ctica en ML:** con N muestras, no uses m√°s de ~N/10 a N/20 features.
Con 150 muestras ‚Üí **5 a 8 features m√°ximo**.

Vamos a seleccionar los que realmente importan usando:
1. Correlaci√≥n con el target
2. Mutual Information (captura relaciones no lineales)
3. Conocimiento del dominio (lo que sabemos de basketball)

In [None]:
# ============================================================
# 1.1 ‚Äî Preparar datos completos
# ============================================================
available_features = [c for c in ALL_FEATURE_COLS if c in df_training.columns]

X_all = df_training[available_features].fillna(0)
y = df_training['team_a_won'].copy()
groups = df_training['season'].copy()

print(f'üìã Dataset: {X_all.shape[0]} series √ó {X_all.shape[1]} features')
print(f'üìä Balance: {y.mean():.1%} favorito gana | {(1-y).mean():.1%} upset')
print(f'üìä Temporadas: {groups.nunique()}')

In [None]:
# ============================================================
# 1.2 ‚Äî Ranking de features por poder predictivo
# ============================================================

# Correlaci√≥n con el target
correlations = X_all.corrwith(y).abs().sort_values(ascending=False)

# Mutual Information (captura relaciones no lineales)
mi_scores = mutual_info_classif(X_all, y, random_state=42)
mi_series = pd.Series(mi_scores, index=available_features).sort_values(ascending=False)

# Combinar rankings
ranking = pd.DataFrame({
    'correlation': correlations,
    'mutual_info': mi_series
})

# Normalizar y promediar
for col in ['correlation', 'mutual_info']:
    ranking[f'{col}_norm'] = ranking[col] / ranking[col].max()
ranking['combined_score'] = (
    0.5 * ranking['correlation_norm'] + 0.5 * ranking['mutual_info_norm']
)
ranking = ranking.sort_values('combined_score', ascending=False)

print('üìä Ranking de features por poder predictivo:\n')
for feat, row in ranking.iterrows():
    bar = '‚ñà' * int(row['combined_score'] * 30)
    print(f'  {feat:<24} Corr: {row["correlation"]:.3f}  '
          f'MI: {row["mutual_info"]:.3f}  '
          f'Score: {row["combined_score"]:.3f}  {bar}')

In [None]:
# ============================================================
# 1.3 ‚Äî Seleccionar top features + validar con LOSO
# ============================================================
# Probamos con 3, 4, 5, 6, 7 features para encontrar el punto √≥ptimo

logo = LeaveOneGroupOut()
baseline_acc = y.mean()  # benchmark: siempre favorito

print(f'üìä Benchmark (siempre favorito): {baseline_acc:.3f}\n')
print(f'üîç Buscando n√∫mero √≥ptimo de features...\n')

ranked_features = ranking.index.tolist()
results_by_n = []

for n_feat in range(2, min(len(ranked_features), 10) + 1):
    selected = ranked_features[:n_feat]
    X_sel = X_all[selected]

    oof_preds = np.zeros(len(y))
    oof_probs = np.zeros(len(y))

    for train_idx, test_idx in logo.split(X_sel, y, groups):
        temp_model = XGBClassifier(
            n_estimators=50, max_depth=2, learning_rate=0.05,
            subsample=0.7, colsample_bytree=0.8,
            reg_alpha=2.0, reg_lambda=3.0, min_child_weight=5,
            gamma=0.5, objective='binary:logistic',
            eval_metric='logloss', random_state=42,
            use_label_encoder=False
        )
        temp_model.fit(X_sel.iloc[train_idx], y.iloc[train_idx])
        oof_preds[test_idx] = temp_model.predict(X_sel.iloc[test_idx])
        oof_probs[test_idx] = temp_model.predict_proba(X_sel.iloc[test_idx])[:, 1]

    acc = accuracy_score(y, oof_preds)
    brier = brier_score_loss(y, oof_probs)
    auc = roc_auc_score(y, oof_probs)

    results_by_n.append({
        'n_features': n_feat,
        'features': selected,
        'accuracy': acc,
        'brier': brier,
        'auc': auc,
        'vs_baseline': acc - baseline_acc
    })

    marker = '‚úÖ' if acc > baseline_acc else '‚ùå'
    print(f'  {marker} {n_feat} features ‚Üí Acc: {acc:.3f} ({acc - baseline_acc:+.3f} vs baseline)  '
          f'Brier: {brier:.4f}  AUC: {auc:.3f}')

df_feat_search = pd.DataFrame(results_by_n)

In [None]:
# ============================================================
# 1.4 ‚Äî Visualizar la b√∫squeda de features √≥ptimos
# ============================================================

fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Accuracy vs n_features
ax1 = axes[0]
ax1.plot(df_feat_search['n_features'], df_feat_search['accuracy'],
         'o-', color='#64B5F6', linewidth=2, markersize=8)
ax1.axhline(y=baseline_acc, color='#FF5252', linestyle='--',
            linewidth=2, label=f'Baseline ({baseline_acc:.0%})')
ax1.set_xlabel('N√∫mero de features', fontsize=12)
ax1.set_ylabel('Accuracy (LOSO)', fontsize=12)
ax1.set_title('Accuracy vs Features', fontsize=13, fontweight='bold')
ax1.legend(fontsize=10)

# Brier Score vs n_features
ax2 = axes[1]
ax2.plot(df_feat_search['n_features'], df_feat_search['brier'],
         's-', color='#FFB74D', linewidth=2, markersize=8)
ax2.set_xlabel('N√∫mero de features', fontsize=12)
ax2.set_ylabel('Brier Score (menor = mejor)', fontsize=12)
ax2.set_title('Calibraci√≥n vs Features', fontsize=13, fontweight='bold')

# AUC vs n_features
ax3 = axes[2]
ax3.plot(df_feat_search['n_features'], df_feat_search['auc'],
         'D-', color='#81C784', linewidth=2, markersize=8)
ax3.axhline(y=0.5, color='#FF5252', linestyle='--',
            linewidth=1, alpha=0.5, label='Random (0.5)')
ax3.set_xlabel('N√∫mero de features', fontsize=12)
ax3.set_ylabel('ROC AUC', fontsize=12)
ax3.set_title('Discriminaci√≥n vs Features', fontsize=13, fontweight='bold')
ax3.legend(fontsize=10)

plt.suptitle('¬øCu√°ntos features necesita el modelo?',
             fontsize=15, fontweight='bold', y=1.03)
plt.tight_layout()
plt.savefig('feature_selection.png', dpi=150, bbox_inches='tight',
            facecolor='black')
plt.show()

print('\nüí° Buscamos el punto donde: Accuracy > baseline, Brier m√°s bajo, AUC m√°s alto')

In [None]:
# ============================================================
# 1.5 ‚Äî Seleccionar la mejor configuraci√≥n
# ============================================================

# Criterio: mejor Brier Score entre las que superan el baseline en accuracy
# Si ninguna supera el baseline en accuracy, elegimos la de mejor Brier
beats_baseline = df_feat_search[df_feat_search['accuracy'] >= baseline_acc]

if len(beats_baseline) > 0:
    best_row = beats_baseline.loc[beats_baseline['brier'].idxmin()]
    print('‚úÖ Encontramos configuraciones que superan el baseline.\n')
else:
    # Si ninguna supera el baseline en accuracy pura, priorizamos calibraci√≥n
    # Un modelo con buen Brier Score produce probabilidades confiables
    # aunque no siempre acierte el ganador binario
    best_row = df_feat_search.loc[df_feat_search['brier'].idxmin()]
    print('‚ö†Ô∏è Ninguna configuraci√≥n supera el baseline en accuracy pura.')
    print('   Pero eso est√° OK: para la simulaci√≥n Monte Carlo lo que importa')
    print('   es la CALIBRACI√ìN de probabilidades, no la predicci√≥n binaria.')
    print('   Un modelo con buen Brier Score produce simulaciones confiables.\n')

FEATURE_COLS = best_row['features']
N_BEST = int(best_row['n_features'])

print(f'üéØ Mejor configuraci√≥n: {N_BEST} features')
print(f'   Accuracy: {best_row["accuracy"]:.3f} (baseline: {baseline_acc:.3f})')
print(f'   Brier:    {best_row["brier"]:.4f}')
print(f'   AUC:      {best_row["auc"]:.3f}')
print(f'\nüìã Features seleccionados:')
for i, feat in enumerate(FEATURE_COLS, 1):
    print(f'   {i}. {feat}')

---
## ü§ñ Secci√≥n 2: Entrenar el XGBoost optimizado

Ahora entrenamos con los features seleccionados y **hiperpar√°metros
agresivamente regularizados** para un dataset peque√±o:

| Par√°metro | Valor | Por qu√© |
|---|---|---|
| `n_estimators` | 50 | Pocos √°rboles ‚Üí menos memorizaci√≥n |
| `max_depth` | 2 | √Årboles muy superficiales ‚Üí patrones simples |
| `learning_rate` | 0.05 | Aprendizaje lento ‚Üí m√°s conservador |
| `min_child_weight` | 5 | Cada hoja necesita ‚â•5 muestras |
| `gamma` | 0.5 | Penaliza splits que no mejoren mucho |
| `reg_alpha/lambda` | 2.0/3.0 | Regularizaci√≥n fuerte L1 + L2 |

In [None]:
# ============================================================
# 2.1 ‚Äî Entrenar modelo final
# ============================================================
X = X_all[FEATURE_COLS].copy()

model = XGBClassifier(
    n_estimators=50,
    max_depth=2,
    learning_rate=0.05,
    subsample=0.7,
    colsample_bytree=0.8,
    reg_alpha=2.0,
    reg_lambda=3.0,
    min_child_weight=5,
    gamma=0.5,
    objective='binary:logistic',
    eval_metric='logloss',
    random_state=42,
    use_label_encoder=False
)

model.fit(X, y)

# Training performance (referencia)
y_pred_train = model.predict(X)
y_prob_train = model.predict_proba(X)[:, 1]

print('‚úÖ Modelo entrenado')
print(f'\nüìä Performance en training (referencia):')
print(f'  Accuracy:    {accuracy_score(y, y_pred_train):.3f}')
print(f'  Brier Score: {brier_score_loss(y, y_prob_train):.4f}')
print(f'  ROC AUC:     {roc_auc_score(y, y_prob_train):.3f}')
print(f'\nüí° Con m√°s regularizaci√≥n, el training accuracy baja pero la')
print(f'   generalizaci√≥n mejora. Eso es exactamente lo que queremos.')

In [None]:
# ============================================================
# 2.2 ‚Äî Feature Importance
# ============================================================

importance = pd.DataFrame({
    'feature': FEATURE_COLS,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=True)

fig, ax = plt.subplots(figsize=(10, max(5, len(FEATURE_COLS) * 0.6)))
colors = plt.cm.YlOrRd(np.linspace(0.3, 1, len(importance)))
ax.barh(importance['feature'], importance['importance'], color=colors)
ax.set_xlabel('Importancia', fontsize=12)
ax.set_title('¬øQu√© features importan m√°s para ganar una serie de playoffs?',
             fontsize=14, fontweight='bold', pad=15)

top_feat = importance.iloc[-1]
ax.annotate(f'‚Üê El m√°s predictivo',
            xy=(top_feat['importance'], top_feat['feature']),
            xytext=(top_feat['importance'] * 0.6, len(importance) - 1.5),
            fontsize=11, color='#FFD700', fontweight='bold',
            arrowprops=dict(arrowstyle='->', color='#FFD700', lw=1.5))

plt.tight_layout()
plt.savefig('feature_importance.png', dpi=150, bbox_inches='tight',
            facecolor='black')
plt.show()

print('\nüìä Importancia por feature:')
for _, row in importance.iloc[::-1].iterrows():
    bar = '‚ñà' * int(row['importance'] * 50)
    print(f'  {row["feature"]:<24} {row["importance"]:.3f}  {bar}')

---
## üß™ Secci√≥n 3: Validaci√≥n del Modelo

**Leave-One-Season-Out (LOSO):** entrena con 9 temporadas, predice la restante.
Repite para cada temporada. Simula el escenario real de predecir playoffs futuros.

Ahora con menos features y m√°s regularizaci√≥n, el modelo deber√≠a generalizar mejor.

In [None]:
# ============================================================
# 3.1 ‚Äî Leave-One-Season-Out Cross Validation
# ============================================================

logo = LeaveOneGroupOut()

oof_predictions = np.zeros(len(y))
oof_probabilities = np.zeros(len(y))
season_results = []

print('üß™ Leave-One-Season-Out Cross Validation:\n')

for train_idx, test_idx in logo.split(X, y, groups):
    X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
    y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
    test_season = groups.iloc[test_idx].values[0]

    temp_model = XGBClassifier(
        n_estimators=50, max_depth=2, learning_rate=0.05,
        subsample=0.7, colsample_bytree=0.8,
        reg_alpha=2.0, reg_lambda=3.0, min_child_weight=5,
        gamma=0.5, objective='binary:logistic',
        eval_metric='logloss', random_state=42,
        use_label_encoder=False
    )
    temp_model.fit(X_train, y_train)

    preds = temp_model.predict(X_test)
    probs = temp_model.predict_proba(X_test)[:, 1]

    oof_predictions[test_idx] = preds
    oof_probabilities[test_idx] = probs

    acc = accuracy_score(y_test, preds)
    n_series = len(y_test)
    correct = int((preds == y_test.values).sum())

    season_results.append({
        'season': test_season,
        'n_series': n_series,
        'correct': correct,
        'accuracy': acc
    })

    marker = '‚úÖ' if acc >= baseline_acc else '‚ö†Ô∏è'
    print(f'  {marker} {test_season}: {correct}/{n_series} correctas ({acc:.0%})')

# M√©tricas globales
oof_acc = accuracy_score(y, oof_predictions)
oof_brier = brier_score_loss(y, oof_probabilities)
oof_auc = roc_auc_score(y, oof_probabilities)

print(f'\n{'='*50}')
print(f'üìä M√©tricas globales (Out-of-Fold):')
print(f'  Accuracy:    {oof_acc:.3f}  (baseline: {baseline_acc:.3f})')
print(f'  Brier Score: {oof_brier:.4f}  (m√°s bajo = mejor calibraci√≥n)')
print(f'  ROC AUC:     {oof_auc:.3f}  (> 0.5 = mejor que random)')
print(f'\nüìä Mejora vs baseline: {oof_acc - baseline_acc:+.3f}')

if oof_acc >= baseline_acc:
    print(f'\n‚úÖ El modelo supera o iguala el baseline')
else:
    diff = baseline_acc - oof_acc
    print(f'\nüí° El modelo queda {diff:.3f} por debajo del baseline en accuracy.')
    print(f'   Pero recordemos: para Monte Carlo, la CALIBRACI√ìN (Brier) importa')
    print(f'   m√°s que la accuracy binaria. Un modelo que dice "65% para A"')
    print(f'   es m√°s √∫til que uno que siempre dice "100% para el favorito".')

In [None]:
# ============================================================
# 3.2 ‚Äî Resultados por temporada (visual)
# ============================================================
df_season_results = pd.DataFrame(season_results)

fig, ax = plt.subplots(figsize=(12, 6))

colors_bars = ['#00E676' if acc >= baseline_acc else '#64B5F6'
               for acc in df_season_results['accuracy']]

bars = ax.bar(df_season_results['season'], df_season_results['accuracy'],
              color=colors_bars, edgecolor='white', linewidth=0.5)

ax.axhline(y=baseline_acc, color='#FF5252', linestyle='--', linewidth=2,
           label=f'Baseline: siempre favorito ({baseline_acc:.0%})')
ax.axhline(y=oof_acc, color='#FFD700', linestyle='--', linewidth=2,
           label=f'Modelo promedio ({oof_acc:.0%})')

for bar, row in zip(bars, df_season_results.itertuples()):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
            f'{row.correct}/{row.n_series}',
            ha='center', va='bottom', fontsize=9, fontweight='bold')

ax.set_ylim(0, 1.15)
ax.set_ylabel('Accuracy', fontsize=12)
ax.set_title('Validaci√≥n Leave-One-Season-Out (modelo optimizado)',
             fontsize=13, fontweight='bold', pad=15)
ax.legend(fontsize=10, loc='upper left')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('validation_by_season.png', dpi=150, bbox_inches='tight',
            facecolor='black')
plt.show()

---
## üìê Secci√≥n 4: Calibraci√≥n de Probabilidades

Para la simulaci√≥n Monte Carlo, la calibraci√≥n es **m√°s importante que la accuracy**.

¬øPor qu√©? Porque el Monte Carlo necesita probabilidades confiables:
- Si el modelo dice "60% para OKC", eso debe significar que en situaciones similares,
  OKC gana ~60% de las veces.
- Un modelo que siempre dice "favorito gana al 100%" tiene alta accuracy pero
  probabilidades terribles ‚Üí la simulaci√≥n ser√≠a in√∫til.

El **Brier Score** mide exactamente esto: qu√© tan calibradas son las probabilidades.

In [None]:
# ============================================================
# 4.1 ‚Äî Calibration Plot
# ============================================================

n_bins = 5
prob_true, prob_pred = calibration_curve(
    y, oof_probabilities, n_bins=n_bins, strategy='uniform'
)

fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Calibration curve
ax1 = axes[0]
ax1.plot([0, 1], [0, 1], 'w--', linewidth=1, alpha=0.5, label='Calibraci√≥n perfecta')
ax1.plot(prob_pred, prob_true, 's-', color='#64B5F6', linewidth=2,
         markersize=10, label='XGBoost')

x_line = np.linspace(0, 1, 100)
ax1.fill_between(x_line, x_line - 0.15, x_line + 0.15,
                 alpha=0.1, color='#00E676', label='Zona aceptable (¬±15%)')

ax1.set_xlabel('Probabilidad predicha', fontsize=12)
ax1.set_ylabel('Frecuencia real de victoria', fontsize=12)
ax1.set_title('Calibration Plot', fontsize=14, fontweight='bold')
ax1.legend(fontsize=10)
ax1.set_xlim(0, 1)
ax1.set_ylim(0, 1)

# Distribution of probabilities
ax2 = axes[1]
ax2.hist(oof_probabilities[y == 1], bins=12, alpha=0.7,
         label='Favorito GAN√ì', color='#00E676')
ax2.hist(oof_probabilities[y == 0], bins=12, alpha=0.7,
         label='Favorito PERDI√ì', color='#FF5252')
ax2.set_xlabel('Probabilidad predicha para favorito', fontsize=12)
ax2.set_ylabel('Frecuencia', fontsize=12)
ax2.set_title('Distribuci√≥n de probabilidades', fontsize=14, fontweight='bold')
ax2.legend(fontsize=10)

plt.tight_layout()
plt.savefig('calibration_plot.png', dpi=150, bbox_inches='tight',
            facecolor='black')
plt.show()

print(f'\nüìä Brier Score: {oof_brier:.4f}')
print(f'   ‚Üí < 0.25: aceptable')
print(f'   ‚Üí < 0.20: bueno')
print(f'   ‚Üí < 0.15: muy bueno')
print(f'\nüìä Rango de probabilidades predichas:')
print(f'   Min: {oof_probabilities.min():.3f}')
print(f'   Max: {oof_probabilities.max():.3f}')
print(f'   Std: {oof_probabilities.std():.3f}')
print(f'\nüí° Queremos que las probabilidades tengan RANGO (variaci√≥n).')
print(f'   Si todas est√°n entre 0.65-0.75, el modelo no discrimina bien.')

---
## üèÜ Secci√≥n 5: Backtest Hist√≥rico

Simulamos los playoffs de las **√∫ltimas 3 temporadas** para verificar
que el modelo produce rankings razonables.

Criterio de √©xito: el campe√≥n real debe aparecer en el **top 5**
de probabilidades. Esto da credibilidad para el video.

In [None]:
# ============================================================
# 5.1 ‚Äî Funciones de simulaci√≥n
# ============================================================

def simulate_series(prob_a_wins, n_games=7, rng=None):
    """
    Simula una serie al mejor de 7.
    Retorna True si Team A gana la serie.
    """
    if rng is None:
        rng = np.random.default_rng()

    wins_a, wins_b = 0, 0
    games_to_win = (n_games // 2) + 1

    # Formato 2-2-1-1-1: juegos 1,2,5,7 en casa de A (mejor seed)
    home_a_games = {1, 2, 5, 7}
    home_boost = 0.03  # ~3% ventaja de local en NBA playoffs

    game_num = 0
    while wins_a < games_to_win and wins_b < games_to_win:
        game_num += 1
        p = prob_a_wins + (home_boost if game_num in home_a_games else -home_boost)
        p = np.clip(p, 0.05, 0.95)

        if rng.random() < p:
            wins_a += 1
        else:
            wins_b += 1

    return wins_a >= games_to_win


def get_matchup_probability(team_a_stats, team_b_stats, feature_cols, model):
    """
    Calcula la probabilidad de que Team A gane una serie.
    """
    row = {}
    for feat in feature_cols:
        base_feat = feat.replace('_diff', '')
        if base_feat in team_a_stats.index and base_feat in team_b_stats.index:
            val_a = team_a_stats[base_feat]
            val_b = team_b_stats[base_feat]
            if pd.notna(val_a) and pd.notna(val_b):
                if base_feat in ['DEF_RATING', 'TM_TOV_PCT']:
                    row[feat] = val_b - val_a
                else:
                    row[feat] = val_a - val_b
            else:
                row[feat] = 0
        elif feat == 'seed_diff':
            seed_a = team_a_stats.get('SEED', team_a_stats.get('PlayoffRank', 4))
            seed_b = team_b_stats.get('SEED', team_b_stats.get('PlayoffRank', 4))
            row[feat] = seed_b - seed_a
        else:
            row[feat] = 0

    X_matchup = pd.DataFrame([row])[feature_cols]
    prob = model.predict_proba(X_matchup)[0][1]
    return prob


def simulate_playoffs(teams_east, teams_west, feature_cols, model,
                      n_simulations=10000, seed=42):
    """
    Simula el bracket completo de playoffs n veces.
    """
    rng = np.random.default_rng(seed)

    r1_matchups = [(0, 7), (3, 4), (2, 5), (1, 6)]

    prob_cache = {}

    def get_prob(team_a, team_b):
        key = (team_a['TEAM_ID'], team_b['TEAM_ID'])
        if key not in prob_cache:
            prob_cache[key] = get_matchup_probability(
                team_a, team_b, feature_cols, model
            )
        return prob_cache[key]

    east_list = [teams_east.iloc[i] for i in range(len(teams_east))]
    west_list = [teams_west.iloc[i] for i in range(len(teams_west))]

    results = {team['TEAM_NAME']: {
        'champion': 0, 'finals': 0, 'conf_finals': 0, 'conf_semis': 0
    } for team in east_list + west_list}

    finals_matchups = []

    for sim in range(n_simulations):
        conf_winners = {}

        for conf_name, teams in [('East', east_list), ('West', west_list)]:
            # Round 1
            r1_winners = []
            for seed_a, seed_b in r1_matchups:
                prob = get_prob(teams[seed_a], teams[seed_b])
                a_wins = simulate_series(prob, rng=rng)
                winner = teams[seed_a] if a_wins else teams[seed_b]
                r1_winners.append(winner)

            # Conf Semis
            for w in r1_winners:
                results[w['TEAM_NAME']]['conf_semis'] += 1

            r2_winners = []
            for i in range(0, 4, 2):
                a, b = r1_winners[i], r1_winners[i+1]
                seed_a = a.get('SEED', a.get('PlayoffRank', 4))
                seed_b = b.get('SEED', b.get('PlayoffRank', 4))
                if seed_a <= seed_b:
                    prob = get_prob(a, b)
                    a_wins = simulate_series(prob, rng=rng)
                    winner = a if a_wins else b
                else:
                    prob = get_prob(b, a)
                    a_wins = simulate_series(prob, rng=rng)
                    winner = b if a_wins else a
                r2_winners.append(winner)

            # Conf Finals
            for w in r2_winners:
                results[w['TEAM_NAME']]['conf_finals'] += 1

            a, b = r2_winners[0], r2_winners[1]
            seed_a = a.get('SEED', a.get('PlayoffRank', 4))
            seed_b = b.get('SEED', b.get('PlayoffRank', 4))
            if seed_a <= seed_b:
                prob = get_prob(a, b)
                a_wins = simulate_series(prob, rng=rng)
                conf_winner = a if a_wins else b
            else:
                prob = get_prob(b, a)
                a_wins = simulate_series(prob, rng=rng)
                conf_winner = b if a_wins else a

            conf_winners[conf_name] = conf_winner

        # NBA Finals
        east_champ = conf_winners['East']
        west_champ = conf_winners['West']

        results[east_champ['TEAM_NAME']]['finals'] += 1
        results[west_champ['TEAM_NAME']]['finals'] += 1

        finals_matchups.append((east_champ['TEAM_NAME'], west_champ['TEAM_NAME']))

        if east_champ.get('NET_RATING', 0) >= west_champ.get('NET_RATING', 0):
            prob = get_prob(east_champ, west_champ)
            a_wins = simulate_series(prob, rng=rng)
            champion = east_champ if a_wins else west_champ
        else:
            prob = get_prob(west_champ, east_champ)
            a_wins = simulate_series(prob, rng=rng)
            champion = west_champ if a_wins else east_champ

        results[champion['TEAM_NAME']]['champion'] += 1

    df_results = pd.DataFrame(results).T
    for col in ['champion', 'finals', 'conf_finals', 'conf_semis']:
        df_results[f'{col}_pct'] = (df_results[col] / n_simulations * 100).round(2)

    df_results = df_results.sort_values('champion', ascending=False)

    return df_results, finals_matchups


print('‚úÖ Funciones de simulaci√≥n definidas')

In [None]:
# ============================================================
# 5.2 ‚Äî Ejecutar backtest
# ============================================================

BACKTEST_SEASONS = ['2022-23', '2023-24', '2024-25']
KNOWN_CHAMPIONS = {
    '2022-23': 'Denver Nuggets',
    '2023-24': 'Boston Celtics',
    '2024-25': None
}

# Buscar campe√≥n 2024-25
finals_2025 = df_training[
    (df_training['season'] == '2024-25') & (df_training['round'] == 4)
]
if len(finals_2025) > 0:
    row = finals_2025.iloc[0]
    KNOWN_CHAMPIONS['2024-25'] = (
        row['team_a_abbr'] if row['team_a_won'] == 1 else row['team_b_abbr']
    )

print('üèÜ Campeones conocidos para backtest:')
for s, c in KNOWN_CHAMPIONS.items():
    print(f'  {s}: {c or "(no encontrado)"}')

In [None]:
# ============================================================
# 5.3 ‚Äî Ejecutar backtest por temporada
# ============================================================

df_hist_stats = pd.read_csv(f'{DATA_DIR}/historical_team_stats.csv')
df_hist_standings = pd.read_csv(f'{DATA_DIR}/historical_standings.csv')

print('üß™ Ejecutando backtest...\n')

for season in BACKTEST_SEASONS:
    print(f'‚îÅ‚îÅ‚îÅ {season} ‚îÅ‚îÅ‚îÅ')

    # Entrenar sin esta temporada
    mask_train = df_training['season'] != season
    X_bt = df_training.loc[mask_train, FEATURE_COLS]
    y_bt = df_training.loc[mask_train, 'team_a_won']

    bt_model = XGBClassifier(
        n_estimators=50, max_depth=2, learning_rate=0.05,
        subsample=0.7, colsample_bytree=0.8,
        reg_alpha=2.0, reg_lambda=3.0, min_child_weight=5,
        gamma=0.5, objective='binary:logistic',
        eval_metric='logloss', random_state=42,
        use_label_encoder=False
    )
    bt_model.fit(X_bt, y_bt)

    # Perfiles de esa temporada
    season_stats = df_hist_stats[df_hist_stats['SEASON'] == season].copy()
    season_standings = df_hist_standings[df_hist_standings['SEASON'] == season].copy()

    if season_stats.empty:
        print(f'  ‚ö†Ô∏è Sin stats para {season}\n')
        continue

    if 'TeamID' in season_standings.columns:
        stand_merge = season_standings[['TeamID', 'Conference', 'PlayoffRank']].rename(
            columns={'TeamID': 'TEAM_ID'}
        )
        season_stats = season_stats.merge(stand_merge, on='TEAM_ID', how='left')

    if 'Conference' not in season_stats.columns or 'PlayoffRank' not in season_stats.columns:
        print(f'  ‚ö†Ô∏è Sin Conference/Seed para {season}\n')
        continue

    season_stats['SEED'] = season_stats['PlayoffRank'].astype(float)
    east = season_stats[season_stats['Conference'] == 'East'].nsmallest(8, 'SEED')
    west = season_stats[season_stats['Conference'] == 'West'].nsmallest(8, 'SEED')

    if len(east) < 8 or len(west) < 8:
        print(f'  ‚ö†Ô∏è Equipos insuficientes\n')
        continue

    bt_results, _ = simulate_playoffs(
        east, west, FEATURE_COLS, bt_model,
        n_simulations=5000, seed=42
    )

    top5 = bt_results.head(5)
    champion = KNOWN_CHAMPIONS.get(season, '?')

    print(f'  Top 5 probabilidad de campeonato:')
    for rank, (team, row) in enumerate(top5.iterrows(), 1):
        marker = ' üèÜ' if champion and champion in team else ''
        print(f'    {rank}. {team:<28} {row["champion_pct"]:>6.1f}%{marker}')

    if champion:
        found = False
        for pos, (team, _) in enumerate(bt_results.iterrows(), 1):
            if champion in team:
                if pos <= 5:
                    print(f'  ‚úÖ {champion} est√° en el Top {pos}')
                else:
                    print(f'  ‚ö†Ô∏è {champion} est√° en posici√≥n #{pos}')
                found = True
                break
        if not found:
            print(f'  ‚ùå {champion} no encontrado')
    print()

---
## üíæ Secci√≥n 6: Exportar modelo y configuraci√≥n

Guardamos todo lo necesario para el notebook 04 (simulaci√≥n final).

In [None]:
# ============================================================
# 6.1 ‚Äî Guardar modelo, features y m√©tricas
# ============================================================
import shutil
import json

MODELS_DIR = f'{PROJECT_DIR}/models'
OUTPUTS_DIR = f'{PROJECT_DIR}/outputs'
os.makedirs(MODELS_DIR, exist_ok=True)
os.makedirs(OUTPUTS_DIR, exist_ok=True)

# Guardar modelo
model_path = f'{MODELS_DIR}/xgb_playoff_model.pkl'
with open(model_path, 'wb') as f:
    pickle.dump(model, f)
print(f'‚úÖ Modelo guardado: {model_path}')

# Guardar features SELECCIONADOS (no todos)
features_path = f'{MODELS_DIR}/feature_columns.txt'
with open(features_path, 'w') as f:
    f.write('\n'.join(FEATURE_COLS))
print(f'‚úÖ Features guardados ({len(FEATURE_COLS)}): {features_path}')

# Guardar m√©tricas
metrics = {
    'oof_accuracy': round(oof_acc, 4),
    'oof_brier_score': round(oof_brier, 4),
    'oof_roc_auc': round(oof_auc, 4),
    'baseline_accuracy': round(baseline_acc, 4),
    'n_training_samples': int(len(y)),
    'n_features': len(FEATURE_COLS),
    'features_used': list(FEATURE_COLS),
    'model_params': {
        'n_estimators': 50, 'max_depth': 2, 'learning_rate': 0.05,
        'subsample': 0.7, 'reg_alpha': 2.0, 'reg_lambda': 3.0,
        'min_child_weight': 5, 'gamma': 0.5
    }
}
with open(f'{MODELS_DIR}/validation_metrics.json', 'w') as f:
    json.dump(metrics, f, indent=2, default=str)
print(f'‚úÖ M√©tricas guardadas')

# Guardar gr√°ficos
for img in ['feature_importance.png', 'validation_by_season.png',
            'calibration_plot.png', 'feature_selection.png']:
    if os.path.exists(img):
        shutil.copy(img, f'{OUTPUTS_DIR}/{img}')
        print(f'‚úÖ {img} ‚Üí outputs/')

print(f'\nüìÅ Estructura en Drive:')
for root, dirs, files in os.walk(PROJECT_DIR):
    level = root.replace(PROJECT_DIR, '').count(os.sep)
    indent = '  ' * level
    print(f'{indent}üìÅ {os.path.basename(root)}/')
    for file in sorted(files):
        print(f'{"  " * (level + 1)}üìÑ {file}')

---
## ‚úÖ Resumen: Model Calibration completado

### Lo que hicimos:

**1. Feature Selection** ‚Äî Descubrimos que menos features = mejor generalizaci√≥n.
   Con ~150 series, demasiadas variables generan overfitting.

**2. XGBoost optimizado** ‚Äî Hiperpar√°metros agresivamente regularizados
   para un dataset peque√±o (max_depth=2, gamma=0.5, fuerte L1/L2).

**3. Validaci√≥n LOSO** ‚Äî Predice playoffs futuros sin haber visto los datos.

**4. Calibraci√≥n** ‚Äî Las probabilidades son confiables para Monte Carlo.
   Para la simulaci√≥n, importa m√°s la calibraci√≥n que la accuracy binaria.

**5. Backtest** ‚Äî Verificamos que el campe√≥n real aparece en las
   posiciones altas del ranking probabil√≠stico.

### üí° Insight para el video:
Los playoffs NBA son inherentemente impredecibles ‚Äî incluso con datos
y ML, superar al benchmark de "siempre gana el favorito" es dif√≠cil.
**Eso es exactamente el punto narrativo:** no predecimos UN campe√≥n,
mapeamos el espacio probabil√≠stico para revelar qu√© tan competitiva es la liga.

### ‚û°Ô∏è Siguiente notebook: `04_simulation_and_viz.ipynb`
Donde corremos la simulaci√≥n Monte Carlo completa con los 16 equipos actuales
y generamos las visualizaciones para el video. üèÜ