# Pipeline Completo: Modelo de IA para Optimizar Ratio de Sharpe

Este notebook ejecuta todo el proceso desde la descarga de datos hasta el backtesting final.

## Pasos:
1. Descargar datos de ETFs (20 a√±os)
2. Generar features de Machine Learning
3. Entrenar modelos de predicci√≥n
4. Optimizar portafolio
5. Backtesting del sistema completo
6. Visualizaciones de resultados

## Paso 1: Instalaci√≥n de Dependencias

In [1]:
# Instalar dependencias si es necesario
# Descomenta la siguiente l√≠nea si necesitas instalar paquetes
# !pip install -r requirements.txt

## Paso 2: Importar Librer√≠as

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import os
import warnings
import sys
warnings.filterwarnings('ignore')

# Configurar estilo de gr√°ficos
try:
    plt.style.use('seaborn-v0_8-darkgrid')
except:
    try:
        plt.style.use('seaborn-darkgrid')
    except:
        plt.style.use('ggplot')
sns.set_palette("husl")

print("‚úÖ Librer√≠as importadas correctamente")

‚úÖ Librer√≠as importadas correctamente


## Paso 3: Descargar Datos de ETFs

In [3]:
# Importar funciones de descarga
import importlib.util

def run_script_as_module(script_path):
    """Ejecuta un script Python como m√≥dulo"""
    spec = importlib.util.spec_from_file_location("module", script_path)
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    if hasattr(module, 'main'):
        return module.main()
    return None

print("=" * 80)
print("PASO 3: DESCARGANDO DATOS DE ETFs")
print("=" * 80)

# Ejecutar descarga
run_script_as_module('download_etf_data.py')

PASO 3: DESCARGANDO DATOS DE ETFs
DESCARGA DE DATOS DE ETFs - YAHOO FINANCE

Descargando datos de 17 ETFs desde 2006-02-01 hasta 2026-01-27...
--------------------------------------------------------------------------------
Descargando SPY - S&P 500 (EEUU)...
  [OK] SPY: 5,026 d√≠as de retornos calculados
    Per√≠odo: 2006-02-02 a 2026-01-26
    A√±os disponibles: 20.0 a√±os (solicitados: 20 a√±os)
    Retorno promedio diario: 0.0484%
    Volatilidad diaria: 1.2217%

Descargando QQQ - Nasdaq 100 (Tecnolog√≠a EEUU)...
  [OK] QQQ: 5,026 d√≠as de retornos calculados
    Per√≠odo: 2006-02-02 a 2026-01-26
    A√±os disponibles: 20.0 a√±os (solicitados: 20 a√±os)
    Retorno promedio diario: 0.0664%
    Volatilidad diaria: 1.3841%

Descargando IWM - Russell 2000 (Peque√±as Empresas EEUU)...
  [OK] IWM: 5,026 d√≠as de retornos calculados
    Per√≠odo: 2006-02-02 a 2026-01-26
    A√±os disponibles: 20.0 a√±os (solicitados: 20 a√±os)
    Retorno promedio diario: 0.0428%
    Volatilidad diaria:

({'SPY': Date
  2006-02-02 00:00:00-05:00   -1.160508
  2006-02-03 00:00:00-05:00   -0.496452
  2006-02-06 00:00:00-05:00    0.261311
  2006-02-07 00:00:00-05:00   -0.884667
  2006-02-08 00:00:00-05:00    0.908496
                                 ...   
  2026-01-20 00:00:00-05:00   -2.035676
  2026-01-21 00:00:00-05:00    1.154108
  2026-01-22 00:00:00-05:00    0.522316
  2026-01-23 00:00:00-05:00    0.036286
  2026-01-26 00:00:00-05:00    0.507813
  Name: Returns, Length: 5026, dtype: float64,
  'QQQ': Date
  2006-02-02 00:00:00-05:00   -1.708196
  2006-02-03 00:00:00-05:00   -1.231050
  2006-02-06 00:00:00-05:00   -0.268747
  2006-02-07 00:00:00-05:00   -0.441033
  2006-02-08 00:00:00-05:00    1.156781
                                 ...   
  2026-01-20 00:00:00-05:00   -2.124716
  2026-01-21 00:00:00-05:00    1.351846
  2026-01-22 00:00:00-05:00    0.726939
  2026-01-23 00:00:00-05:00    0.315736
  2026-01-26 00:00:00-05:00    0.440013
  Name: Returns, Length: 5026, dtype: float64

In [4]:
# Verificar que los datos se descargaron correctamente
data_dir = 'data'
returns_file = os.path.join(data_dir, 'etf_returns_dict.pkl')

if os.path.exists(returns_file):
    with open(returns_file, 'rb') as f:
        returns_dict = pickle.load(f)
    
    print(f"\n‚úÖ Datos descargados: {len(returns_dict)} ETFs")
    print("\nETFs disponibles:")
    for symbol, returns in returns_dict.items():
        years = (returns.index.max() - returns.index.min()).days / 365.25
        print(f"  - {symbol}: {len(returns):,} d√≠as ({years:.1f} a√±os)")
        print(f"    Per√≠odo: {returns.index.min().date()} a {returns.index.max().date()}")
else:
    print("‚ùå Error: No se encontraron datos descargados")


‚úÖ Datos descargados: 17 ETFs

ETFs disponibles:
  - SPY: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - QQQ: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - IWM: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - EFA: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - EEM: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - VGK: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - VPL: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - LQD: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - HYG: 4,728 d√≠as (18.8 a√±os)
    Per√≠odo: 2007-04-12 a 2026-01-26
  - TLT: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - AGG: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - SHY: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - GLD: 5,026 d√≠as (20.0 a√±os)
    Per√≠odo: 2006-02-02 a 2026-01-26
  - SLV: 4,96

## Paso 4: Generar Features de Machine Learning

In [5]:
print("=" * 80)
print("PASO 4: GENERANDO FEATURES DE MACHINE LEARNING")
print("=" * 80)

# Ejecutar generaci√≥n de features
run_script_as_module('generate_ml_features.py')

PASO 4: GENERANDO FEATURES DE MACHINE LEARNING
GENERACI√ìN DE FEATURES PARA ML - MODELO DE DECISI√ìN GEOGR√ÅFICA

1. Cargando datos de retornos...
   [OK] 17 ETFs cargados

2. Descargando datos adicionales (VIX, DXY, EUR/USD, Yield Curve, Credit Spreads, FRED)...

DESCARGANDO DATOS ADICIONALES

1. Descargando VIX (Volatilidad)...
   [OK] VIX: 5027 observaciones

2. Descargando DXY (Dollar Index)...
   [OK] DXY: 5030 observaciones

3. Descargando EUR/USD...
  [OK] EURUSD=X: 5182 dias de retornos

4. Descargando Yield Curve (10Y Treasury)...
   [OK] TNX (10Y): 5024 observaciones
   [OK] Yield Spread: 5024 observaciones

5. Calculando Credit Spread (HYG - LQD)...
   [OK] Credit Spread (HYG-LQD): 4728 observaciones

6. Intentando descargar indicadores de FRED...
   [INFO] FRED_API_KEY no configurada. Para usar indicadores de FRED:
         1. Obt√©n una API key gratuita en: https://fred.stlouisfed.org/docs/api/api_key.html
         2. Configura la variable de entorno: $env:FRED_API_KEY='tu

KeyboardInterrupt: 

In [None]:
# Verificar que el dataset ML se gener√≥ correctamente
ml_dataset_file = os.path.join(data_dir, 'ml_dataset.pkl')

if os.path.exists(ml_dataset_file):
    with open(ml_dataset_file, 'rb') as f:
        ml_dataset = pickle.load(f)
    
    print(f"\n‚úÖ Dataset ML generado: {ml_dataset.shape[0]:,} filas, {ml_dataset.shape[1]} columnas")
    print(f"\nRango de fechas: {ml_dataset.index.min()} a {ml_dataset.index.max()}")
    
    # Separar features y targets
    feature_cols = [col for col in ml_dataset.columns if not col.startswith('target_')]
    target_cols = [col for col in ml_dataset.columns if col.startswith('target_')]
    
    print(f"\nFeatures: {len(feature_cols)}")
    print(f"Targets: {len(target_cols)}")
    
    # Mostrar algunas features
    print("\nPrimeras 10 features:")
    for col in feature_cols[:10]:
        print(f"  - {col}")
    
    print("\nTargets:")
    for col in target_cols:
        print(f"  - {col}")
        
    # Mostrar estad√≠sticas b√°sicas
    print("\nEstad√≠sticas del dataset:")
    print(f"  Valores faltantes: {ml_dataset.isna().sum().sum():,} ({ml_dataset.isna().sum().sum() / (ml_dataset.shape[0] * ml_dataset.shape[1]) * 100:.2f}%)")
else:
    print("‚ùå Error: No se encontr√≥ el dataset ML")

## Paso 5: Entrenar Modelos de Predicci√≥n

In [None]:
print("=" * 80)
print("PASO 5: ENTRENANDO MODELOS DE PREDICCI√ìN")
print("=" * 80)
print("\n[NOTA] Este paso puede tardar varios minutos...")

# Ejecutar entrenamiento
run_script_as_module('train_sharpe_predictor.py')

In [None]:
# Verificar que los modelos se entrenaron correctamente
models_dir = 'models'

if os.path.exists(models_dir):
    model_files = [f for f in os.listdir(models_dir) if f.startswith('sharpe_predictor_') and f.endswith('.pkl')]
    ensemble_files = [f for f in os.listdir(models_dir) if f.startswith('ensemble_') and f.endswith('.pkl')]
    
    print(f"\n‚úÖ Modelos entrenados:")
    print(f"  Modelos individuales: {len(model_files)}")
    print(f"  Ensembles: {len(ensemble_files)}")
    
    print("\nModelos por geograf√≠a:")
    geos_models = {}
    for model_file in model_files:
        parts = model_file.replace('sharpe_predictor_', '').replace('.pkl', '').split('_')
        geo = parts[0]
        model_type = '_'.join(parts[1:]) if len(parts) > 1 else 'unknown'
        if geo not in geos_models:
            geos_models[geo] = []
        geos_models[geo].append(model_type)
    
    for geo, model_types in geos_models.items():
        print(f"  - {geo}: {', '.join(model_types)}")
    
    # Cargar m√©tricas si est√°n disponibles
    import json
    metrics_files = [f for f in os.listdir(models_dir) if f.startswith('metrics_') and f.endswith('.json')]
    
    if len(metrics_files) > 0:
        print("\nM√©tricas de modelos (R¬≤ de validaci√≥n):")
        for metrics_file in sorted(metrics_files):
            geo = metrics_file.replace('metrics_', '').replace('.json', '')
            with open(os.path.join(models_dir, metrics_file), 'r') as f:
                metrics = json.load(f)
            r2 = metrics.get('best_r2_val', 0)
            best_model = metrics.get('best_model', 'unknown')
            print(f"  - {geo}: R¬≤ = {r2:.4f} (mejor modelo: {best_model})")
else:
    print("‚ùå Error: No se encontraron modelos entrenados")

## Paso 6: Optimizar Portafolio (Predicci√≥n Actual)

In [None]:
print("=" * 80)
print("PASO 6: OPTIMIZANDO PORTAFOLIO")
print("=" * 80)

# Ejecutar optimizaci√≥n
run_script_as_module('optimize_portfolio.py')

In [None]:
# Visualizar pesos optimizados
weights_file = os.path.join(data_dir, 'portfolio_weights_latest.pkl')

if os.path.exists(weights_file):
    with open(weights_file, 'rb') as f:
        results = pickle.load(f)
    
    weights = results['weights']
    predictions = results['predictions']
    
    print("\n‚úÖ Pesos optimizados del portafolio:")
    
    # Crear gr√°fico de barras
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # Gr√°fico 1: Pesos
    ax1 = axes[0]
    geos = list(weights.keys())
    weights_values = [weights[geo] * 100 for geo in geos]
    colors = plt.cm.viridis(np.linspace(0, 1, len(geos)))
    
    bars = ax1.bar(geos, weights_values, color=colors)
    ax1.set_title('Asignaciones Optimizadas del Portafolio', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Peso (%)', fontsize=12)
    ax1.set_xlabel('Geograf√≠a', fontsize=12)
    ax1.grid(True, alpha=0.3, axis='y')
    
    # Agregar valores en las barras
    for bar, val in zip(bars, weights_values):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height,
                f'{val:.1f}%', ha='center', va='bottom', fontweight='bold')
    
    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    # Gr√°fico 2: Predicciones de Sharpe
    ax2 = axes[1]
    pred_geos = list(predictions.keys())
    pred_values = [predictions[geo] for geo in pred_geos]
    
    bars2 = ax2.bar(pred_geos, pred_values, color=colors[:len(pred_geos)])
    ax2.axhline(y=0, color='red', linestyle='--', alpha=0.5, label='Neutro')
    ax2.set_title('Predicciones de Sharpe Ratio Futuro (20 d√≠as)', fontsize=14, fontweight='bold')
    ax2.set_ylabel('Sharpe Ratio Predicho', fontsize=12)
    ax2.set_xlabel('Geograf√≠a', fontsize=12)
    ax2.grid(True, alpha=0.3, axis='y')
    ax2.legend()
    
    # Agregar valores
    for bar, val in zip(bars2, pred_values):
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height,
                f'{val:.3f}', ha='center', va='bottom' if val > 0 else 'top', fontweight='bold')
    
    plt.setp(ax2.xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    plt.tight_layout()
    os.makedirs(os.path.join(data_dir, 'visualizations'), exist_ok=True)
    plt.savefig(os.path.join(data_dir, 'visualizations', 'portfolio_optimization.png'), dpi=300, bbox_inches='tight')
    plt.show()
    
    print(f"\n[OK] Gr√°fico guardado en {data_dir}/visualizations/portfolio_optimization.png")
    
    # Mostrar tabla de pesos
    print("\nTabla de asignaciones:")
    print(f"{'Geograf√≠a':<20} {'Peso (%)':<12} {'Sharpe Predicho':<15}")
    print("-" * 50)
    for geo in sorted(weights.keys()):
        weight_pct = weights[geo] * 100
        sharpe = predictions.get(geo, np.nan)
        print(f"{geo:<20} {weight_pct:>10.2f}% {sharpe:>14.4f}")
    print("-" * 50)
    print(f"{'TOTAL':<20} {sum(weights.values())*100:>10.2f}%")
else:
    print("‚ùå Error: No se encontraron pesos optimizados")

## Paso 7: Backtesting del Sistema Completo

In [None]:
print("=" * 80)
print("PASO 7: BACKTESTING DEL SISTEMA COMPLETO")
print("=" * 80)
print("\n[NOTA] Este paso puede tardar varios minutos...")

# Ejecutar backtesting
run_script_as_module('backtest_strategy.py')

In [None]:
# Cargar y visualizar resultados del backtesting
backtest_results_file = os.path.join(data_dir, 'backtest_results.pkl')

if os.path.exists(backtest_results_file):
    with open(backtest_results_file, 'rb') as f:
        backtest_results = pickle.load(f)
    
    portfolio_returns = backtest_results['portfolio_returns']
    portfolio_metrics = backtest_results['metrics']
    comparison = backtest_results.get('comparison', {})
    
    print("\n‚úÖ Resultados del backtesting cargados")
    
    # Mostrar m√©tricas
    print("\n" + "=" * 80)
    print("M√âTRICAS DEL PORTAFOLIO OPTIMIZADO")
    print("=" * 80)
    print(f"\nPer√≠odo: {portfolio_returns.index.min()} a {portfolio_returns.index.max()}")
    print(f"Retorno Total:           {portfolio_metrics['total_return']*100:>8.2f}%")
    print(f"Retorno Anualizado:      {portfolio_metrics['annualized_return']*100:>8.2f}%")
    print(f"Volatilidad:             {portfolio_metrics['volatility']:>8.2f}%")
    print(f"Sharpe Ratio:            {portfolio_metrics['sharpe_ratio']:>8.4f}")
    print(f"Drawdown M√°ximo:         {portfolio_metrics['max_drawdown']*100:>8.2f}%")
    print(f"Win Rate:                {portfolio_metrics['win_rate']*100:>8.2f}%")
    
    # Comparaci√≥n con benchmarks
    if len(comparison) > 0:
        print("\n" + "=" * 80)
        print("COMPARACI√ìN CON BENCHMARKS")
        print("=" * 80)
        print(f"\n{'Estrategia':<25} {'Retorno An.':<12} {'Sharpe':<10} {'Drawdown':<10}")
        print("-" * 60)
        
        for strategy, metrics in comparison.items():
            ret = metrics['annualized_return'] * 100
            sharpe = metrics['sharpe_ratio']
            dd = metrics['max_drawdown'] * 100
            print(f"{strategy:<25} {ret:>10.2f}% {sharpe:>9.4f} {dd:>9.2f}%")
    
else:
    print("‚ùå Error: No se encontraron resultados de backtesting")

## Paso 8: Visualizaciones Detalladas

In [None]:
# Generar visualizaciones completas
print("=" * 80)
print("PASO 8: GENERANDO VISUALIZACIONES")
print("=" * 80)

# Cargar resultados si no est√°n cargados
if 'backtest_results' not in locals():
    with open(backtest_results_file, 'rb') as f:
        backtest_results = pickle.load(f)
    portfolio_returns = backtest_results['portfolio_returns']

# Crear visualizaciones
os.makedirs(os.path.join(data_dir, 'visualizations'), exist_ok=True)

# 1. An√°lisis completo del portafolio
cumulative = (1 + portfolio_returns / 100).cumprod()

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Curva de equity
ax1 = axes[0, 0]
ax1.plot(cumulative.index, cumulative.values, linewidth=2, label='Portafolio Optimizado', color='blue')
ax1.axhline(y=1.0, color='gray', linestyle='--', alpha=0.5, label='Break-even')
ax1.set_title('Curva de Equity - Portafolio Optimizado', fontsize=14, fontweight='bold')
ax1.set_xlabel('Fecha')
ax1.set_ylabel('Valor Acumulado (Base = 1.0)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Retornos diarios
ax2 = axes[0, 1]
ax2.plot(portfolio_returns.index, portfolio_returns.values, alpha=0.6, linewidth=0.5, color='blue')
ax2.axhline(y=0, color='red', linestyle='--', alpha=0.5)
ax2.set_title('Retornos Diarios del Portafolio', fontsize=14, fontweight='bold')
ax2.set_xlabel('Fecha')
ax2.set_ylabel('Retorno Diario (%)')
ax2.grid(True, alpha=0.3)

# Distribuci√≥n de retornos
ax3 = axes[1, 0]
ax3.hist(portfolio_returns.values, bins=50, alpha=0.7, edgecolor='black', color='blue')
ax3.axvline(x=0, color='red', linestyle='--', alpha=0.5)
ax3.axvline(x=portfolio_returns.mean(), color='green', linestyle='--', 
            label=f'Media: {portfolio_returns.mean():.3f}%')
ax3.set_title('Distribuci√≥n de Retornos Diarios', fontsize=14, fontweight='bold')
ax3.set_xlabel('Retorno Diario (%)')
ax3.set_ylabel('Frecuencia')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Drawdown
ax4 = axes[1, 1]
running_max = cumulative.expanding().max()
drawdown = (cumulative - running_max) / running_max * 100
ax4.fill_between(drawdown.index, drawdown.values, 0, alpha=0.3, color='red')
ax4.plot(drawdown.index, drawdown.values, color='red', linewidth=1)
ax4.set_title('Drawdown del Portafolio', fontsize=14, fontweight='bold')
ax4.set_xlabel('Fecha')
ax4.set_ylabel('Drawdown (%)')
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(os.path.join(data_dir, 'visualizations', 'backtest_analysis_complete.png'), dpi=300, bbox_inches='tight')
plt.show()

print(f"\n[OK] Gr√°fico guardado en {data_dir}/visualizations/backtest_analysis_complete.png")

In [None]:
# Comparaci√≥n con SPY
try:
    with open(os.path.join(data_dir, 'etf_returns_dict.pkl'), 'rb') as f:
        returns_dict = pickle.load(f)
    
    if 'SPY' in returns_dict:
        spy_returns = returns_dict['SPY']
        
        # Normalizar √≠ndices
        if isinstance(spy_returns.index, pd.DatetimeIndex):
            if spy_returns.index.tz is not None:
                spy_returns.index = spy_returns.index.tz_localize(None)
            spy_returns.index = spy_returns.index.normalize()
        
        # Alinear fechas
        common_dates = portfolio_returns.index.intersection(spy_returns.index)
        if len(common_dates) > 0:
            portfolio_aligned = portfolio_returns.loc[common_dates]
            spy_aligned = spy_returns.loc[common_dates]
            
            # Calcular valores acumulados
            portfolio_cum = (1 + portfolio_aligned / 100).cumprod()
            spy_cum = (1 + spy_aligned / 100).cumprod()
            
            fig, ax = plt.subplots(figsize=(14, 8))
            ax.plot(portfolio_cum.index, portfolio_cum.values, linewidth=2, 
                   label='Portafolio Optimizado', color='blue')
            ax.plot(spy_cum.index, spy_cum.values, linewidth=2, 
                   label='SPY (Benchmark)', color='orange')
            ax.axhline(y=1.0, color='gray', linestyle='--', alpha=0.5)
            ax.set_title('Comparaci√≥n: Portafolio Optimizado vs SPY', 
                       fontsize=16, fontweight='bold')
            ax.set_xlabel('Fecha')
            ax.set_ylabel('Valor Acumulado (Base = 1.0)')
            ax.legend(fontsize=12)
            ax.grid(True, alpha=0.3)
            
            plt.tight_layout()
            plt.savefig(os.path.join(data_dir, 'visualizations', 'comparison_vs_spy.png'), 
                      dpi=300, bbox_inches='tight')
            plt.show()
            
            # Calcular m√©tricas comparativas
            portfolio_total = portfolio_cum.iloc[-1] - 1
            spy_total = spy_cum.iloc[-1] - 1
            
            print(f"\nComparaci√≥n de Retorno Total:")
            print(f"  Portafolio Optimizado: {portfolio_total*100:.2f}%")
            print(f"  SPY:                    {spy_total*100:.2f}%")
            print(f"  Diferencia:             {(portfolio_total - spy_total)*100:.2f}%")
            
            # Calcular Sharpe de ambos
            portfolio_sharpe = portfolio_metrics['sharpe_ratio']
            spy_annualized = spy_aligned.mean() * 252
            spy_vol = spy_aligned.std() * np.sqrt(252)
            spy_sharpe = (spy_annualized - 0.02) / spy_vol if spy_vol > 0 else 0
            
            print(f"\nComparaci√≥n de Sharpe Ratio:")
            print(f"  Portafolio Optimizado: {portfolio_sharpe:.4f}")
            print(f"  SPY:                    {spy_sharpe:.4f}")
            print(f"  Diferencia:             {portfolio_sharpe - spy_sharpe:.4f}")
except Exception as e:
    print(f"[WARNING] No se pudo generar comparaci√≥n con SPY: {e}")

In [None]:
# Divisi√≥n Train/Test
split_idx = int(len(portfolio_returns) * 0.7)
train_returns = portfolio_returns.iloc[:split_idx]
test_returns = portfolio_returns.iloc[split_idx:]

train_cum = (1 + train_returns / 100).cumprod()
test_cum = (1 + test_returns / 100).cumprod()

fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Train
ax1 = axes[0]
ax1.plot(train_cum.index, train_cum.values, linewidth=2, color='blue', label='Train')
ax1.axhline(y=1.0, color='gray', linestyle='--', alpha=0.5)
ax1.set_title(f'Per√≠odo de Entrenamiento (Train) - {len(train_returns)} d√≠as', 
             fontsize=14, fontweight='bold')
ax1.set_ylabel('Valor Acumulado')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Test
ax2 = axes[1]
ax2.plot(test_cum.index, test_cum.values, linewidth=2, color='red', label='Test')
ax2.axhline(y=1.0, color='gray', linestyle='--', alpha=0.5)
ax2.set_title(f'Per√≠odo de Prueba (Test) - {len(test_returns)} d√≠as', 
             fontsize=14, fontweight='bold')
ax2.set_xlabel('Fecha')
ax2.set_ylabel('Valor Acumulado')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(os.path.join(data_dir, 'visualizations', 'train_test_split.png'), 
           dpi=300, bbox_inches='tight')
plt.show()

# Calcular m√©tricas por per√≠odo
def calculate_simple_metrics(returns):
    returns_pct = returns
    cumulative = (1 + returns_pct / 100).cumprod()
    total_return = cumulative.iloc[-1] - 1
    years = len(returns) / 252
    annualized = (1 + total_return) ** (1 / years) - 1 if years > 0 else 0
    volatility = returns_pct.std() * np.sqrt(252)
    sharpe = (annualized * 100) / volatility if volatility > 0 else 0
    return {'annualized_return': annualized, 'volatility': volatility, 'sharpe_ratio': sharpe}

train_metrics = calculate_simple_metrics(train_returns)
test_metrics = calculate_simple_metrics(test_returns)

print(f"\nM√©tricas Train:")
print(f"  Retorno Anualizado: {train_metrics['annualized_return']*100:.2f}%")
print(f"  Sharpe Ratio: {train_metrics['sharpe_ratio']:.4f}")
print(f"  Volatilidad: {train_metrics['volatility']:.2f}%")

print(f"\nM√©tricas Test:")
print(f"  Retorno Anualizado: {test_metrics['annualized_return']*100:.2f}%")
print(f"  Sharpe Ratio: {test_metrics['sharpe_ratio']:.4f}")
print(f"  Volatilidad: {test_metrics['volatility']:.2f}%")

print(f"\n[OK] Gr√°fico train/test guardado")

## Paso 9: Resumen Final

In [None]:
# Resumen final del pipeline
print("=" * 80)
print("RESUMEN FINAL DEL PIPELINE")
print("=" * 80)

print("\n‚úÖ Pasos completados:")
print("  1. ‚úÖ Descarga de datos de ETFs")
print("  2. ‚úÖ Generaci√≥n de features ML")
print("  3. ‚úÖ Entrenamiento de modelos")
print("  4. ‚úÖ Optimizaci√≥n de portafolio")
print("  5. ‚úÖ Backtesting completo")
print("  6. ‚úÖ Visualizaciones")

print("\nüìä Archivos generados:")
print(f"  - Datos: {data_dir}/etf_returns_dict.pkl")
print(f"  - Dataset ML: {data_dir}/ml_dataset.pkl")
print(f"  - Modelos: {models_dir}/sharpe_predictor_*.pkl")
print(f"  - Ensembles: {models_dir}/ensemble_*.pkl")
print(f"  - Resultados backtest: {data_dir}/backtest_results.pkl")
print(f"  - Visualizaciones: {data_dir}/visualizations/")

print("\nüéØ Estrategias implementadas:")
print("  - Feature selection (50 features)")
print("  - Regularizaci√≥n mejorada")
print("  - Ensemble de modelos")
print("  - Restricciones de riesgo")
print("  - Turnover constraint")
print("  - Stop-loss")
print("  - Re-balanceo conservador")
print("  - Gesti√≥n de riesgo din√°mica (VIX)")

print("\nüìà Pr√≥ximos pasos:")
print("  - Re-entrenar modelos peri√≥dicamente (cada 6-12 meses)")
print("  - Monitorear performance en tiempo real")
print("  - Ajustar par√°metros seg√∫n resultados")
print("  - Implementar en producci√≥n con paper trading primero")

print("\n" + "=" * 80)
print("PIPELINE COMPLETADO EXITOSAMENTE")
print("=" * 80)