# 04 - Modelado y Prediccion del Tipo de Cambio USD/CLP

Este notebook implementa y compara tres modelos de Machine Learning para predecir el tipo de cambio USD/CLP:

1. **Random Forest**: Baseline ML robusto basado en arboles de decision
2. **XGBoost**: Modelo principal de gradient boosting optimizado
3. **AutoARIMA**: Baseline estadistico de series temporales

## Estructura del Notebook

1. Configuracion y carga de datos
2. Preparacion de datos con validacion temporal
3. Entrenamiento y evaluacion de modelos
4. Comparacion de resultados
5. Analisis de feature importance
6. Persistencia del mejor modelo

## 1. Importacion de Librerias y Configuracion

In [None]:
# Importaciones estandar
import os
import sys
import json
import warnings
from pathlib import Path

# Data manipulation
import numpy as np
import pandas as pd

# Visualizacion
import matplotlib.pyplot as plt
import seaborn as sns

# ML Models
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import xgboost as xgb
from pmdarima import auto_arima

# Persistencia
import pickle

# Configuracion
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

# Agregar el directorio raiz al path para importar modulos locales
project_root = Path.cwd().parent
sys.path.append(str(project_root))

# Importar utilidades personalizadas
from src.utils.metrics import calculate_metrics, format_metrics, save_metrics

print("Librerias importadas exitosamente")
print(f"Directorio del proyecto: {project_root}")

## 2. Carga y Preparacion de Datos

In [None]:
# Rutas de archivos
DATA_PATH = project_root / "data" / "processed" / "dolar_features.csv"
MODELS_DIR = project_root / "models"
MODELS_DIR.mkdir(exist_ok=True)

# Cargar datos
print("Cargando datos procesados...")
df = pd.read_csv(DATA_PATH, parse_dates=['Fecha'])

print(f"\nDatos cargados: {df.shape[0]} registros, {df.shape[1]} columnas")
print(f"Rango de fechas: {df['Fecha'].min()} a {df['Fecha'].max()}")
print(f"\nColumnas disponibles:\n{df.columns.tolist()}")

In [None]:
# Verificar valores nulos
print("Valores nulos por columna:")
print(df.isnull().sum())

# Eliminar filas con valores nulos (debido a lags y moving averages)
df_clean = df.dropna().copy()
print(f"\nDatos despues de eliminar nulos: {df_clean.shape[0]} registros")

In [None]:
# Separar features y target
# Excluir: Fecha (indice temporal), Valor (target), statusCode (metadata)
feature_cols = [col for col in df_clean.columns 
                if col not in ['Fecha', 'Valor', 'statusCode']]

X = df_clean[feature_cols].values
y = df_clean['Valor'].values
dates = df_clean['Fecha'].values

print(f"Features seleccionadas ({len(feature_cols)}):")
for i, col in enumerate(feature_cols, 1):
    print(f"  {i:2d}. {col}")

print(f"\nShape de X: {X.shape}")
print(f"Shape de y: {y.shape}")

## 3. Split Temporal de Datos

Para series temporales, es crucial mantener el orden cronologico.
Usamos un split 80/20 respetando la secuencia temporal.

In [None]:
# Split temporal 80/20
split_index = int(len(X) * 0.8)

X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]
dates_train, dates_test = dates[:split_index], dates[split_index:]

print("Division temporal de datos:")
print("=" * 60)
print(f"Train set: {X_train.shape[0]} registros ({X_train.shape[0]/len(X)*100:.1f}%)")
print(f"  Rango: {dates_train[0]} a {dates_train[-1]}")
print(f"\nTest set:  {X_test.shape[0]} registros ({X_test.shape[0]/len(X)*100:.1f}%)")
print(f"  Rango: {dates_test[0]} a {dates_test[-1]}")
print("=" * 60)

In [None]:
# Configurar TimeSeriesSplit para cross-validation
n_splits = 5
tscv = TimeSeriesSplit(n_splits=n_splits)

print(f"\nTimeSeriesSplit configurado con {n_splits} folds")
print("\nEstructura de los folds:")
for i, (train_idx, val_idx) in enumerate(tscv.split(X_train), 1):
    print(f"  Fold {i}: Train={len(train_idx):4d} | Validation={len(val_idx):4d}")

## 4. Modelo 1: Random Forest Regressor

Baseline robusto basado en ensamble de arboles de decision.

In [None]:
print("=" * 60)
print("MODELO 1: RANDOM FOREST")
print("=" * 60)

# Hiperparametros
rf_params = {
    'n_estimators': 100,
    'max_depth': 20,
    'min_samples_split': 5,
    'min_samples_leaf': 2,
    'random_state': 42,
    'n_jobs': -1
}

print("\nHiperparametros:")
for param, value in rf_params.items():
    print(f"  {param}: {value}")

# Entrenar modelo
print("\nEntrenando Random Forest...")
rf_model = RandomForestRegressor(**rf_params)
rf_model.fit(X_train, y_train)

print("Modelo entrenado exitosamente")

In [None]:
# Cross-validation
print("\nEvaluando con Cross-Validation...")
cv_scores_rf = -cross_val_score(
    rf_model, X_train, y_train,
    cv=tscv,
    scoring='neg_mean_absolute_error',
    n_jobs=-1
)

print(f"\nMAE por fold:")
for i, score in enumerate(cv_scores_rf, 1):
    print(f"  Fold {i}: {score:.2f}")
print(f"\nMAE promedio (CV): {cv_scores_rf.mean():.2f} (+/- {cv_scores_rf.std():.2f})")

In [None]:
# Predicciones en test set
print("\nGenerando predicciones en test set...")
y_pred_rf = rf_model.predict(X_test)

# Calcular metricas
metrics_rf = calculate_metrics(y_test, y_pred_rf)

print("\n" + format_metrics(metrics_rf))

## 5. Modelo 2: XGBoost

Modelo principal basado en gradient boosting optimizado.

In [None]:
print("=" * 60)
print("MODELO 2: XGBOOST")
print("=" * 60)

# Hiperparametros
xgb_params = {
    'n_estimators': 100,
    'learning_rate': 0.1,
    'max_depth': 5,
    'min_child_weight': 1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'gamma': 0,
    'random_state': 42,
    'n_jobs': -1,
    'tree_method': 'hist'
}

print("\nHiperparametros:")
for param, value in xgb_params.items():
    print(f"  {param}: {value}")

# Entrenar modelo
print("\nEntrenando XGBoost...")
xgb_model = xgb.XGBRegressor(**xgb_params)
xgb_model.fit(X_train, y_train)

print("Modelo entrenado exitosamente")

In [None]:
# Cross-validation
print("\nEvaluando con Cross-Validation...")
cv_scores_xgb = -cross_val_score(
    xgb_model, X_train, y_train,
    cv=tscv,
    scoring='neg_mean_absolute_error',
    n_jobs=-1
)

print(f"\nMAE por fold:")
for i, score in enumerate(cv_scores_xgb, 1):
    print(f"  Fold {i}: {score:.2f}")
print(f"\nMAE promedio (CV): {cv_scores_xgb.mean():.2f} (+/- {cv_scores_xgb.std():.2f})")

In [None]:
# Predicciones en test set
print("\nGenerando predicciones en test set...")
y_pred_xgb = xgb_model.predict(X_test)

# Calcular metricas
metrics_xgb = calculate_metrics(y_test, y_pred_xgb)

print("\n" + format_metrics(metrics_xgb))

## 6. Modelo 3: AutoARIMA

Baseline estadistico de series temporales univariadas.

In [None]:
print("=" * 60)
print("MODELO 3: AUTO ARIMA")
print("=" * 60)

# Hiperparametros
arima_params = {
    'seasonal': False,
    'suppress_warnings': True,
    'stepwise': True,
    'trace': False,
    'error_action': 'ignore',
    'n_jobs': -1
}

print("\nHiperparametros:")
for param, value in arima_params.items():
    print(f"  {param}: {value}")

# Entrenar modelo (solo usa la serie temporal univariada)
print("\nEntrenando AutoARIMA...")
print("Nota: ARIMA solo utiliza la serie temporal univariada (no features)")
arima_model = auto_arima(y_train, **arima_params)

print(f"\nModelo entrenado: {arima_model}")

In [None]:
# Predicciones en test set
print("\nGenerando predicciones en test set...")
y_pred_arima = arima_model.predict(n_periods=len(y_test))

# Calcular metricas
metrics_arima = calculate_metrics(y_test, y_pred_arima)

print("\n" + format_metrics(metrics_arima))

## 7. Comparacion de Modelos

In [None]:
# Crear tabla comparativa
comparison_df = pd.DataFrame({
    'Modelo': ['Random Forest', 'XGBoost', 'AutoARIMA'],
    'MAE': [metrics_rf['mae'], metrics_xgb['mae'], metrics_arima['mae']],
    'RMSE': [metrics_rf['rmse'], metrics_xgb['rmse'], metrics_arima['rmse']],
    'MAPE (%)': [metrics_rf['mape'], metrics_xgb['mape'], metrics_arima['mape']],
    'R2': [metrics_rf['r2'], metrics_xgb['r2'], metrics_arima['r2']]
})

# Resaltar mejor valor en cada metrica
print("\n" + "="*80)
print("COMPARACION DE MODELOS")
print("="*80)
print(comparison_df.to_string(index=False))
print("="*80)

# Identificar mejor modelo por metrica
print("\nMejor modelo por metrica:")
print(f"  MAE:  {comparison_df.loc[comparison_df['MAE'].idxmin(), 'Modelo']} ({comparison_df['MAE'].min():.2f})")
print(f"  RMSE: {comparison_df.loc[comparison_df['RMSE'].idxmin(), 'Modelo']} ({comparison_df['RMSE'].min():.2f})")
print(f"  MAPE: {comparison_df.loc[comparison_df['MAPE (%)'].idxmin(), 'Modelo']} ({comparison_df['MAPE (%)'].min():.2f}%)")
print(f"  R2:   {comparison_df.loc[comparison_df['R2'].idxmax(), 'Modelo']} ({comparison_df['R2'].max():.4f})")

In [None]:
# Visualizacion comparativa de metricas
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Comparacion de Metricas entre Modelos', fontsize=16, fontweight='bold')

# MAE
axes[0, 0].bar(comparison_df['Modelo'], comparison_df['MAE'], color=['#3498db', '#e74c3c', '#2ecc71'])
axes[0, 0].set_title('Mean Absolute Error (MAE)')
axes[0, 0].set_ylabel('MAE')
axes[0, 0].tick_params(axis='x', rotation=45)

# RMSE
axes[0, 1].bar(comparison_df['Modelo'], comparison_df['RMSE'], color=['#3498db', '#e74c3c', '#2ecc71'])
axes[0, 1].set_title('Root Mean Squared Error (RMSE)')
axes[0, 1].set_ylabel('RMSE')
axes[0, 1].tick_params(axis='x', rotation=45)

# MAPE
axes[1, 0].bar(comparison_df['Modelo'], comparison_df['MAPE (%)'], color=['#3498db', '#e74c3c', '#2ecc71'])
axes[1, 0].set_title('Mean Absolute Percentage Error (MAPE)')
axes[1, 0].set_ylabel('MAPE (%)')
axes[1, 0].tick_params(axis='x', rotation=45)
axes[1, 0].axhline(y=2, color='red', linestyle='--', label='Excelente (2%)')
axes[1, 0].axhline(y=5, color='orange', linestyle='--', label='Bueno (5%)')
axes[1, 0].legend()

# R2 Score
axes[1, 1].bar(comparison_df['Modelo'], comparison_df['R2'], color=['#3498db', '#e74c3c', '#2ecc71'])
axes[1, 1].set_title('R² Score')
axes[1, 1].set_ylabel('R²')
axes[1, 1].tick_params(axis='x', rotation=45)
axes[1, 1].axhline(y=0.95, color='green', linestyle='--', label='Objetivo (0.95)')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

## 8. Visualizacion de Predicciones

In [None]:
# Grafico de predicciones vs valores reales
fig, axes = plt.subplots(3, 1, figsize=(15, 12))
fig.suptitle('Predicciones vs Valores Reales en Test Set', fontsize=16, fontweight='bold')

# Random Forest
axes[0].plot(dates_test, y_test, label='Valor Real', linewidth=2, alpha=0.7)
axes[0].plot(dates_test, y_pred_rf, label='Prediccion RF', linewidth=2, alpha=0.7)
axes[0].set_title(f'Random Forest (MAE: {metrics_rf["mae"]:.2f}, MAPE: {metrics_rf["mape"]:.2f}%)')
axes[0].set_ylabel('Valor USD/CLP')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# XGBoost
axes[1].plot(dates_test, y_test, label='Valor Real', linewidth=2, alpha=0.7)
axes[1].plot(dates_test, y_pred_xgb, label='Prediccion XGB', linewidth=2, alpha=0.7)
axes[1].set_title(f'XGBoost (MAE: {metrics_xgb["mae"]:.2f}, MAPE: {metrics_xgb["mape"]:.2f}%)')
axes[1].set_ylabel('Valor USD/CLP')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# AutoARIMA
axes[2].plot(dates_test, y_test, label='Valor Real', linewidth=2, alpha=0.7)
axes[2].plot(dates_test, y_pred_arima, label='Prediccion ARIMA', linewidth=2, alpha=0.7)
axes[2].set_title(f'AutoARIMA (MAE: {metrics_arima["mae"]:.2f}, MAPE: {metrics_arima["mape"]:.2f}%)')
axes[2].set_xlabel('Fecha')
axes[2].set_ylabel('Valor USD/CLP')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Grafico comparativo de errores
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
fig.suptitle('Distribucion de Errores por Modelo', fontsize=16, fontweight='bold')

# Calcular errores
errors_rf = y_test - y_pred_rf
errors_xgb = y_test - y_pred_xgb
errors_arima = y_test - y_pred_arima

# Random Forest
axes[0].hist(errors_rf, bins=30, edgecolor='black', alpha=0.7)
axes[0].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[0].set_title('Random Forest')
axes[0].set_xlabel('Error (Real - Prediccion)')
axes[0].set_ylabel('Frecuencia')
axes[0].grid(True, alpha=0.3)

# XGBoost
axes[1].hist(errors_xgb, bins=30, edgecolor='black', alpha=0.7)
axes[1].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[1].set_title('XGBoost')
axes[1].set_xlabel('Error (Real - Prediccion)')
axes[1].set_ylabel('Frecuencia')
axes[1].grid(True, alpha=0.3)

# AutoARIMA
axes[2].hist(errors_arima, bins=30, edgecolor='black', alpha=0.7)
axes[2].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[2].set_title('AutoARIMA')
axes[2].set_xlabel('Error (Real - Prediccion)')
axes[2].set_ylabel('Frecuencia')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 9. Analisis de Feature Importance

In [None]:
# Feature Importance para Random Forest
rf_importance = pd.DataFrame({
    'Feature': feature_cols,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("\nFeature Importance - Random Forest:")
print("=" * 60)
print(rf_importance.to_string(index=False))
print("=" * 60)

In [None]:
# Feature Importance para XGBoost
xgb_importance = pd.DataFrame({
    'Feature': feature_cols,
    'Importance': xgb_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("\nFeature Importance - XGBoost:")
print("=" * 60)
print(xgb_importance.to_string(index=False))
print("=" * 60)

In [None]:
# Visualizacion de Feature Importance
fig, axes = plt.subplots(1, 2, figsize=(18, 8))
fig.suptitle('Feature Importance por Modelo', fontsize=16, fontweight='bold')

# Top 15 features Random Forest
top_rf = rf_importance.head(15)
axes[0].barh(range(len(top_rf)), top_rf['Importance'])
axes[0].set_yticks(range(len(top_rf)))
axes[0].set_yticklabels(top_rf['Feature'])
axes[0].set_xlabel('Importance')
axes[0].set_title('Random Forest - Top 15 Features')
axes[0].invert_yaxis()
axes[0].grid(True, alpha=0.3)

# Top 15 features XGBoost
top_xgb = xgb_importance.head(15)
axes[1].barh(range(len(top_xgb)), top_xgb['Importance'])
axes[1].set_yticks(range(len(top_xgb)))
axes[1].set_yticklabels(top_xgb['Feature'])
axes[1].set_xlabel('Importance')
axes[1].set_title('XGBoost - Top 15 Features')
axes[1].invert_yaxis()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 10. Analisis de Residuos del Mejor Modelo

In [None]:
# Seleccionar el mejor modelo basado en MAPE
best_model_idx = comparison_df['MAPE (%)'].idxmin()
best_model_name = comparison_df.loc[best_model_idx, 'Modelo']

# Obtener predicciones del mejor modelo
if best_model_name == 'Random Forest':
    best_model = rf_model
    y_pred_best = y_pred_rf
    metrics_best = metrics_rf
elif best_model_name == 'XGBoost':
    best_model = xgb_model
    y_pred_best = y_pred_xgb
    metrics_best = metrics_xgb
else:
    best_model = arima_model
    y_pred_best = y_pred_arima
    metrics_best = metrics_arima

print(f"\nModelo seleccionado: {best_model_name}")
print(f"MAPE: {metrics_best['mape']:.2f}%")
print(f"R²: {metrics_best['r2']:.4f}")

In [None]:
# Analisis de residuos
residuals = y_test - y_pred_best

fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle(f'Analisis de Residuos - {best_model_name}', fontsize=16, fontweight='bold')

# 1. Residuos vs Predicciones
axes[0, 0].scatter(y_pred_best, residuals, alpha=0.6)
axes[0, 0].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[0, 0].set_xlabel('Valores Predichos')
axes[0, 0].set_ylabel('Residuos')
axes[0, 0].set_title('Residuos vs Predicciones')
axes[0, 0].grid(True, alpha=0.3)

# 2. Distribucion de residuos
axes[0, 1].hist(residuals, bins=30, edgecolor='black', alpha=0.7)
axes[0, 1].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[0, 1].set_xlabel('Residuos')
axes[0, 1].set_ylabel('Frecuencia')
axes[0, 1].set_title('Distribucion de Residuos')
axes[0, 1].grid(True, alpha=0.3)

# 3. Q-Q Plot
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title('Q-Q Plot')
axes[1, 0].grid(True, alpha=0.3)

# 4. Residuos en el tiempo
axes[1, 1].plot(dates_test, residuals, alpha=0.7)
axes[1, 1].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[1, 1].set_xlabel('Fecha')
axes[1, 1].set_ylabel('Residuos')
axes[1, 1].set_title('Residuos en el Tiempo')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Estadisticas de residuos
print("\nEstadisticas de Residuos:")
print("=" * 60)
print(f"Media:              {residuals.mean():.4f}")
print(f"Mediana:            {np.median(residuals):.4f}")
print(f"Desviacion Estandar:{residuals.std():.4f}")
print(f"Minimo:             {residuals.min():.4f}")
print(f"Maximo:             {residuals.max():.4f}")
print(f"Percentil 25:       {np.percentile(residuals, 25):.4f}")
print(f"Percentil 75:       {np.percentile(residuals, 75):.4f}")
print("=" * 60)

## 11. Persistencia del Mejor Modelo

In [None]:
# Guardar el mejor modelo
model_filename = f"best_model_{best_model_name.lower().replace(' ', '_')}.pkl"
model_path = MODELS_DIR / model_filename

print(f"\nGuardando modelo: {best_model_name}")
with open(model_path, 'wb') as f:
    pickle.dump(best_model, f)
print(f"Modelo guardado en: {model_path}")

In [None]:
# Guardar nombres de features (solo para modelos ML, no ARIMA)
if best_model_name != 'AutoARIMA':
    feature_names_path = MODELS_DIR / "feature_names.txt"
    with open(feature_names_path, 'w') as f:
        for feature in feature_cols:
            f.write(f"{feature}\n")
    print(f"Nombres de features guardados en: {feature_names_path}")
else:
    print("\nNota: ARIMA no utiliza features, solo la serie temporal")

In [None]:
# Guardar metricas
metrics_output = {
    "model_name": best_model_name,
    "training_date": pd.Timestamp.now().isoformat(),
    "train_size": int(len(X_train)),
    "test_size": int(len(X_test)),
    "n_features": len(feature_cols) if best_model_name != 'AutoARIMA' else 0,
    "metrics": {
        "mae": float(metrics_best['mae']),
        "rmse": float(metrics_best['rmse']),
        "mape": float(metrics_best['mape']),
        "r2": float(metrics_best['r2']),
        "median_ae": float(metrics_best['median_ae'])
    },
    "comparison": {
        "random_forest": {
            "mae": float(metrics_rf['mae']),
            "mape": float(metrics_rf['mape']),
            "r2": float(metrics_rf['r2'])
        },
        "xgboost": {
            "mae": float(metrics_xgb['mae']),
            "mape": float(metrics_xgb['mape']),
            "r2": float(metrics_xgb['r2'])
        },
        "autoarima": {
            "mae": float(metrics_arima['mae']),
            "mape": float(metrics_arima['mape']),
            "r2": float(metrics_arima['r2'])
        }
    }
}

metrics_path = MODELS_DIR / "metrics.json"
save_metrics(metrics_output, str(metrics_path))
print(f"Metricas guardadas en: {metrics_path}")

## 12. Conclusiones y Recomendaciones

In [None]:
print("\n" + "=" * 80)
print("CONCLUSIONES Y RECOMENDACIONES")
print("=" * 80)

print(f"\n1. MODELO SELECCIONADO: {best_model_name}")
print(f"   - MAPE: {metrics_best['mape']:.2f}%")
print(f"   - R²: {metrics_best['r2']:.4f}")
print(f"   - MAE: {metrics_best['mae']:.2f} CLP")

# Evaluacion de performance
if metrics_best['mape'] < 2:
    performance = "EXCELENTE"
elif metrics_best['mape'] < 5:
    performance = "BUENO"
else:
    performance = "ACEPTABLE"

print(f"\n2. EVALUACION DE PERFORMANCE: {performance}")

print("\n3. FEATURES MAS IMPORTANTES:")
if best_model_name == 'Random Forest':
    top_features = rf_importance.head(5)
elif best_model_name == 'XGBoost':
    top_features = xgb_importance.head(5)
else:
    print("   N/A (ARIMA no utiliza features)")
    top_features = None

if top_features is not None:
    for idx, row in top_features.iterrows():
        print(f"   - {row['Feature']}: {row['Importance']:.4f}")

print("\n4. RECOMENDACIONES PARA PRODUCCION:")
if best_model_name in ['Random Forest', 'XGBoost']:
    print("   - Implementar pipeline de prediccion con preprocesamiento")
    print("   - Monitorear drift en features clave")
    print("   - Reentrenar modelo mensualmente o cuando MAPE > 3%")
    print("   - Implementar validacion de inputs antes de prediccion")
    print("   - Configurar alertas para predicciones anomalas")
else:
    print("   - ARIMA requiere solo la serie temporal")
    print("   - Actualizar modelo con datos recientes regularmente")
    print("   - Considerar SARIMAX si aparece estacionalidad")

print("\n5. PROXIMOS PASOS:")
print("   - Implementar API de prediccion (FastAPI)")
print("   - Crear pipeline de reentrenamiento automatizado")
print("   - Desarrollar dashboard de monitoreo")
print("   - Implementar tests de validacion de modelo")
print("   - Explorar ensemble de modelos para mayor robustez")

print("\n" + "=" * 80)

In [None]:
# Ejemplo de prediccion con el mejor modelo
print("\nEJEMPLO DE PREDICCION:")
print("=" * 60)

# Tomar ultima observacion del test set
sample_idx = -1
sample_date = dates_test[sample_idx]
sample_real = y_test[sample_idx]

if best_model_name != 'AutoARIMA':
    sample_features = X_test[sample_idx].reshape(1, -1)
    sample_pred = best_model.predict(sample_features)[0]
else:
    # Para ARIMA, usar la ultima prediccion
    sample_pred = y_pred_best[sample_idx]

error_abs = abs(sample_real - sample_pred)
error_pct = (error_abs / sample_real) * 100

print(f"Fecha:             {sample_date}")
print(f"Valor Real:        {sample_real:.2f} CLP")
print(f"Valor Predicho:    {sample_pred:.2f} CLP")
print(f"Error Absoluto:    {error_abs:.2f} CLP")
print(f"Error Porcentual:  {error_pct:.2f}%")
print("=" * 60)