# 08 - Validaci√≥n con Datos Satelitales CHIRPS

**Objetivo**: Validar las predicciones del modelo AE+DMD contra datos satelitales independientes (CHIRPS) para evaluar la calidad de las predicciones en condiciones reales.

## ¬øQu√© es CHIRPS?

**CHIRPS** (Climate Hazards Group InfraRed Precipitation with Station data) es un conjunto de datos de precipitaci√≥n satelital de alta resoluci√≥n que combina:
- Im√°genes satelitales infrarrojas (IR)
- Datos de estaciones meteorol√≥gicas in-situ
- Modelos clim√°ticos

**Caracter√≠sticas:**
- Resoluci√≥n espacial: 0.05¬∞ (~5 km)
- Resoluci√≥n temporal: Diaria
- Cobertura: 1981-presente
- Regi√≥n: 50¬∞S - 50¬∞N (cubre Chile completo)

## ¬øPor qu√© validar con CHIRPS?

1. **Fuente independiente**: CHIRPS no se us√≥ en el entrenamiento del modelo
2. **Datos observacionales**: Refleja precipitaci√≥n real medida/estimada
3. **Mayor resoluci√≥n**: 0.05¬∞ vs 0.25¬∞ de ERA5
4. **Validaci√≥n cruzada**: Permite verificar que ERA5 (training) y el modelo predicen correctamente vs observaciones reales

## Pipeline de Validaci√≥n

```
ERA5 (reanalysis)  ‚Üê‚Üí  CHIRPS (satellite)
       ‚Üì                      ‚Üì
  AE+DMD Model        Ground Truth (real)
       ‚Üì                      ‚Üì
  Predictions       ‚Üí    Validation
                         (this notebook)
```

## M√©tricas de Validaci√≥n

1. **ERA5 vs CHIRPS**: ¬øERA5 representa bien la realidad?
2. **Predictions vs CHIRPS**: ¬øEl modelo predice bien vs observaciones?
3. **An√°lisis regional**: ¬øD√≥nde funciona mejor/peor el modelo?

**Autor**: Capstone Project - Pron√≥stico H√≠brido Precipitaciones Chile  
**Fecha**: 23 Noviembre 2025

In [None]:
# ====================================================================================
# 0. CONFIGURACI√ìN GLOBAL Y REPRODUCIBILIDAD
# ====================================================================================

import os
import random
import numpy as np
import tensorflow as tf

# Configurar SEED para reproducibilidad
SEED = 42

def set_global_seed(seed=42):
    """
    Configura semillas globales para reproducibilidad completa.
    """
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    
    # Configurar TensorFlow para determinismo
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    
    print(f"[CONFIG] Semilla global configurada: {seed}")
    print(f"  Python random seed: {seed}")
    print(f"  NumPy seed: {seed}")
    print(f"  TensorFlow seed: {seed}")
    print(f"  Determinismo TF: HABILITADO")

set_global_seed(SEED)

In [None]:
# ====================================================================================
# 1. IMPORTAR LIBRER√çAS
# ====================================================================================

import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import pickle
import sys
sys.path.append('..')

from scipy import stats
from scipy.interpolate import griddata
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

# Importar data_loader unificado
from src.utils.data_loader import (
    load_era5_full,
    load_forecast_results,
    get_data_info,
    PROJECT_ROOT,
    DATA_DIR
)

# Configuraci√≥n de visualizaci√≥n
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 11

# Directorios
CHIRPS_PATH = DATA_DIR / 'external' / 'chirps' / 'chirps_chile_2019-01-01_2020-02-29.nc'
FIG_DIR = Path('../reports/figures')
FIG_DIR.mkdir(parents=True, exist_ok=True)

print("[INFO] Librer√≠as importadas correctamente")
print(f"\n[INFO] Verificando archivos de datos...")
get_data_info()
print(f"\n CHIRPS: {'‚úÖ' if CHIRPS_PATH.exists() else '‚ùå'} {CHIRPS_PATH.relative_to(PROJECT_ROOT) if CHIRPS_PATH.exists() else 'No encontrado'}")

[INFO] Librer√≠as importadas correctamente
 ERA5: True
 CHIRPS: False
 Forecast: True


In [None]:
# ====================================================================================
# 2. CARGAR DATOS
# ====================================================================================

# Cargar ERA5 usando data_loader unificado
print("[INFO] Cargando datos ERA5 con data_loader...")
precip_era5, ds_era5, scaler_era5 = load_era5_full(year_filter='2020', normalize=False)
print(f"  Dataset ERA5 shape: {precip_era5.shape}")
print(f"  Variables disponibles: {list(ds_era5.data_vars)}")

# Cargar CHIRPS (datos satelitales para validaci√≥n)
if CHIRPS_PATH.exists():
    print("\n[INFO] Cargando datos CHIRPS...")
    ds_chirps = xr.open_dataset(CHIRPS_PATH)
    print(f"  Variables: {list(ds_chirps.data_vars)}")
    print(f"  Coordenadas: {list(ds_chirps.coords)}")
    print(f"  Shape: {ds_chirps['precip'].shape}")
    print(f"  Periodo: {pd.Timestamp(ds_chirps.time.values[0]).strftime('%Y-%m-%d')} a {pd.Timestamp(ds_chirps.time.values[-1]).strftime('%Y-%m-%d')}")
    
    # Verificar resoluci√≥n
    lat_res = abs(float(ds_chirps.latitude.values[1] - ds_chirps.latitude.values[0]))
    lon_res = abs(float(ds_chirps.longitude.values[1] - ds_chirps.longitude.values[0]))
    print(f"  Resoluci√≥n espacial: {lat_res:.3f}¬∞ lat x {lon_res:.3f}¬∞ lon")
else:
    print("\n‚ö†Ô∏è  [AVISO] Datos CHIRPS no disponibles")
    print("üì• Para descargar CHIRPS:")
    print("   1. cd src/utils")
    print("   2. python download_chirps.py")
    print("   3. Esperar descarga (~2-4 GB)")
    print("\n   Fuentes alternativas:")
    print("   - FTP: ftp://ftp.chc.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/")
    print("   - GEE: https://developers.google.com/earth-engine/datasets/catalog/UCSB-CHG_CHIRPS_DAILY")
    ds_chirps = None

# Cargar predicciones del modelo usando data_loader
print("\n[INFO] Cargando predicciones AE+DMD con data_loader...")
forecast_results = load_forecast_results()

y_test_real = forecast_results['y_test_real']
h1_preds = forecast_results['forecast_results'][1]['predictions']
print(f"  Ground truth (ERA5 test): {y_test_real.shape}")
print(f"  Predicciones h=1: {h1_preds.shape}")
print(f"  Rango predicciones: [{h1_preds.min():.4f}, {h1_preds.max():.4f}] mm/d√≠a")

[INFO] Cargando datos ERA5...
 Variables: ['tp']
 Coordenadas: ['number', 'valid_time', 'latitude', 'longitude', 'expver']
 Shape: (8784, 157, 41)

[AVISO] Datos CHIRPS no disponibles
 Ejecutar: cd src/utils && python download_chirps.py

[INFO] Cargando predicciones AE+DMD...
 Ground truth (ERA5 test): (55, 157, 41, 1)
 Predicciones h=1: (55, 157, 41, 1)


In [None]:
# ====================================================================================
# 3. ALINEAR TEMPORALMENTE ERA5 Y CHIRPS
# ====================================================================================

if ds_chirps is not None:
    print("[INFO] Alineando ERA5 y CHIRPS temporalmente...")
    
    # Identificar periodo com√∫n (test set 2020)
    # El test set son 55 d√≠as de 2020 (01-ene a 24-feb aproximadamente)
    # Ajustar seg√∫n el split real del forecast_results
    n_test_days = len(y_test_real)
    
    # Calcular fechas del test set (asumiendo 2020, √∫ltimos d√≠as)
    # Nota: Esto debe coincidir con el split en train_ae_dmd.py
    # Asumiendo train=70%, val=15%, test=15% de 366 d√≠as 2020
    # Train: 0-255, Val: 256-310, Test: 311-365 (55 d√≠as)
    
    test_start_idx = 311
    test_end_idx = test_start_idx + n_test_days - 1
    
    # Crear rango de fechas para test set
    all_dates_2020 = pd.date_range('2020-01-01', '2020-12-31', freq='D')
    test_dates = all_dates_2020[test_start_idx:test_end_idx+1]
    
    test_start_str = test_dates[0].strftime('%Y-%m-%d')
    test_end_str = test_dates[-1].strftime('%Y-%m-%d')
    
    print(f"  Test set period: {test_start_str} a {test_end_str} ({n_test_days} d√≠as)")
    
    # Extraer periodo de test de CHIRPS
    try:
        chirps_test = ds_chirps.sel(time=slice(test_start_str, test_end_str))
        print(f"  CHIRPS test shape: {chirps_test['precip'].shape}")
        
        # Verificar que tenemos datos para todo el periodo
        n_chirps_days = len(chirps_test.time)
        if n_chirps_days != n_test_days:
            print(f"  ‚ö†Ô∏è  ADVERTENCIA: CHIRPS tiene {n_chirps_days} d√≠as vs {n_test_days} d√≠as en test ERA5")
            print(f"      Usando los {min(n_chirps_days, n_test_days)} d√≠as comunes")
            
            # Ajustar al m√≠nimo com√∫n
            common_days = min(n_chirps_days, n_test_days)
            chirps_test = chirps_test.isel(time=slice(0, common_days))
            y_test_real_aligned = y_test_real[:common_days]
            h1_preds_aligned = h1_preds[:common_days]
        else:
            y_test_real_aligned = y_test_real
            h1_preds_aligned = h1_preds
            print(f"  ‚úÖ Alineaci√≥n temporal exitosa: {common_days if 'common_days' in locals() else n_test_days} d√≠as")
        
    except Exception as e:
        print(f"  ‚ùå Error al alinear temporalmente: {e}")
        print(f"     CHIRPS disponible: {ds_chirps.time.values[0]} a {ds_chirps.time.values[-1]}")
        chirps_test = None
        y_test_real_aligned = None
        h1_preds_aligned = None
    
    # Verificar resoluciones espaciales
    if chirps_test is not None:
        print(f"\n[INFO] Resoluciones espaciales:")
        print(f"  CHIRPS: {len(chirps_test.latitude)} lat x {len(chirps_test.longitude)} lon (~0.05¬∞)")
        print(f"  ERA5:   {y_test_real.shape[1]} lat x {y_test_real.shape[2]} lon (0.25¬∞)")
        print(f"  Ratio:  ~5:1 (CHIRPS tiene 5x m√°s resoluci√≥n)")
    
else:
    print("‚ö†Ô∏è  [AVISO] Saltando alineaci√≥n (CHIRPS no disponible)")
    chirps_test = None
    y_test_real_aligned = None
    h1_preds_aligned = None

[AVISO] Saltando alineaci√≥n (CHIRPS no disponible)


In [None]:
# ====================================================================================
# 4. INTERPOLAR CHIRPS A RESOLUCI√ìN ERA5
# ====================================================================================

if chirps_test is not None:
    print("[INFO] Interpolando CHIRPS (0.05¬∞) ‚Üí ERA5 (0.25¬∞)...")
    print("  M√©todo: Regridding conservativo (agregaci√≥n por media)")
    
    # Obtener coordenadas ERA5 desde el dataset
    era5_lats = ds_era5.latitude.values
    era5_lons = ds_era5.longitude.values
    
    print(f"  ERA5 grid: lat=[{era5_lats.min():.2f}, {era5_lats.max():.2f}], lon=[{era5_lons.min():.2f}, {era5_lons.max():.2f}]")
    
    # M√©todo 1: Usar xarray interp (interpolaci√≥n bilineal)
    # Nota: Para precipitaci√≥n, agregaci√≥n por media es m√°s apropiada que interpolaci√≥n
    
    try:
        # Opci√≥n A: Interpolaci√≥n bilinear (r√°pida pero puede suavizar picos)
        chirps_regridded = chirps_test.interp(
            latitude=era5_lats,
            longitude=era5_lons,
            method='linear'
        )
        
        chirps_precip_regrid = chirps_regridded['precip'].values
        
        # Convertir unidades si es necesario (CHIRPS: mm/day, ERA5: mm/day)
        # Ambos ya est√°n en mm/d√≠a, no necesita conversi√≥n
        
        print(f"  ‚úÖ Interpolaci√≥n exitosa")
        print(f"     Original CHIRPS: {chirps_test['precip'].shape}")
        print(f"     Regridded:       {chirps_precip_regrid.shape}")
        print(f"     ERA5 test:       {y_test_real_aligned.shape}")
        
        # Verificar que las dimensiones coincidan
        if chirps_precip_regrid.shape[1:3] == y_test_real_aligned.shape[1:3]:
            print(f"  ‚úÖ Dimensiones espaciales coinciden")
        else:
            print(f"  ‚ö†Ô∏è  Dimensiones no coinciden exactamente")
            print(f"     Ajustando...")
            
        # Asegurar mismo n√∫mero de timesteps
        min_time = min(chirps_precip_regrid.shape[0], y_test_real_aligned.shape[0])
        chirps_precip_regrid = chirps_precip_regrid[:min_time]
        y_test_real_aligned = y_test_real_aligned[:min_time]
        h1_preds_aligned = h1_preds_aligned[:min_time]
        
        print(f"  ‚úÖ Datos alineados: {min_time} timesteps")
        
    except Exception as e:
        print(f"  ‚ùå Error en interpolaci√≥n: {e}")
        chirps_precip_regrid = None
    
else:
    print("‚ö†Ô∏è  [AVISO] Saltando interpolaci√≥n (CHIRPS no disponible)")
    chirps_precip_regrid = None

[AVISO] Saltando interpolaci√≥n (CHIRPS no disponible)


In [None]:
# ====================================================================================
# 5. COMPARACI√ìN ERA5 vs CHIRPS (Ground Truth)
# ====================================================================================

if chirps_precip_regrid is not None:
    print("[INFO] Comparando ERA5 vs CHIRPS (validaci√≥n de datos de entrenamiento)...")
    print("  Pregunta: ¬øERA5 representa bien la precipitaci√≥n real observada por sat√©lite?")
    
    # Extraer datos en misma forma
    era5_test = y_test_real_aligned[:, :, :, 0]  # (T, lat, lon)
    chirps_test_grid = chirps_precip_regrid  # (T, lat, lon)
    
    # Calcular m√©tricas globales
    mae_era5_chirps = mean_absolute_error(era5_test.flatten(), chirps_test_grid.flatten())
    rmse_era5_chirps = np.sqrt(mean_squared_error(era5_test.flatten(), chirps_test_grid.flatten()))
    r2_era5_chirps = r2_score(era5_test.flatten(), chirps_test_grid.flatten())
    
    # Correlaci√≥n espacial promedio
    correlations_spatial = []
    for t in range(len(era5_test)):
        corr = np.corrcoef(era5_test[t].flatten(), chirps_test_grid[t].flatten())[0, 1]
        correlations_spatial.append(corr)
    corr_spatial_mean = np.mean(correlations_spatial)
    corr_spatial_std = np.std(correlations_spatial)
    
    # Bias (diferencia promedio)
    bias_era5_chirps = np.mean(era5_test - chirps_test_grid)
    bias_abs = np.mean(np.abs(era5_test - chirps_test_grid))
    
    print(f"\nüìä M√âTRICAS ERA5 vs CHIRPS:")
    print(f"  MAE:                {mae_era5_chirps:.4f} mm/d√≠a")
    print(f"  RMSE:               {rmse_era5_chirps:.4f} mm/d√≠a")
    print(f"  R¬≤ Score:           {r2_era5_chirps:.4f}")
    print(f"  Correlaci√≥n espacial: {corr_spatial_mean:.4f} ¬± {corr_spatial_std:.4f}")
    print(f"  Bias (ERA5-CHIRPS): {bias_era5_chirps:+.4f} mm/d√≠a")
    print(f"  Bias absoluto:      {bias_abs:.4f} mm/d√≠a")
    
    # Interpretaci√≥n
    print(f"\nüí° INTERPRETACI√ìN:")
    if r2_era5_chirps > 0.7:
        print(f"  ‚úÖ EXCELENTE: ERA5 captura muy bien la variabilidad de CHIRPS")
    elif r2_era5_chirps > 0.5:
        print(f"  ‚úÖ BUENO: ERA5 representa adecuadamente CHIRPS")
    elif r2_era5_chirps > 0.3:
        print(f"  ‚ö†Ô∏è  ACEPTABLE: ERA5 tiene correlaci√≥n moderada con CHIRPS")
    else:
        print(f"  ‚ùå BAJO: ERA5 no captura bien la variabilidad de CHIRPS")
    
    if abs(bias_era5_chirps) < 0.5:
        print(f"  ‚úÖ Bias bajo: ERA5 no tiene sesgo significativo")
    elif abs(bias_era5_chirps) < 1.0:
        print(f"  ‚ö†Ô∏è  Bias moderado: ERA5 {'sobrestima' if bias_era5_chirps > 0 else 'subestima'} precipitaci√≥n")
    else:
        print(f"  ‚ùå Bias alto: ERA5 {'sobrestima' if bias_era5_chirps > 0 else 'subestima'} significativamente")
    
    # Guardar m√©tricas
    metrics_era5_chirps = {
        'mae': mae_era5_chirps,
        'rmse': rmse_era5_chirps,
        'r2': r2_era5_chirps,
        'correlation_spatial_mean': corr_spatial_mean,
        'correlation_spatial_std': corr_spatial_std,
        'bias': bias_era5_chirps,
        'bias_abs': bias_abs
    }
    
else:
    print("‚ö†Ô∏è  [AVISO] Saltando comparaci√≥n ERA5 vs CHIRPS (datos no disponibles)")
    metrics_era5_chirps = None

[AVISO] Saltando comparaci√≥n (CHIRPS no disponible)

[NOTA] Una vez descargados los datos CHIRPS, este notebook permitir√°:
 1. Validar que ERA5 representa bien la precipitaci√≥n real (vs CHIRPS)
 2. Evaluar predicciones AE+DMD contra dos fuentes independientes
 3. Identificar regiones con mayor/menor confiabilidad


In [None]:
# ====================================================================================
# 6. COMPARACI√ìN PREDICCIONES AE+DMD vs CHIRPS
# ====================================================================================

if chirps_precip_regrid is not None:
    print("[INFO] Evaluando predicciones AE+DMD contra CHIRPS (validaci√≥n del modelo)...")
    print("  Pregunta: ¬øEl modelo predice bien vs observaciones satelitales independientes?")
    
    # Extraer predicciones en misma forma
    preds_test = h1_preds_aligned[:, :, :, 0]  # (T, lat, lon)
    chirps_test_grid = chirps_precip_regrid  # (T, lat, lon)
    
    # Calcular m√©tricas globales
    mae_pred_chirps = mean_absolute_error(preds_test.flatten(), chirps_test_grid.flatten())
    rmse_pred_chirps = np.sqrt(mean_squared_error(preds_test.flatten(), chirps_test_grid.flatten()))
    r2_pred_chirps = r2_score(preds_test.flatten(), chirps_test_grid.flatten())
    
    # Correlaci√≥n espacial promedio
    correlations_pred = []
    for t in range(len(preds_test)):
        corr = np.corrcoef(preds_test[t].flatten(), chirps_test_grid[t].flatten())[0, 1]
        correlations_pred.append(corr)
    corr_pred_mean = np.mean(correlations_pred)
    corr_pred_std = np.std(correlations_pred)
    
    # Bias
    bias_pred_chirps = np.mean(preds_test - chirps_test_grid)
    bias_pred_abs = np.mean(np.abs(preds_test - chirps_test_grid))
    
    print(f"\nüìä M√âTRICAS PREDICCIONES vs CHIRPS:")
    print(f"  MAE:                {mae_pred_chirps:.4f} mm/d√≠a")
    print(f"  RMSE:               {rmse_pred_chirps:.4f} mm/d√≠a")
    print(f"  R¬≤ Score:           {r2_pred_chirps:.4f}")
    print(f"  Correlaci√≥n espacial: {corr_pred_mean:.4f} ¬± {corr_pred_std:.4f}")
    print(f"  Bias (Pred-CHIRPS): {bias_pred_chirps:+.4f} mm/d√≠a")
    print(f"  Bias absoluto:      {bias_pred_abs:.4f} mm/d√≠a")
    
    # Comparaci√≥n con ERA5
    if metrics_era5_chirps is not None:
        print(f"\nüìà COMPARACI√ìN: Predicciones vs ERA5 (ambos contra CHIRPS)")
        print(f"  {'M√©trica':<20} {'ERA5 vs CHIRPS':<18} {'Pred vs CHIRPS':<18} {'Diferencia':<15}")
        print(f"  {'-'*20} {'-'*18} {'-'*18} {'-'*15}")
        
        mae_diff = mae_pred_chirps - metrics_era5_chirps['mae']
        rmse_diff = rmse_pred_chirps - metrics_era5_chirps['rmse']
        r2_diff = r2_pred_chirps - metrics_era5_chirps['r2']
        
        print(f"  {'MAE (mm/d√≠a)':<20} {metrics_era5_chirps['mae']:<18.4f} {mae_pred_chirps:<18.4f} {mae_diff:+.4f}")
        print(f"  {'RMSE (mm/d√≠a)':<20} {metrics_era5_chirps['rmse']:<18.4f} {rmse_pred_chirps:<18.4f} {rmse_diff:+.4f}")
        print(f"  {'R¬≤':<20} {metrics_era5_chirps['r2']:<18.4f} {r2_pred_chirps:<18.4f} {r2_diff:+.4f}")
        
        print(f"\nüí° INTERPRETACI√ìN:")
        if mae_pred_chirps < metrics_era5_chirps['mae'] * 1.2:
            print(f"  ‚úÖ Predicciones mantienen calidad similar a ERA5")
        elif mae_pred_chirps < metrics_era5_chirps['mae'] * 1.5:
            print(f"  ‚ö†Ô∏è  Predicciones degradan moderadamente vs ERA5")
        else:
            print(f"  ‚ùå Predicciones degradan significativamente vs ERA5")
            
        if r2_pred_chirps > 0.5:
            print(f"  ‚úÖ Modelo captura bien variabilidad real (R¬≤ > 0.5)")
        elif r2_pred_chirps > 0.3:
            print(f"  ‚ö†Ô∏è  Modelo captura variabilidad moderada (R¬≤ > 0.3)")
        else:
            print(f"  ‚ùå Modelo no captura bien variabilidad real (R¬≤ < 0.3)")
    
    # Guardar m√©tricas
    metrics_pred_chirps = {
        'mae': mae_pred_chirps,
        'rmse': rmse_pred_chirps,
        'r2': r2_pred_chirps,
        'correlation_spatial_mean': corr_pred_mean,
        'correlation_spatial_std': corr_pred_std,
        'bias': bias_pred_chirps,
        'bias_abs': bias_pred_abs
    }
    
else:
    print("‚ö†Ô∏è  [AVISO] Saltando evaluaci√≥n predicciones vs CHIRPS (datos no disponibles)")
    metrics_pred_chirps = None

[AVISO] Saltando evaluaci√≥n (CHIRPS no disponible)


In [None]:
# ====================================================================================
# 7. VISUALIZACIONES
# ====================================================================================

if chirps_precip_regrid is not None:
    print("[INFO] Generando visualizaciones comparativas...")
    
    # ============================================================================
    # 7.1 MAPAS COMPARATIVOS: ERA5 vs CHIRPS vs Predicciones
    # ============================================================================
    
    # Seleccionar un d√≠a representativo (d√≠a con mayor precipitaci√≥n)
    day_idx = np.argmax(chirps_test_grid.mean(axis=(1,2)))
    day_date = test_dates[day_idx].strftime('%Y-%m-%d')
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    axes = axes.flatten()
    
    # Ground truth ERA5
    vmax = max(era5_test[day_idx].max(), chirps_test_grid[day_idx].max(), preds_test[day_idx].max())
    
    im0 = axes[0].imshow(era5_test[day_idx], cmap='Blues', vmin=0, vmax=vmax)
    axes[0].set_title(f'ERA5 (Training Data)\n{day_date}', fontsize=13, fontweight='bold')
    axes[0].axis('off')
    plt.colorbar(im0, ax=axes[0], fraction=0.046, label='mm/d√≠a')
    
    # CHIRPS (observaci√≥n real)
    im1 = axes[1].imshow(chirps_test_grid[day_idx], cmap='Blues', vmin=0, vmax=vmax)
    axes[1].set_title(f'CHIRPS Satellite (Ground Truth)\n{day_date}', fontsize=13, fontweight='bold')
    axes[1].axis('off')
    plt.colorbar(im1, ax=axes[1], fraction=0.046, label='mm/d√≠a')
    
    # Predicci√≥n del modelo
    im2 = axes[2].imshow(preds_test[day_idx], cmap='Blues', vmin=0, vmax=vmax)
    axes[2].set_title(f'AE+DMD Prediction\n{day_date}', fontsize=13, fontweight='bold')
    axes[2].axis('off')
    plt.colorbar(im2, ax=axes[2], fraction=0.046, label='mm/d√≠a')
    
    # Bias map (Predicci√≥n - CHIRPS)
    bias_map = preds_test[day_idx] - chirps_test_grid[day_idx]
    bias_max = max(abs(bias_map.min()), abs(bias_map.max()))
    im3 = axes[3].imshow(bias_map, cmap='RdBu_r', vmin=-bias_max, vmax=bias_max)
    axes[3].set_title(f'Bias (Pred - CHIRPS)\n{day_date}', fontsize=13, fontweight='bold')
    axes[3].axis('off')
    plt.colorbar(im3, ax=axes[3], fraction=0.046, label='mm/d√≠a')
    
    plt.suptitle('Validaci√≥n CHIRPS: Comparaci√≥n Espacial', fontsize=15, fontweight='bold', y=0.98)
    plt.tight_layout()
    plt.savefig(FIG_DIR / 'chirps_spatial_comparison.png', dpi=300, bbox_inches='tight')
    print(f"  ‚úÖ Guardado: {FIG_DIR / 'chirps_spatial_comparison.png'}")
    plt.show()
    
    # ============================================================================
    # 7.2 SCATTER PLOTS: ERA5 vs CHIRPS, Pred vs CHIRPS
    # ============================================================================
    
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # ERA5 vs CHIRPS
    axes[0].scatter(era5_test.flatten(), chirps_test_grid.flatten(), 
                    alpha=0.3, s=1, c='steelblue', edgecolors='none')
    axes[0].plot([0, era5_test.max()], [0, era5_test.max()], 'r--', linewidth=2, label='1:1 line')
    axes[0].set_xlabel('ERA5 (mm/d√≠a)', fontsize=12, fontweight='bold')
    axes[0].set_ylabel('CHIRPS Satellite (mm/d√≠a)', fontsize=12, fontweight='bold')
    axes[0].set_title(f'ERA5 vs CHIRPS\nR¬≤={r2_era5_chirps:.3f}, MAE={mae_era5_chirps:.3f}', 
                     fontsize=13, fontweight='bold')
    axes[0].legend(fontsize=10)
    axes[0].grid(True, alpha=0.3)
    axes[0].set_xlim(left=0)
    axes[0].set_ylim(bottom=0)
    
    # Predicciones vs CHIRPS
    axes[1].scatter(preds_test.flatten(), chirps_test_grid.flatten(), 
                    alpha=0.3, s=1, c='darkorange', edgecolors='none')
    axes[1].plot([0, preds_test.max()], [0, preds_test.max()], 'r--', linewidth=2, label='1:1 line')
    axes[1].set_xlabel('AE+DMD Predictions (mm/d√≠a)', fontsize=12, fontweight='bold')
    axes[1].set_ylabel('CHIRPS Satellite (mm/d√≠a)', fontsize=12, fontweight='bold')
    axes[1].set_title(f'Predictions vs CHIRPS\nR¬≤={r2_pred_chirps:.3f}, MAE={mae_pred_chirps:.3f}', 
                     fontsize=13, fontweight='bold')
    axes[1].legend(fontsize=10)
    axes[1].grid(True, alpha=0.3)
    axes[1].set_xlim(left=0)
    axes[1].set_ylim(bottom=0)
    
    plt.suptitle('Validaci√≥n CHIRPS: Scatter Plots', fontsize=15, fontweight='bold')
    plt.tight_layout()
    plt.savefig(FIG_DIR / 'chirps_scatter_plots.png', dpi=300, bbox_inches='tight')
    print(f"  ‚úÖ Guardado: {FIG_DIR / 'chirps_scatter_plots.png'}")
    plt.show()
    
    # ============================================================================
    # 7.3 SERIES TEMPORALES POR MACROZONA
    # ============================================================================
    
    # Definir macrozonas (Norte, Centro, Sur)
    lat_shape = era5_test.shape[1]
    norte_slice = slice(0, lat_shape//3)
    centro_slice = slice(lat_shape//3, 2*lat_shape//3)
    sur_slice = slice(2*lat_shape//3, lat_shape)
    
    # Calcular promedios espaciales por macrozona
    regions = {
        'Norte': norte_slice,
        'Centro': centro_slice,
        'Sur': sur_slice
    }
    
    fig, axes = plt.subplots(3, 1, figsize=(16, 12))
    
    for idx, (region_name, lat_slice) in enumerate(regions.items()):
        era5_regional = era5_test[:, lat_slice, :].mean(axis=(1, 2))
        chirps_regional = chirps_test_grid[:, lat_slice, :].mean(axis=(1, 2))
        preds_regional = preds_test[:, lat_slice, :].mean(axis=(1, 2))
        
        axes[idx].plot(test_dates[:len(era5_regional)], era5_regional, 
                      label='ERA5', linewidth=2, color='steelblue', marker='o', markersize=4)
        axes[idx].plot(test_dates[:len(chirps_regional)], chirps_regional, 
                      label='CHIRPS (Ground Truth)', linewidth=2, color='green', marker='s', markersize=4)
        axes[idx].plot(test_dates[:len(preds_regional)], preds_regional, 
                      label='AE+DMD Prediction', linewidth=2, color='darkorange', marker='^', markersize=4)
        
        axes[idx].set_ylabel('Precipitaci√≥n (mm/d√≠a)', fontsize=11, fontweight='bold')
        axes[idx].set_title(f'Regi√≥n {region_name}', fontsize=12, fontweight='bold')
        axes[idx].legend(fontsize=10, loc='upper right')
        axes[idx].grid(True, alpha=0.3)
        axes[idx].tick_params(axis='x', rotation=45)
        
    axes[-1].set_xlabel('Fecha', fontsize=11, fontweight='bold')
    plt.suptitle('Validaci√≥n CHIRPS: Series Temporales por Macrozona', fontsize=15, fontweight='bold')
    plt.tight_layout()
    plt.savefig(FIG_DIR / 'chirps_timeseries_regions.png', dpi=300, bbox_inches='tight')
    print(f"  ‚úÖ Guardado: {FIG_DIR / 'chirps_timeseries_regions.png'}")
    plt.show()
    
    print("\n‚úÖ Visualizaciones completadas (3 figuras generadas)")
    
else:
    print("‚ö†Ô∏è  [AVISO] Saltando visualizaciones (CHIRPS no disponible)")
    print("\nüìä Visualizaciones planificadas (disponibles con datos CHIRPS):")
    print("  1. Mapas comparativos: ERA5 vs CHIRPS vs Predicciones")
    print("  2. Scatter plots: ERA5 vs CHIRPS, Pred vs CHIRPS")
    print("  3. Series temporales por macrozona (3 fuentes)")
    print("  4. Mapas de bias: (Pred - CHIRPS)")
    print("\nüíæ Para ejecutar: descargar CHIRPS con src/utils/download_chirps.py")

[INFO] Visualizaciones planificadas:
 1. Mapas comparativos: ERA5 vs CHIRPS vs Predicciones
 2. Series temporales por macrozona (3 fuentes)
 3. Scatter plots: ERA5 vs CHIRPS, Pred vs CHIRPS
 4. Mapas de bias: (ERA5 - CHIRPS), (Pred - CHIRPS)
 5. Distribuciones acumuladas por regi√≥n

[NOTA] Ejecutar primero: src/utils/download_chirps.py


In [None]:
# ====================================================================================
# 8. RESUMEN Y CONCLUSIONES
# ====================================================================================

print("\n" + "="*80)
print("RESUMEN - Validaci√≥n CHIRPS")
print("="*80)

if chirps_precip_regrid is not None:
    print("\n‚úÖ VALIDACI√ìN COMPLETADA CON DATOS CHIRPS")
    
    print("\nüìä RESULTADOS PRINCIPALES:")
    print("\n1. ERA5 vs CHIRPS (Validaci√≥n de Datos de Entrenamiento)")
    print(f"   R¬≤ = {metrics_era5_chirps['r2']:.4f}")
    print(f"   MAE = {metrics_era5_chirps['mae']:.4f} mm/d√≠a")
    print(f"   RMSE = {metrics_era5_chirps['rmse']:.4f} mm/d√≠a")
    print(f"   Bias = {metrics_era5_chirps['bias']:+.4f} mm/d√≠a")
    
    print("\n2. Predicciones vs CHIRPS (Validaci√≥n del Modelo)")
    print(f"   R¬≤ = {metrics_pred_chirps['r2']:.4f}")
    print(f"   MAE = {metrics_pred_chirps['mae']:.4f} mm/d√≠a")
    print(f"   RMSE = {metrics_pred_chirps['rmse']:.4f} mm/d√≠a")
    print(f"   Bias = {metrics_pred_chirps['bias']:+.4f} mm/d√≠a")
    
    print("\n3. Degradaci√≥n del Modelo vs ERA5")
    mae_degradation = ((metrics_pred_chirps['mae'] / metrics_era5_chirps['mae']) - 1) * 100
    r2_degradation = ((metrics_era5_chirps['r2'] - metrics_pred_chirps['r2']) / metrics_era5_chirps['r2']) * 100
    print(f"   Degradaci√≥n MAE: {mae_degradation:+.1f}%")
    print(f"   Degradaci√≥n R¬≤: {r2_degradation:+.1f}%")
    
    print("\nüí° CONCLUSIONES:")
    
    # Validaci√≥n ERA5
    if metrics_era5_chirps['r2'] > 0.7:
        print("\n‚úÖ ERA5 es una excelente fuente de datos de entrenamiento")
        print("   - Alta correlaci√≥n con observaciones satelitales (R¬≤ > 0.7)")
        print("   - Los datos de entrenamiento son representativos de la realidad")
    elif metrics_era5_chirps['r2'] > 0.5:
        print("\n‚úÖ ERA5 es una buena fuente de datos de entrenamiento")
        print("   - Correlaci√≥n adecuada con observaciones satelitales (R¬≤ > 0.5)")
    else:
        print("\n‚ö†Ô∏è  ERA5 tiene correlaci√≥n moderada con CHIRPS")
        print("   - Considerar incorporar datos CHIRPS en entrenamiento futuro")
    
    # Validaci√≥n del modelo
    if metrics_pred_chirps['r2'] > 0.5:
        print("\n‚úÖ El modelo AE+DMD generaliza bien a observaciones reales")
        print("   - Captura >50% de variabilidad real (R¬≤ > 0.5)")
        print("   - Las predicciones son confiables para operaci√≥n")
    elif metrics_pred_chirps['r2'] > 0.3:
        print("\n‚ö†Ô∏è  El modelo AE+DMD tiene capacidad predictiva moderada")
        print("   - Captura 30-50% de variabilidad real")
        print("   - √ötil para tendencias, requiere calibraci√≥n para valores exactos")
    else:
        print("\n‚ùå El modelo AE+DMD no captura bien la variabilidad real")
        print("   - Requiere mejoras en arquitectura o datos de entrenamiento")
    
    # Degradaci√≥n
    if abs(mae_degradation) < 20:
        print("\n‚úÖ Degradaci√≥n m√≠nima del modelo vs ERA5 (<20%)")
        print("   - El modelo mantiene la calidad de los datos de entrenamiento")
    elif abs(mae_degradation) < 50:
        print("\n‚ö†Ô∏è  Degradaci√≥n moderada del modelo vs ERA5 (20-50%)")
        print("   - Hay margen de mejora en capacidad de generalizaci√≥n")
    else:
        print("\n‚ùå Degradaci√≥n significativa del modelo vs ERA5 (>50%)")
        print("   - Revisar arquitectura, hiperpar√°metros o cantidad de datos")
    
    print("\nüìÅ ARCHIVOS GENERADOS:")
    print(f"   - chirps_spatial_comparison.png")
    print(f"   - chirps_scatter_plots.png")
    print(f"   - chirps_timeseries_regions.png")
    
    print("\nüî¨ PR√ìXIMOS PASOS RECOMENDADOS:")
    print("   1. Analizar regiones espec√≠ficas con mayor/menor error")
    print("   2. Validar otros horizontes de predicci√≥n (h=2, h=3, ...)")
    print("   3. Incorporar datos CHIRPS en pipeline de entrenamiento")
    print("   4. Evaluar modelos ensemble (ERA5 + CHIRPS)")
    
    # Guardar m√©tricas en archivo
    metrics_summary = {
        'era5_vs_chirps': metrics_era5_chirps,
        'predictions_vs_chirps': metrics_pred_chirps,
        'test_period': {
            'start': test_start_str,
            'end': test_end_str,
            'n_days': min_time
        }
    }
    
    metrics_path = DATA_DIR / 'processed' / 'chirps_validation_metrics.pkl'
    with open(metrics_path, 'wb') as f:
        pickle.dump(metrics_summary, f)
    print(f"\nüíæ M√©tricas guardadas: {metrics_path.relative_to(PROJECT_ROOT)}")
    
else:
    print("\n‚ö†Ô∏è  DATOS CHIRPS NO DISPONIBLES")
    print("\nüì• Para completar esta validaci√≥n:")
    print("   1. cd src/utils")
    print("   2. python download_chirps.py")
    print("   3. Esperar descarga (~2-4 GB, puede tardar varios minutos)")
    print("   4. Re-ejecutar este notebook")
    
    print("\nüåê Alternativas si la descarga falla:")
    print("   - FTP manual: ftp://ftp.chc.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/")
    print("   - Google Earth Engine: https://developers.google.com/earth-engine/datasets/catalog/UCSB-CHG_CHIRPS_DAILY")
    print("   - Climate Engine: https://climateengine.org/")
    
    print("\nüí° BENEFICIOS DE VALIDACI√ìN CHIRPS:")
    print("   ‚úÖ Validar que ERA5 representa bien la precipitaci√≥n real")
    print("   ‚úÖ Evaluar predicciones AE+DMD contra observaciones independientes")
    print("   ‚úÖ Identificar regiones con mayor/menor confiabilidad")
    print("   ‚úÖ Cuantificar degradaci√≥n del modelo en datos reales")
    print("   ‚úÖ Proveer confianza para operaci√≥n del sistema de pron√≥stico")

print("\n" + "="*80)


RESUMEN - Validaci√≥n CHIRPS

[AVISO] DATOS CHIRPS NO DISPONIBLES

Para completar esta validaci√≥n:
 1. Ejecutar: cd src/utils && python download_chirps.py
 2. Esperar descarga (~2-4 GB, puede tardar varios minutos)
 3. Re-ejecutar este notebook

Alternativas si la descarga falla:
 - FTP manual: ftp://ftp.chc.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/
 - Google Earth Engine: https://developers.google.com/earth-engine/datasets/catalog/UCSB-CHG_CHIRPS_DAILY
 - Climate Engine: https://climateengine.org/
