# Examen Final: Pronóstico de propagación de pandemia en EE.UU. usando modelos de series de tiempo

## Objetivo

Este notebook tiene como objetivo predecir la **propagación de la pandemia de COVID-19** en los estados de EE.UU., basándose en los datos de carga viral en aguas residuales. Usaremos modelos de series de tiempo como **LSTM** y **Transformer** para hacer predicciones sobre la evolución futura de la pandemia, con un enfoque de propagación entre estados utilizando una **matriz de adyacencia**.

### Fase 1: Definición del problema
1. **Predicción por Estado:** Entrenaremos un modelo generalizado usando datos de un estado (inicialmente Nueva York) y lo aplicaremos a todos los estados.
2. **Umbral de Riesgo:** Definiremos un umbral para identificar qué estados están en riesgo de entrar en una pandemia.
3. **Propagación a Estados Adyacentes:** Usaremos una matriz de adyacencia para simular cómo los estados vecinos influencian la propagación del virus.
4. **Modelo de Propagación Ajustado:** Entrenaremos el modelo ajustado para incluir el efecto de la proximidad geográfica entre los estados.

### Fase 2: Implementación técnica
1. Descargaremos los datos de **CDC Wastewater** utilizando la API del **National Wastewater Surveillance System (NWSS)**.
2. Limpieza y exploración de los datos, convirtiéndolos en una serie de tiempo adecuada para entrenamiento.
3. Entrenamiento de los modelos **LSTM** y **Transformer** para hacer predicciones de la carga viral.
4. Evaluación del rendimiento del modelo usando **MAE, RMSE y MAPE**.
5. Pronóstico para los próximos **30 días** y propagación entre estados adyacentes.

Notebook plantilla con estructura lista para:
- Construir serie de carga viral en aguas residuales (CDC NWSS).
- Hacer split temporal train/val/test sin fuga.
- Escalar solo con train y aplicar a val/test.
- Crear ventanas deslizantes por split.
- Entrenar LSTM, comparar vs baseline y hacer pronóstico futuro.
- Extender a propagación entre estados con matriz de adyacencia (TODO).


## 1. Imports, configuración de paths y seeds


In [None]:
import os
import random
from pathlib import Path
from typing import Tuple, Dict, List

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import requests

from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam    
from tensorflow.keras.layers import (
    Input,
    MultiHeadAttention,
    LayerNormalization,
    GlobalAveragePooling1D,
    Dense,
    Dropout,
)
from tensorflow.keras import Model


# Paths base
PROJECT_ROOT = Path('.')
DATA_DIR = PROJECT_ROOT / 'data'
LANDING_DIR = DATA_DIR / 'landing'
PROCESSED_DIR = DATA_DIR / 'processed'
FIGURES_DIR = PROJECT_ROOT / 'figures'
MODELS_DIR = PROJECT_ROOT / 'models'

for d in (DATA_DIR, LANDING_DIR, PROCESSED_DIR, FIGURES_DIR, MODELS_DIR):
    d.mkdir(parents=True, exist_ok=True)

# Seed global
GLOBAL_SEED = 42

def set_global_seed(seed: int = 42):
    global GLOBAL_SEED
    GLOBAL_SEED = seed
    random.seed(seed)
    np.random.seed(seed)

set_global_seed(GLOBAL_SEED)
print(f'Seed global fijada en {GLOBAL_SEED}')
print('PROJECT_ROOT:', PROJECT_ROOT.resolve())


Seed global fijada en 42
PROJECT_ROOT: C:\Users\esteb\apps\Wastewater-SARS-CoV-2


## 2. Helpers: ventanas, métricas, gráficas y forecasting


In [22]:
def create_sliding_windows(
    series: np.ndarray,
    window_size: int,
    horizon: int = 1,
    stride: int = 1,
) -> Tuple[np.ndarray, np.ndarray]:
    """Crea ventanas deslizantes sobre una serie ya escalada o cruda.
    Se asume que se usa con un split (train / val / test) a la vez.
    """
    series = np.asarray(series).astype(float)
    T = len(series)
    if T < window_size + horizon:
        raise ValueError('Serie demasiado corta para el window_size y horizon dados.')

    X, y = [], []
    last_start = T - window_size - horizon + 1
    for start in range(0, last_start, stride):
        end = start + window_size
        target_end = end + horizon
        X.append(series[start:end])
        y.append(series[end:target_end])

    X = np.stack(X)
    y = np.stack(y)
    return X, y


def regression_metrics(y_true: np.ndarray, y_pred: np.ndarray) -> Dict[str, float]:
    y_true = np.asarray(y_true).reshape(-1)
    y_pred = np.asarray(y_pred).reshape(-1)
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    eps = 1e-8
    mape = np.mean(np.abs((y_true - y_pred) / np.maximum(np.abs(y_true), eps))) * 100
    return {'MAE': mae, 'RMSE': rmse, 'MAPE': mape}


def plot_time_series(dates, values, title: str = '', ylabel: str = ''):
    dates = pd.to_datetime(dates)
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=dates, y=values, mode='lines+markers', name='serie'))
    fig.update_layout(title=title, xaxis_title='Fecha', yaxis_title=ylabel, template='plotly_white')
    return fig


def plot_history_vs_pred(dates, y_true, y_pred, title: str = 'Histórico vs predicción'):
    dates = pd.to_datetime(dates)
    y_true = np.asarray(y_true).reshape(-1)
    y_pred = np.asarray(y_pred).reshape(-1)
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=dates, y=y_true, mode='lines', name='Real'))
    fig.add_trace(go.Scatter(x=dates, y=y_pred, mode='lines', name='Predicho', line=dict(dash='dash')))
    fig.update_layout(title=title, xaxis_title='Fecha', yaxis_title='Valor', template='plotly_white')
    return fig


def plot_future_forecast(history_dates, history_values, future_dates, future_preds, title: str = 'Pronóstico futuro'):
    history_dates = pd.to_datetime(history_dates)
    future_dates = pd.to_datetime(future_dates)
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=history_dates, y=history_values, mode='lines', name='Histórico'))
    fig.add_trace(go.Scatter(x=future_dates, y=future_preds, mode='lines+markers', name='Pronóstico', line=dict(dash='dash')))
    fig.update_layout(title=title, xaxis_title='Fecha', yaxis_title='Valor', template='plotly_white')
    return fig


def default_predict_fn(model, X):
    if hasattr(model, 'predict'):
        return model.predict(X)
    return model(X)


def recursive_forecast(model, last_window: np.ndarray, n_future: int, predict_fn=default_predict_fn) -> np.ndarray:
    """Pronóstico recursivo n_future pasos usando modelo one-step-ahead (en espacio escalado)."""
    window = np.asarray(last_window).reshape(1, -1)
    preds = []
    for _ in range(n_future):
        y_hat = predict_fn(model, window)
        y_hat = np.asarray(y_hat).reshape(-1)
        y_next = float(y_hat[0])
        preds.append(y_next)
        window = np.roll(window, -1, axis=1)
        window[0, -1] = y_next
    return np.array(preds)

def lstm_predict_fn(model, X_2d: np.ndarray):
    """
    X_2d: shape (batch, window_size)
    Lo convertimos a (batch, window_size, 1) para el LSTM.
    """
    X_2d = np.asarray(X_2d)
    X_3d = X_2d.reshape(X_2d.shape[0], X_2d.shape[1], 1)
    return model.predict(X_3d)

def transformer_predict_fn(model, X_2d: np.ndarray):
    """
    X_2d: shape (batch, window_size)
    Lo inflamos a (batch, window_size, 1) para el Transformer.
    """
    X_2d = np.asarray(X_2d)
    X_3d = X_2d.reshape(X_2d.shape[0], X_2d.shape[1], 1)
    return model.predict(X_3d)

print('Helpers cargados.')

Helpers cargados.


## 3. Descarga de datos NWSS (CDC)


In [4]:
CDC_WASTEWATER_CSV_URL = 'https://data.cdc.gov/api/views/j9g8-acpt/rows.csv?accessType=DOWNLOAD'
CDC_WASTEWATER_FULL_CSV_PATH = LANDING_DIR / 'cdc_wastewater_sarscov2_full.csv'

def download_cdc_wastewater_full(url: str = CDC_WASTEWATER_CSV_URL,
                                 out_path: Path = CDC_WASTEWATER_FULL_CSV_PATH,
                                 use_cache: bool = True) -> Path:
    out_path = Path(out_path)
    if use_cache and out_path.exists():
        print(f'[INFO] Usando archivo en caché: {out_path}')
        return out_path
    print(f'[INFO] Descargando datos NWSS desde {url}')
    r = requests.get(url)
    r.raise_for_status()
    out_path.write_bytes(r.content)
    print(f'[OK] Guardado en {out_path}')
    return out_path

# download_cdc_wastewater_full()


## 4. Carga y exploración rápida


In [5]:
df_raw = pd.read_csv(CDC_WASTEWATER_FULL_CSV_PATH)
print('Shape bruto:', df_raw.shape)
display(df_raw.head())
print(df_raw.columns.tolist())

  df_raw = pd.read_csv(CDC_WASTEWATER_FULL_CSV_PATH)


Shape bruto: (504369, 35)


Unnamed: 0,sewershed_id,wwtp_jurisdiction,county_fips,counties_served,population_served,sample_id,sample_collect_date,sample_type,sample_matrix,sample_location,...,pcr_target_flowpop_lin,pcr_target_mic_lin,hum_frac_target_mic,hum_frac_mic_conc,hum_frac_mic_unit,rec_eff_percent,rec_eff_target_name,rec_eff_spike_matrix,rec_eff_spike_conc,date_updated
0,711,me,23019,Penobscot,2500,000516f8c0f05102d4a010b987f62273,2023-11-28,24-hr flow-weighted composite,raw wastewater,wwtp,...,363459900.0,0.02864,pepper mild mottle virus,10025970.0,copies/l wastewater,68.74511,brsv vaccine,raw sample post pasteurization,5.34357,09/26/2025 10:40:00 AM
1,1809,ri,44007,Providence,10000,040ce1a855db659d046911c5d5758314,2023-07-05,24-hr time-weighted composite,raw wastewater,wwtp,...,42671050.0,0.00092,pepper mild mottle virus,177280900.0,copies/l wastewater,80.15343,brsv vaccine,raw sample post pasteurization,5.0,09/26/2025 10:40:00 AM
2,322,fl,12115,Sarasota,100000,052760ee8f2bec3e7e4ac25f5bff23b4,2023-08-14,24-hr flow-weighted composite,post grit removal,wwtp,...,126357300.0,0.00455,pepper mild mottle virus,116378400.0,copies/l wastewater,20.63618,brsv vaccine,raw sample post pasteurization,5.34357,09/26/2025 10:40:00 AM
3,524,in,18113,Noble,10000,0dd046c819c8214f02eb79056e57978a,2023-12-18,24-hr time-weighted composite,raw wastewater,wwtp,...,51707960.0,0.00177,pmmov (gt-digital),47520000.0,copies/l wastewater,33.02887,bcov vaccine,raw sample,5.45,09/26/2025 10:40:00 AM
4,694,me,23001,Androscoggin,60000,0e08cd627f3702430558aaf38aefa6e4,2023-09-13,24-hr flow-weighted composite,raw wastewater,wwtp,...,106669700.0,0.01578,pepper mild mottle virus,14101230.0,copies/l wastewater,6.26729,brsv vaccine,raw sample post pasteurization,5.34357,09/26/2025 10:40:00 AM


['sewershed_id', 'wwtp_jurisdiction', 'county_fips', 'counties_served', 'population_served', 'sample_id', 'sample_collect_date', 'sample_type', 'sample_matrix', 'sample_location', 'flow_rate', 'concentration_method', 'pasteurized', 'pcr_type', 'extraction_method', 'major_lab_method', 'inhibition_detect', 'inhibition_adjust', 'ntc_amplify', 'pcr_target', 'pcr_gene_target_agg', 'pcr_target_avg_conc', 'pcr_target_units', 'lod_sewage', 'pcr_target_avg_conc_lin', 'pcr_target_flowpop_lin', 'pcr_target_mic_lin', 'hum_frac_target_mic', 'hum_frac_mic_conc', 'hum_frac_mic_unit', 'rec_eff_percent', 'rec_eff_target_name', 'rec_eff_spike_matrix', 'rec_eff_spike_conc', 'date_updated']


## 5. Preprocesamiento base y construcción de una serie (ej. estado NY)

Se elige un estado base (ej. NY) y se agrega la carga viral por fecha usando la mediana.


In [6]:
DATE_COL = 'sample_collect_date'
TARGET_COL = 'pcr_target_flowpop_lin'
STATE_COL = 'wwtp_jurisdiction'

df_raw[DATE_COL] = pd.to_datetime(df_raw[DATE_COL], errors='coerce')
df = df_raw.dropna(subset=[DATE_COL, TARGET_COL]).copy()

STATE = 'ny'  # puedes cambiar el estado base

df_state = df.query('wwtp_jurisdiction == @STATE')[[DATE_COL, TARGET_COL]].copy()
df_state = (
    df_state
    .groupby(DATE_COL, as_index=False)[TARGET_COL]
    .median()
    .sort_values(DATE_COL)
)
df_state = df_state.rename(columns={DATE_COL: 'date', TARGET_COL: 'target'})
df_state = df_state.dropna(subset=['date', 'target'])

print('Estado base:', STATE)
print('Rango de fechas:', df_state['date'].min(), '→', df_state['date'].max())
print('N observaciones:', len(df_state))
display(df_state.head())

fig = plot_time_series(df_state['date'], df_state['target'],
                       title=f'Serie – estado {STATE} (flow/pop)', ylabel='target')
fig.show()

Estado base: ny
Rango de fechas: 2020-08-31 00:00:00 → 2025-09-17 00:00:00
N observaciones: 1366


Unnamed: 0,date,target
0,2020-08-31,2832473.0
1,2020-09-02,596.5088
2,2020-09-04,23559.29
3,2020-09-06,622.6106
4,2020-09-08,5880741.0


## 6. Split temporal + escalado (sin fuga de información)

Primero se hace el split temporal sobre la serie cruda. Luego se ajusta el scaler con train y se aplica a val/test.


In [None]:
TRAIN_FRAC = 0.7
VAL_FRAC = 0.15
TEST_FRAC = 1.0 - TRAIN_FRAC - VAL_FRAC

SCALER_TYPE = 'minmax'  # minmax o standard

series = df_state.set_index('date')['target'].astype(float).sort_index()
n = len(series)
n_train = int(n * TRAIN_FRAC)
n_val = int(n * VAL_FRAC)
n_test = n - n_train - n_val

train_series = series.iloc[:n_train]
val_series   = series.iloc[n_train:n_train + n_val]
test_series  = series.iloc[n_train + n_val:]

print('Sizes → train:', len(train_series), 'val:', len(val_series), 'test:', len(test_series))

if SCALER_TYPE == 'minmax':
    scaler = MinMaxScaler()
elif SCALER_TYPE == 'standard':
    scaler = StandardScaler()
else:
    raise ValueError('SCALER_TYPE debe ser minmax o standard')

train_scaled = scaler.fit_transform(train_series.values.reshape(-1, 1)).reshape(-1)
val_scaled   = scaler.transform(val_series.values.reshape(-1, 1)).reshape(-1)
test_scaled  = scaler.transform(test_series.values.reshape(-1, 1)).reshape(-1)

print('Rango escalado train → min:', float(train_scaled.min()), 'max:', float(train_scaled.max()))


Sizes → train: 956 val: 204 test: 206
Rango escalado train → min: 0.0 max: 0.9999999999999998


## 7. Ventanas deslizantes por split (ya escalados)


In [8]:
WINDOW_SIZE = 30
HORIZON = 1

X_train, y_train = create_sliding_windows(train_scaled, WINDOW_SIZE, HORIZON)
X_val,   y_val   = create_sliding_windows(val_scaled,   WINDOW_SIZE, HORIZON)
X_test,  y_test  = create_sliding_windows(test_scaled,  WINDOW_SIZE, HORIZON)

print('X_train:', X_train.shape, 'y_train:', y_train.shape)
print('X_val  :', X_val.shape,   'y_val  :', y_val.shape)
print('X_test :', X_test.shape,  'y_test :', y_test.shape)

train_dates = train_series.index[WINDOW_SIZE:WINDOW_SIZE + len(y_train)]
val_dates   = val_series.index[WINDOW_SIZE:WINDOW_SIZE + len(y_val)]
test_dates  = test_series.index[WINDOW_SIZE:WINDOW_SIZE + len(y_test)]

print('Rango fechas train ventanas:', train_dates[0], '→', train_dates[-1])
print('Rango fechas test ventanas :', test_dates[0],  '→', test_dates[-1])


X_train: (926, 30) y_train: (926, 1)
X_val  : (174, 30) y_val  : (174, 1)
X_test : (176, 30) y_test : (176, 1)
Rango fechas train ventanas: 2020-10-27 00:00:00 → 2024-05-15 00:00:00
Rango fechas test ventanas : 2025-02-11 00:00:00 → 2025-09-17 00:00:00


## 8. Modelo baseline ingenuo (último valor)


In [10]:
class NaiveLastValueModel:
    def fit(self, X, y=None):
        return self
    def predict(self, X):
        X = np.asarray(X)
        return X[:, -1:].copy()

baseline = NaiveLastValueModel().fit(X_train, y_train)

y_test_pred_baseline_scaled = baseline.predict(X_test).reshape(-1)
y_test_true_scaled = y_test.reshape(-1)

y_test_true = scaler.inverse_transform(y_test_true_scaled.reshape(-1, 1)).reshape(-1)
y_test_pred_baseline = scaler.inverse_transform(y_test_pred_baseline_scaled.reshape(-1, 1)).reshape(-1)

metrics_base = regression_metrics(y_test_true, y_test_pred_baseline)
print('Baseline – métricas test:', metrics_base)

fig = plot_history_vs_pred(test_dates, y_test_true, y_test_pred_baseline,
                           title='Baseline - Test vs predicción')
fig.show()

Baseline – métricas test: {'MAE': 9242329.820107073, 'RMSE': 21335652.728351966, 'MAPE': 291.5344013946131}


## 9. Modelo LSTM (usando datos escalados)


In [11]:
X_train_lstm = X_train.reshape(-1, WINDOW_SIZE, 1)
X_val_lstm   = X_val.reshape(-1, WINDOW_SIZE, 1)
X_test_lstm  = X_test.reshape(-1, WINDOW_SIZE, 1)

model_lstm = Sequential([
    LSTM(64, input_shape=(WINDOW_SIZE, 1), return_sequences=False),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(HORIZON),
])

model_lstm.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['mae'])
model_lstm.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 64)                16896     
                                                                 
 dropout (Dropout)           (None, 64)                0         
                                                                 
 dense (Dense)               (None, 32)                2080      
                                                                 
 dense_1 (Dense)             (None, 1)                 33        
                                                                 
Total params: 19,009
Trainable params: 19,009
Non-trainable params: 0
_________________________________________________________________


In [12]:
EPOCHS = 50
BATCH_SIZE = 32

history_lstm = model_lstm.fit(
    X_train_lstm,
    y_train,
    validation_data=(X_val_lstm, y_val),
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    verbose=1,
)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## 10. Evaluación LSTM (métricas en escala original)


In [13]:
y_test_pred_lstm_scaled = model_lstm.predict(X_test_lstm).reshape(-1)
y_test_true_scaled = y_test.reshape(-1)

y_test_true_lstm = scaler.inverse_transform(y_test_true_scaled.reshape(-1, 1)).reshape(-1)
y_test_pred_lstm = scaler.inverse_transform(y_test_pred_lstm_scaled.reshape(-1, 1)).reshape(-1)

metrics_lstm = regression_metrics(y_test_true_lstm, y_test_pred_lstm)
print('LSTM – métricas test:', metrics_lstm)

fig = plot_history_vs_pred(test_dates, y_test_true_lstm, y_test_pred_lstm,
                           title='LSTM - Test vs predicción')
fig.show()

LSTM – métricas test: {'MAE': 9857281.52857145, 'RMSE': 16313087.38433361, 'MAPE': 555.2318711789375}


## 11. Pronóstico futuro con LSTM


In [18]:
N_FUTURE = 30  # días de pronóstico

# Última ventana del split completo (test) ya escalado
last_window_scaled = test_scaled[-WINDOW_SIZE:]

# Usamos el predict_fn especial para LSTM
future_scaled = recursive_forecast(
    model_lstm,
    last_window_scaled,
    N_FUTURE,
    predict_fn=lstm_predict_fn,  # 👈 aquí la magia
)

# Volvemos a escala original
future_preds = scaler.inverse_transform(future_scaled.reshape(-1, 1)).reshape(-1)

last_date = series.index[-1]
future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1),
                             periods=N_FUTURE, freq='D')

fig = plot_future_forecast(
    series.index,
    series.values,
    future_dates,
    future_preds,
    title=f'Pronóstico LSTM {N_FUTURE} días – estado {STATE}',
)
fig.show()



## 12. Modelos Transformer

In [20]:
# Datos en formato (batch, timesteps, features) para el Transformer
X_train_tf = X_train.reshape(-1, WINDOW_SIZE, 1)
X_val_tf   = X_val.reshape(-1, WINDOW_SIZE, 1)
X_test_tf  = X_test.reshape(-1, WINDOW_SIZE, 1)

# Definición directa del modelo Transformer
inputs_tf = Input(shape=(WINDOW_SIZE, 1), name="transformer_input")

# Proyección inicial a d_model = 64
x = Dense(64, name="proj_input")(inputs_tf)

# === BLOQUE TRANSFORMER 1 (self-attention + feed-forward) ===

# Self-attention (4 cabezas, key_dim=64)
attn1 = MultiHeadAttention(
    num_heads=4,
    key_dim=64,
    name="mha_1"
)(x, x)
attn1 = Dropout(0.1, name="drop_attn_1")(attn1)

# Residual + norm
out1 = LayerNormalization(epsilon=1e-6, name="ln_attn_1")(x + attn1)

# Feed-forward interno
ff1 = Dense(128, activation="relu", name="ffn1_1")(out1)
ff1 = Dropout(0.1, name="drop_ffn_1")(ff1)
ff1 = Dense(64, name="ffn1_2")(ff1)

# Residual + norm
x = LayerNormalization(epsilon=1e-6, name="ln_ffn_1")(out1 + ff1)

# === FIN BLOQUE TRANSFORMER ===
# Si quieres 2 bloques, literal copias/pegas este bloque cambiando los nombres.

# Pooling temporal
x = GlobalAveragePooling1D(name="gap")(x)

# Capas densas de salida
x = Dense(32, activation="relu", name="dense_out")(x)
outputs_tf = Dense(HORIZON, name="output")(x)

model_tf = Model(inputs=inputs_tf, outputs=outputs_tf, name="ts_transformer_simple")

model_tf.compile(
    optimizer=Adam(learning_rate=1e-3),
    loss="mse",
    metrics=["mae"],
)

model_tf.summary()

Model: "ts_transformer_simple"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 transformer_input (InputLayer)  [(None, 30, 1)]     0           []                               
                                                                                                  
 proj_input (Dense)             (None, 30, 64)       128         ['transformer_input[0][0]']      
                                                                                                  
 mha_1 (MultiHeadAttention)     (None, 30, 64)       66368       ['proj_input[0][0]',             
                                                                  'proj_input[0][0]']             
                                                                                                  
 drop_attn_1 (Dropout)          (None, 30, 64)       0           ['mha_1[0][0]

In [21]:
history_tf = model_tf.fit(
    X_train_tf,
    y_train,
    validation_data=(X_val_tf, y_val),
    epochs=40,
    batch_size=32,
    verbose=1,
)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


## 13. Evaluación Transformer

In [23]:
# Predicciones en ESCALA ESCALADA
y_test_pred_tf_scaled = model_tf.predict(X_test_tf).reshape(-1)
y_test_true_scaled    = y_test.reshape(-1)

# Volver a escala original
y_test_true_tf = scaler.inverse_transform(
    y_test_true_scaled.reshape(-1, 1)
).reshape(-1)

y_test_pred_tf = scaler.inverse_transform(
    y_test_pred_tf_scaled.reshape(-1, 1)
).reshape(-1)

metrics_tf = regression_metrics(y_test_true_tf, y_test_pred_tf)
print("Transformer – métricas test:", metrics_tf)

fig = plot_history_vs_pred(
    test_dates,
    y_test_true_tf,
    y_test_pred_tf,
    title="Transformer – Test vs predicción",
)
fig.show()

Transformer – métricas test: {'MAE': 40809449.86294326, 'RMSE': 42060802.41088082, 'MAPE': 2055.0887561729764}


## 14. Pronostico Futuro

In [24]:
N_FUTURE_TF = 30  # días de pronóstico con Transformer

last_window_scaled_tf = test_scaled[-WINDOW_SIZE:]

future_scaled_tf = recursive_forecast(
    model_tf,
    last_window_scaled_tf,
    N_FUTURE_TF,
    predict_fn=transformer_predict_fn,
)

future_preds_tf = scaler.inverse_transform(
    future_scaled_tf.reshape(-1, 1)
).reshape(-1)

last_date = series.index[-1]
future_dates_tf = pd.date_range(
    start=last_date + pd.Timedelta(days=1),
    periods=N_FUTURE_TF,
    freq="D",
)

fig = plot_future_forecast(
    series.index,
    series.values,
    future_dates_tf,
    future_preds_tf,
    title=f"Pronóstico Transformer {N_FUTURE_TF} días – estado {STATE}",
)
fig.show()




## 15. Conclusiones y hallazgos

Deja este bloque para redactar al final del examen:
- Comportamiento de la serie del estado base.
- Comparación baseline vs LSTM.
- Limitaciones del enfoque.
- Cómo se podría extender de forma robusta a un modelo multi-estado.
