# Multi-Horizon Traffic Forecasting on PeMS (Graph Models)

## Goal (Paper Claim)
Build a leakage-safe, reproducible pipeline on PeMS traffic data and evaluate multi-horizon forecasting models fairly.

Primary goal:
- Demonstrate the proposed **GraphWaveNet-GRU-LSTM** performs best on PeMS under the same train/val/test protocol.

Key principles:
- No time leakage (all statistics computed from train only).
- One shared dataset representation for all deep models: **X ∈ R^{T×N×F}, Y ∈ R^{T×N}**.
- One fixed evaluation harness (same horizons, same metrics, same seeds).
- Strong baselines + ablations:
  - HA / Persistence
  - GRU / LSTM (non-graph)
  - GraphWaveNet
  - GraphWaveNet+GRU
  - GraphWaveNet+LSTM
  - **GraphWaveNet+GRU+LSTM (proposed)**


In [1]:
!pip -q install -r requirements.txt


[0m

In [2]:
!pip -q install numpy pandas openpyxl scikit-learn torch tqdm


[0m

In [3]:
import os
import random
from pathlib import Path

import numpy as np
import pandas as pd

import torch
from tqdm.auto import tqdm

def set_seed(seed: int = 42, deterministic: bool = True):
    """
    Sets seeds for reproducibility.
    deterministic=True makes results more reproducible but can reduce speed.
    """
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    if deterministic:
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
    else:
        torch.backends.cudnn.deterministic = False
        torch.backends.cudnn.benchmark = True

SEED = 42
set_seed(SEED, deterministic=True)

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print("Torch:", torch.__version__)
print("Device:", DEVICE)
if DEVICE == "cuda":
    print("GPU:", torch.cuda.get_device_name(0))


Torch: 2.1.1+cu121
Device: cuda
GPU: Quadro P5000


## Configuration

We fix:
- Input window length (`IN_LEN`) and forecast horizon length (`OUT_LEN`)
- Train/val/test boundaries (time-based split)
- Station inclusion rule (coverage threshold)
- Output dataset artifact path (so every model uses the same processed dataset)

Important:
GraphWaveNet expects a consistent node set and continuous time axis,
so we build a clean matrix format (timestamp × station).
