# Irish Weather — Graph Neural Networks (GCN & GAT) on ERA5 (Dublin, Galway, Cork)

This notebook follows the **Graph Neural Networks** lecture and uses the same CSV as the previous notebooks:
`era5_ireland3_t2m_wind_2024.csv`

We build a small **3-node graph** for **Dublin (0), Galway (1), Cork (2)** and link neighbours:
- Edges: Dublin ↔ Galway, Galway ↔ Cork (plus self-loops).  
- Tasks (in increasing complexity):
  1. **Manual GCN step (NumPy)** — reproduce the slide's propagation rule with tiny numbers.
  2. **GCN (PyTorch Geometric)** — predict next-hour **temperature** for each city from the **last 24 hours** (per city).
  3. **GCN + simple temporal encoder** — add a tiny 1D Conv encoder over the last 24 h sequence before the GCN.
  4. **GAT (Graph Attention)** — final example using **all variables** (t2m + wind at all cities) with multi-head attention.

We keep code short, CPU-friendly, and heavily commented.

## 0) Setup

In [2]:
# If needed, install packages (uncomment appropriate lines).
# %pip install pandas numpy matplotlib scikit-learn torch
# PyTorch Geometric install depends on your torch + CUDA/CPU version.
# For CPU-only (example for torch>=2.4):
# %pip install --no-index torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-2.4.0+cpu.html
# %pip install torch-geometric

import warnings, pathlib
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler

try:
    from torch_geometric.data import Data
    from torch_geometric.nn import GCNConv, GATConv
    PYG_OK = True
except Exception as e:
    PYG_OK = False
    print("PyTorch Geometric not found. GCN/GAT sections will error if you run them. See install instructions above.")

CSV_PATH = "../../data/era5_ireland3_t2m_wind_2024.csv"
assert pathlib.Path(CSV_PATH).exists(), "Place the CSV next to this notebook."
device = "cpu"
print("Device:", device)

Device: cpu


## 1) Load ERA5 and define graph

In [3]:
# Load
df = pd.read_csv(CSV_PATH, parse_dates=['time']).sort_values('time').reset_index(drop=True)
print(df.head(3))
print("Coverage:", df['time'].min(), '→', df['time'].max(), f"({len(df)} hourly rows)")

# Node order: Dublin(0), Galway(1), Cork(2)
CITY_COLS_T2M = ['Dublin_t2m_degC','Galway_t2m_degC','Cork_t2m_degC']
CITY_COLS_WS  = ['Dublin_wind_speed10m_ms','Galway_wind_speed10m_ms','Cork_wind_speed10m_ms']

# Graph edges (undirected chain with self-loops): 0-1-2
edge_index = torch.tensor([
    [0,1, 1,0, 1,2, 2,1, 0,0, 1,1, 2,2],  # src
    [1,0, 2,1, 0,1, 1,2, 0,0, 1,1, 2,2],  # dst
], dtype=torch.long)  # shape (2, E)
print("edge_index shape:", edge_index.shape, "→ E=", edge_index.shape[1])

                 time  Dublin_t2m_degC  Galway_t2m_degC  Cork_t2m_degC  \
0 2024-01-01 00:00:00         5.763580         6.396393       5.617096   
1 2024-01-01 01:00:00         5.445953         5.922516       5.094391   
2 2024-01-01 02:00:00         5.434723         5.518707       4.604645   

   Dublin_wind_speed10m_ms  Galway_wind_speed10m_ms  Cork_wind_speed10m_ms  
0                 8.426119                 6.120335               4.927529  
1                 8.082199                 5.639297               4.454760  
2                 7.943061                 4.928046               3.922362  
Coverage: 2024-01-01 00:00:00 → 2024-12-31 23:00:00 (8784 hourly rows)
edge_index shape: torch.Size([2, 14]) → E= 14


## 2) Manual GCN step (NumPy) — matches the slides

In [4]:
# Tiny 3-node example like on the slides.
A = np.array([[0,1,0],
              [1,0,1],
              [0,1,0]], dtype=float)
I = np.eye(3)
A_hat = A + I
D = np.diag(A_hat.sum(axis=1))
D_inv_sqrt = np.linalg.inv(np.sqrt(D))
A_norm = D_inv_sqrt @ A_hat @ D_inv_sqrt  # normalized adjacency

# One-dimensional features H0 and a scalar weight W for simplicity
H0 = np.array([[1.0],
               [2.0],
               [3.0]])
W = np.array([[1.5]])
H1 = A_norm @ H0 @ W
print("A_norm=\n", np.round(A_norm,3))
print("H1=\n", np.round(H1,3))

A_norm=
 [[0.5   0.408 0.   ]
 [0.408 0.333 0.408]
 [0.    0.408 0.5  ]]
H1=
 [[1.975]
 [3.449]
 [3.475]]


## 3) Build time windows → per-window graph samples

We create **24→next** windows. Each training example is a graph with:
- **x**: node features for the 3 cities (simple baselines start with *last 24 h mean & last value*).
- **y**: next-hour **temperature** per node (3 values).

Later we’ll add a tiny **temporal encoder** and more variables (wind) and switch to **GAT**.

In [5]:
T = 24  # hours of history
# Scale temps per city using train-only stats
n = len(df)
i_tr, i_va = int(0.70*n), int(0.85*n)
sc_t2m = StandardScaler().fit(df[CITY_COLS_T2M].iloc[:i_tr])

def make_graph_windows(frame: pd.DataFrame, with_wind=False):
    X_graphs, Y_graphs, times = [], [], []
    values_t2m = sc_t2m.transform(frame[CITY_COLS_T2M].values)  # (N,3) scaled
    values_ws  = frame[CITY_COLS_WS].values if with_wind else None

    for t in range(len(frame) - T - 1):
        # Basic summary features per node for simplicity (keep it very simple for students)
        hist_t2m = values_t2m[t:t+T]           # (T,3) scaled
        last_t2m = values_t2m[t+T-1]           # (3,)
        mean_t2m = hist_t2m.mean(axis=0)       # (3,)
        std_t2m  = hist_t2m.std(axis=0)        # (3,)

        # Start with [last, mean, std] as 3 features per node → (3 nodes, 3 feats)
        x_feats = np.stack([last_t2m, mean_t2m, std_t2m], axis=1)  # (3,3)

        # Optional: include wind (mean over window) as an extra feature column
        if with_wind and values_ws is not None:
            hist_ws = values_ws[t:t+T]         # (T,3) raw m/s
            mean_ws = hist_ws.mean(axis=0)     # (3,)
            x_feats = np.concatenate([x_feats, mean_ws[:,None]], axis=1)  # (3,4)

        # Target: next-hour temps (scaled) for each city
        y_next = values_t2m[t+T]               # (3,)
        X_graphs.append(torch.tensor(x_feats, dtype=torch.float32))    # x
        Y_graphs.append(torch.tensor(y_next,  dtype=torch.float32))    # y
        times.append(frame['time'].iloc[t+T])
    return X_graphs, Y_graphs, times

# Build splits
train_df = df.iloc[:i_tr].copy()
val_df   = df.iloc[i_tr:i_va].copy()
test_df  = df.iloc[i_va:].copy()

Xtr, ytr, _ = make_graph_windows(train_df, with_wind=False)
Xva, yva, _ = make_graph_windows(val_df,   with_wind=False)
Xte, yte, _ = make_graph_windows(test_df,  with_wind=False)

print(len(Xtr), len(Xva), len(Xte), "graph windows (train/val/test)")

6123 1293 1293 graph windows (train/val/test)


## 4) PyG dataset wrappers (3-node graph per window)

In [6]:
class GraphWindowDataset(Dataset):
    def __init__(self, X_list, y_list, edge_index):
        self.X = X_list
        self.y = y_list
        self.edge_index = edge_index
    def __len__(self): return len(self.X)
    def __getitem__(self, i):
        x = self.X[i]                 # (3, F)
        y = self.y[i].unsqueeze(1)    # (3, 1) — predict each node's next-hour temp
        data = Data(x=x, edge_index=edge_index)
        data.y = y
        return data

if PYG_OK:
    tr_ds = GraphWindowDataset(Xtr, ytr, edge_index)
    va_ds = GraphWindowDataset(Xva, yva, edge_index)
    te_ds = GraphWindowDataset(Xte, yte, edge_index)

    def pyg_collate(batch):
        # Batch small graphs by stacking nodes and offsetting edge indices automatically via PyG Batch? 
        # To keep this ultra-simple and transparent, we will process per-sample without PyG's Batch.
        return batch

    tr_dl = DataLoader(tr_ds, batch_size=1, shuffle=True, collate_fn=pyg_collate)
    va_dl = DataLoader(va_ds, batch_size=1, shuffle=False, collate_fn=pyg_collate)
    te_dl = DataLoader(te_ds, batch_size=1, shuffle=False, collate_fn=pyg_collate)

    print("PyG datasets ready (batch_size=1 for clarity)")

PyG datasets ready (batch_size=1 for clarity)


## 5) Example 1 — **GCN**: last/mean/std → next-hour temps (per node)

In [7]:
if not PYG_OK:
    raise SystemExit("Install PyTorch Geometric to run this section.")

in_feats = Xtr[0].shape[1]   # 3
hidden = 16

class SimpleGCN(nn.Module):
    def __init__(self, in_feats, hidden, out_dim=1):
        super().__init__()
        self.conv1 = GCNConv(in_feats, hidden)
        self.conv2 = GCNConv(hidden, hidden)
        self.head  = nn.Linear(hidden, out_dim)  # per-node regressor
    def forward(self, data: Data):
        x, ei = data.x, data.edge_index
        x = torch.relu(self.conv1(x, ei))
        x = torch.relu(self.conv2(x, ei))
        out = self.head(x)  # (3,1)
        return out

torch.manual_seed(0)
model_gcn = SimpleGCN(in_feats, hidden).to(device)
opt = torch.optim.Adam(model_gcn.parameters(), lr=3e-3)
loss_fn = nn.MSELoss()

def loop_epoch(model, loader, train=True):
    model.train(train)
    tot, n = 0.0, 0
    for batch in loader:
        # We used batch_size=1; batch is [Data]
        data = batch[0]
        if train:
            opt.zero_grad()
        pred = model(data)
        loss = loss_fn(pred, data.y)
        if train:
            loss.backward()
            opt.step()
        tot += loss.item()
        n += 1
    return tot / max(n,1)

for ep in range(10):
    tr = loop_epoch(model_gcn, tr_dl, train=True)
    va = loop_epoch(model_gcn, va_dl, train=False)
    print(f"Epoch {ep+1:02d} | train MSE={tr:.4f} | val MSE={va:.4f}")

# Report RMSE (°C) on test
model_gcn.eval()
preds, trues = [], []
with torch.no_grad():
    for batch in te_dl:
        data = batch[0]
        p = model_gcn(data).squeeze(1).cpu().numpy()  # (3,)
        t = data.y.squeeze(1).cpu().numpy()           # (3,)
        preds.append(p); trues.append(t)
preds = np.vstack(preds); trues = np.vstack(trues)      # (#samples, 3)

# inverse-transform back to °C for each city separately
# We scaled per-city jointly using sc_t2m, so we invert city-wise.
inv_pred = sc_t2m.inverse_transform(preds)
inv_true = sc_t2m.inverse_transform(trues)
rmse_each = np.sqrt(((inv_pred - inv_true)**2).mean(axis=0))
print({k: float(v) for k,v in zip(['Dublin','Galway','Cork'], rmse_each)})

Epoch 01 | train MSE=0.0545 | val MSE=0.0372
Epoch 02 | train MSE=0.0428 | val MSE=0.0333
Epoch 03 | train MSE=0.0415 | val MSE=0.0371
Epoch 04 | train MSE=0.0411 | val MSE=0.0336
Epoch 05 | train MSE=0.0412 | val MSE=0.0329
Epoch 06 | train MSE=0.0406 | val MSE=0.0346
Epoch 07 | train MSE=0.0405 | val MSE=0.0354
Epoch 08 | train MSE=0.0398 | val MSE=0.0336
Epoch 09 | train MSE=0.0396 | val MSE=0.0336
Epoch 10 | train MSE=0.0397 | val MSE=0.0338
{'Dublin': 0.8208643198013306, 'Galway': 0.9079791307449341, 'Cork': 0.9440993070602417}


## 6) Example 2 — **Temporal encoder + GCN**

Instead of simple [last/mean/std] features, use a tiny **1D Conv** to encode the **last 24 h sequence** of each city's temperature to a vector per node, then apply a **GCN**. This mirrors the slides’ idea of combining temporal and spatial learning in a simple way.

In [8]:
# Build new feature windows using a tiny temporal encoder on raw sequences.
# For teaching clarity, we keep it short and CPU-friendly.
T = 24
vals_t2m_all = sc_t2m.transform(df[CITY_COLS_T2M].values)  # (N,3) scaled

def make_seq_tensors(frame):
    base = sc_t2m.transform(frame[CITY_COLS_T2M].values)  # (N,3)
    Xseq, Y, times = [], [], []
    for t in range(len(frame) - T - 1):
        # sequence per city: (T, 3) → we will encode per node
        seq = base[t:t+T]          # (T,3)
        target = base[t+T]         # (3,)
        Xseq.append(torch.tensor(seq, dtype=torch.float32))  # store raw (T,3)
        Y.append(torch.tensor(target, dtype=torch.float32))
        times.append(frame['time'].iloc[t+T])
    return Xseq, Y, times

Xtr_seq, ytr_seq, _ = make_seq_tensors(train_df)
Xva_seq, yva_seq, _ = make_seq_tensors(val_df)
Xte_seq, yte_seq, _ = make_seq_tensors(test_df)

class SeqEncGCNDataset(Dataset):
    def __init__(self, Xseq, y, edge_index):
        self.Xseq = Xseq
        self.y = y
        self.edge_index = edge_index
    def __len__(self): return len(self.Xseq)
    def __getitem__(self, i):
        # Encode sequences per node inside the model (clear & didactic)
        data = Data(edge_index=edge_index)
        data.seq = self.Xseq[i]           # (T,3)
        data.y   = self.y[i].unsqueeze(1) # (3,1)
        return data

if PYG_OK:
    tr2_ds = SeqEncGCNDataset(Xtr_seq, ytr_seq, edge_index)
    va2_ds = SeqEncGCNDataset(Xva_seq, yva_seq, edge_index)
    te2_ds = SeqEncGCNDataset(Xte_seq, yte_seq, edge_index)
    tr2_dl = DataLoader(tr2_ds, batch_size=1, shuffle=True, collate_fn=lambda b: b)
    va2_dl = DataLoader(va2_ds, batch_size=1, shuffle=False, collate_fn=lambda b: b)
    te2_dl = DataLoader(te2_ds, batch_size=1, shuffle=False, collate_fn=lambda b: b)

class TinySeqEncoder(nn.Module):
    """Encode a (T,3) temperature sequence into (3,F) node features via shared 1D convs.
    We treat each city's sequence as a separate channel and apply depthwise-ish conv by looping (for clarity).
    """
    def __init__(self, T=24, out_feats=8):
        super().__init__()
        # A simple MLP over the T-length vector per city (clearer than conv for slides)
        self.mlp = nn.Sequential(
            nn.Linear(T, 32), nn.ReLU(),
            nn.Linear(32, out_feats)
        )
    def forward(self, seq):  # seq: (T,3) scaled
        # Split by city: shape -> list of (T,)
        outs = []
        for c in range(3):
            v = seq[:, c]                 # (T,)
            z = self.mlp(v)               # (out_feats,)
            outs.append(z)
        x = torch.stack(outs, dim=0)      # (3, out_feats)
        return x

class SeqEncGCN(nn.Module):
    def __init__(self, enc_feats=8, hidden=16, out_dim=1):
        super().__init__()
        self.enc = TinySeqEncoder(T=24, out_feats=enc_feats)
        self.gcn1 = GCNConv(enc_feats, hidden)
        self.gcn2 = GCNConv(hidden, hidden)
        self.head = nn.Linear(hidden, out_dim)
    def forward(self, data: Data):
        # encode per-node features from sequences
        x = self.enc(data.seq)                 # (3, F)
        x = torch.relu(self.gcn1(x, data.edge_index))
        x = torch.relu(self.gcn2(x, data.edge_index))
        return self.head(x)                    # (3,1)

if PYG_OK:
    torch.manual_seed(0)
    model_seqgcn = SeqEncGCN().to(device)
    opt2 = torch.optim.Adam(model_seqgcn.parameters(), lr=3e-3)
    loss_fn = nn.MSELoss()

    def run_epoch(model, loader, train=True):
        model.train(train)
        tot = 0.0; n=0
        for batch in loader:
            data = batch[0]
            if train: opt2.zero_grad()
            pred = model(data)
            loss = loss_fn(pred, data.y)
            if train: loss.backward(); opt2.step()
            tot += loss.item(); n += 1
        return tot/max(n,1)

    for ep in range(10):
        tr = run_epoch(model_seqgcn, tr2_dl, train=True)
        va = run_epoch(model_seqgcn, va2_dl, train=False)
        print(f"Epoch {ep+1:02d} | train MSE={tr:.4f} | val MSE={va:.4f}")

    # Test (°C)
    model_seqgcn.eval()
    P, Tt = [], []
    with torch.no_grad():
        for batch in te2_dl:
            data = batch[0]
            p = model_seqgcn(data).squeeze(1).cpu().numpy()
            t = data.y.squeeze(1).cpu().numpy()
            P.append(p); Tt.append(t)
    P = np.vstack(P); Tt = np.vstack(Tt)
    invP = sc_t2m.inverse_transform(P); invT = sc_t2m.inverse_transform(Tt)
    rmse_each = np.sqrt(((invP - invT)**2).mean(axis=0))
    print("SeqEnc+GCN RMSE (°C):", {k: float(v) for k,v in zip(['Dublin','Galway','Cork'], rmse_each)})

Epoch 01 | train MSE=0.0701 | val MSE=0.0274
Epoch 02 | train MSE=0.0430 | val MSE=0.0286
Epoch 03 | train MSE=0.0391 | val MSE=0.0283
Epoch 04 | train MSE=0.0398 | val MSE=0.0251
Epoch 05 | train MSE=0.0361 | val MSE=0.0255
Epoch 06 | train MSE=0.0344 | val MSE=0.0331
Epoch 07 | train MSE=0.0337 | val MSE=0.0258
Epoch 08 | train MSE=0.0321 | val MSE=0.0269
Epoch 09 | train MSE=0.0324 | val MSE=0.0308
Epoch 10 | train MSE=0.0322 | val MSE=0.0343
SeqEnc+GCN RMSE (°C): {'Dublin': 1.055409550666809, 'Galway': 1.0983647108078003, 'Cork': 1.3047443628311157}


## 7) Final example — **GAT (Graph Attention)** with **all variables**

Features per node include **temperature and wind** over the last 24 h (simple MLP encoder).  
We use a **GATConv** with multiple heads to let the model learn which neighbours matter more at each step.

In [9]:
# Build sequence windows that include both t2m (scaled) and wind (unscaled) per city.
# We'll encode each city's 2-variable sequence with a tiny MLP (flatten over time for simplicity).
T = 24
sc_all = StandardScaler().fit(df[CITY_COLS_T2M].iloc[:i_tr])  # for temps; winds left raw to keep simple

def make_multi_seq(frame):
    t2m = sc_all.transform(frame[CITY_COLS_T2M].values)   # (N,3) scaled
    ws  = frame[CITY_COLS_WS].values                      # (N,3) raw m/s
    Xseq, Y = [], []
    for t in range(len(frame)-T-1):
        # build per-city sequence with 2 vars: (T, 2) per city
        seq_city = []
        for c in range(3):
            seq = np.stack([t2m[t:t+T, c], ws[t:t+T, c]], axis=1)  # (T,2)
            seq_city.append(seq.reshape(-1))                       # flatten to length 2T
        x = np.stack(seq_city, axis=0)   # (3, 2T)
        y = t2m[t+T]                     # next-hour target temps (scaled)
        Xseq.append(torch.tensor(x, dtype=torch.float32))
        Y.append(torch.tensor(y, dtype=torch.float32))
    return Xseq, Y

Xtr_m, ytr_m = make_multi_seq(train_df)
Xva_m, yva_m = make_multi_seq(val_df)
Xte_m, yte_m = make_multi_seq(test_df)

class MultiEncGATDataset(Dataset):
    def __init__(self, X, y, edge_index):
        self.X = X; self.y = y; self.edge_index = edge_index
    def __len__(self): return len(self.X)
    def __getitem__(self, i):
        d = Data(edge_index=edge_index)
        d.x = self.X[i]                 # (3, 2T)
        d.y = self.y[i].unsqueeze(1)    # (3,1)
        return d

if PYG_OK:
    tr3_ds = MultiEncGATDataset(Xtr_m, ytr_m, edge_index)
    va3_ds = MultiEncGATDataset(Xva_m, yva_m, edge_index)
    te3_ds = MultiEncGATDataset(Xte_m, yte_m, edge_index)
    tr3_dl = DataLoader(tr3_ds, batch_size=1, shuffle=True,  collate_fn=lambda b: b)
    va3_dl = DataLoader(va3_ds, batch_size=1, shuffle=False, collate_fn=lambda b: b)
    te3_dl = DataLoader(te3_ds, batch_size=1, shuffle=False, collate_fn=lambda b: b)

class GATAllVars(nn.Module):
    def __init__(self, in_feats, hidden=16, heads=2, out_dim=1):
        super().__init__()
        self.enc = nn.Sequential(             # simple MLP per node to compress 2T → hidden
            nn.Linear(in_feats, 64), nn.ReLU(),
            nn.Linear(64, hidden)
        )
        self.gat1 = GATConv(hidden, hidden, heads=heads, concat=True, dropout=0.0)
        self.gat2 = GATConv(hidden*heads, hidden, heads=1, concat=False, dropout=0.0)
        self.head = nn.Linear(hidden, out_dim)
    def forward(self, data: Data):
        x = self.enc(data.x)                          # (3, hidden)
        x = torch.relu(self.gat1(x, data.edge_index))
        x = torch.relu(self.gat2(x, data.edge_index))
        return self.head(x)                           # (3,1)

if PYG_OK:
    torch.manual_seed(0)
    model_gat = GATAllVars(in_feats=2*T).to(device)
    opt3 = torch.optim.Adam(model_gat.parameters(), lr=3e-3)
    loss_fn = nn.MSELoss()

    def run_gat_epoch(model, loader, train=True):
        model.train(train); tot=0.0; n=0
        for batch in loader:
            data = batch[0]
            if train: opt3.zero_grad()
            pred = model(data)
            loss = loss_fn(pred, data.y)
            if train: loss.backward(); opt3.step()
            tot += loss.item(); n += 1
        return tot/max(n,1)

    for ep in range(12):
        tr = run_gat_epoch(model_gat, tr3_dl, train=True)
        va = run_gat_epoch(model_gat, va3_dl, train=False)
        print(f"Epoch {ep+1:02d} | train MSE={tr:.4f} | val MSE={va:.4f}")

    # Test (°C)
    model_gat.eval()
    P, Tt = [], []
    with torch.no_grad():
        for batch in te3_dl:
            data = batch[0]
            p = model_gat(data).squeeze(1).cpu().numpy()
            t = data.y.squeeze(1).cpu().numpy()
            P.append(p); Tt.append(t)
    P = np.vstack(P); Tt = np.vstack(Tt)
    invP = sc_all.inverse_transform(P); invT = sc_all.inverse_transform(Tt)
    rmse_each = np.sqrt(((invP - invT)**2).mean(axis=0))
    print("GAT (all vars) RMSE (°C):", {k: float(v) for k,v in zip(['Dublin','Galway','Cork'], rmse_each)})

Epoch 01 | train MSE=0.0910 | val MSE=0.0408
Epoch 02 | train MSE=0.0478 | val MSE=0.0295
Epoch 03 | train MSE=0.0412 | val MSE=0.0266
Epoch 04 | train MSE=0.0382 | val MSE=0.0292
Epoch 05 | train MSE=0.0355 | val MSE=0.0264
Epoch 06 | train MSE=0.0341 | val MSE=0.0259
Epoch 07 | train MSE=0.0325 | val MSE=0.0232
Epoch 08 | train MSE=0.0323 | val MSE=0.0242
Epoch 09 | train MSE=0.0298 | val MSE=0.0230
Epoch 10 | train MSE=0.0300 | val MSE=0.0207
Epoch 11 | train MSE=0.0296 | val MSE=0.0224
Epoch 12 | train MSE=0.0287 | val MSE=0.0284
GAT (all vars) RMSE (°C): {'Dublin': 0.7425894737243652, 'Galway': 0.9258968234062195, 'Cork': 0.8738669157028198}


## 8) Notes & extensions

- Try **fully connected** edges (add Dublin↔Cork) and compare.
- Change window `T` (12/48) — see if GAT benefits more from longer context.
- Swap the simple MLP temporal encoder for a **1D CNN** or **small LSTM** per node.
- Add more stations as new nodes; update `edge_index` via distance-based k-NN.
- Predict **multi-step** outputs (e.g., next 6 hours) by changing `y` and the head.
