# METHOD 2: GNN + LSTM (Recommended)

## OBJECTIVE: Build a Graph Neural Network + LSTM hybrid model for spatiotemporal flood forecasting

### TASK DESCRIPTION:
Implement a state-of-the-art deep learning model that combines:
1. Graph Neural Networks (GNN) for spatial structure
2. LSTM/GRU for temporal dynamics
3. Separate encoders for 1D and 2D nodes
4. 1D-2D coupling mechanism
5. Multi-scale rainfall feature engineering (from research paper)

### ARCHITECTURE SPECIFICATION:

1. FEATURE ENGINEERING (Based on Shenzhen Paper):
   Multi-Temporal Rainfall Aggregation:
   - Rolling sum: 15min, 30min, 1h, 3h, 6h
   - Rolling max: 1h, 3h
   - Cumulative rainfall from event start
   - Rainfall rate (first derivative)
   
   Static features:
   - Node embeddings (learned)
   - Elevation, area, roughness (normalized)
   - Storage capacity (engineered)
   
   Temporal features:
   - Water level lags: t-1, t-2, t-3
   - Velocity (if available)
   - Time since event start

2. MODEL ARCHITECTURE:

   a) Separate 1D and 2D Encoders:
   
   1D Encoder (High Capacity):
   - Input: [17 nodes × features]
   - Graph Attention Network (GAT): 4 layers, 8 heads, 512 hidden
   - Node features → Spatial embedding
   
   2D Encoder (Efficient):
   - Input: [3716 nodes × features]
   - Graph Convolutional Network (GCN): 4 layers, 256 hidden
   - OR ChebConv for large graphs
   - Node features → Spatial embedding
   
   b) Coupling Layer:
   - Use 1d2d_connections.csv to create bipartite edges
   - Message passing from 1D to 2D: aggregation of 1D embeddings to connected 2D nodes
   - Message passing from 2D to 1D: aggregation of 2D embeddings to connected 1D nodes
   
   c) Temporal Module:
   - Input: Concatenate [spatial_embedding, rainfall_features, time_features]
   - LSTM: 3 layers, 512 hidden (for 1D), 256 hidden (for 2D)
   - Bidirectional: False (causal prediction only)
   - Dropout: 0.3
   
   d) Decoder:
   - 1D Decoder: MLP [512 → 256 → 128 → 1]
   - 2D Decoder: MLP [256 → 128 → 1]
   - Output: Predicted water level at t+1

3. TRAINING CONFIGURATION:

   Loss Function:
   ```python
   def standardized_rmse_loss(pred, target, model_id, node_types):
       std_1d = 16.88 if model_id==1 else 3.19
       std_2d = 14.38 if model_id==1 else 2.73
       
       rmse_1d = sqrt(mean((pred[1D] - target[1D])^2)) / std_1d
       rmse_2d = sqrt(mean((pred[2D] - target[2D])^2)) / std_2d
       
       return (rmse_1d + rmse_2d) / 2  # Equal weighting
   ```
   
   Multi-Step Loss (for long sequences):
   ```python
   loss_total = 0
   for k in [1, 2, 3, 5, 10]:  # Predict 1, 2, 3, 5, 10 steps ahead
       pred_k = model.predict_k_ahead(x, k)
       loss_k = standardized_rmse_loss(pred_k, y[k])
       loss_total += (1/k) * loss_k  # Weight closer predictions more
   ```
   
   Optimizer:
   - AdamW with weight decay 1e-4
   - Learning rate: 1e-3 for 1D encoder, 5e-4 for 2D encoder
   - Cosine annealing schedule with warm restarts
   
   Regularization:
   - Dropout: 0.3-0.4
   - Gradient clipping: max_norm=1.0
   - Early stopping: patience=15 epochs
   - L2 weight decay: 1e-4

4. AUTOREGRESSIVE FORECASTING:

   Teacher Forcing Schedule:
   - Epochs 1-20: 90% teacher forcing (use ground truth)
   - Epochs 21-40: 70% teacher forcing
   - Epochs 41-60: 50% teacher forcing
   - Epochs 61+: 30% teacher forcing
   
   Scheduled Sampling:
   ```python
   if epoch < 20:
       teacher_forcing_ratio = 0.9
   elif epoch < 40:
       teacher_forcing_ratio = 0.7
   elif epoch < 60:
       teacher_forcing_ratio = 0.5
   else:
       teacher_forcing_ratio = 0.3
   
   if random.random() < teacher_forcing_ratio:
       input_t = ground_truth[t]
   else:
       input_t = prediction[t-1]
   ```
   
   Noise Injection (reduce error accumulation):
   ```python
   if training:
       prediction += torch.randn_like(prediction) * 0.01 * std
   ```

5. DATA PIPELINE:

   Dataset Structure:
   - Train: Events 1-54 (80%)
   - Validation: Events 55-68 (20%)
   - Stratify by event clusters from Notebook 05
   
   Batching:
   - Batch size: 16 sequences (or full events)
   - Sequence length: Variable (94-205), use padding + masking
   - Data augmentation: Add Gaussian noise to inputs (σ=0.01)
   
   Normalization:
   - Water levels: Standardize per model (use computed μ, σ)
   - Rainfall: MinMax to [0, 1]
   - Static features: StandardScaler

6. CROSS-VALIDATION:

   5-Fold Event-Based CV:
   - Ensure each fold has events from all 4 clusters
   - Don't split individual events
   - Validate on full sequences (not truncated)

### IMPLEMENTATION STEPS:

Week 1 (Days 1-7):
- Day 1: Implement multi-scale rainfall feature engineering
- Day 2: Build data loader with PyTorch Geometric
- Day 3: Implement GNN encoders (GAT for 1D, GCN for 2D)
- Day 4: Add coupling layer (1D ↔ 2D message passing)
- Day 5: Integrate LSTM temporal module
- Day 6: Implement standardized RMSE loss
- Day 7: Train first baseline, validate, debug

Week 2 (Days 8-14):
- Day 8: Add multi-step loss
- Day 9: Implement teacher forcing schedule
- Day 10: Add noise injection for robustness
- Day 11: Hyperparameter tuning (grid search)
- Day 12: Cross-validation setup
- Day 13: Train on all folds
- Day 14: Ensemble fold models

Week 3 (Days 15-21):
- Day 15: Train on Model 2 data
- Day 16: Final hyperparameter optimization
- Day 17: Error analysis (where does model fail?)
- Day 18: Model refinement based on analysis
- Day 19: Generate test predictions
- Day 20: Validate submission format
- Day 21: Final submission + documentation

### CODE SKELETON:


In [None]:
import torch
import torch.nn as nn
from torch_geometric.nn import GATConv, GCNConv
from torch.utils.data import DataLoader
import random

class HybridGNN_LSTM(nn.Module):
    def __init__(self, n_1d_nodes=17, n_2d_nodes=3716, 
                 static_dim_1d=7, static_dim_2d=10,
                 hidden_1d=512, hidden_2d=256):
        super().__init__()
        
        # Node embeddings
        self.node_embed_1d = nn.Embedding(n_1d_nodes, 64)
        self.node_embed_2d = nn.Embedding(n_2d_nodes, 32)
        
        # 1D Encoder (GAT)
        self.gat1d_1 = GATConv(static_dim_1d+64, hidden_1d, heads=8, dropout=0.3)
        self.gat1d_2 = GATConv(hidden_1d*8, hidden_1d, heads=8, dropout=0.3)
        self.gat1d_3 = GATConv(hidden_1d*8, hidden_1d, heads=4, dropout=0.3)
        
        # 2D Encoder (GCN)
        self.gcn2d_1 = GCNConv(static_dim_2d+32, hidden_2d)
        self.gcn2d_2 = GCNConv(hidden_2d, hidden_2d)
        self.gcn2d_3 = GCNConv(hidden_2d, hidden_2d)
        
        # Coupling layer (bipartite graph)
        self.coupling_1d_to_2d = GCNConv(hidden_1d*4, hidden_2d)
        self.coupling_2d_to_1d = GCNConv(hidden_2d, hidden_1d)
        
        # Temporal module
        self.lstm_1d = nn.LSTM(hidden_1d + 10, hidden_1d, 
                               num_layers=3, dropout=0.3, batch_first=True)
        self.lstm_2d = nn.LSTM(hidden_2d + 10, hidden_2d, 
                               num_layers=3, dropout=0.3, batch_first=True)
        
        # Decoders
        self.decoder_1d = nn.Sequential(
            nn.Linear(hidden_1d, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 1)
        )
        
        self.decoder_2d = nn.Sequential(
            nn.Linear(hidden_2d, 128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(128, 1)
        )
    
    def encode_1d(self, x, edge_index):
        # Placeholder for encoding logic
        return x # dummy

    def encode_2d(self, x, edge_index):
        # Placeholder for encoding logic
        return x # dummy

    def couple(self, x_1d, x_2d, coupling_edges):
        # Placeholder for coupling logic
        return x_1d, x_2d # dummy

    def forward(self, data, rainfall_features, teacher_forcing_ratio=0.5):
        # Encode spatial structure
        x_1d = self.encode_1d(data['x_1d'], data['edge_index_1d'])
        x_2d = self.encode_2d(data['x_2d'], data['edge_index_2d'])
        
        # Coupling
        x_1d, x_2d = self.couple(x_1d, x_2d, data['coupling_edges'])
        
        # Temporal processing (autoregressive)
        seq_len = rainfall_features.shape[1]
        predictions_1d = []
        predictions_2d = []
        
        h_1d, c_1d = None, None
        h_2d, c_2d = None, None
        
        for t in range(seq_len):
            # Teacher forcing
            if t == 0 or random.random() > teacher_forcing_ratio:
                input_1d = x_1d
                input_2d = x_2d
            else:
                # Use previous prediction
                input_1d = predictions_1d[-1]
                input_2d = predictions_2d[-1]
            
            # Concatenate with rainfall features
            lstm_input_1d = torch.cat([input_1d, rainfall_features[:, t, :10]], dim=-1)
            lstm_input_2d = torch.cat([input_2d, rainfall_features[:, t, :10]], dim=-1)
            
            # LSTM step
            out_1d, (h_1d, c_1d) = self.lstm_1d(lstm_input_1d.unsqueeze(1), (h_1d, c_1d))
            out_2d, (h_2d, c_2d) = self.lstm_2d(lstm_input_2d.unsqueeze(1), (h_2d, c_2d))
            
            # Decode
            pred_1d = self.decoder_1d(out_1d.squeeze(1))
            pred_2d = self.decoder_2d(out_2d.squeeze(1))
            
            predictions_1d.append(pred_1d)
            predictions_2d.append(pred_2d)
        
        return torch.stack(predictions_1d, dim=1), torch.stack(predictions_2d, dim=1)