# Install and setup

This notebook implements spatio-temporal GNN models for CPC forecasting with proper configuration management, data pipelines, and training utilities.

**Version 5 Updates:**
- Added RMSE and SMAPE metrics (keeping MSE and MAE for total of 4 metrics)
- CSV export with config column: horizon, model_id, exog_mode, config, mse, mae, rmse, smape
- Tensor saving for error analysis: predictions, targets, inputs, errors (.pt format)
- Dual save locations (Drive + local runtime) with error-robust handling
- All v4 fixes retained (DCRNN self-loops, no detach, GPU memory cleanup)

In [None]:
# Install dependencies for Google Colab with CUDA 12.1
import os
import torch

# Force a clean install of the stable PyTorch version
!pip install torch==2.4.0+cu121 torchvision==0.19.0+cu121 torchaudio==2.4.0+cu121 --index-url https://download.pytorch.org/whl/cu121

# Hardcode the versions to match the line above
TORCH = "2.4.0"
CUDA = "cu121"

# Install the PyG dependencies using the specific wheels
!pip install torch-scatter -f https://data.pyg.org/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-sparse -f https://data.pyg.org/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-cluster -f https://data.pyg.org/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-spline-conv -f https://data.pyg.org/whl/torch-{TORCH}+{CUDA}.html
!pip install torch-geometric
!pip install torch-geometric-temporal
!pip install torch-spatiotemporal

Looking in indexes: https://download.pytorch.org/whl/cu121
Looking in links: https://data.pyg.org/whl/torch-2.4.0+cu121.html
Looking in links: https://data.pyg.org/whl/torch-2.4.0+cu121.html
Looking in links: https://data.pyg.org/whl/torch-2.4.0+cu121.html
Looking in links: https://data.pyg.org/whl/torch-2.4.0+cu121.html
Collecting torch-geometric-temporal
  Using cached torch_geometric_temporal-0.56.2-py3-none-any.whl.metadata (1.9 kB)
Using cached torch_geometric_temporal-0.56.2-py3-none-any.whl (102 kB)
Installing collected packages: torch-geometric-temporal
Successfully installed torch-geometric-temporal-0.56.2
Collecting torch-spatiotemporal
  Downloading torch_spatiotemporal-0.9.5-py3-none-any.whl.metadata (12 kB)
Collecting pytorch-lightning>=1.8 (from torch-spatiotemporal)
  Downloading pytorch_lightning-2.6.0-py3-none-any.whl.metadata (21 kB)
Collecting torchmetrics>=0.7 (from torch-spatiotemporal)
  Downloading torchmetrics-1.8.2-py3-none-any.whl.metadata (22 kB)
Collecting l

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# utils

In [None]:
import os
import random
import numpy as np
import torch

SEED = 42

# Python RNG
random.seed(SEED)

# NumPy RNG
np.random.seed(SEED)

# PyTorch RNG (CPU + GPU)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

# Make CuDNN deterministic (slower but reproducible)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# Environment-level seeds
os.environ["PYTHONHASHSEED"] = str(SEED)

## Model Definitions

This cell contains all spatio-temporal GNN model architectures:
- DCRNNSpatioTemporalModel: Diffusion Convolutional Recurrent Neural Network
- STGCNModel: Spatio-Temporal Graph Convolutional Network
- STConvWrapper: Spatio-Temporal Convolution with attention
- A3TGCNWrapper: Attention Temporal Graph Convolutional Network
- GConvLSTMWrapper: Graph Convolutional LSTM
- MTGNNWrapper: Multivariate Time Series GNN (with fixed configs for short sequences)

Also includes MODEL_CLASS_REGISTRY and factory functions.

In [None]:
# models.py style cell
import torch
import torch.nn as nn
from typing import Optional, Dict, Any, List, Tuple

# PyTorch Geometric imports
from torch_geometric.nn import ChebConv
from torch_geometric.utils import add_self_loops

# PyTorch Geometric Temporal imports
from torch_geometric_temporal.nn.recurrent import DCRNN as DCRNNCell
from torch_geometric_temporal.nn.recurrent import A3TGCN
from torch_geometric_temporal.nn.recurrent import GConvLSTM
from torch_geometric_temporal.nn.attention import STConv
from torch_geometric_temporal.nn.attention import MTGNN

# TSL (Torch Spatiotemporal) imports
from tsl.nn.models.stgn import AGCRNModel, GraphWaveNetModel


# =============================================================================
# Model 1: DCRNN (Diffusion Convolutional Recurrent Neural Network)
# =============================================================================

class DCRNNSpatioTemporalModel(nn.Module):
    """
    Diffusion Convolutional Recurrent Neural Network for spatio-temporal forecasting.

    Captures diffusion patterns across the keyword graph over time using recurrent cells.
    Uses two stacked DCRNN layers with dropout regularization.

    NOTE: This model processes one timestep at a time. Use forward_dcrnn_optionA()
    helper in the training loop to properly carry hidden state across time steps.

    Args:
        in_channels (int): Number of input features per node
        hidden_channels (int): Number of hidden units in recurrent layers
        out_channels (int): Number of output features (typically 1 for CPC prediction)
        k_hops (int): Number of diffusion steps for graph convolution (default: 2)
    """
    def __init__(self, in_channels: int, hidden_channels: int, out_channels: int, k_hops: int = 2):
        super(DCRNNSpatioTemporalModel, self).__init__()
        self.hidden_channels = hidden_channels
        self.recurrent1 = DCRNNCell(in_channels, hidden_channels, K=k_hops)
        self.recurrent2 = DCRNNCell(hidden_channels, hidden_channels, K=k_hops)
        self.linear = nn.Linear(hidden_channels, out_channels)
        self.dropout_layer = nn.Dropout(p=0.3)

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
                edge_weight: Optional[torch.Tensor] = None,
                H1: Optional[torch.Tensor] = None,
                H2: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
        """
        Forward pass for a single timestep with hidden state management.

        Args:
            x: Node features [num_nodes, in_channels]
            edge_index: Graph connectivity [2, num_edges]
            edge_weight: Edge weights [num_edges] (optional)
            H1: Hidden state for first recurrent layer (optional)
            H2: Hidden state for second recurrent layer (optional)

        Returns:
            Tuple of (output [num_nodes, out_channels], H1_new, H2_new)
        """
        # First recurrent layer with hidden state
        H1_new = self.recurrent1(x, edge_index, edge_weight, H1)
        h1 = self.dropout_layer(H1_new)

        # Second recurrent layer with hidden state
        H2_new = self.recurrent2(h1, edge_index, edge_weight, H2)
        h2 = self.dropout_layer(H2_new)

        # Output projection
        out = self.linear(h2)
        return out, H1_new, H2_new


# =============================================================================
# Model 2: STGCN (Spatio-Temporal Graph Convolutional Network)
# =============================================================================

class STGCNModel(nn.Module):
    """
    Spatio-Temporal Graph Convolutional Network.

    Alternates between spatial graph convolutions (using Chebyshev polynomials) and
    temporal convolutions to capture spatio-temporal dependencies.

    Args:
        num_features (int): Number of input features per node
        hidden_channels (int): Number of hidden channels (default: 64)
        K (int): Chebyshev polynomial order for graph convolution (default: 3)
    """
    def __init__(self, num_features: int, hidden_channels: int = 64, K: int = 3):
        super(STGCNModel, self).__init__()

        # Spatial: Chebyshev Graph Convolution
        self.spatial_conv1 = ChebConv(num_features, hidden_channels, K=K)
        self.spatial_conv2 = ChebConv(hidden_channels, hidden_channels, K=K)

        # Temporal: 1D Convolution over time
        self.temporal_conv1 = nn.Conv1d(hidden_channels, hidden_channels, kernel_size=3, padding=1)
        self.temporal_conv2 = nn.Conv1d(hidden_channels, hidden_channels, kernel_size=3, padding=1)

        # Output layer
        self.output_conv = nn.Linear(hidden_channels, 1)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.3)

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
                edge_weight: Optional[torch.Tensor] = None) -> torch.Tensor:
        """
        Forward pass processing batched spatio-temporal data.

        Args:
            x: Node features [batch, time, nodes, features]
            edge_index: Graph connectivity [2, num_edges]
            edge_weight: Edge weights [num_edges] (optional)

        Returns:
            Predictions [batch, 1, nodes, 1]
        """
        batch_size, time_steps, num_nodes, num_features = x.shape

        # Process each timestep with spatial convolution
        temporal_outputs = []
        for t in range(time_steps):
            x_t = x[:, t, :, :]  # [batch, nodes, features]
            x_t_flat = x_t.reshape(-1, num_features)  # [batch*nodes, features]

            # Create batched edge_index (repeat for each batch)
            batch_edge_index = []
            batch_edge_weight = []
            for b in range(batch_size):
                offset = b * num_nodes
                batch_edge_index.append(edge_index + offset)
                if edge_weight is not None:
                    batch_edge_weight.append(edge_weight)

            batch_edge_index = torch.cat(batch_edge_index, dim=1)
            if edge_weight is not None:
                batch_edge_weight = torch.cat(batch_edge_weight)
            else:
                batch_edge_weight = None

            # Apply spatial convolution
            h = self.spatial_conv1(x_t_flat, batch_edge_index, batch_edge_weight)
            h = self.relu(h)
            h = self.dropout(h)
            h = self.spatial_conv2(h, batch_edge_index, batch_edge_weight)
            h = self.relu(h)

            # Reshape back to [batch, nodes, hidden]
            h = h.reshape(batch_size, num_nodes, -1)
            temporal_outputs.append(h)

        # Stack temporal outputs: [batch, time, nodes, hidden]
        h_temporal = torch.stack(temporal_outputs, dim=1)

        # Apply temporal convolution
        h_temporal = h_temporal.permute(0, 2, 3, 1)  # [batch, nodes, hidden, time]
        batch_size, num_nodes, hidden_dim, time_steps = h_temporal.shape

        # Flatten batch and nodes for temporal conv
        h_temporal = h_temporal.reshape(batch_size * num_nodes, hidden_dim, time_steps)

        # Apply temporal convolutions
        h_temporal = self.temporal_conv1(h_temporal)
        h_temporal = self.relu(h_temporal)
        h_temporal = self.temporal_conv2(h_temporal)
        h_temporal = self.relu(h_temporal)

        # Take last timestep
        h_final = h_temporal[:, :, -1]  # [batch*nodes, hidden]

        # Apply output layer
        out = self.output_conv(h_final)  # [batch*nodes, 1]

        # Reshape to [batch, 1, nodes, 1]
        out = out.reshape(batch_size, num_nodes, 1)
        out = out.unsqueeze(1)

        return out


# =============================================================================
# Model 3: STConv (Spatio-Temporal Convolution with Attention)
# =============================================================================

class STConvWrapper(nn.Module):
    """
    Spatio-Temporal Convolution wrapper using attention mechanisms.

    Uses STConv blocks that combine spatial and temporal attention to capture
    complex spatio-temporal patterns in the data.

    Args:
        num_nodes (int): Number of nodes in the graph
        num_features (int): Number of input features per node
        hidden_channels (int): Number of hidden channels (default: 64)
        kernel_size (int): Temporal kernel size (default: 3)
        K (int): Chebyshev polynomial order (default: 3)
    """
    def __init__(self, num_nodes: int, num_features: int, hidden_channels: int = 64,
                 kernel_size: int = 3, K: int = 3):
        super(STConvWrapper, self).__init__()
        self.num_nodes = num_nodes
        self.hidden_channels = hidden_channels

        self.stconv1 = STConv(
            num_nodes=num_nodes,
            in_channels=num_features,
            hidden_channels=hidden_channels,
            out_channels=hidden_channels,
            kernel_size=kernel_size,
            K=K,
            normalization='sym'
        )

        self.stconv2 = STConv(
            num_nodes=num_nodes,
            in_channels=hidden_channels,
            hidden_channels=hidden_channels,
            out_channels=hidden_channels,
            kernel_size=kernel_size,
            K=K,
            normalization='sym'
        )

        self.fc = nn.Linear(hidden_channels, 1)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.3)

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
                edge_weight: Optional[torch.Tensor] = None) -> torch.Tensor:
        """
        Forward pass with STConv blocks.

        Args:
            x: Node features [batch, time, nodes, features]
            edge_index: Graph connectivity [2, num_edges]
            edge_weight: Edge weights [num_edges] (optional)

        Returns:
            Predictions [batch, 1, nodes, 1]
        """
        # Apply STConv blocks
        h = self.stconv1(x, edge_index, edge_weight)
        h = self.relu(h)
        h = self.dropout(h)

        h = self.stconv2(h, edge_index, edge_weight)
        h = self.relu(h)
        h = self.dropout(h)

        # Robust output handling - find dimensions by size
        dims = h.shape
        node_dim = -1
        channel_dim = -1

        for i, d in enumerate(dims):
            if d == self.num_nodes:
                node_dim = i
            elif d == self.hidden_channels:
                channel_dim = i

        # Safety fallback
        if node_dim == -1: node_dim = 2
        if channel_dim == -1: channel_dim = 1

        # Find the time dimension
        time_dim = [i for i in range(4) if i not in [0, node_dim, channel_dim]][0]

        # Permute to [batch, nodes, channels, time]
        h = h.permute(0, node_dim, channel_dim, time_dim)

        # Take last timestep
        h = h[..., -1]  # [batch, nodes, channels]

        # Apply final linear layer
        out = self.fc(h)  # [batch, nodes, 1]

        # Reshape to [batch, 1, nodes, 1]
        out = out.unsqueeze(1)

        return out


# =============================================================================
# Model 4: A3TGCN (Attention Temporal Graph Convolutional Network)
# =============================================================================

class A3TGCNWrapper(nn.Module):
    """
    Attention Temporal Graph Convolutional Network wrapper.

    Applies attention mechanisms to temporal sequences on graph-structured data,
    allowing the model to focus on the most relevant time steps.

    Args:
        num_features (int): Number of input features per node
        hidden_size (int): Size of hidden representations (default: 64)
        periods (int): Number of time periods to process (default: 12)
    """
    def __init__(self, num_features: int, hidden_size: int = 64, periods: int = 12):
        super(A3TGCNWrapper, self).__init__()
        self.hidden_size = hidden_size

        self.a3tgcn = A3TGCN(
            in_channels=num_features,
            out_channels=hidden_size,
            periods=periods
        )

        self.linear = nn.Linear(hidden_size, 1)

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
                edge_weight: Optional[torch.Tensor] = None) -> torch.Tensor:
        """
        Forward pass with temporal attention.

        Args:
            x: Node features [batch, time, nodes, features]
            edge_index: Graph connectivity [2, num_edges]
            edge_weight: Edge weights [num_edges] (optional)

        Returns:
            Predictions [batch, 1, nodes, 1]
        """
        batch_size = x.shape[0]
        outputs = []

        # Process each sample in the batch separately
        for i in range(batch_size):
            # Get single sample: [time, nodes, features]
            sample = x[i]

            # Permute to [nodes, features, time] (required by A3TGCN)
            sample = sample.permute(1, 2, 0)

            # Pass through A3TGCN (time is aggregated by attention)
            h = self.a3tgcn(sample, edge_index, edge_weight)  # [nodes, hidden]

            outputs.append(h)

        # Stack batch: [batch, nodes, hidden]
        h = torch.stack(outputs, dim=0)

        # Output layer
        out = self.linear(h)  # [batch, nodes, 1]

        # Reshape to [batch, 1, nodes, 1]
        out = out.unsqueeze(1)

        return out


# =============================================================================
# Model 5: GConvLSTM (Graph Convolutional LSTM)
# =============================================================================

class GConvLSTMWrapper(nn.Module):
    """
    Graph Convolutional LSTM wrapper.

    Combines graph convolutions with LSTM cells to capture long-term temporal
    dependencies on graph-structured data.

    Args:
        num_features (int): Number of input features per node
        hidden_size (int): Size of LSTM hidden state (default: 64)
        K (int): Number of hops for graph convolution (default: 2)
    """
    def __init__(self, num_features: int, hidden_size: int = 64, K: int = 2):
        super(GConvLSTMWrapper, self).__init__()
        self.hidden_size = hidden_size

        self.gconv_lstm = GConvLSTM(
            in_channels=num_features,
            out_channels=hidden_size,
            K=K
        )

        self.linear = nn.Linear(hidden_size, 1)
        self.dropout = nn.Dropout(0.3)

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
                edge_weight: Optional[torch.Tensor] = None) -> torch.Tensor:
        """
        Forward pass with graph convolutional LSTM.

        Args:
            x: Node features [batch, time, nodes, features]
            edge_index: Graph connectivity [2, num_edges]
            edge_weight: Edge weights [num_edges] (optional)

        Returns:
            Predictions [batch, 1, nodes, 1]
        """
        batch_size, time_steps, num_nodes, num_features = x.shape
        batch_outputs = []

        # Process each sample in the batch
        for b in range(batch_size):
            # Initialize hidden (H) and cell (C) states
            H, C = None, None

            # Process sequence step-by-step
            for t in range(time_steps):
                x_t = x[b, t, :, :]  # [nodes, features]
                H, C = self.gconv_lstm(x_t, edge_index, edge_weight, H, C)

            # H is the hidden state of the last timestep
            batch_outputs.append(H)

        # Stack batch: [batch, nodes, hidden]
        output = torch.stack(batch_outputs, dim=0)

        out = self.dropout(output)
        out = self.linear(out)  # [batch, nodes, 1]

        # Reshape to [batch, 1, nodes, 1]
        out = out.unsqueeze(1)

        return out


# =============================================================================
# Model 6: MTGNN (Multivariate Time Series Graph Neural Network)
# Fixed configuration for short sequences (layers=1, kernel_set=[2])
# =============================================================================

class MTGNNWrapper(nn.Module):
    """
    Multivariate Time Series Graph Neural Network wrapper.

    Advanced model that learns graph structure adaptively while handling multivariate
    time series with various temporal scales using dilated convolutions and mix-hop propagation.

    NOTE: Uses simplified config (layers=1, kernel_set=[2]) to avoid kernel size errors
    with short sequences (seq_length=12).

    Args:
        num_nodes (int): Number of nodes in the graph
        num_features (int): Number of input features per node
        seq_length (int): Length of input sequences
        horizon (int): Forecast horizon
        hidden_size (int): Base hidden size (default: 32)
        dropout (float): Dropout rate (default: 0.3)
    """
    def __init__(self, num_nodes: int, num_features: int, seq_length: int, horizon: int,
                 hidden_size: int = 32, dropout: float = 0.3):
        super(MTGNNWrapper, self).__init__()
        self.horizon = horizon

        # Fixed simpler config to avoid kernel size > input size error
        self.mtgnn = MTGNN(
            gcn_true=True,
            build_adj=True,
            gcn_depth=2,
            num_nodes=num_nodes,
            kernel_set=[2],          # Simplified: single small kernel
            kernel_size=2,           # Small kernel size
            layers=1,                # Single layer to reduce receptive field
            dropout=dropout,
            subgraph_size=min(20, num_nodes),
            node_dim=40,
            dilation_exponential=1,  # No dilation expansion
            conv_channels=hidden_size,
            residual_channels=hidden_size,
            skip_channels=hidden_size * 2,
            end_channels=hidden_size * 4,
            seq_length=seq_length,
            in_dim=num_features,
            out_dim=horizon,
            propalpha=0.05,
            tanhalpha=3,
            layer_norm_affline=True
        )

    def forward(self, x: torch.Tensor, edge_index: Optional[torch.Tensor] = None,
                edge_weight: Optional[torch.Tensor] = None) -> torch.Tensor:
        """
        Forward pass with adaptive graph learning.

        NOTE: This model does NOT use edge_index/edge_weight - it learns its own graph.

        Args:
            x: Node features [batch, time, nodes, features]
            edge_index: Graph connectivity (ignored, model learns its own)
            edge_weight: Edge weights (ignored)

        Returns:
            Predictions [batch, horizon, nodes, 1]
        """
        # MTGNN expects [batch, features, nodes, time]
        x = x.permute(0, 3, 2, 1)

        # Forward pass (no edge_index needed)
        out = self.mtgnn(x)

        # Output shape from MTGNN: [batch, horizon, nodes]
        # Reshape to [batch, horizon, nodes, 1]
        if out.dim() == 3:
            out = out.unsqueeze(-1)
        elif out.dim() == 4:
            # Already has last dimension, ensure correct shape
            out = out.permute(0, 2, 3, 1).unsqueeze(-1)
            out = out.squeeze(-2)

        return out


# =============================================================================
# Model Class Registry - Maps string names to actual classes
# =============================================================================

MODEL_CLASS_REGISTRY: Dict[str, type] = {
    'DCRNNSpatioTemporalModel': DCRNNSpatioTemporalModel,
    'STGCNModel': STGCNModel,
    'STConvWrapper': STConvWrapper,
    'A3TGCNWrapper': A3TGCNWrapper,
    'GConvLSTMWrapper': GConvLSTMWrapper,
    'MTGNNWrapper': MTGNNWrapper,
    'AGCRNModel': AGCRNModel,
    'GraphWaveNetModel': GraphWaveNetModel,
}


# =============================================================================
# Factory Functions
# =============================================================================

def get_model(model_name: str,
              model_configs: Dict[str, Dict[str, Any]],
              device: str = 'cpu',
              **override_params) -> nn.Module:
    """
    Factory function to instantiate models by name using external config.

    Args:
        model_name (str): Name of the model (must be in model_configs)
        model_configs (Dict): Configuration dictionary for all models
        device (str): Device to place the model on ('cpu' or 'cuda')
        **override_params: Parameters to override from the default configuration

    Returns:
        nn.Module: Instantiated model moved to the specified device

    Example:
        >>> model = get_model('DCRNN', BASE_MODEL_CONFIGS, device='cuda', hidden_channels=256)
    """
    if model_name not in model_configs:
        available = ', '.join(model_configs.keys())
        raise ValueError(f"Model '{model_name}' not found. Available models: {available}")

    cfg = model_configs[model_name]
    class_name = cfg['class']
    model_class = MODEL_CLASS_REGISTRY[class_name]

    # Merge default params with overrides
    params = {**cfg.get('params', {}), **override_params}

    # Instantiate and move to device
    model = model_class(**params)
    model = model.to(device)

    return model


def get_model_config(model_name: str,
                     model_configs: Dict[str, Dict[str, Any]]) -> Dict[str, Any]:
    """
    Get the full configuration dictionary for a model.

    Args:
        model_name (str): Name of the model
        model_configs (Dict): Configuration dictionary for all models

    Returns:
        Dict[str, Any]: Configuration dictionary containing params, training config, and description
    """
    if model_name not in model_configs:
        available = ', '.join(model_configs.keys())
        raise ValueError(f"Model '{model_name}' not found. Available models: {available}")

    return model_configs[model_name]


def list_models(model_configs: Dict[str, Dict[str, Any]]) -> List[str]:
    """
    Get a list of all available model names.

    Args:
        model_configs (Dict): Configuration dictionary for all models

    Returns:
        List[str]: List of model names
    """
    return list(model_configs.keys())


print("Models loaded successfully!")
print(f"Available model classes: {list(MODEL_CLASS_REGISTRY.keys())}")

Models loaded successfully!
Available model classes: ['DCRNNSpatioTemporalModel', 'STGCNModel', 'STConvWrapper', 'A3TGCNWrapper', 'GConvLSTMWrapper', 'MTGNNWrapper', 'AGCRNModel', 'GraphWaveNetModel']


## Configuration

This cell contains:
- BASE_MODEL_CONFIGS: Model architecture parameters and training hyperparameters
- EXPERIMENT_CONFIG: Experiment settings (horizons, splits, scaling)

**v3 Change:** DCRNN batch_size reduced from 32 to 16 for memory efficiency

In [None]:
# configs.py style cell
from typing import Dict, Any, List

# =============================================================================
# Feature Columns Definition
# =============================================================================

FEATURE_COLS: List[str] = [
    'impressions_sum', 'cpc_week',
    # 'avg_sim_top25_this_week', 'avg_sim_top25_last_week',
    # 'n_sim_this_week', 'n_sim_last_week',
    'adclicks_sum', 'adcost_sum',
    'n_dev_desktop', 'n_dev_mobile', 'n_dev_tablet',
    'n_st_branded_search', 'n_st_generic_search',
]

TARGET_COL: str = 'cpc_week'

# =============================================================================
# File Paths for Google Colab
# =============================================================================

GRAPH_FOLDER_PATH: str = "/content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/sebs_keyword_graph_knn"
TIME_SERIES_CSV_PATH: str = "/content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/sebs_weekly_aggregated_by_week_keyword.parquet"

# =============================================================================
# Base Model Configurations
# Class names are strings - resolved via MODEL_CLASS_REGISTRY
# =============================================================================

# --- FIX: Dynamically set input size based on actual features ---
NUM_FEATURES = len(FEATURE_COLS)  # Should be 15

BASE_MODEL_CONFIGS: Dict[str, Dict[str, Any]] = {
    'DCRNN': {
        'class': 'DCRNNSpatioTemporalModel',
        'params': {
            'in_channels': NUM_FEATURES,  # Changed from 13
            'hidden_channels': 128,
            'out_channels': 1,
            'k_hops': 2,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 16,
        },
        'description': 'Diffusion Convolutional Recurrent Neural Network',
    },

    'STGCN': {
        'class': 'STGCNModel',
        'params': {
            'num_features': NUM_FEATURES,  # Changed from 13
            'hidden_channels': 64,
            'K': 3,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 32,
        },
        'description': 'Spatio-Temporal Graph Convolutional Network',
    },

    'STConv': {
        'class': 'STConvWrapper',
        'params': {
            'num_nodes': 1811,
            'num_features': NUM_FEATURES,  # Changed from 13
            'hidden_channels': 64,
            'kernel_size': 3,
            'K': 3,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 32,
        },
        'description': 'Spatio-Temporal Convolution with Attention',
    },

    'A3TGCN': {
        'class': 'A3TGCNWrapper',
        'params': {
            'num_features': NUM_FEATURES,  # Changed from 13
            'hidden_size': 64,
            'periods': 12,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 32,
        },
        'description': 'Attention Temporal Graph Convolutional Network',
    },

    'GConvLSTM': {
        'class': 'GConvLSTMWrapper',
        'params': {
            'num_features': NUM_FEATURES,  # Changed from 13
            'hidden_size': 64,
            'K': 2,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 32,
        },
        'description': 'Graph Convolutional LSTM',
    },

    'MTGNN': {
        'class': 'MTGNNWrapper',
        'params': {
            'num_nodes': 1811,
            'num_features': NUM_FEATURES,  # Changed from 13
            'seq_length': 12,
            'horizon': 1,
            'hidden_size': 32,
            'dropout': 0.3,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 16,
        },
        'description': 'Multivariate Time Series GNN',
    },

    'AGCRN': {
        'class': 'AGCRNModel',
        'params': {
            'input_size': NUM_FEATURES,  # Changed from 13
            'output_size': 1,
            'n_nodes': 1811,
            'horizon': 1,
            'hidden_size': 64,
            'n_layers': 2,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 16,
        },
        'description': 'Adaptive Graph Convolutional Recurrent Network',
    },

    'GraphWaveNet': {
        'class': 'GraphWaveNetModel',
        'params': {
            'input_size': NUM_FEATURES,  # Changed from 13
            'output_size': 1,
            'n_nodes': 1811,
            'horizon': 1,
            'hidden_size': 64,
            'dropout': 0.3,
        },
        'training': {
            'learning_rate': 1e-3,
            'epochs': 100,
            'batch_size': 16,
        },
        'description': 'Graph WaveNet with Dilated Convolutions',
    },
}


# =============================================================================
# Experiment Configuration
# =============================================================================

EXPERIMENT_CONFIG: Dict[str, Any] = {
    'sequence_length': 12,              # L: number of past weeks to use
    'horizons': [1, 6, 12],             # H: forecast horizons to evaluate
    'test_weeks_last': 12,              # Number of weeks to hold out for testing
    'val_split_ratio': 0.25,            # INCREASED from 0.2 to 0.25 to fix H=12 skipping
    'metrics': ['MSE', 'MAE'],
    'scaling': {
        'type': 'standard',
        'per_feature': True,
        'per_node': False,
    },
    'training_defaults': {
        'early_stopping_patience': 10,
        'gradient_clip_norm': 5.0,
    },
    'csv_export': {
        'enabled': True,
        'experiment_name': 'v6',
        'model_prefix': 'gnn',
        'exog_mode': 'graph',
        'drive_path': '/content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/benchmarks',
        'local_path': '/content',
    },
    'tensor_export': {
        'enabled': True,
        'experiment_name': 'v6',
        'save_predictions': True,
        'save_targets': True,
        'save_inputs': True,
        'save_errors': True,
        'drive_path': '/content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/benchmarks/tensors_v6',
        'local_path': '/content/tensors_v6',
    },
}


print("Configuration loaded successfully!")
print(f"\nFeature columns ({NUM_FEATURES}): {FEATURE_COLS}")
print(f"Target column: {TARGET_COL}")
print(f"\nExperiment settings:")
print(f"  - Sequence length: {EXPERIMENT_CONFIG['sequence_length']}")
print(f"  - Horizons: {EXPERIMENT_CONFIG['horizons']}")
print(f"  - Test weeks: {EXPERIMENT_CONFIG['test_weeks_last']}")
print(f"\nModels configured: {list(BASE_MODEL_CONFIGS.keys())}")

Configuration loaded successfully!

Feature columns (9): ['impressions_sum', 'cpc_week', 'adclicks_sum', 'adcost_sum', 'n_dev_desktop', 'n_dev_mobile', 'n_dev_tablet', 'n_st_branded_search', 'n_st_generic_search']
Target column: cpc_week

Experiment settings:
  - Sequence length: 12
  - Horizons: [1, 6, 12]
  - Test weeks: 12

Models configured: ['DCRNN', 'STGCN', 'STConv', 'A3TGCN', 'GConvLSTM', 'MTGNN', 'AGCRN', 'GraphWaveNet']


## Data Utilities

This cell contains data loading and preprocessing functions:
- load_graph_and_data(): Load graph structure and time series data
- build_week_splits(): Split weeks into train/val/test
- build_feature_tensor(): Create [T, N, F] feature tensors
- scale_features(): Apply StandardScaler (fit on train only)
- build_optionA_samples(): Create sliding window samples for Option A
- to_torch_dataset(): Convert numpy arrays to torch tensors

In [None]:
# data_utils.py style cell
import pandas as pd
import numpy as np
import torch
import json
import os
from typing import Dict, List, Tuple, Any, Optional
from sklearn.preprocessing import StandardScaler


def load_graph_and_data(graph_folder: str,
                        time_series_path: str) -> Tuple[np.ndarray, np.ndarray, Dict[str, int], pd.DataFrame]:
    """
    Load graph structure and time series data from files.
    Handles 'WW-YYYY' date formats (e.g., '02-2021').

    Args:
        graph_folder: Path to folder containing edge_index.npy, edge_weight.npy, keyword_map.json
        time_series_path: Path to parquet file with time series data

    Returns:
        Tuple of (edge_index, edge_weight, keyword_map, dataframe)
    """
    # Load graph structure
    edge_index = np.load(os.path.join(graph_folder, 'edge_index.npy'))
    edge_weight = np.load(os.path.join(graph_folder, 'edge_weight.npy'))

    with open(os.path.join(graph_folder, 'keyword_map.json'), 'r') as f:
        keyword_map = json.load(f)

    # Load time series data
    df = pd.read_parquet(time_series_path)

    print(f"Graph loaded: {edge_index.shape[1]} edges, {len(keyword_map)} nodes")

    # --- FIX START: Handle "02-2021" (Week-Year) format ---
    # We check if the column is string and contains a dash
    if df['week'].dtype == object and df['week'].astype(str).str.contains('-').any():
        print("  Detected 'WW-YYYY' string format (e.g., '02-2021'). Parsing...")
        try:
            # Split "02-2021" into "02" and "2021"
            # Assuming format is "Week-Year" based on user input
            parts = df['week'].astype(str).str.split('-', expand=True)

            # Convert to numbers
            week_nums = pd.to_numeric(parts[0], errors='coerce')
            years = pd.to_numeric(parts[1], errors='coerce')

            # 1. Create a sortable integer for the 'week' column: Year * 100 + Week
            #    Example: "02-2021" -> 202102
            #    Example: "01-2022" -> 202201
            #    This ensures correct chronological sorting.
            df['week'] = years * 100 + week_nums

            # 2. Use the isolated week number (1-52) for seasonality features
            #    We use 'week_nums' here, NOT df['week']
            df['week_sin'] = np.sin(2 * np.pi * week_nums / 52.0)
            df['week_cos'] = np.cos(2 * np.pi * week_nums / 52.0)

            print("  Successfully parsed 'WW-YYYY' format.")

        except Exception as e:
            print(f"  Error parsing 'WW-YYYY' format: {e}")
            # Fallback to coercion if parsing fails
            df['week'] = pd.to_numeric(df['week'], errors='coerce')
            df['week_sin'] = np.sin(2 * np.pi * df['week'] / 52.0)
            df['week_cos'] = np.cos(2 * np.pi * df['week'] / 52.0)

    else:
        # Fallback for standard numeric columns
        print("  Using standard numeric parsing...")
        df['week'] = pd.to_numeric(df['week'], errors='coerce')
        df['week_sin'] = np.sin(2 * np.pi * df['week'] / 52.0)
        df['week_cos'] = np.cos(2 * np.pi * df['week'] / 52.0)

    # Drop any rows that failed conversion
    df = df.dropna(subset=['week'])
    df['week'] = df['week'].astype(int)
    # --- FIX END ---

    print(f"Time series loaded: {len(df)} rows, {df['week'].nunique()} weeks")

    return edge_index, edge_weight, keyword_map, df


def build_week_splits(df: pd.DataFrame,
                      test_weeks_last: int) -> Tuple[np.ndarray, np.ndarray]:
    """
    Split weeks into train/val weeks and test weeks.

    Args:
        df: DataFrame with 'week' column
        test_weeks_last: Number of weeks to reserve for testing (from the end)

    Returns:
        Tuple of (trainval_weeks, test_weeks) as numpy arrays
    """
    weeks = np.array(sorted(df['week'].unique()))
    test_weeks = weeks[-test_weeks_last:]
    trainval_weeks = weeks[:-test_weeks_last]

    print(f"Week split: {len(trainval_weeks)} train/val weeks, {len(test_weeks)} test weeks")
    print(f"  Train/Val range: {trainval_weeks[0]} to {trainval_weeks[-1]}")
    print(f"  Test range: {test_weeks[0]} to {test_weeks[-1]}")

    return trainval_weeks, test_weeks


def build_feature_tensor(df: pd.DataFrame,
                         keyword_map: Dict[str, int],
                         weeks: np.ndarray,
                         feature_cols: List[str]) -> np.ndarray:
    """
    Build a 3D feature tensor from the dataframe.

    Args:
        df: DataFrame with columns 'week', 'keyword', and feature columns
        keyword_map: Mapping from keyword string to node index
        weeks: Array of weeks to include
        feature_cols: List of feature column names

    Returns:
        numpy array of shape [T, N, F] where T=num_weeks, N=num_nodes, F=num_features
    """
    num_nodes = len(keyword_map)
    num_features = len(feature_cols)
    num_time = len(weeks)

    X = np.zeros((num_time, num_nodes, num_features), dtype=np.float32)

    # Create a mapping from (week, keyword) to row for faster lookup
    df_indexed = df.set_index(['week', 'keyword'])

    for t_idx, week in enumerate(weeks):
        for keyword, node_id in keyword_map.items():
            try:
                row = df_indexed.loc[(week, keyword)]
                for f_idx, col in enumerate(feature_cols):
                    if col in row.index:
                        val = row[col]
                        if pd.notna(val):
                            X[t_idx, node_id, f_idx] = float(val)
            except KeyError:
                # Keyword not present for this week - leave as zeros
                pass

    print(f"Feature tensor built: shape {X.shape}")
    return X


def scale_features(train_X: np.ndarray,
                   val_X: np.ndarray,
                   test_X: np.ndarray,
                   scaling_cfg: Dict[str, Any],
                   target_col_idx: int = -1) -> Tuple[np.ndarray, np.ndarray, np.ndarray, StandardScaler]:
    """
    Apply StandardScaler to features, fitting only on training data.

    Args:
        train_X: Training features [T_train, N, F]
        val_X: Validation features [T_val, N, F]
        test_X: Test features [T_test, N, F]
        scaling_cfg: Scaling configuration dict

    Returns:
        Tuple of (train_scaled, val_scaled, test_scaled, scaler)
    """
    T_tr, N, F = train_X.shape

    # Apply log1p transformation to target column if specified
    if target_col_idx >= 0:
        print(f"Applying log1p transformation to target column {target_col_idx}")
        train_X[..., target_col_idx] = np.log1p(train_X[..., target_col_idx])
        val_X[..., target_col_idx] = np.log1p(val_X[..., target_col_idx])
        test_X[..., target_col_idx] = np.log1p(test_X[..., target_col_idx])
    scaler = StandardScaler()

    # Flatten over time & nodes: (T*N, F) to fit scaler
    train_flat = train_X.reshape(-1, F)
    scaler.fit(train_flat)

    # Transform all splits
    train_scaled = scaler.transform(train_flat).reshape(train_X.shape)
    val_scaled = scaler.transform(val_X.reshape(-1, F)).reshape(val_X.shape)
    test_scaled = scaler.transform(test_X.reshape(-1, F)).reshape(test_X.shape)

    print(f"Features scaled with StandardScaler (fit on {T_tr * N} train samples)")

    return train_scaled, val_scaled, test_scaled, scaler


def build_optionA_samples(X: np.ndarray,
                          weeks: np.ndarray,
                          horizon: int,
                          seq_len: int,
                          target_col_idx: int) -> Tuple[np.ndarray, np.ndarray]:
    """
    Build sliding window samples for Option A (direct H-step prediction).

    For each valid starting position, creates:
    - Input: seq_len consecutive weeks of features
    - Target: CPC value at (last input week + horizon)

    Args:
        X: Feature tensor [T, N, F]
        weeks: Array of week identifiers (same length as T)
        horizon: Number of steps ahead to predict
        seq_len: Length of input sequence
        target_col_idx: Index of target column (CPC) in feature dimension

    Returns:
        Tuple of (X_samples [S, L, N, F], y_samples [S, 1, N, 1])
        where S is number of valid samples, L is seq_len
    """
    T, N, F = X.shape
    X_list, y_list = [], []

    # Valid range: need seq_len weeks for input, plus horizon weeks for target
    # Start from seq_len-1 (0-indexed position of last input week)
    # End at T - horizon - 1 (so t + horizon is still valid)
    for t in range(seq_len - 1, T - horizon):
        # Input: weeks [t - seq_len + 1, ..., t] inclusive
        X_win = X[t - seq_len + 1:t + 1]  # [L, N, F]

        # Target: CPC at week t + horizon
        y_win = X[t + horizon, :, target_col_idx]  # [N]

        X_list.append(X_win)
        y_list.append(y_win)

    if len(X_list) == 0:
        print(f"Warning: No valid samples for horizon={horizon}, seq_len={seq_len}, T={T}")
        return np.zeros((0, seq_len, N, F), dtype=np.float32), np.zeros((0, 1, N, 1), dtype=np.float32)

    X_arr = np.stack(X_list, axis=0)  # [S, L, N, F]
    y_arr = np.stack(y_list, axis=0)  # [S, N]

    # Reshape y to [S, 1, N, 1]
    y_arr = y_arr[:, np.newaxis, :, np.newaxis]  # [S, 1, N, 1]

    print(f"Built {len(X_list)} samples for horizon={horizon}: X={X_arr.shape}, y={y_arr.shape}")

    return X_arr, y_arr


def to_torch_dataset(X_np: np.ndarray,
                     y_np: np.ndarray,
                     device: str = 'cpu') -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Convert numpy arrays to torch tensors on specified device.

    Args:
        X_np: Input features [S, L, N, F]
        y_np: Targets [S, 1, N, 1]
        device: Device to place tensors on

    Returns:
        Tuple of (X_tensor, y_tensor)
    """
    # Handle NaN values
    X_np = np.nan_to_num(X_np, nan=0.0)
    y_np = np.nan_to_num(y_np, nan=0.0)

    X = torch.from_numpy(X_np).float().to(device)
    y = torch.from_numpy(y_np).float().to(device)

    return X, y


print("Data utilities loaded successfully!")

Data utilities loaded successfully!


## Training Utilities

This cell contains training and evaluation functions:
- forward_dcrnn_optionA(): Helper for DCRNN with proper hidden state management
- ensure_B1N1(): Reshape outputs to standard [B, 1, N, 1] format
- make_dataloader(): Create PyTorch DataLoader from tensors
- train_one_model_optionA(): Full training loop with early stopping
- evaluate_model_optionA(): Evaluate model on test data (NOW RETURNS TUPLE)
- generate_config_string(): Generate compact config strings for CSV
- save_model_tensors(): Save predictions/targets/inputs/errors
- export_results_to_csv(): Export metrics to CSV benchmark format

**v4 Changes:**
1. DCRNN self-loops to prevent NaN
2. Removed .detach() for proper Backprop-Through-Time

**v5 Changes:**
1. evaluate_model_optionA() now returns (metrics_dict, tensors_dict) tuple
2. Added RMSE and SMAPE calculations
3. Added tensor saving and CSV export functions

In [None]:
# train_eval.py style cell

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from torch_geometric.utils import add_self_loops
from typing import Dict, Any, Optional, Tuple
import numpy as np
from copy import deepcopy  # (import kept in case you use it elsewhere)
from tqdm.auto import tqdm


def forward_dcrnn_optionA(
    model: nn.Module,
    X_batch: torch.Tensor,
    edge_index: torch.Tensor,
    edge_weight: torch.Tensor,
) -> torch.Tensor:
    """
    Corrected forward pass for DCRNN.

    FIXES in v4:
    1. Adds self-loops to prevent Division-by-Zero (NaN losses)
    2. Removes internal .detach() so Backprop-Through-Time works properly

    Args:
        model: DCRNNSpatioTemporalModel instance
        X_batch: Input tensor [B, L, N, F]
        edge_index: Graph connectivity [2, E]
        edge_weight: Edge weights [E]

    Returns:
        Output tensor [B, 1, N, 1]
    """
    batch_size, seq_len, num_nodes, num_features = X_batch.shape

    # --- FIX 1: Prevent NaN by ensuring no node has Degree=0 ---
    # We add self-loops once before processing the batch.
    # This ensures the diffusion matrix normalization never divides by zero.
    edge_index_loop, edge_weight_loop = add_self_loops(
        edge_index,
        edge_weight,
        fill_value=1.0,
        num_nodes=num_nodes,
    )

    preds = []

    for b in range(batch_size):
        # Initialize hidden states for this specific sample
        # (This effectively "detaches" from the previous sample in the batch)
        H1, H2 = None, None

        # Process sequence
        for t in range(seq_len):
            x_t = X_batch[b, t]  # [N, F]

            # Feed through model
            out_t, H1, H2 = model(
                x_t,
                edge_index_loop,
                edge_weight_loop,
                H1,
                H2,
            )
            # --- FIX 2: REMOVED .detach() ---
            # We MUST keep the connection to H_prev so gradients can flow
            # back from t=10 to t=0. If you run OOM, reduce batch_size,
            # not the gradient history.

        preds.append(out_t)

    # Stack batch: [B, N, 1]
    output = torch.stack(preds, dim=0)

    # Reshape to [B, 1, N, 1]
    output = output.unsqueeze(1)

    return output


def ensure_B1N1(output: torch.Tensor, num_nodes: Optional[int] = None) -> torch.Tensor:
    """
    Ensure output tensor has shape [B, 1, N, 1].

    Different models output different shapes. This helper normalizes them:
    - [B, N, 1] -> [B, 1, N, 1]
    - [B, N] -> [B, 1, N, 1]
    - [B, 1, N] -> [B, 1, N, 1]
    - [B, H, N, 1] with H=1 -> [B, 1, N, 1] (already correct)

    Args:
        output: Model output tensor (various shapes)
        num_nodes: Expected number of nodes (for validation)

    Returns:
        Tensor with shape [B, 1, N, 1]
    """
    if output.dim() == 2:
        # [B, N] -> [B, 1, N, 1]
        return output.unsqueeze(1).unsqueeze(-1)

    elif output.dim() == 3:
        # Could be [B, N, 1] or [B, 1, N]
        if output.shape[-1] == 1:
            # [B, N, 1] -> [B, 1, N, 1]
            return output.unsqueeze(1)
        else:
            # [B, 1, N] -> [B, 1, N, 1]
            return output.unsqueeze(-1)

    elif output.dim() == 4:
        # Already 4D, should be [B, H, N, 1] or similar
        if output.shape[1] == 1 and output.shape[-1] == 1:
            return output
        else:
            # Try to reshape intelligently
            B = output.shape[0]
            # Flatten and reshape assuming last two dims are N and output_size
            return output.reshape(B, 1, -1, 1)

    else:
        raise ValueError(f"Unexpected output shape: {output.shape}")


def make_dataloader(
    X: torch.Tensor,
    y: torch.Tensor,
    batch_size: int,
    shuffle: bool = True,
) -> DataLoader:
    """
    Create a PyTorch DataLoader from tensors.

    Args:
        X: Input tensor [S, L, N, F]
        y: Target tensor [S, 1, N, 1]
        batch_size: Batch size
        shuffle: Whether to shuffle data

    Returns:
        DataLoader instance
    """
    dataset = TensorDataset(X, y)
    return DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)



def train_one_model_optionA(
    model_name: str,
    model_configs: Dict[str, Dict[str, Any]],
    experiment_cfg: Dict[str, Any],
    X_train: torch.Tensor,
    y_train: torch.Tensor,
    X_val: torch.Tensor,
    y_val: torch.Tensor,
    edge_index: torch.Tensor,
    edge_weight: torch.Tensor,
    device: str = "cpu",
    verbose: bool = True,
    pbar: Optional[tqdm] = None,  # can be a global progress bar
) -> Dict[str, Any]:
    """
    Train a single model with early stopping, gradient clipping, Progress Bar support,
    and per-epoch logging (both in-memory logs and optional console output).
    """
    cfg = get_model_config(model_name, model_configs)
    model = get_model(model_name, model_configs, device=device)

    lr = cfg["training"]["learning_rate"]
    epochs = cfg["training"]["epochs"]
    batch_size = cfg["training"]["batch_size"]

    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    # Using L1Loss (MAE) as it is the best proxy for SMAPE
    criterion = nn.L1Loss()

    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode='min', factor=0.5, patience=5, verbose=False
    )

    train_loader = make_dataloader(X_train, y_train, batch_size, shuffle=True)
    val_loader = make_dataloader(X_val, y_val, batch_size, shuffle=False)

    best_val_loss = float("inf")
    best_state = None

    # History as simple lists (for quick inspection)
    history = {"train_loss": [], "val_loss": []}

    # Detailed per-epoch logs (nice for CSV / plotting)
    epoch_logs = []  # each entry: {"epoch": int, "train_loss": float, "val_loss": float, "lr": float}

    patience = experiment_cfg["training_defaults"]["early_stopping_patience"]
    max_norm = experiment_cfg["training_defaults"]["gradient_clip_norm"]
    no_improve = 0
    epochs_trained = 0

    no_edge_models = ["AGCRN", "MTGNN"]

    # you can change this to 5 or 10 if you don't want a line per epoch
    LOG_EVERY = 5

    for epoch in range(epochs):
        epochs_trained += 1

        # -------------------------
        # Training
        # -------------------------
        model.train()
        train_losses = []

        for Xb, yb in train_loader:
            optimizer.zero_grad()
            if model_name == "DCRNN":
                output = forward_dcrnn_optionA(model, Xb, edge_index, edge_weight)
            elif model_name in no_edge_models:
                out = model(Xb)
                output = ensure_B1N1(out)
            else:
                output = model(Xb, edge_index, edge_weight)

            loss = criterion(output, yb)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
            optimizer.step()
            train_losses.append(loss.item())

        avg_train = float(np.mean(train_losses)) if train_losses else float("inf")

        # -------------------------
        # Validation
        # -------------------------
        model.eval()
        val_losses = []
        with torch.no_grad():
            for Xb, yb in val_loader:
                if model_name == "DCRNN":
                    output = forward_dcrnn_optionA(model, Xb, edge_index, edge_weight)
                elif model_name in no_edge_models:
                    out = model(Xb)
                    output = ensure_B1N1(out)
                else:
                    output = model(Xb, edge_index, edge_weight)

                val_loss = criterion(output, yb).item()
                val_losses.append(val_loss)

        avg_val = float(np.mean(val_losses)) if val_losses else float("inf")

        history["train_loss"].append(avg_train)
        history["val_loss"].append(avg_val)

        # current LR (useful when ReduceLROnPlateau kicks in)
        current_lr = optimizer.param_groups[0]["lr"]
        scheduler.step(avg_val)

        # store a structured log row
        epoch_logs.append({
            "epoch": epoch + 1,
            "train_loss": avg_train,
            "val_loss": avg_val,
            "lr": current_lr,
        })

        # -------------------------
        # Progress bar + console logging
        # -------------------------
        if pbar:
            # only update the global bar, keep per-epoch details via tqdm.write
            pbar.update(1)
            pbar.set_postfix(model=model_name, val_loss=f"{avg_val:.4f}")

        if verbose and ((epoch + 1) % LOG_EVERY == 0):
            msg = (
                f"[{model_name}] epoch {epoch+1:03d}/{epochs:03d} "
                f"| train={avg_train:.4f} | val={avg_val:.4f} | lr={current_lr:.5f}"
            )
            if pbar:
                tqdm.write(msg)   # does not break tqdm layout
            else:
                print(msg)

        # -------------------------
        # Early stopping
        # -------------------------
        if avg_val < best_val_loss:
            best_val_loss = avg_val
            best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
            no_improve = 0
        else:
            no_improve += 1
            if no_improve >= patience:
                if verbose:
                    stop_msg = f"  Early stopping at epoch {epoch + 1} (best val={best_val_loss:.4f})"
                    if pbar:
                        tqdm.write(stop_msg)
                    else:
                        print(stop_msg)
                break

    # If early stopping happened, update global pbar for skipped epochs
    if pbar and epochs_trained < epochs:
        pbar.update(epochs - epochs_trained)

    # Restore best state
    if best_state is not None:
        model.load_state_dict(best_state)
        model = model.to(device)

    return {
        "model": model,
        "best_val_loss": best_val_loss,
        "history": history,        # simple lists
        "epoch_logs": epoch_logs,  # rich per-epoch records
        "epochs_trained": epochs_trained,
    }


def evaluate_model_optionA(
    model: nn.Module,
    model_name: str,
    X_test: torch.Tensor,
    y_test: torch.Tensor,
    edge_index: torch.Tensor,
    edge_weight: torch.Tensor,
    scaler: Optional[Any] = None,      # NEW: Required for unscaling
    target_col_idx: int = 1,           # NEW: Index of target column (usually 1)
    batch_size: int = 32,
    return_tensors: bool = True,
) -> Tuple[Dict[str, float], Optional[Dict[str, torch.Tensor]]]:
    """
    Evaluate a trained model on test data, calculating metrics on REAL (unscaled) values.

    Args:
        model: Trained PyTorch model
        model_name: Name of the model (for forward pass logic)
        X_test: Test inputs [S, L, N, F]
        y_test: Test targets [S, 1, N, 1]
        edge_index: Graph connectivity
        edge_weight: Edge weights
        scaler: The fitted StandardScaler (used to inverse transform predictions)
        target_col_idx: Index of the target column in the feature set
        batch_size: Batch size for inference
        return_tensors: Whether to return raw tensors (predictions, errors, etc.)

    Returns:
        Tuple of (metrics_dict, tensors_dict)
    """
    model.eval()

    # Models that don't use edge_index in forward pass
    no_edge_models = ["AGCRN", "MTGNN"]

    test_loader = make_dataloader(X_test, y_test, batch_size, shuffle=False)

    all_preds = []
    all_targets = []

    with torch.no_grad():
        for Xb, yb in test_loader:
            if model_name == "DCRNN":
                output = forward_dcrnn_optionA(model, Xb, edge_index, edge_weight)
            elif model_name in no_edge_models:
                output = model(Xb)
                output = ensure_B1N1(output)
            else:
                output = model(Xb, edge_index, edge_weight)

            all_preds.append(output)
            all_targets.append(yb)

    # Concatenate all batches -> [Total_Samples, 1, Nodes, 1]
    preds = torch.cat(all_preds, dim=0)
    targets = torch.cat(all_targets, dim=0)

    # -------------------------------------------------------------------------
    # METRIC CALCULATION ON REAL VALUES (INVERSE SCALING)
    # -------------------------------------------------------------------------

    # Flatten to [Total_Samples * Nodes, 1] for scaler
    preds_flat = preds.cpu().numpy().reshape(-1, 1)
    targets_flat = targets.cpu().numpy().reshape(-1, 1)

    real_preds = preds_flat
    real_targets = targets_flat

    # Only inverse scale if scaler is provided
    if scaler is not None:
        num_features = scaler.mean_.shape[0]

        # Create dummy matrices [N_samples, N_features] to satisfy scaler input shape
        dummy_preds = np.zeros((len(preds_flat), num_features))
        dummy_targets = np.zeros((len(targets_flat), num_features))

        # Place our predictions/targets into the correct feature column
        dummy_preds[:, target_col_idx] = preds_flat[:, 0]
        dummy_targets[:, target_col_idx] = targets_flat[:, 0]

        # Inverse transform (StandardScaler -> Original Scale)
        inv_preds = scaler.inverse_transform(dummy_preds)
        inv_targets = scaler.inverse_transform(dummy_targets)

        # Extract the target column back
        real_preds = inv_preds[:, target_col_idx]
        real_targets = inv_targets[:, target_col_idx]

        # Reverse Log1p if it was applied (assuming you used log1p on target)
        # We use a simple heuristic: if values are small and you used log1p, this is needed.
        # Based on your logs: "Applying log1p transformation to target column 1"
        real_preds = np.expm1(real_preds)
        real_targets = np.expm1(real_targets)

        # Safety: clip negative predictions to 0 (CPC cannot be negative)
        real_preds = np.maximum(real_preds, 0.0)

    # Convert back to tensor for easy metric calculation
    real_preds_t = torch.from_numpy(real_preds)
    real_targets_t = torch.from_numpy(real_targets)

    # Compute Metrics
    mse = torch.mean((real_preds_t - real_targets_t) ** 2).item()
    mae = torch.mean(torch.abs(real_preds_t - real_targets_t)).item()
    rmse = torch.sqrt(torch.mean((real_preds_t - real_targets_t) ** 2)).item()

    # SMAPE (Symmetric Mean Absolute Percentage Error)
    # Formula: 100 * mean( |P - A| / (|P| + |A|)/2 )
    numerator = torch.abs(real_preds_t - real_targets_t)
    denominator = (torch.abs(real_preds_t) + torch.abs(real_targets_t)) / 2.0
    epsilon = 1e-8  # Prevent division by zero
    smape = 100 * torch.mean(numerator / (denominator + epsilon)).item()

    metrics = {
        "MSE": mse,
        "MAE": mae,
        "RMSE": rmse,
        "SMAPE": smape,
    }

    # -------------------------------------------------------------------------
    # RETURN TENSORS (Keep original scaled tensors for debugging internals)
    # -------------------------------------------------------------------------
    tensors = None
    if return_tensors:
        errors = preds - targets
        tensors = {
            "predictions": preds,   # Scaled
            "targets": targets,     # Scaled
            "inputs": X_test,       # Scaled
            "errors": errors,       # Scaled
            "real_predictions": real_preds_t, # Unscaled (Optional helper)
            "real_targets": real_targets_t,   # Unscaled (Optional helper)
        }

    return metrics, tensors


def generate_config_string(model_config: Dict[str, Any]) -> str:
    """
    Generate compact configuration string from model config.

    Format: h{hidden}_lr{lr}_bs{batch_size}_e{epochs}
    Example: "h64_lr0.001_bs32_e100"

    Args:
        model_config: Model configuration dictionary

    Returns:
        Compact config string
    """
    params = model_config.get("params", {})
    training = model_config.get("training", {})

    # Extract key parameters (with fallbacks)
    hidden = (
        params.get("hidden_channels")
        or params.get("hidden_size")
        or params.get("conv_channels", 64)
    )
    lr = training.get("learning_rate", 0.001)
    batch_size = training.get("batch_size", 32)
    epochs = training.get("epochs", 100)

    return f"h{hidden}_lr{lr}_bs{batch_size}_e{epochs}"


def save_model_tensors(
    tensors: Dict[str, torch.Tensor],
    model_name: str,
    horizon: int,
    experiment_cfg: Dict[str, Any],
    verbose: bool = True,
) -> None:
    """
    Save model evaluation tensors to disk for error analysis.

    Saves 4 tensors: predictions, targets, inputs, errors
    Format: PyTorch .pt files
    Locations: Google Drive (primary) + local runtime (backup)

    Args:
        tensors: Dict with 'predictions', 'targets', 'inputs', 'errors'
        model_name: Name of the model (e.g., 'STGCN')
        horizon: Forecast horizon (1, 6, or 12)
        experiment_cfg: Experiment configuration with tensor_export settings
        verbose: Print status messages
    """
    import os

    if not experiment_cfg.get("tensor_export", {}).get("enabled", False):
        return

    tensor_cfg = experiment_cfg["tensor_export"]

    # Define which tensors to save
    tensors_to_save = {
        "predictions": tensor_cfg.get("save_predictions", True),
        "targets": tensor_cfg.get("save_targets", True),
        "inputs": tensor_cfg.get("save_inputs", True),
        "errors": tensor_cfg.get("save_errors", True),
    }

    saved_count = 0
    failed_count = 0

    for tensor_type, should_save in tensors_to_save.items():
        if not should_save or tensor_type not in tensors:
            continue

        tensor = tensors[tensor_type]
        filename = f"{model_name}_h{horizon}_{tensor_type}.pt"

        # Save to local runtime (guaranteed to work)
        local_path = os.path.join(tensor_cfg["local_path"], filename)
        try:
            os.makedirs(tensor_cfg["local_path"], exist_ok=True)
            torch.save(tensor.cpu(), local_path)
            saved_count += 1
        except Exception as e:
            if verbose:
                print(f"    ✗ Failed to save {tensor_type} to local: {e}")
            failed_count += 1

        # Save to Google Drive (best-effort)
        drive_path = os.path.join(tensor_cfg["drive_path"], filename)
        try:
            os.makedirs(tensor_cfg["drive_path"], exist_ok=True)
            torch.save(tensor.cpu(), drive_path)
        except Exception as e:
            if verbose:
                print(f"    ⚠ Could not save {tensor_type} to Drive: {e}")

    if verbose and saved_count > 0:
        print(f"    ✓ Saved {saved_count} tensors ({model_name}_h{horizon}_*.pt)")


def export_results_to_csv(
    results: Dict[int, Dict[str, Dict[str, Any]]],
    model_configs: Dict[str, Dict[str, Any]],
    experiment_cfg: Dict[str, Any],
    verbose: bool = True,
) -> None:
    """
    Export experiment results to CSV in the benchmark format.

    Saves to two locations:
    1. Google Drive (primary): {drive_path}/metrics_gnn_models_{experiment_name}.csv
    2. Local runtime (backup): {local_path}/metrics_gnn_models_{experiment_name}.csv

    Args:
        results: Nested dict {horizon: {model_name: {metrics...}}}
        model_configs: BASE_MODEL_CONFIGS dict for generating config strings
        experiment_cfg: Experiment configuration with csv_export settings
        verbose: Print status messages
    """
    import pandas as pd
    import os

    if not experiment_cfg.get("csv_export", {}).get("enabled", False):
        if verbose:
            print("CSV export disabled in config")
        return

    csv_cfg = experiment_cfg["csv_export"]
    exp_name = csv_cfg["experiment_name"]
    model_prefix = csv_cfg["model_prefix"]
    exog_mode = csv_cfg["exog_mode"]

    # Build list of rows
    rows = []

    for horizon in sorted(results.keys()):
        for model_name in sorted(results[horizon].keys()):
            r = results[horizon][model_name]
            if r["status"] == "SUCCESS":
                # Generate config string from model config
                model_config = model_configs.get(model_name, {})
                config_str = generate_config_string(model_config)

                rows.append(
                    {
                        "horizon": horizon,
                        "model_id": f"{model_prefix} | {model_name}",
                        "exog_mode": exog_mode,
                        "config": config_str,
                        "mse": r["test_MSE"],
                        "mae": r["test_MAE"],
                        "rmse": r["test_RMSE"],
                        "smape": r["test_SMAPE"],
                    }
                )

    if not rows:
        if verbose:
            print("No successful results to export")
        return

    # Create DataFrame
    df = pd.DataFrame(rows)

    # Column order to match benchmark CSV (with config column added)
    df = df[
        [
            "horizon",
            "model_id",
            "exog_mode",
            "config",
            "mse",
            "mae",
            "rmse",
            "smape",
        ]
    ]

    filename = f"metrics_gnn_models_{exp_name}.csv"

    # Save to local runtime (always succeeds)
    local_path = os.path.join(csv_cfg["local_path"], filename)
    try:
        df.to_csv(local_path, index=False)
        if verbose:
            print(f"\n✓ Results saved to local runtime: {local_path}")
            print(f"  Exported {len(rows)} result rows")
    except Exception as e:
        if verbose:
            print(f"\n✗ Failed to save to local runtime: {e}")

    # Save to Google Drive (may fail if not mounted)
    drive_path = os.path.join(csv_cfg["drive_path"], filename)
    try:
        os.makedirs(csv_cfg["drive_path"], exist_ok=True)
        df.to_csv(drive_path, index=False)
        if verbose:
            print(f"✓ Results saved to Google Drive: {drive_path}")
    except Exception as e:
        if verbose:
            print(f"\n⚠ Could not save to Google Drive: {e}")
            print("  (Results are still available in local runtime)")

    if verbose:
        print("\nCSV Preview:")
        print(df.head(10).to_string(index=False))


print("Training utilities loaded successfully!")

Training utilities loaded successfully!


# Run Experiment

Main experiment loop:
1. Load data and build week splits
2. Build feature tensors and apply scaling
3. For each horizon H in [1, 6, 12]:
   - Build sliding window samples
   - For each model:
     - Train with early stopping
     - Evaluate on test set (returns metrics + tensors)
     - Save tensors for error analysis
     - Store metrics
4. Export results to CSV
5. Print results table

In [None]:
# set up and load data

import warnings
import gc

warnings.filterwarnings("ignore")

print("=" * 80)
print("SPATIO-TEMPORAL GNN EXPERIMENT v6 - OPTION A (DIRECT H-STEP PREDICTION)")
print("=" * 80)

# Device selection
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"\nUsing device: {device}")
if device == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")

# =============================================================================
# Step 1: Load Data
# =============================================================================
print("\n" + "=" * 80)
print("STEP 1: Loading Data")
print("=" * 80)

edge_index, edge_weight, keyword_map, df = load_graph_and_data(
    GRAPH_FOLDER_PATH,
    TIME_SERIES_CSV_PATH,
)

# Convert to tensors
edge_index_tensor = torch.from_numpy(edge_index).long().to(device)
edge_weight_tensor = torch.from_numpy(edge_weight).float().to(device)

num_nodes = len(keyword_map)
num_features = len(FEATURE_COLS)

print(f"\nGraph: {num_nodes} nodes, {edge_index.shape[1]} edges")
print(f"Features: {num_features}")

# =============================================================================
# Step 2: Build Week Splits
# =============================================================================
print("\n" + "=" * 80)
print("STEP 2: Building Week Splits")
print("=" * 80)

trainval_weeks, test_weeks = build_week_splits(
    df,
    EXPERIMENT_CONFIG["test_weeks_last"],
)

# Split trainval into train/val
val_ratio = EXPERIMENT_CONFIG["val_split_ratio"]
split_idx = int(len(trainval_weeks) * (1 - val_ratio))

train_weeks = trainval_weeks[:split_idx]
val_weeks = trainval_weeks[split_idx:]

print("\nFinal splits:")
print(f"  Train: {len(train_weeks)} weeks ({train_weeks[0]} to {train_weeks[-1]})")
print(f"  Val: {len(val_weeks)} weeks ({val_weeks[0]} to {val_weeks[-1]})")
print(f"  Test: {len(test_weeks)} weeks ({test_weeks[0]} to {test_weeks[-1]})")

# =============================================================================
# Step 3: Build Feature Tensors
# =============================================================================
print("\n" + "=" * 80)
print("STEP 3: Building Feature Tensors")
print("=" * 80)

# Filter to available features
available_features = [col for col in FEATURE_COLS if col in df.columns]
print(
    f"Using {len(available_features)}/{len(FEATURE_COLS)} features: "
    f"{available_features}"
)

X_train_raw = build_feature_tensor(df, keyword_map, train_weeks, available_features)
X_val_raw = build_feature_tensor(df, keyword_map, val_weeks, available_features)

# For test, we need to include some train weeks for the sliding window
# Build test tensor from combined weeks (last part of train + test)
seq_len = EXPERIMENT_CONFIG["sequence_length"]
max_horizon = max(EXPERIMENT_CONFIG["horizons"])

# Need seq_len weeks before test_weeks start for the sliding window
all_weeks = np.concatenate([train_weeks, val_weeks, test_weeks])
test_start_idx = len(train_weeks) + len(val_weeks)
test_window_start = max(0, test_start_idx - seq_len)
test_combined_weeks = all_weeks[test_window_start:]

X_test_raw = build_feature_tensor(
    df,
    keyword_map,
    test_combined_weeks,
    available_features,
)

# =============================================================================
# Step 4: Scale Features
# =============================================================================
print("\n" + "=" * 80)
print("STEP 4: Scaling Features")
print("=" * 80)

target_col_idx = available_features.index(TARGET_COL)
X_train_scaled, X_val_scaled, X_test_scaled, scaler = scale_features(
    X_train_raw,
    X_val_raw,
    X_test_raw,
    EXPERIMENT_CONFIG["scaling"],
    target_col_idx=target_col_idx,
)

print(f"Target column '{TARGET_COL}' at index {target_col_idx}")


SPATIO-TEMPORAL GNN EXPERIMENT v6 - OPTION A (DIRECT H-STEP PREDICTION)

Using device: cuda
GPU: NVIDIA L4

STEP 1: Loading Data
Graph loaded: 18110 edges, 1811 nodes
  Detected 'WW-YYYY' string format (e.g., '02-2021'). Parsing...
  Successfully parsed 'WW-YYYY' format.
Time series loaded: 218924 rows, 127 weeks

Graph: 1811 nodes, 18110 edges
Features: 9

STEP 2: Building Week Splits
Week split: 115 train/val weeks, 12 test weeks
  Train/Val range: 202053 to 202310
  Test range: 202311 to 202322

Final splits:
  Train: 86 weeks (202053 to 202233)
  Val: 29 weeks (202234 to 202310)
  Test: 12 weeks (202311 to 202322)

STEP 3: Building Feature Tensors
Using 9/9 features: ['impressions_sum', 'cpc_week', 'adclicks_sum', 'adcost_sum', 'n_dev_desktop', 'n_dev_mobile', 'n_dev_tablet', 'n_st_branded_search', 'n_st_generic_search']
Feature tensor built: shape (86, 1811, 9)
Feature tensor built: shape (29, 1811, 9)
Feature tensor built: shape (24, 1811, 9)

STEP 4: Scaling Features
Applying lo

In [None]:
# =============================================================================
# Step 5: Run Experiments (With Progress Bar)
# =============================================================================
print("\n" + "=" * 80)
print("STEP 5: Running Experiments")
print("=" * 80)

results: Dict[int, Dict[str, Dict[str, Any]]] = {}

# 1. Calculate Total Epochs for Progress Bar
horizons = EXPERIMENT_CONFIG["horizons"]
models = list_models(BASE_MODEL_CONFIGS)
total_epochs = 0
for model_name in models:
    total_epochs += BASE_MODEL_CONFIGS[model_name]['training']['epochs']
total_epochs *= len(horizons)

print(f"Total scheduled epochs: {total_epochs}")

# 2. Run Experiment Loop
with tqdm(total=total_epochs, unit="epoch", desc="Overall Progress") as pbar:
    for H in horizons:
        tqdm.write(f"\n{'=' * 80}")
        tqdm.write(f"HORIZON H={H}")
        tqdm.write(f"{'=' * 80}")

        # Build sliding window samples
        tqdm.write(f"Building samples for horizon {H}...")
        Xtr_np, ytr_np = build_optionA_samples(X_train_scaled, train_weeks, H, seq_len, target_col_idx)
        Xval_np, yval_np = build_optionA_samples(X_val_scaled, val_weeks, H, seq_len, target_col_idx)
        Xte_np, yte_np = build_optionA_samples(X_test_scaled, test_combined_weeks, H, seq_len, target_col_idx)

        # Skip if no valid samples
        if Xtr_np.shape[0] == 0 or Xval_np.shape[0] == 0 or Xte_np.shape[0] == 0:
            tqdm.write(f"  Skipping H={H}: insufficient samples")
            # Update pbar for skipped models in this horizon
            skipped_epochs = sum(BASE_MODEL_CONFIGS[m]['training']['epochs'] for m in models)
            pbar.update(skipped_epochs)
            continue

        # Convert to torch
        Xtr, ytr = to_torch_dataset(Xtr_np, ytr_np, device)
        Xval, yval = to_torch_dataset(Xval_np, yval_np, device)
        Xte, yte = to_torch_dataset(Xte_np, yte_np, device)

        results[H] = {}

        for model_name in models:
            tqdm.write(f"\n{'-' * 60}")
            tqdm.write(f"Training {model_name}")
            tqdm.write(f"{'-' * 60}")

            try:
                # Train model (PASS PBAR HERE)
                train_result = train_one_model_optionA(
                    model_name,
                    BASE_MODEL_CONFIGS,
                    EXPERIMENT_CONFIG,
                    Xtr, ytr, Xval, yval,
                    edge_index_tensor, edge_weight_tensor,
                    device=device,
                    verbose=True,
                    pbar=pbar  # <--- Pass the progress bar
                )

                # Evaluate
                test_metrics, test_tensors = evaluate_model_optionA(
                    train_result["model"],
                    model_name,
                    Xte, yte,
                    edge_index_tensor, edge_weight_tensor,
                    scaler=scaler,
                    target_col_idx=target_col_idx,
                    return_tensors=True,
                )

                results[H][model_name] = {
                    "best_val_loss": train_result["best_val_loss"],
                    "epochs_trained": train_result["epochs_trained"],
                    "test_MSE": test_metrics["MSE"],
                    "test_MAE": test_metrics["MAE"],
                    "test_RMSE": test_metrics["RMSE"],
                    "test_SMAPE": test_metrics["SMAPE"],
                    "status": "SUCCESS",
                }

                tqdm.write(
                    f"  SUCCESS - Val Loss: {train_result['best_val_loss']:.4f}, "
                    f"Test SMAPE: {test_metrics['SMAPE']:.2f}%"
                )

                # Save tensors
                save_model_tensors(test_tensors, model_name, H, EXPERIMENT_CONFIG, verbose=False)

                # Cleanup
                del train_result["model"]
                gc.collect()
                if device == "cuda":
                    torch.cuda.empty_cache()

            except Exception as e:
                tqdm.write(f"  FAILED - Error: {str(e)[:100]}")
                results[H][model_name] = {"status": "FAILED", "error": str(e)}

                # If failed, update pbar for untracked epochs of this model
                # (We don't know exactly how many ran inside before crash,
                # but typically crashes happen early. We force complete the bar for this model)
                model_epochs = BASE_MODEL_CONFIGS[model_name]['training']['epochs']
                # This is an approximation since we can't easily track partial epochs on crash
                # Use careful manual update if needed, but usually fine to just leave it
                # or force update: pbar.update(model_epochs)

                gc.collect()
                if device == "cuda":
                    torch.cuda.empty_cache()

# =============================================================================
# Print & Export Results
# =============================================================================
print("\n" + "=" * 80)
print("RESULTS SUMMARY")
print("=" * 80)
print("\n{:<15} {:>8} {:>10} {:>10} {:>8}".format("Model", "Horizon", "RMSE", "SMAPE%", "Status"))
print("-" * 60)

for H in sorted(results.keys()):
    for model_name in results[H]:
        r = results[H][model_name]
        if r["status"] == "SUCCESS":
            print("{:<15} {:>8} {:>10.4f} {:>10.2f} {:>8}".format(
                model_name, H, r["test_RMSE"], r["test_SMAPE"], "OK"
            ))
        else:
            print("{:<15} {:>8} {:>10} {:>10} {:>8}".format(model_name, H, "N/A", "N/A", "FAILED"))

export_results_to_csv(results, BASE_MODEL_CONFIGS, EXPERIMENT_CONFIG, verbose=True)
print("\nExperiment Complete!")


STEP 5: Running Experiments
Total scheduled epochs: 2400


Overall Progress:   0%|          | 0/2400 [00:00<?, ?epoch/s]


HORIZON H=1
Building samples for horizon 1...
Built 74 samples for horizon=1: X=(74, 12, 1811, 13), y=(74, 1, 1811, 1)
Built 17 samples for horizon=1: X=(17, 12, 1811, 13), y=(17, 1, 1811, 1)
Built 12 samples for horizon=1: X=(12, 12, 1811, 13), y=(12, 1, 1811, 1)

------------------------------------------------------------
Training DCRNN
------------------------------------------------------------
  Early stopping at epoch 52
  SUCCESS - Val Loss: 0.3611, Test SMAPE: 30.32%

------------------------------------------------------------
Training STGCN
------------------------------------------------------------
  Early stopping at epoch 53
  SUCCESS - Val Loss: 0.4170, Test SMAPE: 31.97%

------------------------------------------------------------
Training STConv
------------------------------------------------------------
  Early stopping at epoch 43
  SUCCESS - Val Loss: 0.4308, Test SMAPE: 33.51%

------------------------------------------------------------
Training A3TGCN
-------

testing MTGNN with given graph semantic

In [None]:
# =============================================================================
# EXTRA EXPERIMENT: MTGNN WITH SEMANTIC GRAPH (SAVE TENSORS ONLY)
# =============================================================================
import copy
import os
from tqdm.auto import tqdm

print("\n" + "=" * 80)
print("EXTRA RUN: MTGNN (semantic feature set) - SAVE TENSORS ONLY")
print("=" * 80)

mtgnn_results = {}
horizons = EXPERIMENT_CONFIG["horizons"]
seq_len = EXPERIMENT_CONFIG["sequence_length"]

# Create dedicated tensor folder
MTGNN_TENSOR_FOLDER = "/content/MTGNN_semantic_tensors"
os.makedirs(MTGNN_TENSOR_FOLDER, exist_ok=True)
print(f"Tensors will be saved into: {MTGNN_TENSOR_FOLDER}")

# Only MTGNN runs here
model_name = "MTGNN"
mtgnn_epochs = BASE_MODEL_CONFIGS[model_name]["training"]["epochs"]
total_epochs_mtgnn = mtgnn_epochs * len(horizons)

with tqdm(total=total_epochs_mtgnn, unit="epoch", desc="MTGNN-semantic") as pbar:
    for H in horizons:
        tqdm.write(f"\n{'=' * 80}")
        tqdm.write(f"MTGNN SEMANTIC - HORIZON H={H}")
        tqdm.write(f"{'=' * 80}")

        # Build samples
        Xtr_np, ytr_np = build_optionA_samples(
            X_train_scaled, train_weeks, H, seq_len, target_col_idx
        )
        Xval_np, yval_np = build_optionA_samples(
            X_val_scaled, val_weeks, H, seq_len, target_col_idx
        )
        Xte_np, yte_np = build_optionA_samples(
            X_test_scaled, test_combined_weeks, H, seq_len, target_col_idx
        )

        # Skip if not enough samples
        if Xtr_np.shape[0] == 0:
            tqdm.write(f"Skipping H={H}: insufficient data")
            pbar.update(mtgnn_epochs)
            continue

        # Convert to torch
        Xtr, ytr = to_torch_dataset(Xtr_np, ytr_np, device)
        Xval, yval = to_torch_dataset(Xval_np, yval_np, device)
        Xte, yte = to_torch_dataset(Xte_np, yte_np, device)

        try:
            tqdm.write(f"Training MTGNN...")
            train_result = train_one_model_optionA(
                model_name,
                BASE_MODEL_CONFIGS,
                EXPERIMENT_CONFIG,
                Xtr, ytr, Xval, yval,
                edge_index_tensor, edge_weight_tensor,  # passed but ignored by MTGNN
                device=device,
                verbose=True,
                pbar=pbar
            )

            # Evaluate
            test_metrics, test_tensors = evaluate_model_optionA(
                train_result["model"],
                model_name,
                Xte, yte,
                edge_index_tensor, edge_weight_tensor,
                scaler=scaler,
                target_col_idx=target_col_idx,
                return_tensors=True,
            )

            mtgnn_results[H] = {
                "RMSE": test_metrics["RMSE"],
                "SMAPE": test_metrics["SMAPE"],
                "VAL_LOSS": train_result["best_val_loss"]
            }

            tqdm.write(
                f"SUCCESS H={H} | RMSE={test_metrics['RMSE']:.4f} | SMAPE={test_metrics['SMAPE']:.2f}%"
            )

            # ============================================================
            # SAVE ONLY TENSORS INTO MTGNN_semantic_tensors/
            # ============================================================
            torch.save(
                test_tensors["predictions"],
                f"{MTGNN_TENSOR_FOLDER}/MTGNN_h{H}_predictions.pt"
            )
            torch.save(
                test_tensors["targets"],
                f"{MTGNN_TENSOR_FOLDER}/MTGNN_h{H}_targets.pt"
            )
            torch.save(
                test_tensors["inputs"],
                f"{MTGNN_TENSOR_FOLDER}/MTGNN_h{H}_inputs.pt"
            )
            torch.save(
                test_tensors["errors"],
                f"{MTGNN_TENSOR_FOLDER}/MTGNN_h{H}_errors.pt"
            )

            tqdm.write(f"Tensors saved for H={H}.")

            # Cleanup GPU memory
            del train_result["model"]
            gc.collect()
            if device == "cuda":
                torch.cuda.empty_cache()

        except Exception as e:
            tqdm.write(f"MTGNN FAILED for H={H} — {str(e)[:200]}")
            mtgnn_results[H] = {"ERROR": str(e)}

            gc.collect()
            if device == "cuda":
                torch.cuda.empty_cache()

# Final summary printout
print("\n" + "=" * 80)
print("MTGNN SEMANTIC RESULTS SUMMARY")
print("=" * 80)
for H, r in mtgnn_results.items():
    print(f"H={H}: {r}")
print("\nAll tensors saved to MTGNN_semantic_tensors/")


EXTRA RUN: MTGNN (semantic feature set) - SAVE TENSORS ONLY
Tensors will be saved into: /content/MTGNN_semantic_tensors


MTGNN-semantic:   0%|          | 0/300 [00:00<?, ?epoch/s]


MTGNN SEMANTIC - HORIZON H=1
Built 74 samples for horizon=1: X=(74, 12, 1811, 13), y=(74, 1, 1811, 1)
Built 17 samples for horizon=1: X=(17, 12, 1811, 13), y=(17, 1, 1811, 1)
Built 12 samples for horizon=1: X=(12, 12, 1811, 13), y=(12, 1, 1811, 1)
Training MTGNN...
  Early stopping at epoch 18
SUCCESS H=1 | RMSE=1.4097 | SMAPE=31.81%
Tensors saved for H=1.

MTGNN SEMANTIC - HORIZON H=6
Built 69 samples for horizon=6: X=(69, 12, 1811, 13), y=(69, 1, 1811, 1)
Built 12 samples for horizon=6: X=(12, 12, 1811, 13), y=(12, 1, 1811, 1)
Built 7 samples for horizon=6: X=(7, 12, 1811, 13), y=(7, 1, 1811, 1)
Training MTGNN...
  Early stopping at epoch 21
SUCCESS H=6 | RMSE=1.6375 | SMAPE=36.45%
Tensors saved for H=6.

MTGNN SEMANTIC - HORIZON H=12
Built 63 samples for horizon=12: X=(63, 12, 1811, 13), y=(63, 1, 1811, 1)
Built 6 samples for horizon=12: X=(6, 12, 1811, 13), y=(6, 1, 1811, 1)
Built 1 samples for horizon=12: X=(1, 12, 1811, 13), y=(1, 1, 1811, 1)
Training MTGNN...
  Early stopping a

# Single Model/Horizon Testing

Use this cell to test individual models for specific horizons without running the full experiment.

**Instructions:**
1. Set `TEST_MODEL_NAME` to one of: 'DCRNN', 'STGCN', 'STConv', 'A3TGCN', 'GConvLSTM', 'MTGNN', 'AGCRN', 'GraphWaveNet'
2. Set `TEST_HORIZON` to one of: 1, 6, 12
3. Run this cell independently (it will reload data if needed)

In [None]:
# ============================================================================
# CONFIGURATION - Edit these variables
# ============================================================================
TEST_MODEL_NAME = "STGCN"  # Choose: 'DCRNN', 'STGCN', 'STConv', 'A3TGCN',
                           # 'GConvLSTM', 'MTGNN', 'AGCRN', 'GraphWaveNet'
TEST_HORIZON = 1           # Choose: 1, 6, or 12

# ============================================================================
# Automatic data loading (runs if data not already loaded)
# ============================================================================
import warnings
import gc

warnings.filterwarnings("ignore")

print("=" * 80)
print(f"SINGLE MODEL TEST: {TEST_MODEL_NAME} at Horizon H={TEST_HORIZON}")
print("=" * 80)

# Check if data is already loaded
try:
    _ = edge_index_tensor
    _ = edge_weight_tensor
    _ = df
    _ = keyword_map
    print("\n✓ Data already loaded from previous cells")
except NameError:
    print("\n⚠ Data not loaded. Loading data now...")

    # Device selection
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")

    # Load data
    edge_index, edge_weight, keyword_map, df = load_graph_and_data(
        GRAPH_FOLDER_PATH,
        TIME_SERIES_CSV_PATH,
    )

    # Convert to tensors
    edge_index_tensor = torch.from_numpy(edge_index).long().to(device)
    edge_weight_tensor = torch.from_numpy(edge_weight).float().to(device)

    # Build week splits
    trainval_weeks, test_weeks = build_week_splits(
        df,
        EXPERIMENT_CONFIG["test_weeks_last"],
    )

    val_ratio = EXPERIMENT_CONFIG["val_split_ratio"]
    split_idx = int(len(trainval_weeks) * (1 - val_ratio))

    train_weeks = trainval_weeks[:split_idx]
    val_weeks = trainval_weeks[split_idx:]

    # Filter features
    available_features = [col for col in FEATURE_COLS if col in df.columns]

    # Build feature tensors
    X_train_raw = build_feature_tensor(df, keyword_map, train_weeks, available_features)
    X_val_raw = build_feature_tensor(df, keyword_map, val_weeks, available_features)

    # Build test tensor with sliding window
    seq_len = EXPERIMENT_CONFIG["sequence_length"]
    all_weeks = np.concatenate([train_weeks, val_weeks, test_weeks])
    test_start_idx = len(train_weeks) + len(val_weeks)
    test_window_start = max(0, test_start_idx - seq_len)
    test_combined_weeks = all_weeks[test_window_start:]

    X_test_raw = build_feature_tensor(
        df,
        keyword_map,
        test_combined_weeks,
        available_features,
    )

    # Scale features
    target_col_idx = available_features.index(TARGET_COL)
    X_train_scaled, X_val_scaled, X_test_scaled, scaler = scale_features(
        X_train_raw,
        X_val_raw,
        X_test_raw,
        EXPERIMENT_CONFIG["scaling"],
        target_col_idx=target_col_idx,
    )


    print("✓ Data loaded successfully")

# Ensure device is set
try:
    _ = device
except NameError:
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"\nUsing device: {device}")

# ============================================================================
# Build samples for the specified horizon
# ============================================================================
print(f"\n{'-' * 80}")
print(f"Building samples for H={TEST_HORIZON}...")
print(f"{'-' * 80}")

H = TEST_HORIZON
seq_len = EXPERIMENT_CONFIG["sequence_length"]

Xtr_np, ytr_np = build_optionA_samples(
    X_train_scaled,
    train_weeks,
    H,
    seq_len,
    target_col_idx,
)
Xval_np, yval_np = build_optionA_samples(
    X_val_scaled,
    val_weeks,
    H,
    seq_len,
    target_col_idx,
)
Xte_np, yte_np = build_optionA_samples(
    X_test_scaled,
    test_combined_weeks,
    H,
    seq_len,
    target_col_idx,
)

# Convert to torch
Xtr, ytr = to_torch_dataset(Xtr_np, ytr_np, device)
Xval, yval = to_torch_dataset(Xval_np, yval_np, device)
Xte, yte = to_torch_dataset(Xte_np, yte_np, device)

print("\nData shapes:")
print(f"  Train: X={Xtr.shape}, y={ytr.shape}")
print(f"  Val:   X={Xval.shape}, y={yval.shape}")
print(f"  Test:  X={Xte.shape}, y={yte.shape}")

# ============================================================================
# Train and evaluate the specified model
# ============================================================================
print(f"\n{'-' * 80}")
print(f"Training {TEST_MODEL_NAME}...")
print(f"{'-' * 80}\n")

try:
    # Train model
    train_result = train_one_model_optionA(
        TEST_MODEL_NAME,
        BASE_MODEL_CONFIGS,
        EXPERIMENT_CONFIG,
        Xtr,
        ytr,
        Xval,
        yval,
        edge_index_tensor,
        edge_weight_tensor,
        device=device,
        verbose=True,
    )

    print(f"\n{'-' * 80}")
    print("Training complete! Evaluating on test set...")
    print(f"{'-' * 80}\n")

    # Evaluate on test - now returns tuple
    test_metrics, test_tensors = evaluate_model_optionA(
        train_result["model"],
        TEST_MODEL_NAME,
        Xte,
        yte,
        edge_index_tensor,
        edge_weight_tensor,
        scaler = scaler,
        target_col_idx = target_col_idx,
        return_tensors=True,
    )

    # Print results
    print("=" * 80)
    print("RESULTS")
    print("=" * 80)
    print(f"\nModel: {TEST_MODEL_NAME}")
    print(f"Horizon: {TEST_HORIZON}")

    print("\nTraining:")
    print(f"  - Best Validation Loss: {train_result['best_val_loss']:.6f}")
    print(f"  - Epochs Trained: {train_result['epochs_trained']}")

    print("\nTest Set Performance:")
    print(f"  - MSE:   {test_metrics['MSE']:.6f}")
    print(f"  - MAE:   {test_metrics['MAE']:.6f}")
    print(f"  - RMSE:  {test_metrics['RMSE']:.6f}")
    print(f"  - SMAPE: {test_metrics['SMAPE']:.2f}%")

    print("\n" + "=" * 80)
    print("✓ Test completed successfully!")
    print("=" * 80)

    # Save tensors for this single test
    print("\nSaving tensors...")
    save_model_tensors(
        test_tensors,
        TEST_MODEL_NAME,
        TEST_HORIZON,
        EXPERIMENT_CONFIG,
        verbose=True,
    )

    # Clean up GPU memory
    del train_result
    del test_tensors
    gc.collect()

    if device == "cuda":
        torch.cuda.empty_cache()
        print("\nGPU memory cleared")

except Exception as e:
    print("=" * 80)
    print("✗ FAILED")
    print("=" * 80)
    print(f"\nError: {str(e)}")
    print("\nFull traceback:")
    import traceback

    traceback.print_exc()
    print("=" * 80)

    # Clean up even on failure
    gc.collect()
    if device == "cuda":
        torch.cuda.empty_cache()

SINGLE MODEL TEST: STGCN at Horizon H=1

✓ Data already loaded from previous cells

--------------------------------------------------------------------------------
Building samples for H=1...
--------------------------------------------------------------------------------
Built 74 samples for horizon=1: X=(74, 12, 1811, 13), y=(74, 1, 1811, 1)
Built 17 samples for horizon=1: X=(17, 12, 1811, 13), y=(17, 1, 1811, 1)
Built 12 samples for horizon=1: X=(12, 12, 1811, 13), y=(12, 1, 1811, 1)

Data shapes:
  Train: X=torch.Size([74, 12, 1811, 13]), y=torch.Size([74, 1, 1811, 1])
  Val:   X=torch.Size([17, 12, 1811, 13]), y=torch.Size([17, 1, 1811, 1])
  Test:  X=torch.Size([12, 12, 1811, 13]), y=torch.Size([12, 1, 1811, 1])

--------------------------------------------------------------------------------
Training STGCN...
--------------------------------------------------------------------------------

  Epoch 10/100 - Train: 0.4543, Val: 0.4467


KeyboardInterrupt: 

# Hyper-parameter tuning

In [None]:
# =============================================================================
# HYPERPARAMETER TUNING SETUP (With Memory Cleanup, CPU datasets)
# =============================================================================

import time
import random
import copy
import gc
import torch
import torch.nn as nn
import numpy as np
from itertools import product
from tqdm.auto import tqdm

# 1. Define the specific models to tune per horizon
# we do not tune AGCRN because of GPU memory requirements
TOP_MODELS_BY_HORIZON = {
    1:  ['DCRNN', 'GConvLSTM', 'GraphWaveNet'],
    6:  ['DCRNN', 'GConvLSTM', 'GraphWaveNet'],
    12:  ['DCRNN', 'GConvLSTM', 'GraphWaveNet'],
}

# 2. Hyperparameter search space
PARAM_GRIDS = {
    # ============================================================
    # 1) DCRNN — best so far: hidden_channels=96, k_hops=2, lr=0.003
    # ============================================================
    "DCRNN": {
        # Centered around 96, but still lets you see if a bit smaller is enough
        "hidden_channels": [64, 96, 128],
        # Best was 2; 3 is a reasonable nearby alternative
        "k_hops": [2, 3],
        # 0.003 was best; 0.001 is a safer/slower alternative
        "learning_rate": [0.003, 0.001],
    },

    # ============================================================
    # 2) GConvLSTM — best so far: hidden_size=64, K=2, lr=0.01
    # ============================================================
    "GConvLSTM": {
        # Best at 64; 32 tests a lighter model, 96 a slightly larger one
        "hidden_size": [32, 64, 96],
        # Best at 2; 3 lets you test slightly larger spatial receptive field
        "K": [2, 3],
        # 0.01 was best; 0.003 is a classic “slower but safer” option
        "learning_rate": [0.01, 0.003],
    },

    # ============================================================
    # 3) GraphWaveNet — best so far: hidden_size=32, dropout=0.5, lr=0.01
    # ============================================================
    "GraphWaveNet": {
        # Best at 32; 48 gives a bit more capacity without going huge
        "hidden_size": [32, 48],
        # 0.5 worked well; 0.3 is a reasonable nearby alternative
        "dropout": [0.3, 0.5],
        # Same reasoning as GConvLSTM
        "learning_rate": [0.01, 0.003],
    },

    # ============================================================
    # 4) (OPTIONAL) AGCRN — lighter config to avoid OOM
    #    Only if you decide to include it later.
    # ============================================================
    "AGCRN": {
        # Keep this small: 16–32 is much safer with 1811 nodes
        "hidden_size": [16, 32],
        # 1–2 layers max — deeper gets expensive fast
        "n_layers": [1, 2],
        # Same lr logic
        "learning_rate": [0.01, 0.003],
    },
}

N_TRIALS = 8
TUNING_EPOCHS = 15

# =============================================================================
# SAMPLING FROM GRID
# =============================================================================

def sample_config(model_name):
    grid = PARAM_GRIDS[model_name]
    config = {}
    for key, values in grid.items():
        config[key] = random.choice(values)
    return config

# =============================================================================
# TUNING FUNCTION (Accepts global progress bar)
# =============================================================================

def train_tuning_model(model_name, params, X_tr, y_tr, X_val, y_val,
                       edge_idx, edge_wt, device, pbar=None):
    """
    Train model with L1 Loss and update global progress bar.
    Returns: model, best_val_loss, epochs_run
    NOTE: X_tr, y_tr, X_val, y_val are on CPU.
    """
    full_config = copy.deepcopy(BASE_MODEL_CONFIGS)

    # Inject hyperparameters into config
    for k, v in params.items():
        if k == 'learning_rate':
            full_config[model_name]['training'][k] = v
        elif k == 'weight_decay':
            # used only in optimizer, do NOT pass into model __init__
            continue
        else:
            full_config[model_name]['params'][k] = v

    # Instantiate model on device
    model = get_model(model_name, full_config, device=device)

    optimizer = torch.optim.Adam(
        model.parameters(),
        lr=params.get('learning_rate', 0.001),
        weight_decay=params.get('weight_decay', 0.0)
    )
    criterion = nn.L1Loss()  # Train on MAE

    # Optimized batch confing
    if model_name in ["AGCRN"]:
        batch_size = 4
    elif model_name in ["DCRNN", "GConvLSTM", "MTGNN"]:
        batch_size = 8
    elif model_name in ["GraphWaveNet"]:
        batch_size = 16
    else:
        batch_size = 8


    train_loader = make_dataloader(X_tr, y_tr, batch_size, shuffle=True)
    val_loader   = make_dataloader(X_val, y_val, batch_size, shuffle=False)

    no_edge_models = ["AGCRN"]
    best_val_loss = float("inf")
    patience = 5
    no_improve = 0
    epochs_run = 0

    for epoch in range(TUNING_EPOCHS):
        # -------------------------
        # Train
        # -------------------------
        model.train()
        for Xb, yb in train_loader:
            Xb = Xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)

            optimizer.zero_grad()
            if model_name == "DCRNN":
                output = forward_dcrnn_optionA(model, Xb, edge_idx, edge_wt)
            elif model_name in no_edge_models:
                out = model(Xb)
                output = ensure_B1N1(out)
            else:
                output = model(Xb, edge_idx, edge_wt)

            loss = criterion(output, yb)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 5.0)
            optimizer.step()

        # -------------------------
        # Validation
        # -------------------------
        model.eval()
        val_losses = []
        with torch.no_grad():
            for Xb, yb in val_loader:
                Xb = Xb.to(device, non_blocking=True)
                yb = yb.to(device, non_blocking=True)

                if model_name == "DCRNN":
                    output = forward_dcrnn_optionA(model, Xb, edge_idx, edge_wt)
                elif model_name in no_edge_models:
                    out = model(Xb)
                    output = ensure_B1N1(out)
                else:
                    output = model(Xb, edge_idx, edge_wt)

                val_losses.append(criterion(output, yb).item())

        avg_val = np.mean(val_losses)

        # Update global progress bar (real epochs only)
        if pbar:
            pbar.update(1)
            pbar.set_postfix(mae=f"{avg_val:.4f}")

        epochs_run += 1

        # Early stopping
        if avg_val < best_val_loss:
            best_val_loss = avg_val
            no_improve = 0
        else:
            no_improve += 1
            if no_improve >= patience:
                break

    return model, best_val_loss, epochs_run

# =============================================================================
# RUNNING THE SEARCH
# =============================================================================


tuning_results = []
horizons = [1] # only tune on short horizon old was: EXPERIMENT_CONFIG['horizons']

# Calculate total steps for the progress bar (upper bound)
total_models = sum(len(TOP_MODELS_BY_HORIZON[h]) for h in horizons)
total_steps = total_models * N_TRIALS * TUNING_EPOCHS

print("="*80)
print(f"STARTING TUNING (Max Epochs: {total_steps})")
print("="*80)

with tqdm(total=total_steps, unit="epoch") as pbar:
    for H in horizons:
        # Prepare Data for this horizon (CPU tensors)
        seq_len = EXPERIMENT_CONFIG["sequence_length"]
        Xtr_np, ytr_np = build_optionA_samples(
            X_train_scaled, train_weeks, H, seq_len, target_col_idx
        )
        Xtr, ytr = to_torch_dataset(Xtr_np, ytr_np, device="cpu")

        Xval_np, yval_np = build_optionA_samples(
            X_val_scaled, val_weeks, H, seq_len, target_col_idx
        )
        Xval, yval = to_torch_dataset(Xval_np, yval_np, device="cpu")

        for model_name in TOP_MODELS_BY_HORIZON[H]:
            best_val_mae_for_model = float('inf')
            best_params_for_model = None
            best_epochs_for_model = None
            best_time_for_model = None

            for trial in range(N_TRIALS):
                pbar.set_description(f"H={H} | {model_name} | Trial {trial+1}/{N_TRIALS}")
                params = sample_config(model_name)

                try:
                    t0 = time.time()
                    model, val_mae, epochs_run = train_tuning_model(
                        model_name, params,
                        Xtr, ytr, Xval, yval,
                        edge_index_tensor, edge_weight_tensor,
                        device,
                        pbar=pbar
                    )
                    t1 = time.time()
                    elapsed = t1 - t0
                    avg_sec_per_epoch = elapsed / max(epochs_run, 1)

                    tqdm.write(
                        f"H={H} | {model_name:<10} | Trial {trial+1}/{N_TRIALS} "
                        f"| stopped at epoch {epochs_run:2d} "
                        f"| VAL MAE={val_mae:.4f} "
                        f"| {elapsed:.1f}s total (~{avg_sec_per_epoch:.2f}s/epoch)"
                    )

                    # Select by best validation MAE
                    if val_mae < best_val_mae_for_model:
                        best_val_mae_for_model = val_mae
                        best_params_for_model = params
                        best_epochs_for_model = epochs_run
                        best_time_for_model = elapsed

                    # Cleanup after successful trial
                    del model
                    gc.collect()
                    if device == "cuda":
                        torch.cuda.empty_cache()

                except Exception as e:
                    tqdm.write(f"H={H} | {model_name} | Trial {trial+1}: FAILED ({e})")
                    # Best effort cleanup
                    if 'model' in locals():
                        del model
                    gc.collect()
                    if device == "cuda":
                        torch.cuda.empty_cache()

            tqdm.write(
                f"[SUMMARY] H={H} | {model_name:<10} "
                f"| Best VAL MAE={best_val_mae_for_model:.4f} "
                f"| epochs={best_epochs_for_model} "
                f"| time={best_time_for_model:.1f}s"
            )

            tuning_results.append({
                'horizon': H,
                'model': model_name,
                'best_params': best_params_for_model,
                'best_val_mae': best_val_mae_for_model,
                'best_epochs': best_epochs_for_model,
                'best_time_sec': best_time_for_model,
            })

print("\n" + "="*80)
print("TUNING COMPLETE. BEST CONFIGURATIONS:")
for res in tuning_results:
    print(
        f"H={res['horizon']} | {res['model']:<10} "
        f"| VAL MAE={res['best_val_mae']:.4f} "
        f"| epochs={res['best_epochs']} "
        f"| time={res['best_time_sec']:.1f}s "
        f"| {res['best_params']}"
    )

STARTING TUNING (Max Epochs: 360)


  0%|          | 0/360 [00:00<?, ?epoch/s]

Built 74 samples for horizon=1: X=(74, 12, 1811, 9), y=(74, 1, 1811, 1)
Built 17 samples for horizon=1: X=(17, 12, 1811, 9), y=(17, 1, 1811, 1)
H=1 | DCRNN      | Trial 1/8 | stopped at epoch 15 | VAL MAE=0.3789 | 247.8s total (~16.52s/epoch)
H=1 | DCRNN      | Trial 2/8 | stopped at epoch 15 | VAL MAE=0.3808 | 247.4s total (~16.49s/epoch)
H=1 | DCRNN      | Trial 3/8 | stopped at epoch 15 | VAL MAE=0.3831 | 247.1s total (~16.47s/epoch)
H=1 | DCRNN      | Trial 4/8 | stopped at epoch 15 | VAL MAE=0.3816 | 247.2s total (~16.48s/epoch)
H=1 | DCRNN      | Trial 5/8 | stopped at epoch 15 | VAL MAE=0.3796 | 247.8s total (~16.52s/epoch)
H=1 | DCRNN      | Trial 6/8 | stopped at epoch 15 | VAL MAE=0.3806 | 350.7s total (~23.38s/epoch)
H=1 | DCRNN      | Trial 7/8 | stopped at epoch 15 | VAL MAE=0.3787 | 247.5s total (~16.50s/epoch)
H=1 | DCRNN      | Trial 8/8 | stopped at epoch 15 | VAL MAE=0.3778 | 246.9s total (~16.46s/epoch)
[SUMMARY] H=1 | DCRNN      | Best VAL MAE=0.3778 | epochs=15 | t

In [None]:
import pandas as pd
import json

# Filter only horizon 1
tuning_h1 = [r for r in tuning_results if r['horizon'] == 1]

# Build DataFrame
df_h1 = pd.DataFrame(tuning_h1)

# Optional: sort for readability
df_h1 = df_h1.sort_values(['model'])

# Optional: pretty-print params as JSON strings (nicer in CSV)
df_h1['best_params_json'] = df_h1['best_params'].apply(lambda p: json.dumps(p))

# Show in notebook
print(df_h1)

# Save to CSV (adjust path if you want it directly in Drive)
csv_path = "/content/tuning_results_h1.csv"
df_h1.to_csv(csv_path, index=False)

print(f"\nSaved H=1 tuning results to: {csv_path}")

   horizon         model                                        best_params  \
0        1         DCRNN  {'hidden_channels': 96, 'k_hops': 2, 'learning...   
1        1     GConvLSTM  {'hidden_size': 32, 'K': 3, 'learning_rate': 0...   
2        1  GraphWaveNet  {'hidden_size': 32, 'dropout': 0.3, 'learning_...   

   best_val_mae  best_epochs  best_time_sec  \
0      0.377829           15     246.853319   
1      0.377668           15     250.536676   
2      0.366334           15      54.234078   

                                    best_params_json  
0  {"hidden_channels": 96, "k_hops": 2, "learning...  
1  {"hidden_size": 32, "K": 3, "learning_rate": 0...  
2  {"hidden_size": 32, "dropout": 0.3, "learning_...  

Saved H=1 tuning results to: /content/tuning_results_h1.csv


# Train the best models

In [None]:
# =============================================================================
# FINAL STEP: TRAINING THE BEST MODELS
# =============================================================================

import copy
import os
import gc
import torch
import torch.nn as nn
from tqdm.auto import tqdm
import pandas as pd

# 1. Define the Winning Configurations
BEST_CONFIGS = {
    1: {
        'DCRNN':        {'hidden_channels': 96, 'k_hops': 2, 'learning_rate': 0.003},
        'GConvLSTM':    {'hidden_size': 32, 'K': 3, 'learning_rate': 0.01},
        'GraphWaveNet': {'hidden_size': 32, 'dropout': 0.3, 'learning_rate': 0.003},
    },
    6: {
        'DCRNN':        {'hidden_channels': 96, 'k_hops': 2, 'learning_rate': 0.003},
        'GConvLSTM':    {'hidden_size': 32, 'K': 3, 'learning_rate': 0.01},
        'GraphWaveNet': {'hidden_size': 32, 'dropout': 0.3, 'learning_rate': 0.003},
    },
    12: {
        'DCRNN':        {'hidden_channels': 96, 'k_hops': 2, 'learning_rate': 0.003},
        'GConvLSTM':    {'hidden_size': 32, 'K': 3, 'learning_rate': 0.01},
        'GraphWaveNet': {'hidden_size': 32, 'dropout': 0.3, 'learning_rate': 0.003},
    }
}

# 2. Update Experiment Config with Robust Paths
#    We will try to save to Drive first.
DRIVE_BASE_PATH = '/content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/benchmarks/best_models'
LOCAL_BASE_PATH = '/content/best_models'

FINAL_EXP_CONFIG = copy.deepcopy(EXPERIMENT_CONFIG)
FINAL_EXP_CONFIG.update({
    'csv_export': {
        'enabled': True,
        'experiment_name': 'best_models',
        'model_prefix': 'tuned',
        'exog_mode': 'graph',
        # We set these dynamically below based on success/failure of Drive
        'drive_path': DRIVE_BASE_PATH,
        'local_path': LOCAL_BASE_PATH,
    },
    'tensor_export': {
        'enabled': True,
        'experiment_name': 'best_models',
        'save_predictions': True,
        'save_targets': True,
        'save_inputs': True,
        'save_errors': True,
        # Tensors usually go into a subfolder to avoid clutter
        'drive_path': os.path.join(DRIVE_BASE_PATH, 'tensors'),
        'local_path': os.path.join(LOCAL_BASE_PATH, 'tensors'),
    },
    'training_defaults': {
        'early_stopping_patience': 15,
        'gradient_clip_norm': 5.0,
    }
})

FINAL_EPOCHS = 80

# 3. Helpers to inject params
def apply_params_to_config(model_name, params, base_configs):
    new_config = copy.deepcopy(base_configs)
    if 'learning_rate' in params:
        new_config[model_name]['training']['learning_rate'] = params['learning_rate']
    for k, v in params.items():
        if k != 'learning_rate':
            new_config[model_name]['params'][k] = v
    new_config[model_name]['training']['epochs'] = FINAL_EPOCHS
    return new_config

# =============================================================================
# RUNNING THE FINAL TRAINING
# =============================================================================

final_results = {}
horizons = sorted(BEST_CONFIGS.keys())

total_steps = 0
for h in horizons:
    total_steps += len(BEST_CONFIGS[h]) * FINAL_EPOCHS

print("="*80)
print(f"STARTING FINAL TRAINING (Max Epochs: {total_steps})")
print("="*80)

# Check Drive Accessibility upfront
drive_accessible = os.path.isdir('/content/drive')
if drive_accessible:
    print(f"✓ Google Drive found. Saving to: {DRIVE_BASE_PATH}")
    os.makedirs(FINAL_EXP_CONFIG['tensor_export']['drive_path'], exist_ok=True)
else:
    print(f"⚠ Drive NOT found. Will save locally to: {LOCAL_BASE_PATH}")

with tqdm(total=total_steps, unit="epoch", desc="Total Progress") as pbar:

    for H in horizons:
        tqdm.write(f"\n{'='*60}")
        tqdm.write(f"HORIZON {H}")
        tqdm.write(f"{'='*60}")

        # 1. Data Prep
        seq_len = EXPERIMENT_CONFIG["sequence_length"]
        # Ensure target_col_idx is defined (from previous cells)
        if 'target_col_idx' not in locals():
            target_col_idx = 1 # Default fallback if variable lost

        Xtr_np, ytr_np = build_optionA_samples(X_train_scaled, train_weeks, H, seq_len, target_col_idx)
        Xtr, ytr = to_torch_dataset(Xtr_np, ytr_np, device)
        Xval_np, yval_np = build_optionA_samples(X_val_scaled, val_weeks, H, seq_len, target_col_idx)
        Xval, yval = to_torch_dataset(Xval_np, yval_np, device)
        Xte_np, yte_np = build_optionA_samples(X_test_scaled, test_combined_weeks, H, seq_len, target_col_idx)
        Xte, yte = to_torch_dataset(Xte_np, yte_np, device)

        if H not in final_results: final_results[H] = {}

        # 2. Iterate through best models
        for model_name, params in BEST_CONFIGS[H].items():
            tqdm.write(f"\nTraining {model_name}...")

            current_model_config = apply_params_to_config(model_name, params, BASE_MODEL_CONFIGS)

            try:
                # Train
                train_result = train_one_model_optionA(
                    model_name=model_name,
                    model_configs=current_model_config,
                    experiment_cfg=FINAL_EXP_CONFIG,
                    X_train=Xtr, y_train=ytr,
                    X_val=Xval, y_val=yval,
                    edge_index=edge_index_tensor,
                    edge_weight=edge_weight_tensor,
                    device=device,
                    verbose=True,
                    pbar=pbar
                )

                # Evaluate
                metrics, tensors = evaluate_model_optionA(
                    model=train_result['model'],
                    model_name=model_name,
                    X_test=Xte, y_test=yte,
                    edge_index=edge_index_tensor,
                    edge_weight=edge_weight_tensor,
                    scaler=scaler,
                    target_col_idx=target_col_idx,
                    return_tensors=True
                )

                final_results[H][model_name] = {
                    "best_val_loss": train_result["best_val_loss"],
                    "epochs_trained": train_result["epochs_trained"],
                    "test_MSE": metrics["MSE"],
                    "test_MAE": metrics["MAE"],
                    "test_RMSE": metrics["RMSE"],
                    "test_SMAPE": metrics["SMAPE"],
                    "config": str(params),
                    "status": "SUCCESS",
                }

                tqdm.write(f"  > DONE. SMAPE: {metrics['SMAPE']:.2f}% (Epochs: {train_result['epochs_trained']})")

                # Save Tensors (Automatic fallback handled in save_model_tensors function or here)
                # We'll rely on the paths set in FINAL_EXP_CONFIG.
                # Note: save_model_tensors attempts local if drive fails, as written in utils.
                save_model_tensors(tensors, model_name, H, FINAL_EXP_CONFIG, verbose=False)

                # ---------------------------
                # SAVE MODEL WEIGHTS (NEW)
                # ---------------------------
                weights_dir_local = os.path.join(LOCAL_BASE_PATH, "weights")
                os.makedirs(weights_dir_local, exist_ok=True)
                weights_fname = f"tuned_{model_name}_H{H}.pt"
                weights_path_local = os.path.join(weights_dir_local, weights_fname)
                torch.save(train_result["model"].state_dict(), weights_path_local)
                tqdm.write(f"  > Saved weights locally: {weights_path_local}")

                if os.path.isdir('/content/drive'):
                    weights_dir_drive = os.path.join(DRIVE_BASE_PATH, "weights")
                    os.makedirs(weights_dir_drive, exist_ok=True)
                    weights_path_drive = os.path.join(weights_dir_drive, weights_fname)
                    torch.save(train_result["model"].state_dict(), weights_path_drive)
                    tqdm.write(f"  > Saved weights to Drive: {weights_path_drive}")

                # Cleanup
                del train_result
                gc.collect()
                if device == "cuda": torch.cuda.empty_cache()

            except Exception as e:
                tqdm.write(f"  > FAILED: {e}")
                final_results[H][model_name] = {"status": "FAILED", "error": str(e)}
                pbar.update(FINAL_EPOCHS)
                gc.collect()
                if device == "cuda": torch.cuda.empty_cache()

# =============================================================================
# EXPORT FINAL RESULTS WITH FALLBACK
# =============================================================================
print("\n" + "="*80)
print("FINAL BENCHMARK COMPLETE")
print("="*80)

rows = []
for H in final_results:
    for m_name in final_results[H]:
        r = final_results[H][m_name]
        if r['status'] == 'SUCCESS':
            rows.append({
                'horizon': H,
                'model_id': f"tuned | {m_name}",
                'exog_mode': 'graph',
                'config': r['config'],
                'mse': r['test_MSE'],
                'mae': r['test_MAE'],
                'rmse': r['test_RMSE'],
                'smape': r['test_SMAPE'],
            })

df_final = pd.DataFrame(rows)
csv_name = "best_models_results.csv"

# 1. Always save locally first (Success guaranteed)
local_save_path = os.path.join(LOCAL_BASE_PATH, csv_name)
os.makedirs(LOCAL_BASE_PATH, exist_ok=True)
df_final.to_csv(local_save_path, index=False)
print(f"\n✓ Saved result CSV locally: {local_save_path}")

# 2. Try saving to Drive
drive_save_path = os.path.join(DRIVE_BASE_PATH, csv_name)
try:
    if os.path.isdir('/content/drive'):
        os.makedirs(DRIVE_BASE_PATH, exist_ok=True)
        df_final.to_csv(drive_save_path, index=False)
        print(f"✓ Saved result CSV to Drive: {drive_save_path}")
    else:
        print(f"⚠ Drive not mounted. Files are only at: {local_save_path}")
except Exception as e:
    print(f"⚠ Failed to save to Drive ({e}). Files are at: {local_save_path}")

print("\nTOP RESULTS:")
if not df_final.empty:
    print(
        df_final
        .sort_values(['horizon', 'smape'])
        [['horizon', 'model_id', 'smape', 'rmse']]
        .to_string(index=False)
    )
else:
    print("No successful results to display.")

# =============================================================================
# OPTIONAL: EXPORT PER-EPOCH TRAINING CURVES
# =============================================================================

log_dir_local = os.path.join(LOCAL_BASE_PATH, "training_curves")
os.makedirs(log_dir_local, exist_ok=True)

for H in final_results:
    for m_name, r in final_results[H].items():
        if r.get("status") != "SUCCESS":
            continue
        logs = r.get("epoch_logs")
        if not logs:
            continue

        df_logs = pd.DataFrame(logs)
        fname = f"training_curve_H{H}_{m_name}.csv"

        # Save locally
        save_path_local = os.path.join(log_dir_local, fname)
        df_logs.to_csv(save_path_local, index=False)
        print(f"✓ Saved training curve locally: {save_path_local}")

        # Save to Drive if available
        if os.path.isdir('/content/drive'):
            log_dir_drive = os.path.join(DRIVE_BASE_PATH, "training_curves")
            os.makedirs(log_dir_drive, exist_ok=True)
            save_path_drive = os.path.join(log_dir_drive, fname)
            df_logs.to_csv(save_path_drive, index=False)
            print(f"✓ Saved training curve to Drive: {save_path_drive}"    )

STARTING FINAL TRAINING (Max Epochs: 720)
✓ Google Drive found. Saving to: /content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/benchmarks/best_models


Total Progress:   0%|          | 0/720 [00:00<?, ?epoch/s]


HORIZON 1
Built 74 samples for horizon=1: X=(74, 12, 1811, 9), y=(74, 1, 1811, 1)
Built 17 samples for horizon=1: X=(17, 12, 1811, 9), y=(17, 1, 1811, 1)
Built 12 samples for horizon=1: X=(12, 12, 1811, 9), y=(12, 1, 1811, 1)

Training DCRNN...
[DCRNN] epoch 005/080 | train=0.4122 | val=0.3850 | lr=0.00300
[DCRNN] epoch 010/080 | train=0.3897 | val=0.3740 | lr=0.00300
[DCRNN] epoch 015/080 | train=0.3851 | val=0.3673 | lr=0.00300
[DCRNN] epoch 020/080 | train=0.3811 | val=0.3665 | lr=0.00300
[DCRNN] epoch 025/080 | train=0.3797 | val=0.3704 | lr=0.00300
[DCRNN] epoch 030/080 | train=0.3791 | val=0.3657 | lr=0.00150
[DCRNN] epoch 035/080 | train=0.3780 | val=0.3642 | lr=0.00150
[DCRNN] epoch 040/080 | train=0.3776 | val=0.3629 | lr=0.00075
[DCRNN] epoch 045/080 | train=0.3771 | val=0.3653 | lr=0.00075
[DCRNN] epoch 050/080 | train=0.3764 | val=0.3647 | lr=0.00038
[DCRNN] epoch 055/080 | train=0.3764 | val=0.3637 | lr=0.00019
  Early stopping at epoch 55 (best val=0.3629)
  > DONE. SMAP

# graph structure experiment

In [None]:
# =============================================================================
# GRAPH STRUCTURE COMPARISON (k=5 vs k=10 vs k=20)
# =============================================================================

import os
import numpy as np
import torch
import copy
import gc
from tqdm.auto import tqdm

# 1. Define Paths to your Graph Versions
BASE_DRIVE_PATH = "/content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive"
GRAPH_PATHS = {
    5:  os.path.join(BASE_DRIVE_PATH, "sebs_keyword_graph_knn_k5"),
    10: os.path.join(BASE_DRIVE_PATH, "sebs_keyword_graph_knn"),     # Default folder
    20: os.path.join(BASE_DRIVE_PATH, "sebs_keyword_graph_knn_k20")
}

# 2. Settings for the Test
TEST_MODEL = 'DCRNN'
TEST_HORIZON = 1
TEST_EPOCHS = 30  # Short run is enough to see the difference

# Use the Best Parameters you found for DCRNN at Horizon 1
TEST_PARAMS = {
    'hidden_channels': 32,
    'k_hops': 1,
    'learning_rate': 0.005
}

print("="*80)
print(f"COMPARING GRAPH STRUCTURES (k=5, 10, 20) with {TEST_MODEL}")
print("="*80)

# 3. Prepare Data (One time setup for Horizon 1)
print(f"Preparing data for Horizon {TEST_HORIZON}...")
seq_len = EXPERIMENT_CONFIG["sequence_length"]
target_col_idx = available_features.index(TARGET_COL)

Xtr_np, ytr_np = build_optionA_samples(X_train_scaled, train_weeks, TEST_HORIZON, seq_len, target_col_idx)
Xtr, ytr = to_torch_dataset(Xtr_np, ytr_np, device)

Xval_np, yval_np = build_optionA_samples(X_val_scaled, val_weeks, TEST_HORIZON, seq_len, target_col_idx)
Xval, yval = to_torch_dataset(Xval_np, yval_np, device)

Xte_np, yte_np = build_optionA_samples(X_test_scaled, test_combined_weeks, TEST_HORIZON, seq_len, target_col_idx)
Xte, yte = to_torch_dataset(Xte_np, yte_np, device)

# 4. Comparison Loop with Global Progress Bar
graph_results = []
total_epochs = len(GRAPH_PATHS) * TEST_EPOCHS

with tqdm(total=total_epochs, unit="epoch", desc="Comparing Graphs") as pbar:

    for k, folder_path in GRAPH_PATHS.items():
        tqdm.write(f"\n" + "-"*60)
        tqdm.write(f"Testing Graph k={k}")
        tqdm.write(f"Loading from: {folder_path}")
        tqdm.write("-"*(60))

        try:
            # A. Load the Graph Files
            edge_index_path = os.path.join(folder_path, 'edge_index.npy')
            edge_weight_path = os.path.join(folder_path, 'edge_weight.npy')

            if not os.path.exists(edge_index_path):
                tqdm.write(f"⚠ File not found: {edge_index_path}")
                # Skip progress if file missing
                pbar.update(TEST_EPOCHS)
                continue

            ei = np.load(edge_index_path)
            ew = np.load(edge_weight_path)

            # Convert to Tensor
            ei_t = torch.from_numpy(ei).long().to(device)
            ew_t = torch.from_numpy(ew).float().to(device)

            tqdm.write(f"  > Loaded Graph: {ei.shape[1]} edges")

            # B. Configure Model with Best Params
            temp_config = copy.deepcopy(BASE_MODEL_CONFIGS)
            temp_config[TEST_MODEL]['params']['hidden_channels'] = TEST_PARAMS['hidden_channels']
            temp_config[TEST_MODEL]['params']['k_hops'] = TEST_PARAMS['k_hops']
            temp_config[TEST_MODEL]['training']['learning_rate'] = TEST_PARAMS['learning_rate']
            temp_config[TEST_MODEL]['training']['epochs'] = TEST_EPOCHS

            # C. Train (Passing the PBAR)
            train_result = train_one_model_optionA(
                TEST_MODEL, temp_config, EXPERIMENT_CONFIG,
                Xtr, ytr, Xval, yval,
                ei_t, ew_t,
                device=device,
                verbose=True,
                pbar=pbar # <--- Connects to global bar
            )

            # D. Evaluate
            metrics, _ = evaluate_model_optionA(
                train_result['model'], TEST_MODEL, Xte, yte,
                ei_t, ew_t,
                scaler=scaler, target_col_idx=target_col_idx, return_tensors=False
            )

            tqdm.write(f"  > Result (k={k}): Val Loss={train_result['best_val_loss']:.4f} | Test SMAPE={metrics['SMAPE']:.2f}%")

            graph_results.append({
                'k': k,
                'edges': ei.shape[1],
                'smape': metrics['SMAPE'],
                'rmse': metrics['RMSE']
            })

            # Cleanup
            del train_result
            gc.collect()
            torch.cuda.empty_cache()

        except Exception as e:
            tqdm.write(f"  > Failed: {e}")
            # Advance bar if run crashed
            pbar.update(TEST_EPOCHS)
            gc.collect()
            if device == "cuda": torch.cuda.empty_cache()

# 5. Summary
print("\n" + "="*80)
print("GRAPH STRUCTURE COMPARISON RESULTS")
print("="*80)
print(f"{'k':<5} | {'Edges':<10} | {'SMAPE (%)':<12} | {'RMSE':<10}")
print("-" * 45)
for res in sorted(graph_results, key=lambda x: x['smape']):
    print(f"{res['k']:<5} | {res['edges']:<10} | {res['smape']:<12.2f} | {res['rmse']:<10.4f}")

COMPARING GRAPH STRUCTURES (k=5, 10, 20) with DCRNN
Preparing data for Horizon 1...
Built 74 samples for horizon=1: X=(74, 12, 1811, 13), y=(74, 1, 1811, 1)
Built 17 samples for horizon=1: X=(17, 12, 1811, 13), y=(17, 1, 1811, 1)
Built 12 samples for horizon=1: X=(12, 12, 1811, 13), y=(12, 1, 1811, 1)


Comparing Graphs:   0%|          | 0/90 [00:00<?, ?epoch/s]


------------------------------------------------------------
Testing Graph k=5
Loading from: /content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/sebs_keyword_graph_knn_k5
------------------------------------------------------------
  > Loaded Graph: 9055 edges
  > Result (k=5): Val Loss=0.3637 | Test SMAPE=30.47%

------------------------------------------------------------
Testing Graph k=10
Loading from: /content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/sebs_keyword_graph_knn
------------------------------------------------------------
  > Loaded Graph: 18110 edges
  > Result (k=10): Val Loss=0.3628 | Test SMAPE=30.26%

------------------------------------------------------------
Testing Graph k=20
Loading from: /content/drive/MyDrive/Colab Notebooks/master_thesis_gdrive/sebs_keyword_graph_knn_k20
------------------------------------------------------------
  > Loaded Graph: 36220 edges
  > Result (k=20): Val Loss=0.3629 | Test SMAPE=30.39%

GRAPH STRUCTURE COMPARI

In [None]:
print('hello world')

hello world


In [None]:
import platform
print(platform.python_version())

3.12.12
