## Graph Attention Network (GAT)

### Importing Dependencies

We import the necessary libraries and functions, ensuring that all required modules and helper functions are properly integrated.


In [1]:
import networkx as nx
from torch_geometric.utils import from_networkx
import os
import sys

# gat → models → src
src_path = os.path.abspath(os.path.join(os.getcwd(), "..", ".."))
if src_path not in sys.path:
    sys.path.append(src_path)


import import_ipynb 
from utils.wrapper.networkx_to_pyg import networkx_to_pyg
from utils.add_dummy_node_features import add_dummy_node_features


### Loading and Preparing Graph Data from GraphML Files

This code snippet loads a series of bicycle traffic network graphs stored in GraphML format and prepares them for training with PyTorch Geometric (PyG). The objective is to convert each monthly graph into a format compatible with Graph Neural Networks (GNNs), ensuring that edge features are retained.

Each NetworkX graph is converted into a PyG `Data` object using a custom helper function `networkx_to_pyg`. This function ensures that essential edge attributes such as:

- `tracks` (the number of bicycles traveling from the starting to the ending point),
- `month` and `year`,
- `speed_rel` (relative speed),

are preserved during the conversion process.

PyG expects data in a specific structure, particularly when edge attributes are used in models like GATv2.

`data_list` contains multiple `torch_geometric.data.Data` objects, each representing a graph.

**NOTE:** So far I only did for data from 2023. I will iterate through all available data, when the implementation is finished.


In [2]:

# Initialize an empty list to store the PyTorch Geometric Data objects
data_list = []

# Iterate over the 12 graph files (from 0 to 11)
for i in range(12):  # 0 to 11
    # Build the path to the graph file
    path = f"../../../graphs/2023/bike_network_2023_{i}.graphml"
    
    # Read the graph from the GraphML file
    G_nx = nx.read_graphml(path)
    
    # Ensure the graph is loaded as a directed graph (DiGraph)
    G_nx = nx.DiGraph(G_nx)
    
    # Use the custom function to convert the NetworkX graph to a PyTorch Geometric Data object
    # The edge attributes (such as 'tracks', 'month', 'year', 'speed_rel') will be preserved
    data = networkx_to_pyg(G_nx)
    
    # Append the Data object to the list
    data_list.append(data)

# Check the result - number of graphs (Data objects) loaded
print(f"Number of graphs: {len(data_list)}")


Number of graphs: 12


### Adding Dummy Node Features to the Graphs

In this section of the code, we add **dummy node features** to our graphs. This process ensures that each node in our graphs has a **feature dimension**, even if no node features were originally present. This is an important step in preparing the data for use in Graph Neural Networks (GNNs).

**NOTE:** At a later stage, once we have implemented feature engineering, we will replace the dummy features with the engineered ones.


In [3]:
data_list = add_dummy_node_features(data_list, feature_dim=1, value=1.0)


### Train-Validation Split

For predicting edge attributes (e.g., `tracks`), an 80/20 train/validation split was applied to the **existing edges within each graph**.

In our application, the **nodes represent physically existing bike stations**, which typically do not change or only change very infrequently. The aim of the analysis is to model the **connections between stations**, i.e., to understand and predict how many bicycles move along certain routes (in other words: edges with weights).

A **node-level split** (i.e., an 80/20 split of the nodes themselves) would mean that some stations would be completely unseen during training. This would not be meaningful because:

- The **stations themselves are not the prediction target**;
- It is the **relationships or transitions between the stations (edges)** that should be modeled;
- In deployment, **all stations are known** (they are physically installed in the system);

**NOTE:** GCN and GCN-GRU could also use these functions. So we would move them to the folder utils


In [4]:
from torch_geometric.transforms import RandomLinkSplit

# Define the 80/20 train/val split
transform = RandomLinkSplit(
    num_val=0.2,  # 20% for validation
    num_test=0.0,  # no test set (test set will be 2024 data)
    is_undirected=False,  # Set to False if your graphs are directed
    split_labels=False,
)

# Apply the transform to each graph in your list
train_val_data_list = [transform(data) for data in data_list]

# Now you get a list of (train_data, val_data) tuples
for i, (train_data, val_data, _) in enumerate(train_val_data_list):
    print(f"Graph {i}:")
    print(f"  Train edges: {train_data.edge_index.size(1)}")
    print(f"  Val edges:   {val_data.edge_label_index.size(1)}")



Graph 0:
  Train edges: 16671
  Val edges:   8334
Graph 1:
  Train edges: 17980
  Val edges:   8988
Graph 2:
  Train edges: 18066
  Val edges:   9032
Graph 3:
  Train edges: 22253
  Val edges:   11126
Graph 4:
  Train edges: 26821
  Val edges:   13410
Graph 5:
  Train edges: 38612
  Val edges:   19304
Graph 6:
  Train edges: 31458
  Val edges:   15728
Graph 7:
  Train edges: 30864
  Val edges:   15432
Graph 8:
  Train edges: 30796
  Val edges:   15396
Graph 9:
  Train edges: 24759
  Val edges:   12378
Graph 10:
  Train edges: 19477
  Val edges:   9738
Graph 11:
  Train edges: 14930
  Val edges:   7464


### Implementing a GAT Model

We implement the **Graph Attention Network (GATv2)** model, an advanced model for Graph Neural Networks (GNNs) that is based on the principles of **attention mechanisms**. It is specifically designed to aggregate node features while learning the relationships between nodes, taking into account **edge attributes**.

The GATv2 model expects the following **input**:
- `x`: Node features → `data.x`
- `edge_index`: Edge list → `data.edge_index`
- `edge_attr`: Edge attributes → `data.edge_attr`

These values are passed from the `Data` object when calling your model.

**Output**: The model currently returns node representations (node embeddings) – a tensor with one row per node and one column per feature at the output.


In [5]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GATv2Conv

class GATv2(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, edge_dim, heads=1):
        super(GATv2, self).__init__()

        # First GATv2 layer, with edge attributes
        self.gat1 = GATv2Conv(in_channels, hidden_channels, heads=heads, edge_dim=edge_dim)

        # Second GATv2 layer, output dimension = out_channels
        self.gat2 = GATv2Conv(hidden_channels * heads, out_channels, heads=1, edge_dim=edge_dim)

    def forward(self, x, edge_index, edge_attr):
        # Apply first GATv2 layer with edge attributes
        x = self.gat1(x, edge_index, edge_attr)
        x = F.elu(x)

        # Apply second GATv2 layer
        x = self.gat2(x, edge_index, edge_attr)
        return x


### Advancing to Encoder-Decoder Architecture

As we can see, the original GATv2 model is primarily designed to learn **node representations (node embeddings)**. These embeddings are well-suited for tasks like node classification, but they are **not directly applicable for predicting edge attributes** – such as our target edge weight `tracks`.

Since our goal is to **predict edge values**, a pure node embedding model is not sufficient. Therefore, we extend the GATv2 model by adding an **additional layer** that takes the computed node embeddings and produces a prediction for each edge.

Our final model follows the common **encoder-decoder architecture**:

- **Encoder:**  
  - We use the GATv2 model as an encoder to compute **informative node embeddings** from the input node features, graph structure, and edge attributes.

- **Decoder:**  
  - As a decoder, we use a **small multilayer perceptron (MLP)** that combines the embeddings of each edge's source and target node (via concatenation).  
  - This MLP then outputs a **single scalar per edge** – which serves as our prediction for the edge weight `tracks`.




In [6]:
import torch.nn as nn

class GATv2EdgePredictor(nn.Module):
    def __init__(self, 
                 in_channels, 
                 hidden_channels, 
                 out_channels, 
                 edge_dim, 
                 heads=1):
        super(GATv2EdgePredictor, self).__init__()

        # 1. GATv2 model for computing node embeddings
        self.gnn = GATv2(in_channels, hidden_channels, out_channels, edge_dim, heads)

        # 2. Edge MLP to predict edge attributes (e.g., "tracks")
        self.edge_mlp = nn.Sequential(
            nn.Linear(out_channels * 2, out_channels),
            nn.ReLU(),
            nn.Linear(out_channels, 1)  # Output: a single scalar per edge
        )

    def forward(self, data):
        """
        Args:
            data: PyTorch Geometric Data object with attributes:
                  - x: node features
                  - edge_index: edge connectivity (COO format)
                  - edge_attr: edge attributes

        Returns:
            pred: Tensor of shape [num_edges, 1] with predicted edge weights (e.g., "tracks")
        """
        # Compute node embeddings using the GATv2 model
        x = self.gnn(data.x, data.edge_index, data.edge_attr)  # [num_nodes, out_channels]

        # Construct edge representations by concatenating source and target node embeddings
        row, col = data.edge_index  # source & target node indices for each edge
        edge_inputs = torch.cat([x[row], x[col]], dim=1)  # [num_edges, out_channels * 2]

        # Predict edge weights
        pred = self.edge_mlp(edge_inputs)  # [num_edges, 1]
        return pred


### Model Configuration

- **in_channels=1**: In our setup, we used the `add_dummy_node_features(...)` function, which adds dummy features with one dimension to all nodes. **NOTE:** This will be adjusted later once we implement additional feature engineering.

- **head=1**: Initially, we keet the model simple by using one attention head. With multiple heads, the embeddings would need to be concatenated or averaged later. I assume, if we change this, we will also need to adjust the dimensions accordingly. 

- **hidden_channels=32 / out_channels=32**: We chose 32 to strike a good balance between computational cost and model capacity.

- **Adam**: A very popular optimization algorithm. The learning rate is a sensitive hyperparameter, and 0.01 is a solid starting value. 

- **MSELoss**: We use Mean Squared Error (MSE) Loss as the loss function since the edge weight prediction is a regression problem. We might also try using L1 Loss.

**NOTE:** We may later perform hyperparameter optimization (HPO) on hyperparameters such as the learning rate using random search.



In [7]:
from torch.nn import MSELoss
from torch.optim import Adam

# Initialize the model (adjust input values as needed)
model = GATv2EdgePredictor(
    in_channels=1,                     # Number of input node features (e.g., data.num_node_features)
    hidden_channels=32,                # Size of hidden representations
    out_channels=32,                   # Output size of node embeddings
    edge_dim=data.edge_attr.shape[1],  # Dimensionality of edge attributes
    heads=1                            # Number of attention heads
)

# Define loss function and optimizer
criterion = MSELoss()                          # Mean Squared Error for regression
optimizer = Adam(model.parameters(), lr=0.01)  # Adam optimizer with learning rate 0.01

print("Model initialized, loss function and optimizer defined.")


Model initialized, loss function and optimizer defined.


### Training and Validation Loop

We implemented **early stopping** to prevent overfitting and unnecessary training time. The mechanism works as follows: If the validation loss improves during an epoch, the model parameters are saved, and the counter for epochs without improvement is reset to `0`. If no improvement is observed for the specified number of epochs (`early_stopping_patience`), training is stopped early, and the best model is returned.

- **Consider switching to minibatches**: In the future, we might switch to using minibatches to increase training efficiency.


In [8]:
num_epochs = 1000
early_stopping_patience = 10  # How many epochs to wait before stopping if no improvement occurs
best_val_loss = float('inf')  # Initialize with a very high value
epochs_without_improvement = 0  # Counter for epochs without improvement

for epoch in range(1, num_epochs + 1):
    model.train()
    optimizer.zero_grad()
    
    # Forward pass: predict edge values
    pred = model(train_data)  # Shape: [num_edges, 1]

    # Ground truth edge weights (e.g., 'tracks' value assumed to be the first attribute)
    target = train_data.edge_attr[:, 4].unsqueeze(1)  # Ensure shape matches prediction [num_edges, 1]

    # Compute loss between predictions and ground truth
    loss = criterion(pred, target)

    # Backpropagation and parameter update
    loss.backward()
    optimizer.step()

    # Logging progress every 10 epochs (training loss)
    if epoch % 10 == 0 or epoch == 1:
        print(f"Epoch {epoch:03d} | Training Loss: {loss.item():.4f}")

    # Validation every 50 epochs
    if epoch % 50 == 0:
        model.eval()  # Set the model to evaluation mode
        with torch.no_grad():  # Disable gradient computation for validation
            val_pred = model(val_data)  # Validation data
            val_target = val_data.edge_attr[:, 4].unsqueeze(1)
            val_loss = criterion(val_pred, val_target)

        # Logging progress for validation loss every 50 epochs
        print(f"Epoch {epoch:03d} | Validation Loss: {val_loss.item():.4f}")

        # Early stopping check: if validation loss improves
        if val_loss < best_val_loss:
            best_val_loss = val_loss  # Update best validation loss
            epochs_without_improvement = 0  # Reset the counter
            # Save the model when improvement occurs
            torch.save(model.state_dict(), 'best_model/best_model.pth')
        else:
            epochs_without_improvement += 1
        
        # If no improvement for 'early_stopping_patience' epochs, stop training
        if epochs_without_improvement >= early_stopping_patience:
            print(f"Early stopping triggered at epoch {epoch}.")
            break


Epoch 001 | Training Loss: 333.0140
Epoch 010 | Training Loss: 257.1412
Epoch 020 | Training Loss: 197.2691
Epoch 030 | Training Loss: 197.4322
Epoch 040 | Training Loss: 195.1697
Epoch 050 | Training Loss: 194.9637
Epoch 050 | Validation Loss: 194.9500
Epoch 060 | Training Loss: 194.9608
Epoch 070 | Training Loss: 194.9910
Epoch 080 | Training Loss: 194.9718
Epoch 090 | Training Loss: 194.9391
Epoch 100 | Training Loss: 194.9320
Epoch 100 | Validation Loss: 194.9332
Epoch 110 | Training Loss: 194.9335
Epoch 120 | Training Loss: 194.9322
Epoch 130 | Training Loss: 194.9319
Epoch 140 | Training Loss: 194.9319
Epoch 150 | Training Loss: 194.9319
Epoch 150 | Validation Loss: 194.9319
Epoch 160 | Training Loss: 194.9319
Epoch 170 | Training Loss: 194.9319
Epoch 180 | Training Loss: 194.9319
Epoch 190 | Training Loss: 194.9319
Epoch 200 | Training Loss: 194.9319
Epoch 200 | Validation Loss: 194.9319
Epoch 210 | Training Loss: 194.9319
Epoch 220 | Training Loss: 194.9319
Epoch 230 | Training