### Importing Necessary Libraries
In this cell, we are importing all the required libraries, such as:
- `numpy` for numerical computations
- `torch` and `torch_geometric` for deep learning and graph neural networks
- `matplotlib` and `seaborn` for data visualization
- `tqdm` for progress tracking

In [1]:
## Standard libraries
import os
import json
import math
import numpy as np
import time

## Imports for plotting
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg', 'pdf') # For export
from matplotlib.colors import to_rgb
import matplotlib
matplotlib.rcParams['lines.linewidth'] = 2.0
import seaborn as sns
sns.reset_orig()
sns.set()

## Progress bar
from tqdm.notebook import tqdm

## PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data
import torch.optim as optim
# Torchvision
import torchvision
from torchvision.datasets import CIFAR10
from torchvision import transforms
# PyTorch Lightning
try:
    import pytorch_lightning as pl
except ModuleNotFoundError: # Google Colab does not have PyTorch Lightning installed by default. Hence, we do it here if necessary
    !pip install --quiet pytorch-lightning>=1.4
    import pytorch_lightning as pl
from pytorch_lightning.callbacks import LearningRateMonitor, ModelCheckpoint

# Path to the folder where the datasets are/should be downloaded (e.g. CIFAR10)
DATASET_PATH = "../data"
# Path to the folder where the pretrained models are saved
CHECKPOINT_PATH = "../saved_models/tutorial7"

# Setting the seed
pl.seed_everything(42)

# Ensure that all operations are deterministic on GPU (if used) for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
print(device)

  set_matplotlib_formats('svg', 'pdf') # For export
Seed set to 42


cpu


In [2]:
import urllib.request
from urllib.error import HTTPError
# Github URL where saved models are stored for this tutorial
base_url = "https://raw.githubusercontent.com/phlippe/saved_models/main/tutorial7/"
# Files to download
pretrained_files = ["NodeLevelMLP.ckpt", "NodeLevelGNN.ckpt", "GraphLevelGraphConv.ckpt"]

# Create checkpoint path if it doesn't exist yet
os.makedirs(CHECKPOINT_PATH, exist_ok=True)

# For each file, check whether it already exists. If not, try downloading it.
for file_name in pretrained_files:
    file_path = os.path.join(CHECKPOINT_PATH, file_name)
    if "/" in file_name:
        os.makedirs(file_path.rsplit("/",1)[0], exist_ok=True)
    if not os.path.isfile(file_path):
        file_url = base_url + file_name
        print(f"Downloading {file_url}...")
        try:
            urllib.request.urlretrieve(file_url, file_path)
        except HTTPError as e:
            print("Something went wrong. Please try to download the file from the GDrive folder, or contact the author with the full output including the following error:\n", e)

### GCN Layer Explanation

This code defines a Graph Convolutional Network (GCN) layer that updates node features by aggregating information from neighboring nodes in a graph.

- **Initialization (`__init__`)**:  
  A linear layer (`nn.Linear`) is defined to transform the input node features from `c_in` dimensions to `c_out` dimensions.

- **Forward Method**:  
  The forward pass takes `node_feats` (node features) and `adj_matrix` (graph adjacency matrix) as inputs.
  - **Neighbor Calculation**:  
    The sum of edges for each node is computed (`adj_matrix.sum(dim=-1)`), which gives the number of neighbors.
  - **Projection**:  
    Node features are transformed using a linear projection.
  - **Aggregation**:  
    Node features are aggregated from neighbors using matrix multiplication (`torch.bmm`).
  - **Normalization**:  
    The aggregated features are normalized by the number of neighbors.

- **Output**:  
  Returns the updated node features with shape `[batch_size, num_nodes, c_out]`.


In [3]:
class GCNLayer(nn.Module):

    def __init__(self, c_in, c_out):
        super().__init__()
        self.projection = nn.Linear(c_in, c_out)

    def forward(self, node_feats, adj_matrix):
        """
        Inputs:
            node_feats - Tensor with node features of shape [batch_size, num_nodes, c_in]
            adj_matrix - Batch of adjacency matrices of the graph. If there is an edge from i to j, adj_matrix[b,i,j]=1 else 0.
                         Supports directed edges by non-symmetric matrices. Assumes to already have added the identity connections.
                         Shape: [batch_size, num_nodes, num_nodes]
        """
        # Num neighbours = number of incoming edges
        num_neighbours = adj_matrix.sum(dim=-1, keepdims=True)
        node_feats = self.projection(node_feats)
        node_feats = torch.bmm(adj_matrix, node_feats)
        node_feats = node_feats / num_neighbours
        return node_feats

### Node Features and Adjacency Matrix Setup

This code creates:
1. **Node Features (`node_feats`)**: A tensor representing 4 nodes, each with 2 features, in a single batch. The tensor is created using `torch.arange(8)` and reshaped to `[1, 4, 2]`.
   
2. **Adjacency Matrix (`adj_matrix`)**: A 4x4 adjacency matrix representing the connections between nodes in a graph. A value of `1` indicates an edge between nodes, and `0` indicates no edge. The adjacency matrix is reshaped to include a batch dimension, with shape `[1, 4, 4]`.

This setup is used to represent a small graph with 4 nodes and the connectivity between them.


In [4]:
node_feats = torch.arange(8, dtype=torch.float32).view(1, 4, 2)
adj_matrix = torch.Tensor([[[1, 1, 0, 0],
                            [1, 1, 1, 1],
                            [0, 1, 1, 1],
                            [0, 1, 1, 1]]])

print("Node features:\n", node_feats)
print("\nAdjacency matrix:\n", adj_matrix)


Node features:
 tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])

Adjacency matrix:
 tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])


### GCN Layer Forward Pass

In this section, a GCN layer is initialized and applied to the `node_feats` and `adj_matrix`. 

1. **Layer Initialization**:  
   A `GCNLayer` is created with 2 input features (`c_in=2`) and 2 output features (`c_out=2`). 
   The layer's weight and bias for the linear projection are manually set:
   - Weight matrix is set to the identity matrix `[[1, 0], [0, 1]]`.
   - Bias vector is set to `[0, 0]`, meaning no bias is applied.

2. **Forward Pass**:  
   Using `torch.no_grad()`, the GCN layer processes the input features (`node_feats`) and adjacency matrix (`adj_matrix`). The output node features are calculated based on the graph structure and the feature transformation defined by the layer.

3. **Output**:  
   The adjacency matrix, input features, and output features are printed for inspection. The output features represent the updated node representations after the GCN layer.



In [5]:
layer = GCNLayer(c_in=2, c_out=2)
layer.projection.weight.data = torch.Tensor([[1., 0.], [0., 1.]])
layer.projection.bias.data = torch.Tensor([0., 0.])

with torch.no_grad():
    out_feats = layer(node_feats, adj_matrix)

print("Adjacency matrix", adj_matrix)
print("Input features", node_feats)
print("Output features", out_feats)

Adjacency matrix tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])
Input features tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])
Output features tensor([[[1., 2.],
         [3., 4.],
         [4., 5.],
         [4., 5.]]])


### GAT Layer Implementation

This code defines a **Graph Attention Network (GAT) layer**. GAT is an extension of GCN that introduces attention mechanisms to weight the importance of neighboring nodes for feature aggregation.

#### Key Elements:
1. **Initialization (`__init__`)**:
    - **Input Parameters**:
      - `c_in`: Dimensionality of input node features.
      - `c_out`: Dimensionality of output node features.
      - `num_heads`: Number of attention heads used in parallel.
      - `concat_heads`: Whether to concatenate or average the output of the attention heads.
      - `alpha`: Negative slope for the LeakyReLU activation function used in the attention mechanism.
    - **Projection Layer**: A linear projection is applied to node features to transform them into output space (`nn.Linear(c_in, c_out * num_heads)`).
    - **Attention Mechanism (`self.a`)**: Learnable parameters for computing attention scores between nodes, applied independently for each head.
    - **Initialization**: Weights and attention parameters are initialized using Xavier initialization (`nn.init.xavier_uniform_`).

2. **Forward Method**:
    The forward pass involves the following steps:
    - **Linear Projection**: Node features are transformed using the linear projection, and the output is divided between the attention heads.
    - **Attention Calculation**:
      - The adjacency matrix is used to retrieve the edges (non-zero values). For each edge, attention scores are computed between the source and target nodes.
      - Attention logits are calculated using the dot product between the transformed node features, followed by a LeakyReLU activation.
    - **Attention Weighting**: The attention logits are used to create an attention matrix, and a softmax is applied to normalize the attention scores. These scores determine the weight of each neighboring node during aggregation.
    - **Feature Aggregation**: Node features are aggregated using the attention scores (`torch.einsum`). Each node aggregates its neighbors' features weighted by the attention scores.
    - **Concatenation or Averaging**: If `concat_heads=True`, the features from all heads are concatenated. Otherwise, the features are averaged across heads.

3. **Return**:
    The layer returns the updated node features, which have been aggregated and transformed based on the attention mechanism.


In [6]:
class GATLayer(nn.Module):

    def __init__(self, c_in, c_out, num_heads=1, concat_heads=True, alpha=0.2):
        """
        Inputs:
            c_in - Dimensionality of input features
            c_out - Dimensionality of output features
            num_heads - Number of heads, i.e. attention mechanisms to apply in parallel. The
                        output features are equally split up over the heads if concat_heads=True.
            concat_heads - If True, the output of the different heads is concatenated instead of averaged.
            alpha - Negative slope of the LeakyReLU activation.
        """
        super().__init__()
        self.num_heads = num_heads
        self.concat_heads = concat_heads
        if self.concat_heads:
            assert c_out % num_heads == 0, "Number of output features must be a multiple of the count of heads."
            c_out = c_out // num_heads

        # Sub-modules and parameters needed in the layer
        self.projection = nn.Linear(c_in, c_out * num_heads)
        self.a = nn.Parameter(torch.Tensor(num_heads, 2 * c_out)) # One per head
        self.leakyrelu = nn.LeakyReLU(alpha)

        # Initialization from the original implementation
        nn.init.xavier_uniform_(self.projection.weight.data, gain=1.414)
        nn.init.xavier_uniform_(self.a.data, gain=1.414)

    def forward(self, node_feats, adj_matrix, print_attn_probs=False):
        """
        Inputs:
            node_feats - Input features of the node. Shape: [batch_size, c_in]
            adj_matrix - Adjacency matrix including self-connections. Shape: [batch_size, num_nodes, num_nodes]
            print_attn_probs - If True, the attention weights are printed during the forward pass (for debugging purposes)
        """
        batch_size, num_nodes = node_feats.size(0), node_feats.size(1)

        # Apply linear layer and sort nodes by head
        node_feats = self.projection(node_feats)
        node_feats = node_feats.view(batch_size, num_nodes, self.num_heads, -1)

        # We need to calculate the attention logits for every edge in the adjacency matrix
        # Doing this on all possible combinations of nodes is very expensive
        # => Create a tensor of [W*h_i||W*h_j] with i and j being the indices of all edges
        edges = adj_matrix.nonzero(as_tuple=False) # Returns indices where the adjacency matrix is not 0 => edges
        node_feats_flat = node_feats.view(batch_size * num_nodes, self.num_heads, -1)
        edge_indices_row = edges[:,0] * num_nodes + edges[:,1]
        edge_indices_col = edges[:,0] * num_nodes + edges[:,2]
        a_input = torch.cat([
            torch.index_select(input=node_feats_flat, index=edge_indices_row, dim=0),
            torch.index_select(input=node_feats_flat, index=edge_indices_col, dim=0)
        ], dim=-1) # Index select returns a tensor with node_feats_flat being indexed at the desired positions along dim=0

        # Calculate attention MLP output (independent for each head)
        attn_logits = torch.einsum('bhc,hc->bh', a_input, self.a)
        attn_logits = self.leakyrelu(attn_logits)

        # Map list of attention values back into a matrix
        attn_matrix = attn_logits.new_zeros(adj_matrix.shape+(self.num_heads,)).fill_(-9e15)
        attn_matrix[adj_matrix[...,None].repeat(1,1,1,self.num_heads) == 1] = attn_logits.reshape(-1)

        # Weighted average of attention
        attn_probs = F.softmax(attn_matrix, dim=2)
        if print_attn_probs:
            print("Attention probs\n", attn_probs.permute(0, 3, 1, 2))
        node_feats = torch.einsum('bijh,bjhc->bihc', attn_probs, node_feats)

        # If heads should be concatenated, we can do this by reshaping. Otherwise, take mean
        if self.concat_heads:
            node_feats = node_feats.reshape(batch_size, num_nodes, -1)
        else:
            node_feats = node_feats.mean(dim=2)

        return node_feats

### GAT Layer Forward Pass with Multiple Attention Heads

1. **Layer Initialization**:  
   A `GATLayer` is initialized with 2 input features, 2 output features, and 2 attention heads (`num_heads=2`). The layer's projection weights and biases are manually set:
   - Weight matrix is set to the identity matrix `[[1, 0], [0, 1]]`, so no feature transformation is applied.
   - Bias is set to `[0, 0]`, meaning no additional bias is applied.
   - The attention mechanism parameters (`self.a`) are manually set to specific values for both heads.

2. **Forward Pass**:  
   The forward pass is performed without gradients (`torch.no_grad()`), and the attention probabilities are printed (`print_attn_probs=True`). This operation calculates the attention scores between nodes and aggregates features accordingly:
   - The input node features are projected and transformed.
   - Attention scores are computed for the edges in the adjacency matrix.
   - Features from neighboring nodes are aggregated using the computed attention scores.

3. **Output**:
   - **Adjacency Matrix**: The connectivity of the graph is displayed, showing which nodes are connected.
   - **Input Features**: The node features before applying the GAT layer.
   - **Output Features**: The updated node features after applying the attention-based aggregation from neighboring nodes.


In [7]:
layer = GATLayer(2, 2, num_heads=2)
layer.projection.weight.data = torch.Tensor([[1., 0.], [0., 1.]])
layer.projection.bias.data = torch.Tensor([0., 0.])
layer.a.data = torch.Tensor([[-0.2, 0.3], [0.1, -0.1]])

with torch.no_grad():
    out_feats = layer(node_feats, adj_matrix, print_attn_probs=True)

print("Adjacency matrix", adj_matrix)
print("Input features", node_feats)
print("Output features", out_feats)

Attention probs
 tensor([[[[0.3543, 0.6457, 0.0000, 0.0000],
          [0.1096, 0.1450, 0.2642, 0.4813],
          [0.0000, 0.1858, 0.2885, 0.5257],
          [0.0000, 0.2391, 0.2696, 0.4913]],

         [[0.5100, 0.4900, 0.0000, 0.0000],
          [0.2975, 0.2436, 0.2340, 0.2249],
          [0.0000, 0.3838, 0.3142, 0.3019],
          [0.0000, 0.4018, 0.3289, 0.2693]]]])
Adjacency matrix tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])
Input features tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])
Output features tensor([[[1.2913, 1.9800],
         [4.2344, 3.7725],
         [4.6798, 4.8362],
         [4.5043, 4.7351]]])


### Importing Necessary Libraries

In [8]:
import torch_geometric
import torch_geometric.nn as geom_nn
import torch_geometric.data as geom_data


### GNN Layer Mapping


In [9]:
gnn_layer_by_name = {
    "GCN": geom_nn.GCNConv,
    "GAT": geom_nn.GATConv,
    "GraphConv": geom_nn.GraphConv
}

### Loading the Cora Dataset
Here, the Cora dataset is loaded using the `torch_geometric` library. The Cora dataset is often used in graph-based deep learning tasks. The dataset is a citation network where nodes represent documents and edges represent citations between them.

In [10]:
cora_dataset = torch_geometric.datasets.Planetoid(root=DATASET_PATH, name="Cora")

Processing...
Done!


In [11]:
cora_dataset[0]

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

### GNNModel: General Graph Neural Network Model

This class defines a flexible Graph Neural Network (GNN) model that can utilize different GNN layers like GCN, GAT, or GraphConv. The architecture allows easy customization of the number of layers, layer types, and other hyperparameters.

#### Key Elements:
1. **Initialization (`__init__`)**:
    - **Input Parameters**:
      - `c_in`: Dimensionality of the input node features.
      - `c_hidden`: Dimensionality of the hidden features (intermediate layers).
      - `c_out`: Dimensionality of the output features (e.g., the number of classes for classification tasks).
      - `num_layers`: The number of layers in the GNN model.
      - `layer_name`: Specifies which GNN layer to use (e.g., "GCN", "GAT").
      - `dp_rate`: Dropout rate for regularization.
      - `kwargs`: Additional arguments specific to certain layers (e.g., the number of attention heads for GAT).
    - **Layer Construction**: 
      - The GNN layers are built dynamically based on the `layer_name` provided. The layers alternate between GNN layers, ReLU activations, and dropout for regularization.
      - The final layer is a GNN layer that outputs `c_out` features.

2. **Forward Pass (`forward`)**:
    - **Inputs**:
      - `x`: Input features for each node.
      - `edge_index`: The edge list in PyTorch Geometric format, which specifies the graph's connectivity.
    - **Layer Application**:
      - The forward pass iterates over the defined layers. For each layer:
        - If the layer is a GNN layer (inheriting from `MessagePassing`), the `edge_index` is provided as an additional argument.
        - Otherwise, it's treated as a regular layer (e.g., ReLU or Dropout) and applied to the node features.
    - **Output**: The final output contains the updated node features after all layers have been applied.


In [12]:
class GNNModel(nn.Module):

    def __init__(self, c_in, c_hidden, c_out, num_layers=2, layer_name="GCN", dp_rate=0.1, **kwargs):
        """
        Inputs:
            c_in - Dimension of input features
            c_hidden - Dimension of hidden features
            c_out - Dimension of the output features. Usually number of classes in classification
            num_layers - Number of "hidden" graph layers
            layer_name - String of the graph layer to use
            dp_rate - Dropout rate to apply throughout the network
            kwargs - Additional arguments for the graph layer (e.g. number of heads for GAT)
        """
        super().__init__()
        gnn_layer = gnn_layer_by_name[layer_name]

        layers = []
        in_channels, out_channels = c_in, c_hidden
        for l_idx in range(num_layers-1):
            layers += [
                gnn_layer(in_channels=in_channels,
                          out_channels=out_channels,
                          **kwargs),
                nn.ReLU(inplace=True),
                nn.Dropout(dp_rate)
            ]
            in_channels = c_hidden
        layers += [gnn_layer(in_channels=in_channels,
                             out_channels=c_out,
                             **kwargs)]
        self.layers = nn.ModuleList(layers)

    def forward(self, x, edge_index):
        """
        Inputs:
            x - Input features per node
            edge_index - List of vertex index pairs representing the edges in the graph (PyTorch geometric notation)
        """
        for l in self.layers:
            # For graph layers, we need to add the "edge_index" tensor as additional input
            # All PyTorch Geometric graph layer inherit the class "MessagePassing", hence
            # we can simply check the class type.
            if isinstance(l, geom_nn.MessagePassing):
                x = l(x, edge_index)
            else:
                x = l(x)
        return x

### MLPModel: Multi-Layer Perceptron Model

This class defines a basic **Multi-Layer Perceptron (MLP)** model, which is a fully connected feed-forward neural network. It consists of multiple hidden layers and supports regularization through dropout.

#### Key Elements:
1. **Initialization (`__init__`)**:
    - **Input Parameters**:
      - `c_in`: Dimensionality of the input features.
      - `c_hidden`: Dimensionality of the hidden features.
      - `c_out`: Dimensionality of the output features (usually representing the number of classes in classification tasks).
      - `num_layers`: Number of hidden layers in the MLP.
      - `dp_rate`: Dropout rate applied after each hidden layer for regularization.
    - **Layer Construction**:
      - The model is built by stacking linear layers, ReLU activations, and dropout layers for hidden layers.
      - The final layer is a linear layer that outputs `c_out` features, without any activation or dropout.
      - All the layers are combined into a single sequential module (`nn.Sequential`).

2. **Forward Pass (`forward`)**:
    - **Inputs**:
      - `x`: Input features for the MLP, typically node features in a graph.
    - **Output**: The input `x` is passed through the sequentially stacked layers, and the final result is returned after the last layer.


In [13]:
class MLPModel(nn.Module):

    def __init__(self, c_in, c_hidden, c_out, num_layers=2, dp_rate=0.1):
        """
        Inputs:
            c_in - Dimension of input features
            c_hidden - Dimension of hidden features
            c_out - Dimension of the output features. Usually number of classes in classification
            num_layers - Number of hidden layers
            dp_rate - Dropout rate to apply throughout the network
        """
        super().__init__()
        layers = []
        in_channels, out_channels = c_in, c_hidden
        for l_idx in range(num_layers-1):
            layers += [
                nn.Linear(in_channels, out_channels),
                nn.ReLU(inplace=True),
                nn.Dropout(dp_rate)
            ]
            in_channels = c_hidden
        layers += [nn.Linear(in_channels, c_out)]
        self.layers = nn.Sequential(*layers)

    def forward(self, x, *args, **kwargs):
        """
        Inputs:
            x - Input features per node
        """
        return self.layers(x)

### NodeLevelGNN: Node-Level Graph Neural Network using PyTorch Lightning

This class implements a **Node-Level Graph Neural Network (GNN)** for tasks like node classification using PyTorch Lightning. It allows for flexible model selection between GNN-based architectures and MLPs, and handles the training, validation, and testing of the model on graph-structured data.

#### Key Elements:
1. **Initialization (`__init__`)**:
    - **Input Parameters**:
      - `model_name`: The name of the model to be used, either `"MLP"` or a GNN model like `"GCN"`, `"GAT"`, etc.
      - `model_kwargs`: Additional keyword arguments for the model, such as the number of layers, hidden dimensions, and other hyperparameters.
    - **Model Selection**:
      - If `model_name` is `"MLP"`, the Multi-Layer Perceptron (`MLPModel`) is used.
      - Otherwise, the specified GNN model (`GNNModel`) is initialized.
    - A cross-entropy loss module (`nn.CrossEntropyLoss`) is set up to compute classification loss during training and evaluation.

2. **Forward Pass (`forward`)**:
    - **Inputs**:
      - `data`: The graph data, containing node features (`x`), edge index (`edge_index`), and masks for training, validation, and test splits.
      - `mode`: Specifies the mode of operation (`train`, `val`, or `test`) to compute the loss and accuracy only on the relevant nodes based on the provided masks.
    - **Model Execution**: The input features and graph structure are passed through the selected model.
    - **Masking and Loss Calculation**:
      - The forward pass only computes the loss and accuracy for nodes specified by the `train_mask`, `val_mask`, or `test_mask`.
      - **Accuracy Calculation**: The predicted class is compared with the ground truth using `argmax` and accuracy is computed as the percentage of correct predictions.
    - **Output**: Returns the computed loss and accuracy for the specified mask.

3. **Optimizer Configuration (`configure_optimizers`)**:
    - Defines the optimizer for training the model, in this case, Stochastic Gradient Descent (SGD) with momentum and weight decay. Adam can be used as an alternative optimizer.

4. **Training Step (`training_step`)**:
    - Executes a single training step by calling `forward` in `train` mode and logging the training loss and accuracy using PyTorch Lightning’s `self.log`.

5. **Validation Step (`validation_step`)**:
    - Executes a validation step by calling `forward` in `val` mode and logs the validation accuracy.

6. **Test Step (`test_step`)**:
    - Executes a test step by calling `forward` in `test` mode and logs the test accuracy.



In [14]:
class NodeLevelGNN(pl.LightningModule):

    def __init__(self, model_name, **model_kwargs):
        super().__init__()
        # Saving hyperparameters
        self.save_hyperparameters()

        if model_name == "MLP":
            self.model = MLPModel(**model_kwargs)
        else:
            self.model = GNNModel(**model_kwargs)
        self.loss_module = nn.CrossEntropyLoss()

    def forward(self, data, mode="train"):
        x, edge_index = data.x, data.edge_index
        x = self.model(x, edge_index)

        # Only calculate the loss on the nodes corresponding to the mask
        if mode == "train":
            mask = data.train_mask
        elif mode == "val":
            mask = data.val_mask
        elif mode == "test":
            mask = data.test_mask
        else:
            assert False, f"Unknown forward mode: {mode}"

        loss = self.loss_module(x[mask], data.y[mask])
        acc = (x[mask].argmax(dim=-1) == data.y[mask]).sum().float() / mask.sum()
        return loss, acc

    def configure_optimizers(self):
        # We use SGD here, but Adam works as well
        optimizer = optim.SGD(self.parameters(), lr=0.1, momentum=0.9, weight_decay=2e-3)
        return optimizer

    def training_step(self, batch, batch_idx):
        loss, acc = self.forward(batch, mode="train")
        self.log('train_loss', loss)
        self.log('train_acc', acc)
        return loss

    def validation_step(self, batch, batch_idx):
        _, acc = self.forward(batch, mode="val")
        self.log('val_acc', acc)

    def test_step(self, batch, batch_idx):
        _, acc = self.forward(batch, mode="test")
        self.log('test_acc', acc)

### train_node_classifier: Training Node-Level Classifier with PyTorch Lightning

This function trains a node-level graph neural network (GNN) classifier using PyTorch Lightning. It supports training from scratch or loading a pretrained model if available.

#### Key Steps:

1. **Seed Setting**:  
   The random seed is set using `pl.seed_everything(42)` to ensure reproducibility across runs.

2. **DataLoader Initialization**:  
   A PyTorch Geometric `DataLoader` is created for the graph dataset, which handles batching. Since this is node-level classification, the batch size is set to `1`.

3. **Trainer Setup**:  
   A PyTorch Lightning `Trainer` is initialized:
   - **Callbacks**: A model checkpoint callback is added to save only the model weights, monitoring the validation accuracy (`val_acc`) and saving the best model.
   - **Device Selection**: The accelerator is set to `"gpu"` if a CUDA device is available, otherwise it uses the CPU.
   - **Epochs**: The maximum number of epochs is set to 200.
   - **Progress Bar**: The progress bar is disabled since the epoch size is only 1 (due to the batch size being 1).

4. **Pretrained Model Check**:  
   The function checks if a pretrained model checkpoint exists in the `CHECKPOINT_PATH`. If found, the model is loaded and training is skipped. If no pretrained model is found, the function initializes a new model and trains it on the dataset.

5. **Model Training**:  
   If training from scratch:
   - A `NodeLevelGNN` model is instantiated with the input and output dimensions corresponding to the dataset's node features and classes.
   - The model is trained using `trainer.fit` with both the training and validation data being the same (since it's node-level classification).

6. **Testing**:  
   Once the best model is identified (via the validation set), it is tested on the dataset using `trainer.test`. The accuracy on the test set is returned.

7. **Accuracy Computation**:  
   The model's accuracy on the training, validation, and test sets is computed by applying the model in different modes (`train`, `val`, `test`).

8. **Return**:  
   The function returns the trained model and a dictionary of accuracy results for the training, validation, and test sets.


In [15]:
def train_node_classifier(model_name, dataset, **model_kwargs):
    pl.seed_everything(42)
    node_data_loader = geom_data.DataLoader(dataset, batch_size=1)

    # Create a PyTorch Lightning trainer with the generation callback
    root_dir = os.path.join(CHECKPOINT_PATH, "NodeLevel" + model_name)
    os.makedirs(root_dir, exist_ok=True)
    trainer = pl.Trainer(default_root_dir=root_dir,
                         callbacks=[ModelCheckpoint(save_weights_only=True, mode="max", monitor="val_acc")],
                         accelerator="gpu" if str(device).startswith("cuda") else "cpu",
                         devices=1,
                         max_epochs=200,
                         enable_progress_bar=False) # False because epoch size is 1
    trainer.logger._default_hp_metric = None # Optional logging argument that we don't need

    # Check whether pretrained model exists. If yes, load it and skip training
    pretrained_filename = os.path.join(CHECKPOINT_PATH, f"NodeLevel{model_name}.ckpt")
    if os.path.isfile(pretrained_filename):
        print("Found pretrained model, loading...")
        model = NodeLevelGNN.load_from_checkpoint(pretrained_filename)
    else:
        pl.seed_everything()
        model = NodeLevelGNN(model_name=model_name, c_in=dataset.num_node_features, c_out=dataset.num_classes, **model_kwargs)
        trainer.fit(model, node_data_loader, node_data_loader)
        model = NodeLevelGNN.load_from_checkpoint(trainer.checkpoint_callback.best_model_path)

    # Test best model on the test set
    test_result = trainer.test(model, node_data_loader, verbose=False)
    batch = next(iter(node_data_loader))
    batch = batch.to(model.device)
    _, train_acc = model.forward(batch, mode="train")
    _, val_acc = model.forward(batch, mode="val")
    result = {"train": train_acc,
              "val": val_acc,
              "test": test_result[0]['test_acc']}
    return model, result

In [16]:
# Small function for printing the test scores
def print_results(result_dict):
    if "train" in result_dict:
        print(f"Train accuracy: {(100.0*result_dict['train']):4.2f}%")
    if "val" in result_dict:
        print(f"Val accuracy:   {(100.0*result_dict['val']):4.2f}%")
    print(f"Test accuracy:  {(100.0*result_dict['test']):4.2f}%")

### Training and Evaluating MLP for Node-Level Classification

This section calls the `train_node_classifier` function to train an MLP model on the **Cora** dataset, which is commonly used for node-level classification tasks.

#### Key Arguments:
- **`model_name="MLP"`**: Specifies that a Multi-Layer Perceptron (MLP) model is being used instead of a GNN model.
- **`dataset=cora_dataset`**: The Cora dataset is passed as the input graph dataset.
- **`c_hidden=16`**: The hidden layer of the MLP has 16 units (or neurons).
- **`num_layers=2`**: The MLP will have 2 layers.
- **`dp_rate=0.1`**: A dropout rate of 10% is applied to prevent overfitting.

#### Function Execution:
- **Training**: The function trains the MLP model on the Cora dataset, performing node-level classification. It checks for any pretrained models, trains a new model if needed, and evaluates the model.
- **Return Values**: 
  - `node_mlp_model`: The trained MLP model.
  - `node_mlp_result`: A dictionary containing the training, validation, and test accuracies.

#### Results Printing:
- **`print_results(node_mlp_result)`**: This function prints out the results (accuracy scores) from the training process, showing how well the MLP model performed on the training, validation, and test sets.


In [17]:
node_mlp_model, node_mlp_result = train_node_classifier(model_name="MLP",
                                                        dataset=cora_dataset,
                                                        c_hidden=16,
                                                        num_layers=2,
                                                        dp_rate=0.1)

print_results(node_mlp_result)

Seed set to 42
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/opt/anaconda3/envs/nn_class/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
Lightning automatically upgraded your loaded checkpoint from v1.0.2 to v2.4.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../saved_models/tutorial7/NodeLevelMLP.ckpt`


Found pretrained model, loading...
Train accuracy: 97.14%
Val accuracy:   54.60%
Test accuracy:  60.60%


/opt/anaconda3/envs/nn_class/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/opt/anaconda3/envs/nn_class/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 2708. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.


### Training and Evaluating GNN (GCN) for Node-Level Classification

This section calls the `train_node_classifier` function to train a **Graph Convolutional Network (GCN)** on the **Cora** dataset, commonly used for node-level classification tasks.

#### Key Arguments:
- **`model_name="GNN"`**: Specifies that a Graph Neural Network (GNN) model is being used.
- **`layer_name="GCN"`**: Specifies that the GNN architecture to use is the **Graph Convolutional Network (GCN)**.
- **`dataset=cora_dataset`**: The Cora dataset is passed as the input graph dataset.
- **`c_hidden=16`**: The hidden layers in the GCN have 16 units (or neurons).
- **`num_layers=2`**: The GCN will have 2 layers, including the hidden layers and the output layer.
- **`dp_rate=0.1`**: A dropout rate of 10% is applied to prevent overfitting.

#### Function Execution:
- **Training**: The function trains a GCN model on the Cora dataset. It checks if a pretrained model exists and trains a new one if necessary. After training, the model is evaluated on the test set.
- **Return Values**:
  - `node_gnn_model`: The trained GCN model.
  - `node_gnn_result`: A dictionary containing the training, validation, and test accuracies.

#### Results Printing:
- **`print_results(node_gnn_result)`**: This function prints the performance of the trained GCN model on the training, validation, and test sets, displaying the accuracy scores.


In [18]:
node_gnn_model, node_gnn_result = train_node_classifier(model_name="GNN",
                                                        layer_name="GCN",
                                                        dataset=cora_dataset,
                                                        c_hidden=16,
                                                        num_layers=2,
                                                        dp_rate=0.1)
print_results(node_gnn_result)

Seed set to 42
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Lightning automatically upgraded your loaded checkpoint from v1.0.2 to v2.4.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../saved_models/tutorial7/NodeLevelGNN.ckpt`


Found pretrained model, loading...
Train accuracy: 100.00%
Val accuracy:   78.60%
Test accuracy:  82.40%


### Loading the MUTAG Dataset from TUDataset

This line initializes the **MUTAG** dataset using the `torch_geometric.datasets.TUDataset` class. 


In [19]:
tu_dataset = torch_geometric.datasets.TUDataset(root=DATASET_PATH, name="MUTAG")

Downloading https://www.chrsmrrs.com/graphkerneldatasets/MUTAG.zip
Processing...
Done!


In [20]:
print("Data object:", tu_dataset.data)
print("Length:", len(tu_dataset))
print(f"Average label: {tu_dataset.data.y.float().mean().item():4.2f}")

Data object: Data(x=[3371, 7], edge_index=[2, 7442], edge_attr=[7442, 4], y=[188])
Length: 188
Average label: 0.66




### Dataset Preparation

- **`torch.manual_seed(42)`**: Sets the random seed for reproducibility.
- **`tu_dataset.shuffle()`**: Shuffles the MUTAG dataset to randomize the order of the graphs.
- **`train_dataset` and `test_dataset`**: Splits the dataset into training (first 150 graphs) and testing sets (remaining graphs).


In [21]:
torch.manual_seed(42)
tu_dataset.shuffle()
train_dataset = tu_dataset[:150]
test_dataset = tu_dataset[150:]

### DataLoader Setup

- **`graph_train_loader`**: Creates a DataLoader for the training dataset with a batch size of 64 and shuffling enabled.
- **`graph_val_loader`**: DataLoader for the test dataset (can be used for validation if needed) with a batch size of 64.
- **`graph_test_loader`**: DataLoader for the test dataset for final evaluation, also with a batch size of 64.


In [22]:
graph_train_loader = geom_data.DataLoader(train_dataset, batch_size=64, shuffle=True)
graph_val_loader = geom_data.DataLoader(test_dataset, batch_size=64) # Additional loader if you want to change to a larger dataset
graph_test_loader = geom_data.DataLoader(test_dataset, batch_size=64)

In [23]:
batch = next(iter(graph_test_loader))
print("Batch:", batch)
print("Labels:", batch.y[:10])
print("Batch indices:", batch.batch[:40])

Batch: DataBatch(edge_index=[2, 1512], x=[687, 7], edge_attr=[1512, 4], y=[38], batch=[687], ptr=[39])
Labels: tensor([1, 1, 1, 0, 0, 0, 1, 1, 1, 0])
Batch indices: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2])


### GraphGNNModel: Graph Classification Model

- **Initialization (`__init__`)**:  
  Combines a GNN model (`GNNModel`) to extract features from the graph, followed by a linear classification head with dropout (`dp_rate_linear`).

- **Forward Pass (`forward`)**:  
  - **Inputs**:  
    - `x`: Node features.
    - `edge_index`: Graph edges.
    - `batch_idx`: Batch index for each node.
  - **Process**:
    - Features are passed through the GNN.
    - Global mean pooling aggregates node features into graph-level representations.
    - The linear head predicts the final output (classification).


In [24]:
class GraphGNNModel(nn.Module):

    def __init__(self, c_in, c_hidden, c_out, dp_rate_linear=0.5, **kwargs):
        """
        Inputs:
            c_in - Dimension of input features
            c_hidden - Dimension of hidden features
            c_out - Dimension of output features (usually number of classes)
            dp_rate_linear - Dropout rate before the linear layer (usually much higher than inside the GNN)
            kwargs - Additional arguments for the GNNModel object
        """
        super().__init__()
        self.GNN = GNNModel(c_in=c_in,
                            c_hidden=c_hidden,
                            c_out=c_hidden, # Not our prediction output yet!
                            **kwargs)
        self.head = nn.Sequential(
            nn.Dropout(dp_rate_linear),
            nn.Linear(c_hidden, c_out)
        )

    def forward(self, x, edge_index, batch_idx):
        """
        Inputs:
            x - Input features per node
            edge_index - List of vertex index pairs representing the edges in the graph (PyTorch geometric notation)
            batch_idx - Index of batch element for each node
        """
        x = self.GNN(x, edge_index)
        x = geom_nn.global_mean_pool(x, batch_idx) # Average pooling
        x = self.head(x)
        return x

### GraphLevelGNN: Graph-Level Classification with PyTorch Lightning

- **Initialization (`__init__`)**:  
  - A graph-level GNN model (`GraphGNNModel`) is instantiated.
  - The loss function is either binary cross-entropy (for binary classification) or cross-entropy (for multi-class classification).

- **Forward Pass (`forward`)**:  
  - **Inputs**: Graph node features (`x`), edge list (`edge_index`), and batch index (`batch_idx`).
  - **Process**:
    - Runs node features through the GNN, then applies mean pooling for graph-level representation.
    - Classifies the output based on the number of classes (`c_out`).
    - Computes loss and accuracy.

- **Optimizer (`configure_optimizers`)**:  
  - Uses `AdamW` optimizer with a higher learning rate due to the small dataset and model size.

- **Training/Validation/Test Steps**:  
  - For each step, the model computes the loss and accuracy, and logs the metrics using PyTorch Lightning's `self.log`.


In [25]:
class GraphLevelGNN(pl.LightningModule):

    def __init__(self, **model_kwargs):
        super().__init__()
        # Saving hyperparameters
        self.save_hyperparameters()

        self.model = GraphGNNModel(**model_kwargs)
        self.loss_module = nn.BCEWithLogitsLoss() if self.hparams.c_out == 1 else nn.CrossEntropyLoss()

    def forward(self, data, mode="train"):
        x, edge_index, batch_idx = data.x, data.edge_index, data.batch
        x = self.model(x, edge_index, batch_idx)
        x = x.squeeze(dim=-1)

        if self.hparams.c_out == 1:
            preds = (x > 0).float()
            data.y = data.y.float()
        else:
            preds = x.argmax(dim=-1)
        loss = self.loss_module(x, data.y)
        acc = (preds == data.y).sum().float() / preds.shape[0]
        return loss, acc

    def configure_optimizers(self):
        optimizer = optim.AdamW(self.parameters(), lr=1e-2, weight_decay=0.0) # High lr because of small dataset and small model
        return optimizer

    def training_step(self, batch, batch_idx):
        loss, acc = self.forward(batch, mode="train")
        self.log('train_loss', loss)
        self.log('train_acc', acc)
        return loss

    def validation_step(self, batch, batch_idx):
        _, acc = self.forward(batch, mode="val")
        self.log('val_acc', acc)

    def test_step(self, batch, batch_idx):
        _, acc = self.forward(batch, mode="test")
        self.log('test_acc', acc)

### train_graph_classifier: Graph-Level Classification Training

- **Seed Setting**:  
  Sets the random seed for reproducibility using `pl.seed_everything(42)`.

- **Trainer Setup**:  
  Initializes a PyTorch Lightning `Trainer` with:
  - A checkpoint callback to save the best model based on validation accuracy (`val_acc`).
  - GPU support if available, otherwise runs on CPU.
  - A maximum of 500 training epochs.

- **Pretrained Model Check**:  
  Checks if a pretrained model exists. If found, the model is loaded to skip training. If not, a new `GraphLevelGNN` model is initialized and trained.

- **Model Training**:  
  The `GraphLevelGNN` is trained on the training set (`graph_train_loader`) and validated on the validation set (`graph_val_loader`).

- **Testing**:  
  After training, the model is tested on both the training and test sets, and the accuracy for both sets is returned in the result dictionary.


In [26]:
def train_graph_classifier(model_name, **model_kwargs):
    pl.seed_everything(42)

    # Create a PyTorch Lightning trainer with the generation callback
    root_dir = os.path.join(CHECKPOINT_PATH, "GraphLevel" + model_name)
    os.makedirs(root_dir, exist_ok=True)
    trainer = pl.Trainer(default_root_dir=root_dir,
                         callbacks=[ModelCheckpoint(save_weights_only=True, mode="max", monitor="val_acc")],
                         accelerator="gpu" if str(device).startswith("cuda") else "cpu",
                         devices=1,
                         max_epochs=500,
                         enable_progress_bar=False)
    trainer.logger._default_hp_metric = None # Optional logging argument that we don't need

    # Check whether pretrained model exists. If yes, load it and skip training
    pretrained_filename = os.path.join(CHECKPOINT_PATH, f"GraphLevel{model_name}.ckpt")
    if os.path.isfile(pretrained_filename):
        print("Found pretrained model, loading...")
        model = GraphLevelGNN.load_from_checkpoint(pretrained_filename)
    else:
        pl.seed_everything(42)
        model = GraphLevelGNN(c_in=tu_dataset.num_node_features,
                              c_out=1 if tu_dataset.num_classes==2 else tu_dataset.num_classes,
                              **model_kwargs)
        trainer.fit(model, graph_train_loader, graph_val_loader)
        model = GraphLevelGNN.load_from_checkpoint(trainer.checkpoint_callback.best_model_path)
    # Test best model on validation and test set
    train_result = trainer.test(model, graph_train_loader, verbose=False)
    test_result = trainer.test(model, graph_test_loader, verbose=False)
    result = {"test": test_result[0]['test_acc'], "train": train_result[0]['test_acc']}
    return model, result

### Training and Evaluating GraphConv Model for Graph-Level Classification

This section calls the `train_graph_classifier` function to train a **GraphConv** model on a graph-level classification task.

#### Function Execution:
- The model is trained on the provided dataset and tested on both the training and test sets. The resulting accuracies for both sets are returned in the `result` dictionary.


In [27]:
model, result = train_graph_classifier(model_name="GraphConv",
                                       c_hidden=256,
                                       layer_name="GraphConv",
                                       num_layers=3,
                                       dp_rate_linear=0.5,
                                       dp_rate=0.0)


Seed set to 42
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Lightning automatically upgraded your loaded checkpoint from v1.0.2 to v2.4.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../saved_models/tutorial7/GraphLevelGraphConv.ckpt`


Found pretrained model, loading...


/opt/anaconda3/envs/nn_class/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:475: Your `test_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test dataloaders.
/opt/anaconda3/envs/nn_class/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 2. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.


In [28]:
print(f"Train performance: {100.0*result['train']:4.2f}%")
print(f"Test performance:  {100.0*result['test']:4.2f}%")

Train performance: 93.28%
Test performance:  92.11%
