<a href="https://colab.research.google.com/github/JulianMeigen/ML-handson/blob/main/notebooks/7.0-SNJMMH-Day7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment Day 7

## Team members:
- Samuel Nebgen s6sanebg@uni-bonn.de
- Muhammad Humza Arain s27marai@uni-bonn.de
- Julian Meigen s82jmeig@uni-bonn.de

## 16.09.2025

Contributions were made by all team members in around the same amount, either based on discussions or coding.

In [1]:
!gdown --folder https://drive.google.com/drive/folders/1VESm-JaHEqPJmM23iLW1mEJsuI2mLBdx?usp=sharing

Retrieving folder contents
Processing file 1i6W9fI3sGEn6V9xBlt2MxlonZXOTLZjg load-subgraph_doc.ipynb
Processing file 1qZpQzFMRzuYQ0xoQJcUNe7CRBMS2mRmz subgraph_hop_1.pt
Processing file 1iz_FOBs9k7m9z3lDtRIXzRkd92tL_EK- subgraph.pt
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=1i6W9fI3sGEn6V9xBlt2MxlonZXOTLZjg
To: /content/ML-HandsOn/load-subgraph_doc.ipynb
100% 2.92k/2.92k [00:00<00:00, 8.80MB/s]
Downloading...
From: https://drive.google.com/uc?id=1qZpQzFMRzuYQ0xoQJcUNe7CRBMS2mRmz
To: /content/ML-HandsOn/subgraph_hop_1.pt
100% 10.5M/10.5M [00:00<00:00, 31.5MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=1iz_FOBs9k7m9z3lDtRIXzRkd92tL_EK-
From (redirected): https://drive.google.com/uc?id=1iz_FOBs9k7m9z3lDtRIXzRkd92tL_EK-&confirm=t&uuid=afcba1af-c8f4-4589-a2ce-e8edcfa6b6fd
To: /content/ML-HandsOn/subgraph.pt
100% 1.64G/1.64G [00:21<00:00, 74.5MB/s]
Download c

In [2]:
!pip install torch_geometric



In [3]:
import torch
import torch_geometric
import numpy as np
import pandas as pd
import networkx as nx
import plotly
from torch_geometric.utils import to_networkx
from torch.nn import Embedding
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

# Task 1 Perform a node labeling task with a Graph ML model

## a) Load the graph dataset (ogbn-proteins) into pytorch-geometric

We are directly using a Subgraph

In [4]:
import torch
import torch_geometric

path_big = "/content/ML-HandsOn/subgraph.pt"
path_small = "/content/ML-HandsOn/subgraph_hop_1.pt"

dataset = torch.load(path_small, weights_only=False)

data = dataset["graph"]

print(data)

Data(num_nodes=942, edge_index=[2, 200414], edge_attr=[200414, 8], node_species=[942, 1], y=[942, 112])


In [5]:
G = to_networkx(data, to_undirected=True)
print(G)

Graph with 942 nodes and 100207 edges


## b) Create a train, val, test split on the nodes or load the masks via pytorch-geometric.

### i. Create a subgraph if the computation is too expensive.

In [6]:
num_nodes = data.num_nodes
perm = torch.randperm(num_nodes)

train_size = int(0.7 * num_nodes)
val_size = int(0.15 * num_nodes)

train_mask = torch.zeros(num_nodes, dtype=torch.bool)
val_mask = torch.zeros(num_nodes, dtype=torch.bool)
test_mask = torch.zeros(num_nodes, dtype=torch.bool)

train_mask[perm[:train_size]] = True
val_mask[perm[train_size:train_size + val_size]] = True
test_mask[perm[train_size + val_size:]] = True

data.train_mask = train_mask
data.val_mask = val_mask
data.test_mask = test_mask

print(data.y)

tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])


In [7]:
print(len(data.y[data.train_mask]))
print(len(data.y[data.val_mask]))
print(len(data.y[data.test_mask]))

659
141
142


## c) Initialize the graph with random node embeddings.

In [8]:
number_nodes = data.num_nodes
embedding_dim = 64
x = torch.empty((num_nodes, embedding_dim))  # empty tensor
torch.nn.init.xavier_uniform_(x)  # Xavier uniform initialization
data.x = x

In [9]:
node_idx = data.edge_index.flatten().unique()
train_idx = torch.tensor(node_idx[data.train_mask.numpy()])
test_idx = torch.tensor(node_idx[data.test_mask.numpy()])
val_idx = torch.tensor(node_idx[data.val_mask.numpy()])

train_subgraph = data.subgraph(train_idx)
test_subgraph = data.subgraph(test_idx)
val_subgraph = data.subgraph(val_idx)

  train_idx = torch.tensor(node_idx[data.train_mask.numpy()])
  test_idx = torch.tensor(node_idx[data.test_mask.numpy()])
  val_idx = torch.tensor(node_idx[data.val_mask.numpy()])


## d) Define a graph convolutional neural network class with two layers using pytorch-geometric..

In [10]:
class GCN(torch.nn.Module):
    def __init__(self, num_nodes, embedding_dim, hidden_dim, out_dim, drop_out=0.5):
        super().__init__()
        # Project raw input features -> embedding_dim so embedding_dim is a latent size
        self.input_proj = torch.nn.Linear(num_nodes, embedding_dim)
        self.conv1 = GCNConv(embedding_dim, hidden_dim)
        self.conv2 = GCNConv(hidden_dim, out_dim)
        self.dropout = torch.nn.Dropout(drop_out)

    def forward(self, x, edge_index):
        # apply GCN layers
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)

In [11]:
model_gcn = GCN(num_nodes=data.num_nodes, embedding_dim=64, hidden_dim=128, out_dim=112)

### i. Train your model on the train dataset using an optimizer and a loss function for a multilabel classification task for 100 epochs

In [12]:
# Optimizer and loss
optimizer = torch.optim.Adam(model_gcn.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

# Training loop
epochs = 100
model_gcn.train()
for epoch in range(1, epochs + 1):
    optimizer.zero_grad()
    out = model_gcn(data.x, train_subgraph.edge_index)  # forward pass
    loss = criterion(out, data.y.float())        # multi-label BCE loss
    loss.backward()                       # backward pass
    optimizer.step()                      # update parameters

    if epoch % 10 == 0 or epoch == 1:
        print(f"Epoch {epoch:03d}, Loss: {loss.item():.4f}")

Epoch 001, Loss: 147.4101
Epoch 010, Loss: 142.0090
Epoch 020, Loss: 141.1376
Epoch 030, Loss: 140.7409
Epoch 040, Loss: 140.3083
Epoch 050, Loss: 139.9596
Epoch 060, Loss: 139.6259
Epoch 070, Loss: 139.3822
Epoch 080, Loss: 139.0911
Epoch 090, Loss: 138.9003
Epoch 100, Loss: 138.6884


### ii. Test your model on the test set and evaluate it with accuracy, AUROC, precision, recall and F1 score.

In [13]:
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score, f1_score

In [14]:
model_gcn.eval()
with torch.no_grad():
    logits = model_gcn(data.x, test_subgraph.edge_index)          # raw logits
    probs = torch.sigmoid(logits)                   # convert to probabilities
    preds = (probs > 0.5).int()                     # threshold at 0.5

    y_true = data.y[test_mask].numpy()
    y_pred = preds[test_mask].numpy()
    y_prob = probs[test_mask].numpy()

# Accuracy (exact match per node)
accuracy = accuracy_score(y_true, y_pred)

# AUROC (per class, average='macro')
auroc = roc_auc_score(y_true, y_prob, average='macro')

# Precision, Recall, F1 (micro-averaged)
precision = precision_score(y_true, y_pred, average='micro', zero_division=0)
recall = recall_score(y_true, y_pred, average='micro', zero_division=0)
f1 = f1_score(y_true, y_pred, average='micro', zero_division=0)

print(f"Test Accuracy:  {accuracy:.4f}")
print(f"Test AUROC:     {auroc:.4f}")
print(f"Test Precision: {precision:.4f}")
print(f"Test Recall:    {recall:.4f}")
print(f"Test F1 Score:  {f1:.4f}")

Test Accuracy:  0.0423
Test AUROC:     0.6284
Test Precision: 0.0000
Test Recall:    0.0000
Test F1 Score:  0.0000


### Outer Cross-valiudation with Stratifiedkfold

In [15]:
from sklearn.model_selection import StratifiedKFold

In [16]:
nodes_idx = data.edge_index.flatten().unique()
y = data.node_species.squeeze()

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

accuracys = []
aurocs = []
precisions = []
recalls = []
f1s = []
for train_idx, test_idx in skf.split(nodes_idx, y):
  train_subgraph = data.subgraph(torch.tensor(train_idx))
  test_subgraph = data.subgraph(torch.tensor(test_idx))

  # Train the model
  optimizer = torch.optim.Adam(model_gcn.parameters(), lr=0.01, weight_decay=5e-4)
  criterion = torch.nn.CrossEntropyLoss()

  # Training loop
  epochs = 100
  model_gcn.train()
  for epoch in range(1, epochs + 1):
      optimizer.zero_grad()
      out = model_gcn(data.x, train_subgraph.edge_index)  # forward pass
      loss = criterion(out, data.y.float())        # multi-label BCE loss
      loss.backward()                       # backward pass
      optimizer.step()                      # update parameters

  # Evaluate the model
  model_gcn.eval()
  with torch.no_grad():
      logits = model_gcn(data.x, test_subgraph.edge_index)          # raw logits
      probs = torch.sigmoid(logits)                   # convert to probabilities
      preds = (probs > 0.5).int()                     # threshold at 0.5

      y_true = data.y[test_mask].numpy()
      y_pred = preds[test_mask].numpy()
      y_prob = probs[test_mask].numpy()

      # Accuracy (exact match per node)
      accuracy = accuracy_score(y_true, y_pred)
      accuracys.append(accuracy)

      # AUROC (per class, average='macro')
      auroc = roc_auc_score(y_true, y_prob, average='macro')
      aurocs.append(auroc)

      # Precision, Recall, F1 (micro-averaged)
      precision = precision_score(y_true, y_pred, average='micro', zero_division=0)
      precisions.append(precision)
      recall = recall_score(y_true, y_pred, average='micro', zero_division=0)
      recalls.append(recall)
      f1 = f1_score(y_true, y_pred, average='micro', zero_division=0)
      f1s.append(f1)
      print("-----")



-----
-----
-----
-----
-----


In [17]:

print(f"Test Accuracy:  {np.mean(accuracys):.4f}")
print(f"Test AUROC:     {np.mean(aurocs):.4f}")
print(f"Test Precision: {np.mean(precisions):.4f}")
print(f"Test Recall:    {np.mean(recalls):.4f}")
print(f"Test F1 Score:  {np.mean(f1s):.4f}")

Test Accuracy:  0.0423
Test AUROC:     0.6455
Test Precision: 0.0000
Test Recall:    0.0000
Test F1 Score:  0.0000


## e) Set up a hyperparameter optimization pipeline with nested 5-fold cross-validation

### i. Familiarize yourself with the hyperparameter optimization package optuna (https://optuna.org/ )

In [18]:
!pip install optuna



In [19]:
import optuna

In [26]:
class GCNoptimization:
    def __init__(self, data, train_subdata, val_subdata, study_name="GCN_optimization"):
        self.data = data
        self.train_subdata = train_subdata
        self.val_subdata = val_subdata

        self.study_name = study_name
        self.storage_name = "sqlite:///{}.db".format(self.study_name)
        self.study = optuna.create_study(study_name=self.study_name, storage=self.storage_name, load_if_exists=True)

    def objective(self, trial):
        # Define the hyperparameters to optimize

        dropout = trial.suggest_float("dropout", 0.0, 0.7)
        hidden_dim = trial.suggest_int("hidden_dim", 16, 256)
        embedding_dim = trial.suggest_int("embedding_dim", 16, 256)

        # Create the GCN model with the suggested hyperparameters
        model = GCN(num_nodes=data.num_nodes, embedding_dim=embedding_dim, hidden_dim=hidden_dim, out_dim=112, drop_out=dropout)

        x = torch.empty((num_nodes, embedding_dim))  # empty tensor
        torch.nn.init.xavier_uniform_(x)  # Xavier uniform initialization

        # Train the model
        optimizer = torch.optim.Adam(model_gcn.parameters(), lr=0.01, weight_decay=5e-4)
        criterion = torch.nn.CrossEntropyLoss()

        # Training loop
        epochs = 100
        model.train()
        for epoch in range(1, epochs + 1):
            optimizer.zero_grad()

            out = model(x, self.train_subdata.edge_index)  # forward pass
            loss = criterion(out, data.y.float())        # multi-label BCE loss
            loss.backward()                       # backward pass
            optimizer.step()                      # update parameters


        # Validate the model
        model.eval()
        with torch.no_grad():
            logits = model(x, self.val_subdata.edge_index)          # raw logits
            probs = torch.sigmoid(logits)                   # convert to probabilities
            preds = (probs > 0.5).int()                     # threshold at 0.5

            y_true = data.y[test_mask].numpy()
            y_pred = preds[test_mask].numpy()
            y_prob = probs[test_mask].numpy()

            # Accuracy (exact match per node)
            accuracy = accuracy_score(y_true, y_pred)

            # AUROC (per class, average='macro')
            auroc = roc_auc_score(y_true, y_prob, average='macro')

            return auroc



### ii. Integrate the logging package mlflow (https://mlflow.org/) to log your metrics.

In [21]:
%pip install wandb -q

In [27]:
 #Ignore excessive warnings
import logging
logging.propagate = False
logging.getLogger().setLevel(logging.ERROR)

# WandB – Import the wandb library
import wandb

In [23]:
# WandB – Login to your wandb account so you can log all your metrics
#!wandb login

[34m[1mwandb[0m: Currently logged in as: [33mjulian-meigen[0m ([33mjulian-meigen-university-of-bonn[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


### iii. Train and test your models and report the evaluation metrics with mean and std for the nested CV.

In [33]:
nodes_idx = data.edge_index.flatten().unique()
y = data.node_species.squeeze()

n_splits_outer = 5
outer_skf = StratifiedKFold(n_splits=n_splits_outer, shuffle=True, random_state=42)

auroc_scores = []
for fold, (trainval_idx, test_idx) in enumerate(outer_skf.split(nodes_idx, y)):
            print(f"===== Outer Fold {fold+1}/{n_splits_outer} =====")
            outer_train_subgraph = data.subgraph(torch.tensor(train_idx))
            outer_test_subgraph = data.subgraph(torch.tensor(test_idx))


            # Inner CV for hyperparameter tuning
            nodes_idx_train = outer_train_subgraph.edge_index.flatten().unique()
            y_train = outer_train_subgraph.node_species.squeeze()

            inner_skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
            best_accuracy = 0
            best_params = None
            for train_idx, val_idx in inner_skf.split(nodes_idx_train, y_train):
                inner_train_subgraph = train_subgraph.subgraph(torch.tensor(train_idx))
                val_subgraph = train_subgraph.subgraph(torch.tensor(val_idx))

                # Optuna
                gcn_opt = GCNoptimization(data, inner_train_subgraph, val_subgraph)
                study = gcn_opt.study
                study.optimize(gcn_opt.objective, n_trials=10)
                best_params = study.best_params


                # Train the model with the best hyperparameters
                model = GCN(num_nodes=data.num_nodes,
                            embedding_dim=best_params["embedding_dim"],
                            hidden_dim=best_params["hidden_dim"],
                            out_dim=112,
                            drop_out=best_params["dropout"])

                x_fin = torch.empty((data.num_nodes, best_params["embedding_dim"]))  # empty tensor
                torch.nn.init.xavier_uniform_(x)  # Xavier uniform initialization

                optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
                criterion = torch.nn.BCEWithLogitsLoss()

                for epoch in range(100):
                  model.train()
                  optimizer.zero_grad()
                  out = model(x_fin, outer_train_subgraph.edge_index)
                  loss = criterion(out, data.y.float())
                  loss.backward()
                  optimizer.step()


                # Test Evaluation
                model.eval()
                with torch.no_grad():
                    logits = model(x_fin, outer_test_subgraph.edge_index)
                    probs = torch.sigmoid(logits)
                    preds = (probs > 0.5).int()

                    y_true = data.y[test_mask].numpy()
                    y_pred = preds[test_mask].numpy()
                    y_prob = probs[test_mask].numpy()

                    # Accuracy (exact match per node)
                    accuracy = accuracy_score(y_true, y_pred)

                    # AUROC (per class, average='macro')
                    auroc = roc_auc_score(y_true, y_prob, average='macro')
                    auroc_scores.append(auroc)






[I 2025-09-16 17:34:28,920] Using an existing study with name 'GCN_optimization' instead of creating a new one.


===== Outer Fold 1/5 =====


[I 2025-09-16 17:34:31,433] Trial 53 finished with value: 0.5049824703244515 and parameters: {'dropout': 0.2582042945495541, 'hidden_dim': 213, 'embedding_dim': 174}. Best is trial 38 with value: 0.508987628268384.
[I 2025-09-16 17:34:33,260] Trial 54 finished with value: 0.4988901186887929 and parameters: {'dropout': 0.44165976174164334, 'hidden_dim': 60, 'embedding_dim': 250}. Best is trial 38 with value: 0.508987628268384.
[I 2025-09-16 17:34:37,150] Trial 55 finished with value: 0.49935117207007984 and parameters: {'dropout': 0.4566947603435956, 'hidden_dim': 77, 'embedding_dim': 245}. Best is trial 38 with value: 0.508987628268384.
[I 2025-09-16 17:34:40,125] Trial 56 finished with value: 0.4926771733212158 and parameters: {'dropout': 0.3178045991910758, 'hidden_dim': 93, 'embedding_dim': 227}. Best is trial 38 with value: 0.508987628268384.
[I 2025-09-16 17:34:41,734] Trial 57 finished with value: 0.5011743770101306 and parameters: {'dropout': 0.356233179902026, 'hidden_dim': 68,

===== Outer Fold 2/5 =====


[I 2025-09-16 17:36:19,720] Trial 103 finished with value: 0.4985666337332931 and parameters: {'dropout': 0.4200858801579007, 'hidden_dim': 102, 'embedding_dim': 115}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:36:21,446] Trial 104 finished with value: 0.5026659192995443 and parameters: {'dropout': 0.1368768856689699, 'hidden_dim': 89, 'embedding_dim': 158}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:36:23,246] Trial 105 finished with value: 0.5039388626073279 and parameters: {'dropout': 0.057974434513208314, 'hidden_dim': 116, 'embedding_dim': 143}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:36:24,503] Trial 106 finished with value: 0.5086933603128126 and parameters: {'dropout': 0.24887988964743274, 'hidden_dim': 92, 'embedding_dim': 151}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:36:25,632] Trial 107 finished with value: 0.49801062618643777 and parameters: {'dropout': 0.2580755699076845, 'hid

===== Outer Fold 3/5 =====


[I 2025-09-16 17:37:51,188] Trial 153 finished with value: 0.4995947751424922 and parameters: {'dropout': 0.628950628808157, 'hidden_dim': 62, 'embedding_dim': 77}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:37:52,197] Trial 154 finished with value: 0.50490651302594 and parameters: {'dropout': 0.5620025439052871, 'hidden_dim': 69, 'embedding_dim': 92}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:37:53,155] Trial 155 finished with value: 0.4996147686918978 and parameters: {'dropout': 0.5438637320994978, 'hidden_dim': 57, 'embedding_dim': 102}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:37:54,407] Trial 156 finished with value: 0.5042659627795466 and parameters: {'dropout': 0.6519631482556404, 'hidden_dim': 49, 'embedding_dim': 82}. Best is trial 67 with value: 0.5097313487821145.
[I 2025-09-16 17:37:55,905] Trial 157 finished with value: 0.5050330685645873 and parameters: {'dropout': 0.28652858528176584, 'hidden_dim': 8

===== Outer Fold 4/5 =====


[I 2025-09-16 17:39:23,030] Trial 203 finished with value: 0.5034790637473235 and parameters: {'dropout': 0.3963906308643895, 'hidden_dim': 72, 'embedding_dim': 246}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:39:24,909] Trial 204 finished with value: 0.5017521439170015 and parameters: {'dropout': 0.3693544926040989, 'hidden_dim': 228, 'embedding_dim': 224}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:39:27,588] Trial 205 finished with value: 0.4872452125451102 and parameters: {'dropout': 0.33119569913097213, 'hidden_dim': 235, 'embedding_dim': 243}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:39:29,856] Trial 206 finished with value: 0.48745536473191375 and parameters: {'dropout': 0.35793996182446947, 'hidden_dim': 241, 'embedding_dim': 236}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:39:31,938] Trial 207 finished with value: 0.4954464274304645 and parameters: {'dropout': 0.415136426490386, '

===== Outer Fold 5/5 =====


[I 2025-09-16 17:40:56,660] Trial 253 finished with value: 0.5005814622623123 and parameters: {'dropout': 0.34142579600782785, 'hidden_dim': 71, 'embedding_dim': 230}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:40:58,109] Trial 254 finished with value: 0.5065550104198779 and parameters: {'dropout': 0.3518704970249195, 'hidden_dim': 156, 'embedding_dim': 149}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:41:00,127] Trial 255 finished with value: 0.4978145812210299 and parameters: {'dropout': 0.3473940734816067, 'hidden_dim': 176, 'embedding_dim': 149}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:41:01,718] Trial 256 finished with value: 0.49949596516499456 and parameters: {'dropout': 0.36841731544950773, 'hidden_dim': 156, 'embedding_dim': 153}. Best is trial 174 with value: 0.5155898492952439.
[I 2025-09-16 17:41:02,998] Trial 257 finished with value: 0.5022545791130709 and parameters: {'dropout': 0.3578153398456887, 

In [34]:
print(f"Mean AUROC: {np.mean(auroc_scores):.4f}")


Mean AUROC: 0.5176
