# An Introduction to Graph Neural Networks (GNNs) üåê

**Contents**
1. Why graphs? When to use GNNs
2. Message passing intuition and the GCN formula
3. Installing / importing PyTorch Geometric (optional)
4. Loading the Cora dataset (Planetoid)
5. Inspecting graph data in PyG (`Data` object)
6. Building a simple 2-layer GCN (implementation + explanation)
7. Training loop (full-batch) and evaluation using train/val/test masks
8. Inspecting predictions on specific nodes


## Why graphs? ‚Äî Motivation

Many real-world data are naturally represented as graphs: social networks, molecules, knowledge graphs, citation networks, recommender systems, and more. Traditional ML models (MLPs, CNNs, RNNs) assume fixed-size vectors, grids, or sequences; they don't directly model relationships between entities. GNNs operate **directly on graph-structured data** and propagate information along edges, letting each node aggregate information from its neighbours.

**Use cases**
- Node classification (e.g., predict paper topic in a citation graph) ‚Äî *this notebook's task*
- Graph classification (e.g., molecule property prediction)
- Link prediction (e.g., recommend new friendships)
- Graph-level regression (e.g., molecular energy)

We will focus on **node classification** on the Cora citation graph: nodes = papers, edges = citations, features = bag-of-words of paper abstracts, labels = topic class.


## The Core Idea of GNNs: Message Passing

The fundamental mechanism behind GNNs is **message passing** (or neighborhood aggregation). A GNN layer updates each node's representation by performing two key steps:

1.  **Aggregate:** Each node collects feature vectors (or "messages") from its immediate neighbors. Common aggregation functions include taking the sum, mean, or max of the neighbor features.

2.  **Update:** Each node updates its own feature vector by combining its current vector with the aggregated message from its neighbors. This combination is typically done using a small neural network (e.g., a linear layer followed by an activation function).

By stacking multiple GNN layers, a node can gather information from nodes that are further and further away. A 2-layer GNN allows a node to receive information from its 2-hop neighborhood (its friends, and its friends' friends).

## Setup: PyTorch Geometric (PyG)

We will use PyTorch Geometric (PyG), which provides efficient message passing primitives and dataset loaders. Installing PyG requires matching wheels for your PyTorch and CUDA versions. The cell below shows how to install PyG ‚Äî **run it only if PyG is not installed**.


In [2]:
# OPTIONAL: install PyG (uncomment to run if PyG is not installed)
# This installation command depends on your PyTorch version and system.
# Uncomment and run only if you need to install PyG in this environment.
#
# import torch
!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-{torch.__version__}.html
#
print('If PyG is not installed, follow the instructions at https://pytorch-geometric.readthedocs.io/')

Looking in links: https://data.pyg.org/whl/torch-2.8.0+cu126.html
Collecting torch-scatter
  Downloading https://data.pyg.org/whl/torch-2.8.0%2Bcu126/torch_scatter-2.1.2%2Bpt28cu126-cp312-cp312-linux_x86_64.whl (10.9 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m10.9/10.9 MB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torch-sparse
  Downloading https://data.pyg.org/whl/torch-2.8.0%2Bcu126/torch_sparse-0.6.18%2Bpt28cu126-cp312-cp312-linux_x86_64.whl (5.2 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m5.2/5.2 MB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torch-cluster
  Downloading https://data.pyg.org/whl/torch-2.8.0%2Bcu126/torch_cluster-1.6.3%2Bpt28cu126-cp312-cp312-linux_x86_64.whl (3.3 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚

## Loading the Cora dataset (Planetoid)

We'll use the `Planetoid` loader from PyG which provides Cora, CiteSeer, and PubMed.

The dataset contains:
- `data.x`: node feature matrix (num_nodes √ó num_node_features)
- `data.edge_index`: COO-format edge list (2 √ó num_edges)
- `data.y`: labels for nodes
- `data.train_mask`, `data.val_mask`, `data.test_mask`: boolean masks for splits

In [3]:
# Imports and dataset loading
import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv
# Download/load Cora (Planetoid)
dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]

print('Loaded dataset:', dataset.name)
print(data)

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...


Loaded dataset: Cora
Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])


Done!


## Inspecting the graph data

Let's print basic statistics and understand the `Data` object fields. We also explain `edge_index`'s COO format and how PyG expects full-batch training for small citation graphs like Cora.

In [5]:
print('Number of nodes:', data.num_nodes)
print('Number of edges:', data.num_edges)
print('Num node features:', dataset.num_node_features)
print('Num classes:', dataset.num_classes)
print('Data object keys:', data.keys)

# Show example shapes
print('x shape:', data.x.shape)
print('edge_index shape:', data.edge_index.shape)
print('y shape:', data.y.shape)
print('train/val/test counts:', int(data.train_mask.sum()), int(data.val_mask.sum()), int(data.test_mask.sum()))

Number of nodes: 2708
Number of edges: 10556
Num node features: 1433
Num classes: 7
Data object keys: <bound method BaseData.keys of Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])>
x shape: torch.Size([2708, 1433])
edge_index shape: torch.Size([2, 10556])
y shape: torch.Size([2708])
train/val/test counts: 140 500 1000


## Building a simple 2-layer GCN

We implement a compact model with two `GCNConv` layers:
- `conv1`: input features ‚Üí hidden_dim (ReLU)
- dropout
- `conv2`: hidden_dim ‚Üí num_classes (logits)

This model is sufficient to demonstrate message passing and achieve strong baseline performance on Cora.

In [6]:
from torch import nn

class GCN(nn.Module):
  def __init__(self, in_channels, hidden_channels, out_channels, dropout=0.5):
    super().__init__()
    self.conv1 = GCNConv(in_channels, hidden_channels) # First GCN layer: maps input features to a hidden dimension
    self.conv2 = GCNConv(hidden_channels, out_channels) # Second GCN layer: maps hidden features to the number of output classes
    self.dropout = dropout

  def forward(self, x, edge_index):
    x = self.conv1(x, edge_index) # x: node features, edge_index: graph connectivity
    x = F.relu(x)
    x = F.dropout(x, p=self.dropout, training=self.training)
    x = self.conv2(x, edge_index)
    return x

# Instantiate model and optimizer
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN(dataset.num_node_features, 16, dataset.num_classes, dropout=0.5).to(device)
data = data.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
print(model)

GCN(
  (conv1): GCNConv(1433, 16)
  (conv2): GCNConv(16, 7)
)


### How `GCNConv` Uses `edge_index`

The `edge_index` is crucial for the `GCNConv` layers. During the forward pass, the `GCNConv` layer performs **message passing** using `data.x` and `data.edge_index`. For each node, it:

1.  **Gathers Neighbor Features:** Uses `edge_index` to identify direct neighbors and collects their features (`data.x` from the previous layer).
2.  **Aggregates Features:** Combines the gathered neighbor features into a single vector (typically by summing or averaging, often with normalization).
3.  **Transforms and Updates:** Combines the aggregated neighbor features with the node's own features, applies a linear transformation (learned weights), and passes the result through an activation function (like ReLU) to get the node's updated feature vector for the next layer.

Essentially, `edge_index` acts as a roadmap for `GCNConv`, guiding the aggregation of information from connected nodes, allowing the network to learn representations that incorporate local graph structure.

## Training and Evaluation (full-batch)

For small citation graphs like Cora, PyG uses **full-batch training**: every forward pass uses the entire graph (all nodes and edges). We compute the loss only on the nodes specified by `train_mask`.

We implement `train()` and `evaluate()` helper functions. Note that during evaluation we compute accuracy on train/val/test splits using the masks.

In [7]:
def train():
  model.train()
  optimizer.zero_grad() # Clear old gradients
  out = model(data.x, data.edge_index) # Perform a single forward pass
  loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask]) # Calculate the loss only on the training nodes
  loss.backward() # Derive gradients
  optimizer.step() # Update parameters
  return loss.item()

def evaluate():
  model.eval()
  out = model(data.x, data.edge_index)
  pred = out.argmax(dim=1) # Use the class with the highest score
  accs = []

  for mask in [data.train_mask, data.val_mask, data.test_mask]:
    correct = int((pred[mask] == data.y[mask]).sum())
    accs.append(correct / int(mask.sum()))
  return accs

print('Training & evaluation functions defined')

Training & evaluation functions defined


### Full training loop

Below is a standard training loop. We train for `num_epochs` and print metrics every few epochs. For reproducibility you may set manual seeds. For production experiments, consider using early stopping and learning rate schedules.


In [8]:
num_epochs = 200
print('Starting training...')

for epoch in range(1, num_epochs + 1):
  loss = train()
  if epoch % 10 == 0:
    train_acc, val_acc, test_acc = evaluate()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Train Acc: {train_acc:.4f}, Val Acc: {val_acc:.4f}, Test: {test_acc:.4f}')
print('Training finished')

Starting training...
Epoch: 010, Loss: 0.8534, Train Acc: 0.9571, Val Acc: 0.7280, Test: 0.7450
Epoch: 020, Loss: 0.2933, Train Acc: 0.9857, Val Acc: 0.7720, Test: 0.7840
Epoch: 030, Loss: 0.1297, Train Acc: 1.0000, Val Acc: 0.7680, Test: 0.7870
Epoch: 040, Loss: 0.0547, Train Acc: 1.0000, Val Acc: 0.7660, Test: 0.7800
Epoch: 050, Loss: 0.0643, Train Acc: 1.0000, Val Acc: 0.7700, Test: 0.7800
Epoch: 060, Loss: 0.0600, Train Acc: 1.0000, Val Acc: 0.7760, Test: 0.7890
Epoch: 070, Loss: 0.0609, Train Acc: 1.0000, Val Acc: 0.7820, Test: 0.7970
Epoch: 080, Loss: 0.0418, Train Acc: 1.0000, Val Acc: 0.7800, Test: 0.7980
Epoch: 090, Loss: 0.0458, Train Acc: 1.0000, Val Acc: 0.7840, Test: 0.7950
Epoch: 100, Loss: 0.0385, Train Acc: 1.0000, Val Acc: 0.7820, Test: 0.8060
Epoch: 110, Loss: 0.0422, Train Acc: 1.0000, Val Acc: 0.7800, Test: 0.8000
Epoch: 120, Loss: 0.0379, Train Acc: 1.0000, Val Acc: 0.7720, Test: 0.7930
Epoch: 130, Loss: 0.0346, Train Acc: 1.0000, Val Acc: 0.7680, Test: 0.7950
Epoc

## Inspecting predictions on specific test nodes

After training, it's often useful to inspect individual predictions. We pick a few random test nodes and print their true and predicted labels. If available, we map numeric labels to class names (Cora's 7 classes).

In [12]:
import random
test_nodes = torch.where(data.test_mask)[0]
sampled = test_nodes[torch.randperm(len(test_nodes))[:5]]
model.eval()

with torch.no_grad():
  out = model(data.x, data.edge_index)
  preds = out.argmax(dim=1)

  for n in sampled:
    print('Node', int(n), 'True:', int(data.y[n]), 'Pred:', int(preds[n]))

Node 2115 True: 3 Pred: 3
Node 2292 True: 4 Pred: 4
Node 1854 True: 1 Pred: 3
Node 2151 True: 6 Pred: 6
Node 1942 True: 3 Pred: 3
