# Introduction to Graph Neural Networks (GNNs)

In this notebook, we cover the basics of Graph Neural Networks (GNNs), focusing primarily on the **Graph Convolutional Network (GCN)**. We’ll see the general message-passing idea and walk through a small example.


## 0. Why do people use GNNs?

Many real-world datasets have rich relational structures (e.g., social networks, citation networks, molecular graphs, knowledge graphs). Traditional fully-connected or convolutional networks aren’t designed to leverage such adjacency information explicitly. **GNNs** fill this gap by incorporating the graph’s structure in each forward pass.

## 1. Introduction to Graph Neural Networks

In this tutorial, we will explore **Graph Neural Networks (GNNs)**, which are specialized neural networks designed to handle data represented as graphs (nodes + edges). We will cover:
1. The high-level idea behind GNNs (*Message Passing*).
2. The **Graph Convolutional Network (GCN)** architecture.
3. A simple example of GCN-based node classification.


## 2. Setup

Below, we load common libraries needed for our experiments. Make sure you have installed:
- `torch` (PyTorch)
- `numpy`

You can install them via:
```bash
pip install torch numpy
```
Then in Python:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

print("PyTorch version:", torch.__version__)


## 3. Graph Notation & Message Passing

### 3.1 Graph Notation

A graph **G = (V, E)** has:
- **V**: a set of nodes (or vertices), |V| = N.
- **E**: a set of edges, each edge (i, j) connects two nodes.

We often store this information in an **adjacency matrix** $ A \in \{0,1\}^{N\times N}$. Also, each node i can have a feature vector $x_i \in \mathbb{R}^d$. Collectively, these are often stored as a matrix $X \in \mathbb{R}^{N \times d}$.

### 3.2 Message Passing Paradigm

The core idea behind GNNs is **message passing**:

1. Each node begins with an initial representation (often just its features):  
   $h_i^{(0)} = x_i$

2. At each layer `k`, every node aggregates information from its neighbors to form a new representation $h_i^{(k)}$.

3. After multiple layers, each node’s representation captures information from a broader neighborhood in the graph.

Formally:

$
h_i^{(k)} = \text{UPDATE}\left(h_i^{(k-1)}, \text{AGGREGATE}\left(\{ h_j^{(k-1)} : j \in \mathcal{N}(i) \}\right)\right)
$

where $ \mathcal{N}(i) $ denotes the set of neighbors of node $i$.

## 4. Graph Convolutional Network (GCN)

One of the earliest and most widely used GNN variants is the **Graph Convolutional Network (GCN)** by Kipf & Welling (ICLR 2017). The layer update is typically written in matrix form:

$
H^{(k+1)} = \sigma\left(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(k)} W^{(k)}\right)
$

where:
- $\tilde{A} = A + I$ (we add self-connections along the diagonal),
- $\tilde{D}$ is the diagonal degree matrix of $\tilde{A}$,
- $H^{(k)} \in \mathbb{R}^{N \times d_k}$ is the matrix of node embeddings in the $k$-th layer,
- $W^{(k)}$ is a trainable weight matrix,
- $\sigma$ is a non-linear activation (e.g., ReLU).


### 4.1 Minimal GCN Implementation

Below, we define two classes:
- **`SimpleGCNLayer`**: a single GCN layer that computes $\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} X W$.
- **`SimpleGCN`**: a 2-layer GCN network (for illustration).

In [None]:
class SimpleGCNLayer(nn.Module):
    def __init__(self, in_features, out_features):
        super(SimpleGCNLayer, self).__init__()
        self.linear = nn.Linear(in_features, out_features, bias=False)

    def forward(self, x, adj):
        """x: (N, d_in), adj: (N, N) adjacency with self-loops added"""
        # Compute the degree of each node
        degree = torch.sum(adj, dim=1, keepdim=True)
        degree_inv_sqrt = torch.pow(degree, -0.5)
        degree_inv_sqrt[torch.isinf(degree_inv_sqrt)] = 0.0  # prevent inf

        # Normalize adjacency: D^{-1/2} * A * D^{-1/2}
        adj_norm = degree_inv_sqrt * adj * degree_inv_sqrt.transpose(0, 1)

        # GCN operation
        x = torch.mm(adj_norm, x)            # (N, d_in)
        x = self.linear(x)                   # (N, d_out)
        return x

class SimpleGCN(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super(SimpleGCN, self).__init__()
        self.gcn1 = SimpleGCNLayer(in_features, hidden_features)
        self.gcn2 = SimpleGCNLayer(hidden_features, out_features)

    def forward(self, x, adj):
        x = self.gcn1(x, adj)
        x = F.relu(x)
        x = self.gcn2(x, adj)
        return x

### 4.2 Example: Node Classification (Toy Data)

We’ll define a small graph of 5 nodes, construct a toy adjacency matrix, and attempt a simple classification (2 classes). 
In reality, you’d apply this to bigger datasets, but the process is the same. :)

In [None]:
# Toy data
N = 5              # number of nodes
d_in = 4           # dimensionality of input features
x_toy = torch.rand(N, d_in)

# Toy adjacency matrix: connect nodes in a chain + self-loops
adj_toy = torch.zeros(N, N)
for i in range(N - 1):
    adj_toy[i, i+1] = 1
    adj_toy[i+1, i] = 1
for i in range(N):
    adj_toy[i, i] = 1

# Labels for each node (0 or 1)
labels = torch.tensor([0, 1, 1, 0, 1])

# Create model
model = SimpleGCN(in_features=d_in, hidden_features=8, out_features=2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(50):
    optimizer.zero_grad()
    logits = model(x_toy, adj_toy)  # shape (N, 2)
    loss = criterion(logits, labels)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# Final logits
print("\nFinal logits:\n", logits)

Not very interesting for just 5 nodes, but the pipeline is the same for bigger graphs! :D

## 5. Conclusion & Further Reading

In this tutorial, we covered:
- **Graph Notation & Message Passing**
- **Graph Convolutional Network (GCN)** theory + example

**Further Reading**:
- Kipf & Welling (2017) "Semi-Supervised Classification with GCNs"
- Velickovic et al. (2018) "Graph Attention Networks"
- Hamilton et al. (2017) "GraphSAGE"
- [PyTorch Geometric Documentation](https://pytorch-geometric.readthedocs.io/)
- [DGL Documentation](https://www.dgl.ai/)


## 6. Advanced Topics

Some popular extensions to GCN include:
1. **Graph Attention Networks (GAT)**: Use attention to weight neighbors differently.
2. **GraphSAGE**: Sample fixed-size neighborhoods for large-scale graphs.
3. **Heterogeneous Graphs**: Different node types and different edge types.
4. **Scaling GNNs**: Efficient training on huge graphs (mini-batching, sampling, etc.).

_(R-GCN, which handles multiple relation types, is covered in a separate tutorial.)_