# Notebook 1: PyTorch and PyTorch Geometric

This notebook introduces **PyTorch** — the foundational deep learning framework — and **PyTorch Geometric (PyG)** — the library built on top of PyTorch for deep learning on graphs and other irregular structures.

**Contents**
1. [PyTorch Fundamentals](#1-pytorch-fundamentals)  
2. [Tensors and Autograd](#2-tensors-and-autograd)  
3. [Building Neural Networks with `nn.Module`](#3-building-neural-networks-with-nnmodule)  
4. [Introduction to Graph-Structured Data](#4-introduction-to-graph-structured-data)  
5. [PyTorch Geometric: Core Concepts](#5-pytorch-geometric-core-concepts)  
6. [Datasets and DataLoaders in PyG](#6-datasets-and-dataloaders-in-pyg)  
7. [Your First GNN Layer (`GCNConv`)](#7-your-first-gnn-layer-gcnconv)  
8. [Exercises](#8-exercises)

## Setup
Install the required packages (run once).

In [None]:
# Uncomment and run the lines below if you are in a fresh environment
# !pip install torch torchvision
# !pip install torch_geometric
# Some PyG optional dependencies (speeds up sparse operations):
# !pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cpu.html

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

print(f'PyTorch version: {torch.__version__}')

---
## 1. PyTorch Fundamentals

### 1.1 What is PyTorch?

**PyTorch** is an open-source deep learning framework developed by Meta AI. Its two main features are:

| Feature | Description |
|---------|-------------|
| **Dynamic computation graphs** | Graphs are built on-the-fly during the forward pass, making debugging intuitive |
| **Automatic differentiation** | The `autograd` engine computes gradients automatically |

### 1.2 Core data structure: `torch.Tensor`

A **Tensor** is a multi-dimensional array, similar to NumPy arrays but with GPU acceleration and gradient support.

---
## 2. Tensors and Autograd

### 2.1 Creating Tensors

In [None]:
# Scalars, vectors, matrices, and higher-dimensional tensors
scalar = torch.tensor(3.14)
vector = torch.tensor([1.0, 2.0, 3.0])
matrix = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
tensor_3d = torch.randn(2, 3, 4)  # random normal

print('scalar:', scalar, '| shape:', scalar.shape)
print('vector:', vector, '| shape:', vector.shape)
print('matrix:\n', matrix, '| shape:', matrix.shape)
print('3D tensor shape:', tensor_3d.shape)

In [None]:
# Factory functions
zeros = torch.zeros(3, 3)
ones  = torch.ones(3, 3)
eye   = torch.eye(3)          # identity matrix
arange = torch.arange(0, 10, 2)  # [0, 2, 4, 6, 8]
linspace = torch.linspace(0, 1, 5)  # [0, 0.25, 0.5, 0.75, 1]

print('zeros:\n', zeros)
print('arange:', arange)
print('linspace:', linspace)

### 2.2 Tensor Operations

In [None]:
a = torch.tensor([[1., 2.], [3., 4.]])
b = torch.tensor([[5., 6.], [7., 8.]])

print('Element-wise addition:\n', a + b)
print('Matrix multiplication:\n', a @ b)       # or torch.matmul(a, b)
print('Element-wise multiplication:\n', a * b)
print('Transpose:\n', a.T)
print('Sum of all elements:', a.sum())
print('Mean along dim=0:', a.mean(dim=0))

### 2.3 Reshaping Tensors

In [None]:
x = torch.arange(12, dtype=torch.float32)
print('Original:', x.shape)         # (12,)

x_2d = x.view(3, 4)
print('After view(3,4):', x_2d.shape)   # (3, 4)

x_3d = x.reshape(2, 2, 3)
print('After reshape(2,2,3):', x_3d.shape)

# squeeze / unsqueeze: add or remove dimensions of size 1
y = torch.randn(1, 4, 1)
print('y shape:', y.shape)
print('squeeze:', y.squeeze().shape)         # (4,)
print('unsqueeze(0):', y.squeeze().unsqueeze(0).shape)  # (1, 4)

### 2.4 Automatic Differentiation (`autograd`)

PyTorch tracks operations on tensors with `requires_grad=True` and can automatically compute gradients via **backpropagation**.

In [None]:
# Simple example: y = 3x^2 + 2x + 1
# dy/dx at x=2 should be 6x + 2 = 14
x = torch.tensor(2.0, requires_grad=True)
y = 3 * x**2 + 2 * x + 1
y.backward()   # compute dy/dx
print('dy/dx at x=2:', x.grad)   # expected: 14.0

In [None]:
# Vector-valued function — needs a gradient vector (Jacobian-vector product)
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x ** 2
y.backward(torch.ones_like(x))  # sum of gradients
print('Gradients:', x.grad)     # [2, 4, 6]

### 2.5 Simple Training Loop

Let's fit a simple linear model $y = wx + b$ to random data.

In [None]:
torch.manual_seed(42)

# Generate synthetic data: y = 2x + 1 + noise
X = torch.randn(100, 1)
y_true = 2 * X + 1 + 0.1 * torch.randn(100, 1)

# Model parameters
w = torch.randn(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)

lr = 0.01
losses = []

for epoch in range(300):
    y_pred = X * w + b
    loss = ((y_pred - y_true) ** 2).mean()
    loss.backward()
    with torch.no_grad():
        w -= lr * w.grad
        b -= lr * b.grad
    w.grad.zero_()
    b.grad.zero_()
    losses.append(loss.item())

print(f'Learned w={w.item():.4f}, b={b.item():.4f}  (true: w=2.0, b=1.0)')

plt.plot(losses)
plt.xlabel('Epoch'); plt.ylabel('MSE Loss')
plt.title('Training Loss'); plt.show()

---
## 3. Building Neural Networks with `nn.Module`

PyTorch's `torch.nn.Module` is the base class for all neural network modules. A model is built by:
1. Subclassing `nn.Module`
2. Defining layers in `__init__`
3. Implementing the forward pass in `forward`

In [None]:
class MLP(nn.Module):
    """Simple Multi-Layer Perceptron."""

    def __init__(self, in_features, hidden_size, out_features):
        super().__init__()
        self.fc1 = nn.Linear(in_features, hidden_size)
        self.fc2 = nn.Linear(hidden_size, out_features)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        return self.fc2(x)


model = MLP(in_features=16, hidden_size=32, out_features=4)
print(model)

# Count trainable parameters
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'Trainable parameters: {num_params}')

In [None]:
# Training with nn.Module, an optimizer, and a loss function
torch.manual_seed(0)
X_train = torch.randn(200, 16)
y_train = torch.randint(0, 4, (200,))

model = MLP(16, 32, 4)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

for epoch in range(50):
    optimizer.zero_grad()
    logits = model(X_train)
    loss = criterion(logits, y_train)
    loss.backward()
    optimizer.step()

print(f'Final loss: {loss.item():.4f}')

---
## 4. Introduction to Graph-Structured Data

### 4.1 What is a Graph?

A **graph** $G = (V, E)$ consists of:
- $V$ — a set of **nodes** (vertices)
- $E \subseteq V \times V$ — a set of **edges** connecting pairs of nodes

Graphs can additionally carry:
- **Node features** $\mathbf{X} \in \mathbb{R}^{|V| \times F}$ — feature vector for each node
- **Edge features** — attributes on each edge
- **Graph-level labels** — a label for the entire graph

### 4.2 Graph Representations

| Representation | Description | Memory |
|---|---|---|
| **Adjacency matrix** | $A \in \{0,1\}^{N \times N}$, $A_{ij}=1$ if edge $(i,j)$ exists | $O(N^2)$ |
| **Edge list** | List of tuples $(i, j)$ | $O(\|E\|)$ |
| **COO sparse format** | Two arrays `row` and `col` | $O(\|E\|)$ |

PyG uses the **COO format** via `edge_index` of shape `[2, |E|]`.

### 4.3 Why Standard NNs Fail on Graphs?

Standard neural networks assume fixed-size, ordered inputs. Graphs have:
- **Variable size** — different numbers of nodes and edges
- **No canonical ordering** — node order is arbitrary (**permutation invariance**)
- **Irregular connectivity** — each node can have a different number of neighbours

In [None]:
# Manual graph example
# 5-node graph: 0--1--2--3, 1--4
edge_index = torch.tensor([
    [0, 1, 1, 2, 2, 3, 1, 4],   # source nodes
    [1, 0, 2, 1, 3, 2, 4, 1],   # target nodes
], dtype=torch.long)

x = torch.randn(5, 4)  # 5 nodes, 4 features each
print('edge_index shape:', edge_index.shape)  # [2, 8]
print('node features shape:', x.shape)         # [5, 4]

In [None]:
# Visualise the graph with networkx
try:
    import networkx as nx
    G = nx.Graph()
    G.add_nodes_from(range(5))
    edges = edge_index.T.tolist()
    G.add_edges_from(edges)
    pos = nx.spring_layout(G, seed=42)
    nx.draw(G, pos, with_labels=True, node_color='lightblue',
            node_size=700, font_size=12)
    plt.title('Example graph'); plt.show()
except ImportError:
    print('networkx not installed — skipping visualisation')

---
## 5. PyTorch Geometric: Core Concepts

### 5.1 What is PyTorch Geometric?

[PyTorch Geometric (PyG)](https://pytorch-geometric.readthedocs.io/) provides:
- A **`Data`** object to represent a single graph
- A **`Dataset`** / **`DataLoader`** pipeline for batching graphs
- A rich library of **GNN layers** (GCNConv, SAGEConv, GATConv, …)
- **Mini-batch support** using sparse block-diagonal adjacency matrices

### 5.2 The `Data` Object

The central data structure in PyG is `torch_geometric.data.Data`:

```python
Data(
    x          = <node features>       # [num_nodes, num_node_features]
    edge_index = <COO edge index>      # [2, num_edges]
    edge_attr  = <edge features>       # [num_edges, num_edge_features]  (optional)
    y          = <labels>              # node-, edge-, or graph-level labels
    pos        = <node positions>      # [num_nodes, num_dimensions]    (optional)
)
```

In [None]:
try:
    from torch_geometric.data import Data

    # Build the same 5-node graph
    data = Data(
        x=torch.randn(5, 4),          # 5 nodes, 4 features
        edge_index=edge_index,
        y=torch.tensor([0, 1, 0, 1, 0])  # node labels
    )

    print(data)
    print('Num nodes:', data.num_nodes)
    print('Num edges:', data.num_edges)
    print('Num node features:', data.num_node_features)
    print('Has isolated nodes:', data.has_isolated_nodes())
    print('Has self-loops:', data.has_self_loops())
    print('Is directed:', data.is_directed())
except ImportError:
    print('torch_geometric not installed — install it to run this cell')

### 5.3 Message Passing Framework

Almost every GNN follows the **message passing** paradigm:

$$
\mathbf{h}_v^{(k)} = \text{UPDATE}^{(k)}\!\left(
    \mathbf{h}_v^{(k-1)},
    \text{AGGREGATE}^{(k)}\!\left(
        \left\{ \mathbf{m}_{(u,v)}^{(k)} : u \in \mathcal{N}(v) \right\}
    \right)
\right)
$$

where $\mathbf{m}_{(u,v)}^{(k)} = \text{MESSAGE}^{(k)}(\mathbf{h}_u^{(k-1)}, \mathbf{h}_v^{(k-1)}, \mathbf{e}_{uv})$.

PyG implements this via `MessagePassing` base class:

In [None]:
try:
    from torch_geometric.nn import MessagePassing
    from torch_geometric.utils import add_self_loops, degree

    class SimpleGCNConv(MessagePassing):
        """A minimal GCN convolution to illustrate MessagePassing."""

        def __init__(self, in_channels, out_channels):
            super().__init__(aggr='add')  # 'add' aggregation
            self.lin = nn.Linear(in_channels, out_channels, bias=False)

        def forward(self, x, edge_index):
            # 1. Add self-loops to the adjacency matrix
            edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

            # 2. Linearly transform node feature matrix
            x = self.lin(x)

            # 3. Compute normalisation coefficients
            row, col = edge_index
            deg = degree(col, x.size(0), dtype=x.dtype)
            deg_inv_sqrt = deg.pow(-0.5)
            norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]

            # 4. Start propagating messages
            return self.propagate(edge_index, x=x, norm=norm)

        def message(self, x_j, norm):
            # x_j: features of source nodes
            return norm.view(-1, 1) * x_j

    conv = SimpleGCNConv(4, 8)
    out = conv(data.x, data.edge_index)
    print('Output shape:', out.shape)  # [5, 8]
except ImportError:
    print('torch_geometric not installed')

---
## 6. Datasets and DataLoaders in PyG

### 6.1 Built-in Datasets

PyG ships with many benchmark datasets, for example:

| Dataset | Task | Nodes | Edges |
|---------|------|-------|-------|
| Cora | Node classification | 2708 | 10556 |
| Citeseer | Node classification | 3327 | 9104 |
| KarateClub | Community detection | 34 | 156 |
| TUDataset (MUTAG) | Graph classification | ~18 | ~40 |

### 6.2 The Karate Club Dataset

In [None]:
try:
    from torch_geometric.datasets import KarateClub

    dataset = KarateClub()
    data_kc = dataset[0]

    print('Dataset:', dataset)
    print('Number of graphs:', len(dataset))
    print('Number of node features:', dataset.num_features)
    print('Number of classes:', dataset.num_classes)
    print()
    print(data_kc)
    print('Number of nodes:', data_kc.num_nodes)
    print('Number of edges:', data_kc.num_edges)
except ImportError:
    print('torch_geometric not installed')

### 6.3 Mini-batching Graphs

PyG's `DataLoader` stacks multiple graphs into a **single disconnected graph** using block-diagonal adjacency:

$$
\mathbf{A}_{\text{batch}} = \begin{pmatrix} A_1 & & \\ & A_2 & \\ & & \ddots \end{pmatrix}
$$

The `batch` vector maps each node to its graph index.

In [None]:
try:
    from torch_geometric.loader import DataLoader
    from torch_geometric.datasets import TUDataset

    dataset_tu = TUDataset(root='/tmp/MUTAG', name='MUTAG')
    print('MUTAG:', dataset_tu)
    print('Number of graphs:', len(dataset_tu))
    print('Number of classes:', dataset_tu.num_classes)

    loader = DataLoader(dataset_tu, batch_size=32, shuffle=True)
    for batch in loader:
        print('Batch:', batch)
        print('Batch vector (unique values):', batch.batch.unique())
        break
except ImportError:
    print('torch_geometric not installed')

---
## 7. Your First GNN Layer (`GCNConv`)

### 7.1 Theory Recap

The **Graph Convolutional Network (GCN)** layer (Kipf & Welling, 2017) computes:

$$
\mathbf{H}^{(l+1)} = \sigma\!\left(
    \hat{\mathbf{D}}^{-1/2}\hat{\mathbf{A}}\hat{\mathbf{D}}^{-1/2}\mathbf{H}^{(l)}\mathbf{W}^{(l)}
\right)
$$

where $\hat{\mathbf{A}} = \mathbf{A} + \mathbf{I}$ (add self-loops) and $\hat{\mathbf{D}}$ is the diagonal degree matrix of $\hat{\mathbf{A}}$.

### 7.2 Node Classification on Cora

In [None]:
try:
    from torch_geometric.datasets import Planetoid
    from torch_geometric.nn import GCNConv

    dataset_cora = Planetoid(root='/tmp/Cora', name='Cora')
    data_cora = dataset_cora[0]
    print(data_cora)
except ImportError:
    print('torch_geometric not installed')

In [None]:
try:
    class GCN(nn.Module):
        def __init__(self, in_ch, hidden_ch, out_ch):
            super().__init__()
            self.conv1 = GCNConv(in_ch, hidden_ch)
            self.conv2 = GCNConv(hidden_ch, out_ch)

        def forward(self, data):
            x, edge_index = data.x, data.edge_index
            x = F.relu(self.conv1(x, edge_index))
            x = F.dropout(x, p=0.5, training=self.training)
            x = self.conv2(x, edge_index)
            return F.log_softmax(x, dim=1)


    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model_gcn = GCN(dataset_cora.num_features, 16, dataset_cora.num_classes).to(device)
    data_cora = data_cora.to(device)
    optimizer_gcn = optim.Adam(model_gcn.parameters(), lr=0.01, weight_decay=5e-4)

    def train():
        model_gcn.train()
        optimizer_gcn.zero_grad()
        out = model_gcn(data_cora)
        loss = F.nll_loss(out[data_cora.train_mask], data_cora.y[data_cora.train_mask])
        loss.backward()
        optimizer_gcn.step()
        return loss.item()

    @torch.no_grad()
    def test():
        model_gcn.eval()
        out = model_gcn(data_cora)
        pred = out.argmax(dim=1)
        accs = []
        for mask in [data_cora.train_mask, data_cora.val_mask, data_cora.test_mask]:
            accs.append(int((pred[mask] == data_cora.y[mask]).sum()) / int(mask.sum()))
        return accs

    for epoch in range(1, 201):
        train()

    train_acc, val_acc, test_acc = test()
    print(f'Train: {train_acc:.4f} | Val: {val_acc:.4f} | Test: {test_acc:.4f}')
except NameError:
    print('torch_geometric not installed — skipping GCN training')

### 7.3 Visualising Node Embeddings with t-SNE

In [None]:
try:
    from sklearn.manifold import TSNE

    model_gcn.eval()
    with torch.no_grad():
        # Get embeddings from first layer
        x = data_cora.x
        emb = F.relu(model_gcn.conv1(x, data_cora.edge_index))

    emb_np = emb.cpu().numpy()
    labels_np = data_cora.y.cpu().numpy()

    tsne = TSNE(n_components=2, random_state=42)
    emb_2d = tsne.fit_transform(emb_np)

    plt.figure(figsize=(8, 6))
    scatter = plt.scatter(emb_2d[:, 0], emb_2d[:, 1], c=labels_np, cmap='tab10', s=10)
    plt.colorbar(scatter)
    plt.title('t-SNE of GCN Node Embeddings (Cora)')
    plt.show()
except Exception as e:
    print(f'Skipping t-SNE visualisation: {e}')

---
## 8. Exercises

### Exercise 1 — Tensor Operations
1. Create a $4 \times 4$ matrix of random integers between 0 and 9 using `torch.randint`.
2. Compute its trace (sum of diagonal elements). *Hint: `torch.diagonal`*.
3. Extract the upper-triangular part using `torch.triu`.
4. Compute the L2 norm of each row.

In [None]:
# Exercise 1 — your solution here
...

### Exercise 2 — Custom `nn.Module`
Implement a **residual block**: a module that applies two linear layers with ReLU activations and adds the input to the output (skip connection). Test it on random input.

$$\text{out} = \text{ReLU}(W_2 \cdot \text{ReLU}(W_1 \cdot x)) + x$$

In [None]:
# Exercise 2 — your solution here
class ResidualBlock(nn.Module):
    def __init__(self, dim):
        super().__init__()
        # TODO: define two linear layers
        ...

    def forward(self, x):
        # TODO: implement residual connection
        ...

### Exercise 3 — Custom `MessagePassing` Layer
Implement a **mean-aggregation** GNN layer using `MessagePassing` that:
1. Aggregates neighbour features by **mean** (`aggr='mean'`).
2. Concatenates the mean-aggregated neighbour features with the node's own features.
3. Applies a linear transformation to the concatenated vector.

Test it on the KarateClub dataset.

In [None]:
# Exercise 3 — your solution here
try:
    from torch_geometric.nn import MessagePassing

    class MeanAggConv(MessagePassing):
        def __init__(self, in_channels, out_channels):
            super().__init__(aggr='mean')
            # TODO: define linear layer(s)
            ...

        def forward(self, x, edge_index):
            # TODO
            ...

        def message(self, x_j):
            return x_j
except ImportError:
    print('torch_geometric not installed')

### Exercise 4 — Graph Classification Pipeline
Build a complete **graph classification** pipeline on the MUTAG dataset:
1. Load MUTAG using `TUDataset`.
2. Split into train / test sets (80/20).
3. Use `DataLoader` with `batch_size=32`.
4. Build a 2-layer GCN with global mean pooling (`global_mean_pool`).
5. Train for 100 epochs and report test accuracy.

*Hint:* `from torch_geometric.nn import global_mean_pool`

In [None]:
# Exercise 4 — your solution here
...

---
## Summary

| Topic | Key Takeaway |
|-------|--------------|
| **PyTorch Tensors** | n-dimensional arrays with GPU support and autograd |
| **Autograd** | Automatic gradient computation via backpropagation |
| **`nn.Module`** | Base class for building reusable neural network modules |
| **Graph data** | Represented as `(x, edge_index)` in COO format |
| **PyG `Data`** | Flexible container for a single graph |
| **`MessagePassing`** | Unified API for message-passing GNNs |
| **`DataLoader`** | Efficient mini-batching of graph datasets |

**Next notebook →** `02_gcn_graphsage_gat.ipynb` — GCN, GraphSAGE, and GAT in depth.