# 1. Theory: What is a Graph Convolution?

## Analogy
Just like CNNs apply convolutions on pixel grids,

GNNs apply convolutions on graphs by aggregating information from each node’s neighbors.


$$H^{(l+1)}=σ(D~^{−1/2}A~D~^{−1/2}H^{l}W^{l})$$

- 𝐴: adjacency matrix with self-loops

- 𝐷: degree matrix

- $H^l$: node representations at layer l

- $𝑊^l$: learnable weights

- σ: activation (e.g., ReLU)

> Each node updates its feature by averaging neighbors’ features + itself.

## Practice

In [4]:
# Download Dataset Cora
# Build 2 layer GCN

In [2]:
from torch_geometric.datasets import Planetoid

dataset = Planetoid(root="data/Planetoid", name="Cora") # Dataset: Cora
data = dataset[0]
print(data)


Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])


In [6]:

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, in_channels , hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)
        
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p =0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return F.log_softmax(x, dim=1)
        

In [7]:
model = GCN(dataset.num_features, 16, dataset.num_classes)
optimizer = torch.optim.Adam(model.parameters(),lr=0.01, weight_decay=5e-4)
data = dataset[0]

for epoch in range(200):
    model.train()
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    
    if epoch % 20 == 0:
        print(f'Epoch {epoch:03d}, loss={loss:.4f}')
        
        
    

Epoch 000, loss=1.9361
Epoch 020, loss=0.1865
Epoch 040, loss=0.0567
Epoch 060, loss=0.0411
Epoch 080, loss=0.0577
Epoch 100, loss=0.0449
Epoch 120, loss=0.0372
Epoch 140, loss=0.0343
Epoch 160, loss=0.0280
Epoch 180, loss=0.0298


In [8]:
model.eval()
out = model(data)
pred = out.argmax(dim=1)

correct = (pred[data.test_mask] == data.y[data.test_mask]).sum()
acc = int(correct) / int(data.test_mask.sum())
print(f'Accuracy = {acc:4f}')



Accuracy = 0.803000


In [9]:
## Visual nodes embedding

In [12]:
!pip install -U numpy scikit-learn


Collecting numpy
  Downloading numpy-2.3.4-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting scikit-learn
  Downloading scikit_learn-1.7.2-cp313-cp313-win_amd64.whl.metadata (11 kB)
Downloading numpy-2.3.4-cp313-cp313-win_amd64.whl (12.8 MB)
   ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
   ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
    --------------------------------------- 0.3/12.8 MB ? eta -:--:--
   -- ------------------------------------- 0.8/12.8 MB 1.7 MB/s eta 0:00:08
   --- ------------------------------------ 1.0/12.8 MB 1.5 MB/s eta 0:00:08
   ---- ----------------------------------- 1.6/12.8 MB 1.7 MB/s eta 0:00:07
   ------ --------------------------------- 2.1/12.8 MB 1.8 MB/s eta 0:00:06
   ------- -------------------------------- 2.4/12.8 MB 1.8 MB/s eta 0:00:06
   --------- ------------------------------ 2.9/12.8 MB 1.9 MB/s eta 0:00:06
   ---------- ----------------------------- 3.4/12.8 MB 2.0 MB/s eta 0:00:05
   --

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.61.2 requires numpy<2.3,>=1.24, but you have numpy 2.3.4 which is incompatible.
sklearn-compat 0.1.3 requires scikit-learn<1.7,>=1.2, but you have scikit-learn 1.7.2 which is incompatible.


In [13]:
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

model.eval()
z = model.conv1(data.x, data.edge_index).detach()

z_2d = TSNE(n_components=2).fit_transform(z)
plt.figure(figsize=(8,6))
plt.scatter(z_2d[:,0], z_2d[:,1], c=data.y, cmap='jet', s=15)
plt.title('2D Visualization of GCN Node Embeddings')
plt.show()


ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject