<a href="https://colab.research.google.com/github/Whoseyashar/Machine-Learning-Advance/blob/main/Vanilla_graph_neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

So far, the only type of information we’ve considered is the graph topology. However, graph datasets tend to be richer than a mere set of connections: nodes and edges can also have features to represent scores, colors, words, and so on. Including this additional information in our input data is essential to produce the best embeddings possible. In fact, this is something natural in machine learning: node and edge features have the same structure as a tabular (non-graph) dataset. This means that traditional techniques can be applied to this data, such as neural networks.

The graph datasets we’re going to use in this chapter are richer than Zachary’s Karate Club: they have more nodes, more edges, and include node features.

The Cora dataset is the most popular dataset for node classification in the scientific literature. It represents a network of 2,708 publications, where each connection is a reference. Each publication is described as a binary vector of 1,433 unique words, where 0 and 1 indicate the absence or presence of the corresponding word, respectively. This representation is also called a binary bag of words in natural language processing. Our goal is to classify each node into one of seven categories.

Let’s import it and analyze its main characteristics with PyTorch Geometric. This library has a dedicated class to download the dataset and return a relevant data structure.

In [1]:
!pip install torch-geometric
from torch_geometric.datasets import Planetoid

cora_dataset = Planetoid(root="./datasets", name="Cora")

cora_data = cora_dataset[0]

print(f'Dataset: {cora_dataset}')
print('---------------')
print(f'Number of graphs: {len(cora_dataset)}')
print(f'Number of nodes: {cora_data.x.shape[0]}')
print(f'Number of features: {cora_dataset.num_features}')
print(f'Number of classes: {cora_dataset.num_classes}')

print(f'Graph:')
print('------')
print(f'Edges are directed: {cora_data.is_directed()}')
print(f'Graph has isolated nodes: {cora_data.has_isolated_nodes()}')
print(f'Graph has loops: {cora_data.has_self_loops()}')

from torch_geometric.datasets import FacebookPagePage

facebook_dataset = FacebookPagePage(root="./datasets")

facebook_data = facebook_dataset[0]

print(f'Dataset: {facebook_dataset}')
print('-----------------------')
print(f'Number of graphs: {len(facebook_dataset)}')
print(f'Number of nodes: {facebook_data.x.shape[0]}')
print(f'Number of features: {facebook_dataset.num_features}')
print(f'Number of classes: {facebook_dataset.num_classes}')

print(f'\nGraph:')
print('------')
print(f'Edges are directed: {facebook_data.is_directed()}')
print(f'Graph has isolated nodes: {facebook_data.has_isolated_nodes()}')
print(f'Graph has loops: {facebook_data.has_self_loops()}')

Collecting torch-geometric
  Downloading torch_geometric-2.6.1-py3-none-any.whl.metadata (63 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/63.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.1/63.1 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Downloading torch_geometric-2.6.1-py3-none-any.whl (1.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: torch-geometric
Successfully installed torch-geometric-2.6.1


Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!
Downloading https://graphmining.ai/datasets/ptg/facebook.npz


Dataset: Cora()
---------------
Number of graphs: 1
Number of nodes: 2708
Number of features: 1433
Number of classes: 7
Graph:
------
Edges are directed: False
Graph has isolated nodes: False
Graph has loops: False
Dataset: FacebookPagePage()
-----------------------
Number of graphs: 1
Number of nodes: 22470
Number of features: 128
Number of classes: 4

Graph:
------
Edges are directed: False
Graph has isolated nodes: False
Graph has loops: True


Processing...
Done!


Use https://www.yworks.com/yed-live/ to visualise both graphs

Facebook page dataset does not contain training-test split. Let's perform it, but this time using already existing API

In [2]:
from torch_geometric.transforms import RandomNodeSplit

randomSplit = RandomNodeSplit(split='train_rest', num_test=0.1, num_val=0.1)
facebook_data = randomSplit(facebook_data)

print(facebook_data)

Data(x=[22470, 128], edge_index=[2, 342004], y=[22470], train_mask=[22470], val_mask=[22470], test_mask=[22470])


Compared to Zachary’s Karate Club, these two datasets include a new type of information: node features. They provide additional information about the nodes in a graph, such as a user’s age, gender, or interests in a social network. In a vanilla neural network (also called multilayer perceptron), these embeddings are directly used in the model to perform downstream tasks such as node classification.

We will consider node features as a regular tabular dataset. We will train a simple neural network on this dataset to classify our nodes. Note that this architecture does not take into account the topology of the network. We will try to fix this issue in the next section and compare our results.

In [3]:
import pandas as pd
import torch
from torch.nn import Linear
import torch.nn.functional as F

df_x = pd.DataFrame(cora_data.x.numpy())
df_x['label'] = pd.DataFrame(cora_data.y)

In [4]:
df_x

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1424,1425,1426,1427,1428,1429,1430,1431,1432,label
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2703,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2704,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2705,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3


Before we can train our model, we must define the main metric. There are several metrics for multiclass classification problems: accuracy, F1 score, Area Under the Receiver Operating Characteristic Curve (ROC AUC) score, and so on. For this work, let’s implement a simple accuracy, which is defined as the fraction of correct predictions. It is not the best metric for multiclass classification, but it is simpler to understand.

In [5]:
def accuracy(y_pred, y_true):
    return torch.sum(y_pred == y_true) / len(y_true)

Now, we can start the actual implementation. We don’t need PyTorch Geometric to implement the MLP in this section. Everything can be done in regular PyTorch

In [6]:
class MLP(torch.nn.Module):

    def __init__(self, dim_in, dim_h, dim_out):
        super().__init__()
        self.linear1 = Linear(dim_in, dim_h)
        self.linear2 = Linear(dim_h, dim_out)

    def forward(self, x):
        x = self.linear1(x)
        x = torch.relu(x)
        x = self.linear2(x)
        return F.log_softmax(x, dim=1)

    def fit(self, data, epochs):
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(self.parameters(), lr=0.01, weight_decay=5e-4)
        self.train()
        for epoch in range(epochs + 1):
            optimizer.zero_grad()
            out = self(data.x)
            loss = criterion(out[data.train_mask], data.y[data.train_mask])
            acc = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
            loss.backward()
            optimizer.step()

            if epoch % 20 == 0:
                val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
                val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
                print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Train Acc: {acc * 100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc * 100:.2f}%')

    def test(self, data):
        self.eval()
        out = self(data.x)
        acc = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])
        return acc

In [7]:
mlp = MLP(cora_dataset.num_features, 16, cora_dataset.num_classes)
print(mlp)

MLP(
  (linear1): Linear(in_features=1433, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=7, bias=True)
)


Now that our class is complete, we can create, train, and test an instance of MLP.

In [8]:
mlp.fit(cora_data, epochs=100)

acc = mlp.test(cora_data)
print(f'MLP test accuracy CORA: {acc * 100:.2f}%')

mlp_face = MLP(facebook_dataset.num_features, 16, facebook_dataset.num_classes)
print(mlp_face)

mlp_face.fit(facebook_data, epochs=100)

acc_face = mlp_face.test(facebook_data)
print(f'MLP test accuracy FACEBOOK: {acc_face * 100:.2f}%')

Epoch   0 | Train Loss: 1.949 | Train Acc: 14.29% | Val Loss: 1.99 | Val Acc: 6.80%
Epoch  20 | Train Loss: 0.087 | Train Acc: 100.00% | Val Loss: 1.36 | Val Acc: 53.40%
Epoch  40 | Train Loss: 0.011 | Train Acc: 100.00% | Val Loss: 1.39 | Val Acc: 53.60%
Epoch  60 | Train Loss: 0.007 | Train Acc: 100.00% | Val Loss: 1.36 | Val Acc: 56.00%
Epoch  80 | Train Loss: 0.008 | Train Acc: 100.00% | Val Loss: 1.32 | Val Acc: 56.60%
Epoch 100 | Train Loss: 0.009 | Train Acc: 100.00% | Val Loss: 1.31 | Val Acc: 57.00%
MLP test accuracy CORA: 55.00%
MLP(
  (linear1): Linear(in_features=128, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=4, bias=True)
)
Epoch   0 | Train Loss: 1.417 | Train Acc: 24.72% | Val Loss: 1.43 | Val Acc: 23.05%
Epoch  20 | Train Loss: 0.673 | Train Acc: 73.09% | Val Loss: 0.70 | Val Acc: 72.76%
Epoch  40 | Train Loss: 0.574 | Train Acc: 77.13% | Val Loss: 0.62 | Val Acc: 75.52%
Epoch  60 | Train Loss: 0.545 | Train Acc: 78.35% | Val Loss: 0.6

Instead of directly introducing well-known GNN architectures, let’s try to build our own model to understand the thought process behind GNNs.

A basic neural network layer corresponds to a linaer transformation $h_A = x_A W^T$, where $X_A$ is the input vector of node $A$, a $W$ is the weight matrix. In PyTorch this equation can be implemented with simple `nn.Linear` class that adds other parameters such as biases.

With provided PyTorch Dataset, the input vectors are node features, it means nodes are completely seperated from each other. This is not enough to capture a good understanding of the graph, i.e., like a pixel in an image, the context of a node is essential to understand. If you look at a group of pixels instead of a single one, you can recognize edges, patterns, and so on. Likewise, to understand a node, you need to look at its neighborhood.

Let's observe $N_A$ as a set of neighbors of node $A$. The graph liner layer can be written as:

$h_A = \sum_{i ϵ N_A} x_i W^T$

There can be plenty of variations of the equation, where separate weight matrix $W_1$ can be added to the central node and another $W_2$ for the neighbors.

As we are talking about neural networks we cannot apply the previous equation to each node. Instead, we will perform matrix multiplicaiton that are much efficient and from them extract embedding of each node. For example, the equation of matrix multiplication for linear layer can be rewritten as:

$H = XW^T$, where $X$ is the input matrix.

In case of graphs, the adjacency matrix $A$ contains information about the connections between every node in the graph. Multiplying the input matrix by this adjacency matrix will directly sum up the neighboring node features. We can add `self` loops to the adjacency matrix $Ã = A + I$, where $I$ is an identity matrix.

In that case, graph linear layer can be represented as:

$H = Ã^T X W^T$

Let's implement the previously described layer in PyTorch Geometric

In [9]:
class VanillaGNNLayer(torch.nn.Module):

    def __init__(self, dim_in, dim_out):
        super().__init__()
        self.linear = Linear(dim_in, dim_out, bias=False)

    def forward(self, x, adjacency):
        x = self.linear(x)
        x = torch.sparse.mm(adjacency, x) # or torch.mm(adjacency, x)
        return x

Before we can create our vanilla GNN, we need to convert the edge index from our dataset (data.edge_index) in coordinate format to a dense adjacency matrix. We also need to include self loops; otherwise, the central nodes won’t be taken into account in their own embeddings.

In [10]:
from torch_geometric.utils import to_dense_adj

core_adjacency = to_dense_adj(cora_data.edge_index)[0]
core_adjacency += torch.eye(len(core_adjacency))

print(core_adjacency)
print(f'<< Core Adjacency + Identity matrix size: {core_adjacency.size()}')

face_adjacency = to_dense_adj(facebook_data.edge_index)[0]
face_adjacency += torch.eye(len(face_adjacency))

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 1.],
        [0., 0., 0.,  ..., 0., 1., 1.]])
<< Core Adjacency + Identity matrix size: torch.Size([2708, 2708])


Now let's create our full GNN class with previoulsy implemented layers and all needed functionality for training and testing.

In [11]:
class VanillaGNN(torch.nn.Module):
    def __init__(self, dim_in, dim_h, dim_out):
        super().__init__()
        self.gnn1 = VanillaGNNLayer(dim_in, dim_h)
        self.gnn2 = VanillaGNNLayer(dim_h, dim_out)

    def forward(self, x, adjacency):
        h = self.gnn1(x, adjacency)
        h = torch.relu(h)
        h = self.gnn2(h, adjacency)
        return F.log_softmax(h, dim=1)

    def fit(self, data, epochs, adjacency):
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(self.parameters(), lr=0.01, weight_decay=5e-4)
        self.train()
        for epoch in range(epochs + 1):
            optimizer.zero_grad()
            out = self(data.x, adjacency)
            loss = criterion(out[data.train_mask], data.y[data.train_mask])
            acc = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
            loss.backward()
            optimizer.step()
            if epoch % 20 == 0:
                val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
                val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
                print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Train Acc: {acc * 100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc * 100:.2f}%')

    def test(self, data, adjacency):
        self.eval()
        out = self(data.x, adjacency)
        acc = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])
        return acc


gnn = VanillaGNN(cora_dataset.num_features, 16, cora_dataset.num_classes)
print(gnn)
print(f'<< Core data: {cora_data.x.shape}')
gnn.fit(cora_data, epochs=100, adjacency=core_adjacency)
acc = gnn.test(cora_data, adjacency=core_adjacency)
print(f'\nGNN Core test accuracy: {acc * 100:.2f}%')

gnn_face = VanillaGNN(facebook_dataset.num_features, 16, facebook_dataset.num_classes)
print(gnn_face)
print(f'<< Facebook data: {facebook_data.x.shape}')
gnn_face.fit(facebook_data, epochs=100, adjacency=face_adjacency)
acc = gnn_face.test(facebook_data, adjacency=face_adjacency)
print(f'\nGNN Facebook test accuracy: {acc * 100:.2f}%')

VanillaGNN(
  (gnn1): VanillaGNNLayer(
    (linear): Linear(in_features=1433, out_features=16, bias=False)
  )
  (gnn2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=7, bias=False)
  )
)
<< Core data: torch.Size([2708, 1433])
Epoch   0 | Train Loss: 2.298 | Train Acc: 12.14% | Val Loss: 2.36 | Val Acc: 11.00%
Epoch  20 | Train Loss: 0.149 | Train Acc: 97.86% | Val Loss: 1.35 | Val Acc: 76.40%
Epoch  40 | Train Loss: 0.017 | Train Acc: 100.00% | Val Loss: 1.94 | Val Acc: 74.60%
Epoch  60 | Train Loss: 0.005 | Train Acc: 100.00% | Val Loss: 2.19 | Val Acc: 74.40%
Epoch  80 | Train Loss: 0.003 | Train Acc: 100.00% | Val Loss: 2.26 | Val Acc: 74.00%
Epoch 100 | Train Loss: 0.002 | Train Acc: 100.00% | Val Loss: 2.28 | Val Acc: 74.20%

GNN Core test accuracy: 73.10%
VanillaGNN(
  (gnn1): VanillaGNNLayer(
    (linear): Linear(in_features=128, out_features=16, bias=False)
  )
  (gnn2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=4, bias=False)
  )


Exercise:
Observe the difference in performace in comparison with MLP architecture and suggest further improvements.