# Labonne - Hands-On Graph Neural Networks Using Python

## Chapter 5 - Including node features with vanilla neural networks

---

**NOTE: CHAPTER USES PyTorch SO RAN NOTEBOOK WITH ACCELERATOR (all previous ones can be CPU only)**

---

- The idea here is that we can add more than just the graph topology: nodes and edges might have features or information attached to them
- Including this information can improve the quality of the embeddings we obtain
- Node and edge features have the same structure as a tabular dataset, so standard ML can be applied

We will first build a vanilla NN on node features (**AFAICT before reading chapter : this means that we're not including the underlying graph topology**) to classify nodes

We will then include topological information of the graph - **this leads to building a GRAPH NEURAL NETWORK**

Then we compare the performance of the 2 approaches (without / with graph topology in other words)

---

## Datasets

- `Cora`
- `Facebook Page-Page`

## Cora dataset

- This is most popular dataset for node classification in the literature.
- network of 2708 (tbc if any updates since book) publications
- each connection is a Reference (**I think he means citation???**)
- each publication (node) is a binary vector of 1433 unique words where 0 or 1 indicate whether the corresponding word is present in that publication
- (**this is known as a BINARY bag of words** in NLP)

**Our goal is to classify each node into one of 7 categories**

### Visualization 

Graph data viz : use yEd Live or Gephi (maybe others since book published)

In [2]:
!pip install torch_geometric

Collecting torch_geometric
  Downloading torch_geometric-2.5.3-py3-none-any.whl.metadata (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.2/64.2 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Downloading torch_geometric-2.5.3-py3-none-any.whl (1.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: torch_geometric
Successfully installed torch_geometric-2.5.3


**he says PyTorch Geometric has a "dedicated class" to download Cora dataset (why is it called Planetoid????)**

In [3]:
from torch_geometric.datasets import Planetoid

In [4]:
dataset = Planetoid(root=".", name="Cora")

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


In [5]:
# Cora has only one graph so can store it in dedicated variable
# NOTE: TODO - IM GUESSING THE PLANETOID thing can store multiple graphs then???

data = dataset[0]

### Dataset then graph info

In [6]:
print(f'Dataset: {dataset}')
print('---------------')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of nodes: {data.x.shape[0]}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

Dataset: Cora()
---------------
Number of graphs: 1
Number of nodes: 2708
Number of features: 1433
Number of classes: 7


In [7]:
print(f'Graph:')
print('------')
print(f'Edges are directed: {data.is_directed()}')
print(f'Graph has isolated nodes: {data.has_isolated_nodes()}')
print(f'Graph has loops: {data.has_self_loops()}')

Graph:
------
Edges are directed: False
Graph has isolated nodes: False
Graph has loops: False


## Facebook Page-Page dataset

- more representative of size of real world social networks
- from Facebook graph API
- 22470 nodes: each node represents a Facebook page
- pages connected when there are mutual likes between them
- node features are 128 dim vectors: created from textual descriptions written by the owners of the pages

**Goal is to classify each node into one of 4 categories : politicians, companies, tv shows, government organisations**

---

Task is similar to Cora but 3 main diffs:

- many more nodes
- dimensionality of node features is much smaller (128 vs 1433)
- 4 categories vs 7 (easier since fewer options)

In [8]:
from torch_geometric.datasets import FacebookPagePage

dataset = FacebookPagePage(root=".")

data = dataset[0]

Downloading https://graphmining.ai/datasets/ptg/facebook.npz
Processing...
Done!


In [9]:
print(f'Dataset: {dataset}')
print('-----------------------')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of nodes: {data.x.shape[0]}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

Dataset: FacebookPagePage()
-----------------------
Number of graphs: 1
Number of nodes: 22470
Number of features: 128
Number of classes: 4


In [10]:
print(f'\nGraph:')
print('------')
print(f'Edges are directed: {data.is_directed()}')
print(f'Graph has isolated nodes: {data.has_isolated_nodes()}')
print(f'Graph has loops: {data.has_self_loops()}')


Graph:
------
Edges are directed: False
Graph has isolated nodes: False
Graph has loops: True


Facebook dataset doesn't have TVT split by default so have to make them: create masks with range() as follows:

In [11]:
data.train_mask = range(18000)
data.val_mask = range(18001, 20000)
data.test_mask = range(20001, 22470)

Was wondering what `data` is that it has these 3 methods - seems that torch_geometric type has it:

In [12]:
print(type(data))

<class 'torch_geometric.data.data.Data'>


## Classifying nodes with vanilla neural networks

- Compared to Zach Karate Club, these graphs now have **node features**
- we can classify these nodes by using the node features as though they were from a regular tabular dataset, and use a standard NN **NOTE TO BE 100% CLEAR: that this does not include/take into account the topology of the graph**

---

torch_geometric `data` has `.x` as node features and `.y` as class labels

In [14]:
# HE SAID SOMETHING ABOUT HOW pytorch has transforms to calculate random masks, but then just
# includes this code block

# WE ARE GOING TO BE BACK TO Cora NOW SO I COPIED IT HERE - TODO: confirm later that we use T transform stuff later

import torch_geometric.transforms as T


dataset = Planetoid(root=".", name="Cora")
data = dataset[0]

In [18]:
data.x, data.y

(tensor([[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]),
 tensor([3, 4, 4,  ..., 3, 3, 3]))

In [20]:
data.x[0], type(data.x[0])

(tensor([0., 0., 0.,  ..., 0., 0., 0.]), torch.Tensor)

In [22]:
# -- convert the x and y in torch_geometric data to pandas dataframe

import pandas as pd

df_x = pd.DataFrame(data.x.numpy()) # x contains torch.Tensors (see type call just above I did for myself to check)

# add label from the data.y of torch_geometric object
df_x["label"] = pd.DataFrame(data.y)

df_x

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,...,1394,1395,1396,1397,1398,1399,1400,1401,1402,1403,1404,1405,1406,1407,1408,1409,1410,1411,1412,1413,1414,1415,1416,1417,1418,1419,1420,1421,1422,1423,1424,1425,1426,1427,1428,1429,1430,1431,1432,label
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2703,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2704,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2705,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3


**NOTE TO MYSELF: I saw 0's everywhere so looked carefully and there are indeed 1s, note that this is a bag of words model so it seems in general that the vectors are very sparse?? anyway, just wanted to check that there's no mistake - matrix does indeed contain nonzero elements O_o**

---

We build a simple MLP, train it on `data.x` to predict the labels in `data.y`

We create our own MLP class with 4 methods

- init
- forward
- fit
- test

---

**Which metric to evaluate with?:** we'll use simple accuracy. It's not the best for multiclass classification, but it is simpler to understand (CAN REPLACE WITH METRIC OF YOUR CHOICE LATER - **ROC AUC, f1** etc)

In [23]:
def accuracy(y_pred, y_true):
    return torch.sum(y_pred == y_true) / len(y_true)

In [24]:
# -- Custom simple MLP for classification --

import torch
import torch.nn.functional as F
from torch.nn import Linear

In [28]:
class MLP(torch.nn.Module):
    
    def __init__(self, dim_in, dim_h, dim_out):
        super().__init__()
        
        self.linear1 = Linear(dim_in, dim_h) #  h is the hidden layer
        self.linear2 = Linear(dim_h, dim_out)
        
    def forward(self, x):
        x = self.linear1(x)
        x = torch.relu(x)
        x = self.linear2(x)
        
        return F.log_softmax(x, dim=1)
    
    def fit(self, data, epochs):
        # this is used for the training loop
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(self.parameters(),
                                    lr=0.01,
                                    weight_decay=5e-4,
                                    )
        
        self.train()
        for epoch in range(epochs+1):
            optimizer.zero_grad()
            out = self(data.x) # I DONT UNDERSTAND THIS?? WE ARE SENDING ALL DATA, NOT JUST THE TRAIN_MASK PART?!?!
            loss = criterion(out[data.train_mask], data.y[data.train_mask])
            accuracy_score = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
            loss.backward()
            optimizer.step()
            
            if epoch % 20 == 0:
                # EVALUATE ON VALIDATION SPLIT
                val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
                val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
                print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Train Acc: {accuracy_score*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')
    
    def test(self, data):
        # EVALUATES MODEL ON THE TEST SPLIT AND RETURNS ACCURACY
        self.eval()
        out = self(data.x)
        accuracy_score = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])
        return accuracy_score

Create a MLP and check its layers seem correct:

In [29]:
# FOR NOW THIS IS WITH Cora SINCE THAT IS MOST RECENTLY LOADED DATASET

# 16 hidden neurons it seems he uses in example:

mlp = MLP(dataset.num_features, 16, dataset.num_classes)

print(mlp)

MLP(
  (linear1): Linear(in_features=1433, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=7, bias=True)
)


In [30]:
mlp.fit(data, epochs=100)

Epoch   0 | Train Loss: 1.950 | Train Acc: 12.86% | Val Loss: 1.91 | Val Acc: 28.60%
Epoch  20 | Train Loss: 0.100 | Train Acc: 100.00% | Val Loss: 1.50 | Val Acc: 52.60%
Epoch  40 | Train Loss: 0.011 | Train Acc: 100.00% | Val Loss: 1.59 | Val Acc: 51.80%
Epoch  60 | Train Loss: 0.007 | Train Acc: 100.00% | Val Loss: 1.61 | Val Acc: 49.80%
Epoch  80 | Train Loss: 0.007 | Train Acc: 100.00% | Val Loss: 1.49 | Val Acc: 52.40%
Epoch 100 | Train Loss: 0.008 | Train Acc: 100.00% | Val Loss: 1.42 | Val Acc: 53.40%


In [31]:
# evaluate final model on test data split

accuracy_score = mlp.test(data)
print(f'MLP test accuracy: {accuracy_score*100:.2f}%')

MLP test accuracy: 50.60%


Now do same for the Facebook dataset - have to reload it:

In [32]:
fb_dataset = FacebookPagePage(root=".")

fb_data = fb_dataset[0]

In [34]:
# forgot to repeat this so got error
# AttributeError: 'GlobalStorage' object has no attribute 'train_mask'
# when tried training fb_mlp below

fb_data.train_mask = range(18000)
fb_data.val_mask = range(18001, 20000)
fb_data.test_mask = range(20001, 22470)

In [35]:
fb_mlp = MLP(fb_dataset.num_features, 16, fb_dataset.num_classes)

print(fb_mlp)

fb_mlp.fit(fb_data, epochs=100)

MLP(
  (linear1): Linear(in_features=128, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=4, bias=True)
)
Epoch   0 | Train Loss: 1.492 | Train Acc: 15.95% | Val Loss: 1.48 | Val Acc: 17.26%
Epoch  20 | Train Loss: 0.682 | Train Acc: 73.68% | Val Loss: 0.70 | Val Acc: 73.39%
Epoch  40 | Train Loss: 0.584 | Train Acc: 76.61% | Val Loss: 0.61 | Val Acc: 74.89%
Epoch  60 | Train Loss: 0.555 | Train Acc: 78.09% | Val Loss: 0.60 | Val Acc: 75.94%
Epoch  80 | Train Loss: 0.537 | Train Acc: 78.61% | Val Loss: 0.59 | Val Acc: 75.49%
Epoch 100 | Train Loss: 0.525 | Train Acc: 79.07% | Val Loss: 0.59 | Val Acc: 75.69%


In [62]:
fb_accuracy_score = fb_mlp.test(fb_data)
print(f'MLP test accuracy: {fb_accuracy_score*100:.2f}%')

MLP test accuracy: 74.85%


## Classifying the same nodes but with vanilla GRAPH neural networks

Build our own before using existing library ones

- currently our input vectors are just node features
- nodes are "separate from eachother": not good enough to get a good understanding of the graph
- it's like with pixels in image/CV: to understand a node you need to look at its neighborhood

Let's call `N_A` the set of neighbors of the node `A`

Our **graph linear layer** will be taken as follows:

`h_A = sum over i belonging to N_A of : x_i . W-transpose`

where `x_i` are the input vectors for each node in `N_A`

(aside, he says "you can imagine more complex variants of this approach - like for example a weight matrix W_1 dedicated to the central node, and another one W_2 for the neighbors etc." - **basically seems can extend this simple approch TODO: later, read literature see what else is out there**)

---

Now instead of doing this node by node, we can rewrite the equation for a linear layer as follows:

`H = X . W-transpose` 

where now X is the "input matrix" (matrix of all the input vectors of all the nodes in the graph)

(**UPDATE: I think eqn above is just for the "basic" linear layer WITHOUT GRAPH INFO - it's unclear in book, because below we have another `H = ...` which is for the graph stuff this time**)

For the graph, the **adjacency matrix `A`** contains the connections between every node in the graph:

multiplying the input matrix, `X`, by this adjacency matrix `A` will directly sum up the "neighboring node features" as done earlier on the single vector example.

**We can add "self loops" to `A` so that the central node is also considered (i.e. include "current node" in set of "neighbors" to sum over) - can do this by usual trick of ADDING IDENTITY MATRIX TO ADJACENCY MATRIX**

`A_tilde = A + I`

So now our **graph** linear layer can be written as (I ADDED SUBSCRIPT `_g` TO BE CLEAR IT'S NOT THE SAME `H` AS ABOVE)

`H_g = A_tilde-transpose . X . W-transpose`

---

Let's implement this in PyTorch Geometric, so we can use later as a layer when building GNNs:

**note this is the basic "component" layer, it's a basic linear transformation WITHOUT BIAS ALSO**




In [36]:
class VanillaGNNLayer(torch.nn.Module):
    
    def __init__(self, dim_in, dim_out):
        # NUMBER OF FEATURES OF THE INPUT , NUMBER OF FEATURES OF THE OUTPUT
        super().__init__()
        
        self.linear = Linear(dim_in, dim_out, bias=False)
        
    def forward(self, x, adjacency_matrix):
        # perform the linear transformation ...
        x = self.linear(x)
        # ... then the multiplication with the adjacency matrix TODO: NOT CLEAR WHAT THE sparse IS ABOUT HERE
        x = torch.sparse.mm(adjacency_matrix, x)
        return x
        

Before we can create our vanilla GNN, we need to:

1. convert the edge_index from our dataset into dense adjacency matrix `A`
2. create `A_tilde` by adding identity matrix (to account for self loops / current node being taken into account by their own embeddings - see above for discussion)

---

# NOTE:

it's not clear in book since he uses `data` for everything but currently in notebook I have:

- `data` is Cora dataset
- `fb_data` is Facebook dataset

In [40]:
data.edge_index # I don't understand what this is 100% yet TODO: CHECK

tensor([[ 633, 1862, 2582,  ...,  598, 1473, 2706],
        [   0,    0,    0,  ..., 2707, 2707, 2707]])

[https://pytorch-geometric.readthedocs.io/en/1.4.3/modules/data.html](https://pytorch-geometric.readthedocs.io/en/1.4.3/modules/data.html)

says:

edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)

**So does this mean it's a "list" of the edges with (node1,node2) format???** I think that's what it means

In [41]:
data

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

In [48]:
from torch_geometric.utils import to_dense_adj

# build basic adj matrix, A, using inbuilt function
adjacency = to_dense_adj(data.edge_index)[0]

# build A_tilde, which is A + I identity matrix
adjacency += torch.eye(len(adjacency))

adjacency

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 1.],
        [0., 0., 0.,  ..., 0., 1., 1.]])

Now we can implement the vanilla GNN, very similar to the MLP we built earlier:


**we build our first GNN with 2 linear layers**

In [51]:
class VanillaGNN(torch.nn.Module):
    
    def __init__(self, dim_in, dim_h, dim_out):
        super().__init__()
        
        self.gnn1 = VanillaGNNLayer(dim_in, dim_h)
        self.gnn2 = VanillaGNNLayer(dim_h, dim_out)
        
    def forward(self, x, adjacency_matrix):
        # include adj info now in forward()
        h = self.gnn1(x, adjacency_matrix)
        h = torch.relu(h)
        h = self.gnn2(h, adjacency_matrix)
        
        return F.log_softmax(h, dim=1)
    
    # --------------------------
    # fit and test are unchanged
    # JUST NEED TO ADD ADJACENCY_MATRIX THOUGH
    # --------------------------
    def fit(self, data, epochs):
        # this is used for the training loop
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(self.parameters(),
                                    lr=0.01,
                                    weight_decay=5e-4,
                                    )

        self.train()
        for epoch in range(epochs+1):
            optimizer.zero_grad()
            #out = self(data.x) # I DONT UNDERSTAND THIS?? WE ARE SENDING ALL DATA, NOT JUST THE TRAIN_MASK PART?!?!
            
            out = self(data.x, adjacency) # NOTE: REALLY CONFUSING - THIS IS A "GLOBAL" VARIABLE IN HIS CODE - IT'S THE ACTUAL adjacency created in an earlier cell
            loss = criterion(out[data.train_mask], data.y[data.train_mask])
            accuracy_score = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
            loss.backward()
            optimizer.step()

            if epoch % 20 == 0:
                # EVALUATE ON VALIDATION SPLIT
                val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
                val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
                print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Train Acc: {accuracy_score*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')
    
    def test(self, data):
        # EVALUATES MODEL ON THE TEST SPLIT AND RETURNS ACCURACY
        self.eval()
        #out = self(data.x)
        out = self(data.x, adjacency) # NOTE: REALLY CONFUSING - THIS IS A "GLOBAL" VARIABLE IN HIS CODE - IT'S THE ACTUAL adjacency created in an earlier cell
        accuracy_score = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])
        return accuracy_score
        

In [52]:
# create, train, evaluate our GNN
gnn = VanillaGNN(dataset.num_features, 16, dataset.num_classes)

print(gnn)

gnn.fit(data, epochs=100)

accuracy_score = gnn.test(data)
print(f'\nGNN test accuracy: {accuracy_score*100:.2f}%')

VanillaGNN(
  (gnn1): VanillaGNNLayer(
    (linear): Linear(in_features=1433, out_features=16, bias=False)
  )
  (gnn2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=7, bias=False)
  )
)
Epoch   0 | Train Loss: 2.299 | Train Acc: 14.29% | Val Loss: 2.18 | Val Acc: 15.80%
Epoch  20 | Train Loss: 0.266 | Train Acc: 94.29% | Val Loss: 1.69 | Val Acc: 68.00%
Epoch  40 | Train Loss: 0.011 | Train Acc: 100.00% | Val Loss: 2.61 | Val Acc: 70.00%
Epoch  60 | Train Loss: 0.003 | Train Acc: 100.00% | Val Loss: 2.95 | Val Acc: 68.80%
Epoch  80 | Train Loss: 0.002 | Train Acc: 100.00% | Val Loss: 2.95 | Val Acc: 68.40%
Epoch 100 | Train Loss: 0.002 | Train Acc: 100.00% | Val Loss: 2.85 | Val Acc: 69.40%

GNN test accuracy: 72.80%


## NOTE:

In book he says "do the same with Facebook dataset" but because the Vanilla GNN above has HARDCODED `adjacency` in the `fit` and `test` method you would have to copy everything out, and recreate the adjacency matrix for Facebook dataset

**very unclear that this is lurking in the code - so I make my own V2 below, which accepts the SPECIFIC ADJ MATRIX YOU WANT TO USE IN THE ARGS OF THE CLASS**:

In [59]:
class VanillaGNN_v2_WithAdjacencyMatrix(torch.nn.Module):
    
    def __init__(self, dim_in, dim_h, dim_out, adjacency_matrix):
        super().__init__()
        
        self.gnn1 = VanillaGNNLayer(dim_in, dim_h)
        self.gnn2 = VanillaGNNLayer(dim_h, dim_out)
        self.adjacency_matrix = adjacency_matrix
        
    def forward(self, x):
        # include adj info now in forward()
        h = self.gnn1(x, self.adjacency_matrix)
        h = torch.relu(h)
        h = self.gnn2(h, self.adjacency_matrix)
        
        return F.log_softmax(h, dim=1)
    
    # --------------------------
    # fit and test are unchanged
    # JUST NEED TO ADD ADJACENCY_MATRIX THOUGH
    # --------------------------
    def fit(self, data, epochs):
        # this is used for the training loop
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(self.parameters(),
                                    lr=0.01,
                                    weight_decay=5e-4,
                                    )

        self.train()
        for epoch in range(epochs+1):
            optimizer.zero_grad()
            out = self(data.x) # we now use self.adjacency_matrix in forward() so dont need to pass it locally here
            loss = criterion(out[data.train_mask], data.y[data.train_mask])
            accuracy_score = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
            loss.backward()
            optimizer.step()

            if epoch % 20 == 0:
                # EVALUATE ON VALIDATION SPLIT
                val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
                val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
                print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Train Acc: {accuracy_score*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')
    
    def test(self, data):
        # EVALUATES MODEL ON THE TEST SPLIT AND RETURNS ACCURACY
        self.eval()
        out = self(data.x) # we now use self.adjacency_matrix in forward() so dont need to pass it locally here
        accuracy_score = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])
        return accuracy_score

Now create the adj matrix for the **Facebook dataset** following steps for the Cora dataset:

In [60]:
# build basic adj matrix, A, using inbuilt function
fb_adjacency_matrix = to_dense_adj(fb_data.edge_index)[0]

# build A_tilde, which is A + I identity matrix
fb_adjacency_matrix += torch.eye(len(fb_adjacency_matrix))

fb_adjacency_matrix # dont worry if it looks like only 1's on diag, there are indeed 1s elsewhere just sparse

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]])

Now pass this specific Facebook adj matrix to our v2 GNN class, and train / evaluate as before:

In [61]:
# create, train, evaluate our GNN
fb_gnn = VanillaGNN_v2_WithAdjacencyMatrix(fb_dataset.num_features,
                                           16,
                                           fb_dataset.num_classes,
                                           fb_adjacency_matrix,
                                          )

print(fb_gnn)

fb_gnn.fit(fb_data, epochs=100)

fb_accuracy_score = fb_gnn.test(fb_data)
print(f'\nGNN test accuracy: {fb_accuracy_score*100:.2f}%')

VanillaGNN_v2_WithAdjacencyMatrix(
  (gnn1): VanillaGNNLayer(
    (linear): Linear(in_features=128, out_features=16, bias=False)
  )
  (gnn2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=4, bias=False)
  )
)
Epoch   0 | Train Loss: 44.444 | Train Acc: 30.32% | Val Loss: 39.78 | Val Acc: 28.86%
Epoch  20 | Train Loss: 3.736 | Train Acc: 79.31% | Val Loss: 2.52 | Val Acc: 80.54%
Epoch  40 | Train Loss: 1.536 | Train Acc: 81.63% | Val Loss: 1.31 | Val Acc: 83.34%
Epoch  60 | Train Loss: 0.865 | Train Acc: 83.93% | Val Loss: 0.85 | Val Acc: 83.69%
Epoch  80 | Train Loss: 0.647 | Train Acc: 85.47% | Val Loss: 0.67 | Val Acc: 84.94%
Epoch 100 | Train Loss: 0.563 | Train Acc: 86.10% | Val Loss: 0.57 | Val Acc: 85.49%

GNN test accuracy: 84.89%


# Summary of results

At start of chapter, using basic MLP without graph information we got:

- 50.4% accuracy for Cora and
- 74.8% for Facebook

With our basic vanilla GNN we get:

- 72.8% for Cora
- 84.9% for Facebook

Even with this basic implementation, considering the neighborhood of each node gives +10-20 points boost in performance !

We will extend and build on this in later chapters; in next chapter by correctly normalizing inputs and obtaining a **graph convolutional network** model.