<a href="https://colab.research.google.com/github/Brennan-Richards/Deep-Learning-R-and-D/blob/main/Graph_NN_Protein_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

PURPOSE: Learn to create GNNs (in the language of the Spektral library).

CONTENTS: Contents pertaining to the tutorial in the [Spektral - Getting Started documentation](https://graphneural.network/getting-started/), a step-by-step walkthrough for a graph-level prediction task.

In [None]:
!pip install spektral

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting spektral
  Downloading spektral-1.2.0-py3-none-any.whl (140 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.1/140.1 KB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: spektral
Successfully installed spektral-1.2.0


In [None]:
import numpy as np

Load dataset and explore possible operations

In [None]:
from spektral.datasets import TUDataset
dataset = TUDataset('PROTEINS')

dataset

Downloading PROTEINS dataset.


100%|████████████████████████████████████████| 447k/447k [00:00<00:00, 1.02MB/s]


Successfully loaded PROTEINS.


TUDataset(n_graphs=1113)

In [None]:
# Index data set
dataset[0]

Graph(n_nodes=98, n_node_features=4, n_edge_features=None, n_labels=2)

In [None]:
# Shuffle graphs in data set
np.random.shuffle(dataset)
dataset[0]

  np.random.shuffle(dataset)


Graph(n_nodes=23, n_node_features=4, n_edge_features=None, n_labels=2)

In [None]:
# Slice data set
part = dataset[:10]
part

TUDataset(n_graphs=10)

Dataset transformations

In [None]:
# Transform the dataset inplace by removing graphs with more than 500 nodes
dataset.filter(lambda g: g.n_nodes < 500)
dataset

TUDataset(n_graphs=1111)

In [None]:
# Apply more transforms

max_degree = int(dataset.map(lambda g: g.a.sum(-1).max(), reduce=max)) # Compute the max degree of the data set = size of one-hot encoding vector

print(max_degree)

from spektral.transforms import Degree

dataset.apply(Degree(max_degree))

12


In [None]:
# NOTE: max_degree + 1 new node features were added.
dataset[0]

Graph(n_nodes=23, n_node_features=17, n_edge_features=None, n_labels=2)

In [None]:
# "Since we will be using a GCNConv layer in our GNN, we also want to follow the original paper (https://arxiv.org/abs/1609.02907) that introduced this layer and do some extra pre-processing of the adjacency matrix."
from spektral.transforms import GCNFilter

dataset.apply(GCNFilter())

# NOTE: "If you don't want to go back to the literature every time, every convolutional layer in Spektral has a preprocess(a) method that you can use to transform the adjacency matrix as needed."

Create train and test data sets

In [None]:
train_dataset = dataset[:int(len(dataset)*0.8)] # The first 80%
print(len(train_dataset))

888


In [None]:
test_dataset = dataset[int(len(dataset)*0.8):] # The rest beyond the train set's 80%
print(len(test_dataset))

223


Build and train a GNN

In [None]:
# "Since Spektral is designed as an extension of Keras, you can plug any Spektral layer into a Keras Model without modifications". Must use functional API.
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout
from spektral.layers import GCNConv, GlobalSumPool

In [None]:
class GNN(Model):

    def __init__(self, n_hidden, n_labels):
        super().__init__()
        self.graph_conv = GCNConv(n_hidden)
        self.pool = GlobalSumPool()
        self.dropout = Dropout(0.5)
        self.dense = Dense(n_labels, 'softmax')

    def call(self, inputs):
        out = self.graph_conv(inputs)
        out = self.dropout(out)
        out = self.pool(out)
        out = self.dense(out)

        return out

In [None]:
model = GNN(32, dataset.n_labels)
model.compile('adam', 'categorical_crossentropy')

In [None]:
"""
Unlike regular data, like images or sequences, graphs cannot be stretched, cut, or reshaped so that we can fit them into tensors of pre-defined shapes.
If a graph has 10 nodes and another one has 4, we have to keep them that way. This means that iterating over a dataset in mini-batches is not trivial
and we cannot simply use the model.fit() method of Keras as-is.

We have to use a data Loader (read: https://graphneural.network/data-modes/).
"""

In [None]:
from spektral.data import BatchLoader

train_loader = BatchLoader(train_dataset, batch_size=32)

In [None]:
model.fit(train_loader.load(), steps_per_epoch=train_loader.steps_per_epoch, epochs=10)

Epoch 1/10
 6/28 [=====>........................] - ETA: 0s - loss: 5.4338

  np.random.shuffle(a)


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f6cf1b7b8b0>

Evaluate

In [None]:
test_loader = BatchLoader(test_dataset, batch_size=32)


In [None]:
loss = model.evaluate(test_loader.load(), steps=test_loader.steps_per_epoch)

print('Test loss: {}'.format(loss))

Test loss: 1.6037871837615967


Moving forward



*   [Data modes reading](https://graphneural.network/data-modes/)
*   [Examples!](https://graphneural.network/examples/)

