# Introduction to Graph Neural Networks

In this practical, we will be focusing on understanding graph data structures, and how conventional machine learning and deep learning approaches work with graph data. The notebook consists of two parts. 

- Part 1: Graph Data and Handling Graph Data
- Part 2: Machine Learning with Graphs

## Part 1: Graph Data and Handling Graph Data

In this section, we are going to understand graph data structures and do machine learning with graph data. Graph data is very common in the world we live in, and graph data structures provide a powerful representation to embed this information in a machine-readable manner. 

### Building a Graph in Python

There are off-the-shelf libraries available to represent graph-like data using a graph data structure in Python. One of the most mature and popular libraries used for this purpose is the `NetworkX` library. [NetworkX](https://networkx.org/) is a Python package for the creation, manipulation, and study; of the structure, dynamics, and functions of complex networks. It is a 3-clause, BSD licensed, open-source library, that can be used to create graph structures and obtain useful information from the created graphs. 

#### Installation

In [None]:
import networkx as nx

#### Constructing Graphs

One can instantiate graphs in `NetworkX` using the `nx.Graph` class. `NetworkX` supports both *directed* and *undirected* graphs. 

<img src="img/graph.jpg" class="wide">

Consider the above graph. We can use `nx.Graph` to instantiate an undirected, unweighted graph; and use the `add_node` and `add_edge` methods to express this graph. 

In [None]:
# Instantiate the graph
G = nx.Graph()

In [None]:
# Add nodes A, B, C, and D
G.add_node("A")
G.add_node("B")
G.add_node("C")
G.add_node("D")

#### Exercise

Now let us add the *four* undirected, unweighted edges to the graph `G`. The `add_edge` method found in [networkX API](https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.add_edge.html) allows doing this. 

***Hint***: the `weight` parameter can be used to explicitly state that every edge should get a weight of `1.0`.

In [None]:
# Add the edges. 
# Edge weight is kept constant to 1.0 as this is a unweighted graph

# Your code here...
G.add_edge("A", "B", weight=1)
G.add_edge("A", "C", weight=1)
G.add_edge("A", "D", weight=1)
G.add_edge("B", "D", weight=1)


Once the nodes and the edges have been expressed in the graph `G` using `NetworkX`, your graph is ready. You can view the expressed graph using the `nx.draw()` function.

Use the `nx.draw()` function to display graph `G`. You can refer to the documentation found [here](https://networkx.org/documentation/stable/reference/generated/networkx.drawing.nx_pylab.draw.html#networkx.drawing.nx_pylab.draw) to fully understand how the drawing functionality works. 

***Hint***: You can set the `with_labels` parameter to `True` to also display the node labels. 

In [None]:
# Your code here...
nx.draw(G, with_labels=True)


### Attributes of a Graph

Now that the graph is available in `NetworkX`, we can use all the functionality within `NetworkX` to get relevant information about the graph. 

One can investigate the number of nodes and number of edges in the graph. 

In [None]:
print(f"Number of nodes in graph G: {G.number_of_nodes()}")
print(f"Number of edges in graph G: {G.number_of_edges()}")

One can also easily obtain useful information about the graph that is required to generate an *Adjacency Matrix* $\mathcal{A}$ and a *Degree Matrix* $\mathcal{D}$.

#### Exercise

Use the attributes of graph `G` to find out the statistics required to create the:
1. Adjacency Matrix (assign to the variable `G_Adj`)
2. Degree Matrix (assign to the variable `G_Deg`)

***Hint***: You may refer to the `Graph` object documentation found [here](https://networkx.org/documentation/stable/reference/classes/graph.html) to identify the attributes for the relevant statistics. 

The matrix itself does not need to be obtained here. Gathering the statistics required to compute the respective matrices is sufficient. 

In [None]:
# Find the adjacency matrix of graph G

# Your code here...
G_Adj = G.adj

print("The adjacency information of graph G:")
print(G_Adj)
print("\n")

# Find the degree statistics of graph G

# Your code here...
G_Deg = G.degree

print("The degree information of graph G:")
print(G_Deg)

## CORA Dataset

The CORA dataset is a citation network dataset that is available publicly from the [University of California, Santa Cruz](https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz). The dataset contains a collection of Computer Science research papers where the paper attributes, labels, and the citation graph is presented. The full dataset contains 2708 research papers (nodes in the graph) and 5429 citation relationships (edges). 

The dataset is located in the `data/cora` directory and contains 3 main files:
1. `README.txt`: Provides an overall description of the CORA dataset.
2. `cora.content`: Contains attributes and the label specific to every paper in the dataset. 
3. `cora.cites`: Contains the data that is required to generate the citation relationships.

### `cora.content` File

This file contains a multitude of attributes that are specific to each paper, along with the target label. The attributes are extracted from words that occur in the papers. After removing stopwords and stemming the word tokens, tokens that occurred less than 10 times were removed. This has led to a word vocabulary of 1,433 unique word tokens that represent the attributes of the paper. 

The `.content` file contains descriptions of the papers in the following format (where `\t` is a tab):
- `<paper_id>\t<word_attributes_1>` ... `<word_attributes_1433>\t<class_label>`

We load this file first.

In [None]:
# Import useful libraries
import pandas as pd
from sklearn.utils import shuffle

In [None]:
PAPER_ID_COL = "paper_id"
WORD_TOKEN_COLS = ["word_{}".format(i) for i in range(1433)]
LABEL_COL = "label"

CONTENT_COLS = [PAPER_ID_COL] + WORD_TOKEN_COLS + [LABEL_COL]

def read_cora_content(filepath):
    """ 
    Reads cora.content file and puts the data into a pandas.DataFrame after 
    shuffling the observations
    """
    data = shuffle(pd.read_csv(filepath, sep="\t", names=CONTENT_COLS), random_state=42)
    data.reset_index(drop=True, inplace=True)
    
    return data

In [None]:
content_df = read_cora_content("data/cora/cora.content")
content_df

### `cora.cites` File

This file contains the citation graph of the corpus. The cited and citing papers are connected with an edge.

The `.cites` file contains lines where each line describes a citation link in the following format:
- `<paper_id of cited paper>\t<paper_id of citing paper>`

Second, we load the `cora.cites` file. 

In [None]:
CITES_COLS = ["cited_paper", "citing_paper"]

def read_cora_cites(filepath):   
    """ Reads cora.cites file and puts the data into a pandas.DataFrame
    """
    data = pd.read_csv(filepath, sep="\t", names=CITES_COLS)
    
    return data

In [None]:
cites_df = read_cora_cites("data/cora/cora.cites")
cites_df

### Allocating Training, Validation, and Test sets

In the original [GCN paper](https://arxiv.org/abs/1609.02907), 20 examples from each class are used as testing data. However in this tutorial, we follow a more popular and tested [hold-out validation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) approach, that uses training, validation, and test datasets. 

The `get_train_val_test_datasets` function below takes the dataset and splits it into 3 distinct sets of observations that can be used for training, validation, and testing.

In [None]:
from sklearn.model_selection import train_test_split

def get_train_val_test_datasets(df, test_frac=.3, val_frac=.3):
    """ 
    Splits the full dataset into 3 mutually exclusive observations sets for Training, Validation
    and Testing purposes. The splitting is also stratified meaning that the label distribution is
    preserved accross the three datasets.
    
    Params: 
        df (pandas.DataFrame): DataFrame containing the attributes of research papers
        test_frac (float): The fraction of full dataset that should be allocated for testing
        val_frac (float): The fraction of the training dataset that should be allocated for validation
        
    Returns:
        train (pandas.DataFrame): DataFrame containing the training observations
        validation (pandas.DataFrame): DataFrame containing the validation observations
        test (pandas.DataFrame): DataFrame containing the test observations
    """
    # split the full dataset to train and test
    train, test = train_test_split(df, 
                                   test_size=test_frac, 
                                   random_state=42, 
                                   stratify=content_df[LABEL_COL])
    
    # further split the training dataset to train and validation
    train, validation = train_test_split(train, 
                                         test_size=val_frac, 
                                         random_state=42, 
                                         stratify=train[LABEL_COL])
    
    return train, validation, test

We use 70% of the full dataset for training and validation while 30% of the data is used for testing. The training dataset is also further split to training and validation sets with a 70:30 proportion.

In [None]:
train_data, validation_data, test_data = get_train_val_test_datasets(content_df, 
                                                                     test_frac=.3, 
                                                                     val_frac=.3) 

We can empirically evaluate the effect of stratification by looking at the label distribution across the three datasets. 

We can start by displaying the label distribution on training data.

In [None]:
print("The label distribution in the training data is as follows:")
display(train_data[LABEL_COL].value_counts() / len(train_data))

#### Exercise

Check the label distribution in the `validation_data` and `test_data` programatically to make sure the label distributions are similar.

In [None]:
# Validation data
print("The label distribution in the validation data is as follows:")

# Your code here...
display(validation_data[LABEL_COL].value_counts()/len(validation_data))


In [None]:
# Test data
print("The label distribution in the test data is as follows:")

# Your code here...
display(test_data[LABEL_COL].value_counts()/len(test_data))


### Preparing Data for Machine Learning

Preparing data for training a machine learning algorithm entails a few additional tasks apart from loading the source data and cleaning it. Sometimes, the raw data that we find is not in a suitable numerical format that is aligned with the machine learning libraries that we intend to use them with. 

In such scenarios, additional preparations need to be done. In the following section, we carry out a few such steps.

#### Encoding Labels

The task at hand is a supervised learning task where the topic of the publication has to be predicted based on the available features. The label in our task, topics, is a discrete categorical variable and needs to be converted into a numerical representation. 

Different approaches exist in order to [vectorise a categorical variable](https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features). However, in this case, _one-hot-encoding_ is a suitable approach to convert our categorical label into a numeric one.

#### Exercise

Use `sklearn.preprocessing.OneHotEncoder` found [here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn-preprocessing-onehotencoder) to convert the label into a one-hot encoding. 

Implement your logic in the `to_one_hot_encoding` function below.

***Hint***: The attributes available in the `OneHotEncoder` class allow you to obtain the label mapping.

In [None]:
# Import relevant libraries

# Your code here...
from sklearn.preprocessing import OneHotEncoder


In [None]:
# Implement the function below
def to_one_hot_encoding(label_col):
    """ 
    Returns a one hot encoding of the labels with the index-label mapping
    
    Params:
        label_col (pandas.Series): a pandas series with target label (str) for each observation
        
    Returns:
        label_mapping ([str]): A list of class labels where the index of the label corresponds to
                                the position of label in the one-hot vector
        labels (np.array): A 2-d array containing the one-hot encoded labels from dataset
    """
    # Your code here...
    y = [[_y] for _y in label_col]
    label_encoder = OneHotEncoder()
    labels = label_encoder.fit_transform(y)
    label_mapping = label_encoder.categories_[0]
    return label_mapping, labels


We use the `to_one_hot_encoding` function to obtain the encoded labels and the label mapping.

In [None]:
label_index, one_y = to_one_hot_encoding(content_df[LABEL_COL])

In [None]:
print(label_index)

#### Vectorising Node Features

We must also extract each node's features from the data. Luckily the CORA dataset already provides numerical features. 

In [None]:
X = content_df[WORD_TOKEN_COLS]
test_data

We also calculate a few other statistics which might be useful. 

In [None]:
nodes = content_df["paper_id"]
n_obs = len(nodes)
n_feat = len(WORD_TOKEN_COLS)
n_labels = content_df[LABEL_COL].nunique()

print(f"There are {n_obs} nodes, each with {n_feat} features and {n_labels} total classes.")

# Part 2: Machine Learning with Graphs

In this section, we focus on developing machine learning models with the CORA dataset. We will attempt to build two neural networks to predict the topic label of the papers in the CORA dataset.

As a first attempt, we will build a feed-forward neural network (FNN), a conventional neural network that only considers the node features in predicting the label. 

As an evolution, we will then build a graph convolutional neural network (GCN), which will allow us to leverage the graph that is available to us. 

In this scenario, we use `spektral`, a Python library that has a Keras-like interface for GNN development. With `spektral`, we use `tensorflow` as our deep neural network computation library. In order to make sure that the code runs successfully in `tensorflow`, we need to create a few data structures that will allow us to use our dataset with `tensorflow` to develop DNNs. 

## Transforming Data for the `tensorflow.keras` Interface

Before we start building neural network models, further transformations are necessary for the data structures we use to be compliant with the Python ML libraries we are using.

### Masks

Masks allow us to use the entire dataset as one matrix, while allowing us to specifically select subsets of data during runtime to work with.

A great utilisation of masks is to select training, validation, and test observations from the full dataset without having to create multiple different datasets at the start.

In the following cell, we create three masks that will be used for training, validating, and testing the models we build in `tensorflow`. As you will see in subsequent steps, these steps can be used to define the `sample_weight` parameter in `tensorflow`.  

In [None]:
import numpy as np

# Set masks 
train_mask = np.zeros((n_obs,), dtype=bool)
train_mask[train_data.index] = True

validation_mask = np.zeros((n_obs,), dtype=bool)
validation_mask[validation_data.index] = True

test_mask = np.zeros((n_obs,), dtype=bool)
test_mask[test_data.index] = True

### Compatible Data Types

Having compatible data types is also important in `tensorflow`, and leads to having fewer runtime errors. We convert both `X` (node features) and `one_y` (labels) to the `float32` data type.  

In [None]:
X = X.to_numpy().astype("float32")
one_y = one_y.toarray().astype("float32")

## Feed-forward Network to Predict Paper Topic

As the first step, we will build a feed-forward neural network, which is one of the most foundational architectures of the neural network model. In this model, no graph-related data is taken into consideration. The prediction model is learned solely based on the features that are associated with the nodes in the graph (1433 word attributes of the features in this scenario). 

### Parts of a Neural Network

A neural network consists of many different building blocks that can be assembled together in order to create a model. These blocks can be various components such as *hidden layers*, *activations*, *regularisers*, and other types of blocks.

### Example: Single Layer Neural Network

As the first example, we build a *single layer neural network* which consists of ***one*** hidden layer that maps the input features to the labels. The following figure illustrates the network architecture. 

<img src="img/shallow_net.jpg" class="wide">

In this architecture, we build a single layer neural network. The network takes the features as input (`X` in the figure) and learns a single layer of weights that uses a `softmax` activation.

As seen, this network uses a neural layer, which is a `Dense` layer, that has an `activation` function. Implementing this network in the `tensorflow` library will require using:
1. the [`Input` interface](https://www.tensorflow.org/api_docs/python/tf/keras/Input), that defines the input to the network (features)
2. a [Dense layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense), where the parameters are learned

This `Dense` layer should also be equipped with [`L2` regularisation](https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/L2) to combat overfitting.

We also use the [`EarlyStopping` approach](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) to combat overfitting in a global level (full model). 

As this is the final `Dense` layer that maps the values to the target labels (`y`), it is called the ***Output Layer***. 

We start by importing the relevant classes and setting up hyperparameters.

In [None]:
from tensorflow.keras.layers import Input, Dense, Dropout # model components

# Import regularisers
from tensorflow.keras.regularizers import l2 
from tensorflow.keras.callbacks import EarlyStopping 

from tensorflow.keras.models import Model # to store the entire model

In [None]:
l2_reg = 5e-4

#### Architecture Definition

As per the figure above, we define the model architecture.

In [None]:
# Input interface with 1433 features
X_in = Input(shape=(n_feat, ))

# Hidden layer with softmax activation and 
layer_1 = Dense(
    n_labels,
    activation='softmax',
    kernel_regularizer=l2(l2_reg)
)(X_in)

# Instantiate the model
model_1_layer = Model(inputs=X_in,
                      outputs=layer_1)

# Validate if the architecture is sound
model_1_layer.summary()

#### Model Training

Once the architecture is defined successfully, we need to define the optimisation algorithm we want to use in order to learn the parameters. 

We use [`Adam`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam) for this, which is a modified version of vanilla stochastic gradient descent. We use [categorical cross entropy](https://en.wikipedia.org/wiki/Cross_entropy) as the loss function.

In [None]:
from tensorflow.keras.optimizers import Adam # import optimisation algorithm

In [None]:
# Setting up hyperparameters for the model
es_patience = 10
learning_rate = 1e-2 # learning rate for the optmiser
epochs = 200 # no. of iterations
es_patience = 10 # no. of epochs to wait before stopping if performance keeps decreasing

In [None]:
optimizer = Adam(learning_rate=learning_rate)

model_1_layer.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',
    weighted_metrics=['acc']
)

Before training the model, we should also prepare the validation dataset that is used to identify the right combination of parameters for the model. [Early stopping](https://en.wikipedia.org/wiki/Early_stopping) allows the algorithm to stop training the model when a good solution is found. 

In [None]:
# Prepare validation data
validation_data_fnn = (X, one_y, validation_mask)

# Train model
model_1_layer.fit(
    X,
    one_y,
    sample_weight=train_mask,
    epochs=epochs,
    batch_size=n_obs,
    validation_data=validation_data_fnn,
    shuffle=False,
    callbacks=[
        EarlyStopping(patience=es_patience,  restore_best_weights=True)
    ]
)

The trained model can be evaluated with the test data to evaluate the predictive performance of this model.

In [None]:
from sklearn.metrics import classification_report

X_test = X[test_mask]
y_test = one_y[test_mask]

y_test_pred = model_1_layer.predict(X_test)
report = classification_report(
    np.argmax(y_test,axis=1),
    np.argmax(y_test_pred,axis=1), 
    target_names=label_index.tolist()
)

print(f'Simple Neural Network Classification Report: \n {report}')

### Deep Neural Network (Another Feed-forward Neural Network)

Now that we have familiarised ourselves with a single layer neural network and how it is implemented in `tensorflow`, we build a more sophisticated neural architecture to learn a more complex model. The proposed model architecture for the new model has multiple layers and looks like this. 

<img src="img/deep_net.jpg" class="wide">

#### Exercise 

Implement the code to build the above model architecture using `tensorflow`. 

You will need to use the following classes:
1. [`Input` interface](https://www.tensorflow.org/api_docs/python/tf/keras/Input)
2. [`Dense` layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)
3. [`Dropout` regularisation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout)

Build a model (`Model` class type) named `model_fnn` based on these components. 

In [None]:
# Setting up hyperparameters for the model
es_patience = 10
l2_reg = 5e-4
learning_rate = 1e-2
epochs = 200
dropout = 0.5

In [None]:
# Implement the model architecture

# Your code here...
X_in = Input(shape=(n_feat, ))

layer_h1 = Dense(
    128,
    activation='relu',
    kernel_regularizer=l2(l2_reg)
)(X_in)

dropout_1 = Dropout(dropout)(layer_h1)
layer_h2 = Dense(
    256, 
    activation='relu'
)(dropout_1)

dropout_2 = Dropout(dropout)(layer_h2)

layer_output = Dense(
    n_labels,
    activation='softmax'
)(dropout_2)

model_fnn = Model(
    inputs=X_in,
    outputs=layer_output
)

model_fnn.summary()

In [None]:
optimizer = Adam(learning_rate=learning_rate)

model_fnn.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',
    weighted_metrics=['acc']
)

In [None]:
# Create validation dataset
validation_data_fnn = (X, one_y, validation_mask)

Use the `.fit` method with the relevant parameters to train the model, while using early stopping to stop training if the model has found a good solution. 

***Hint***: You may draw from the previous example where the `.fit` method was implemented for the shallow learning network.

In [None]:
# Train model

# Your code here...
model_fnn.fit(
    X,
    one_y,
    sample_weight=train_mask,
    epochs=epochs,
    batch_size=n_obs,
    validation_data=validation_data_fnn,
    shuffle=False,
    callbacks=[
        EarlyStopping(patience=es_patience,  restore_best_weights=True)
    ]
)


In [None]:
from sklearn.metrics import classification_report

In [None]:
# Evaluate model
X_test = X[test_mask]
y_test = one_y[test_mask]

y_test_pred = model_fnn.predict(X_test)
report = classification_report(
    np.argmax(y_test,axis=1),
    np.argmax(y_test_pred,axis=1), 
    target_names=label_index.tolist()
)

print(f'FNN Classification Report: \n {report}')

## Creating the Adjacency Matrix

In order to create a graph neural network, we need to create the adjacency matrix of the graph. We can use the above loaded data to do this. In this particular case, let us treat this graph to be an *undirected* graph, to keep the implementation simple (although the directions do exist).

Let us now use `NetworkX` to create the graph representation of the CORA dataset. 

In [None]:
import networkx as nx

cora_graph = nx.Graph()

#### Exercise

Use the `content_df` and `cites_df` DataFrames that were created in part 1 to add nodes and edges to `cora_graph`. 

***Hint***: You may use the `add_node()` and `add_edge()` methods from part 1 to add nodes and edges.

In [None]:
# Add all the nodes to the dataset

# Your code here...
for node in nodes:
    cora_graph.add_node(node)


In [None]:
# Add all the edges

# Your code here...
for _, edge in cites_df.iterrows():
    cora_graph.add_edge(
        edge["cited_paper"],
        edge["citing_paper"],
        weight=1.0
    )


Now we can visualise the CORA graph (this may take around 30 seconds).

In [None]:
display_options = {
    'node_size': 30,
    'width': 0.2,
    'with_labels': False
}

nx.draw(cora_graph, **display_options)

Use the [`adjacency_matrix()`](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.graphmatrix.adjacency_matrix.html) utility function available in `NetworkX` to create the sparse matrix that represents the adjacency matrix of graph `cora_graph`.

Assign this to a variable called `A`.

In [None]:
# Retrieve the adjacency matrix A

# Your code here...
A = nx.adjacency_matrix(cora_graph)


In [None]:
print(A)

## Graph Convolutional Neural Network with `spektral`

As the final step, we build a graph neural network. In this exercise, we will build a specific type of graph neural network that uses a block called Graph Convolutional Network (GCN), which is proposed in a recent [research paper](https://arxiv.org/abs/1609.02907). 

We use the library [`spektral`](https://graphneural.network/) to implement this network due to its compatibility with `tensorflow` and user friendliness. 

### Architecture

The proposed architecture for the CORA dataset is illustrated in the figure below:

<img src="img/graph_net.jpg" class="wide">

There are a few differences here in comparison to the other networks we saw earlier. 
1. Presence adjacency matrix (Orange box titled `Adj`)
2. Presence of `GCN Conv` layers instead of hidden layers.

As this is a graph neural network architecture, it exploits the graph structure of the data. The [`GCNConv` layer](https://graphneural.network/layers/convolution/#gcnconv) takes care of this by taking the adjacency matrix of the graph as an input parameter. 

#### Exercise

Implement the architecture outlined above using `spektral` and `tensorflow`. The architecture is inspired by this [web post](https://towardsdatascience.com/graph-convolutional-networks-on-node-classification-2b6bbec1d042), which also demonstrates how `spektral` and `tensorflow` can operate together. 

***Hints***: 
- You may use `Input`, `Dropout`, and `Model` components from `tensorflow` library
- The `GCNConv` layer can be imported from the `spektral` library where extensive [documentation](https://graphneural.network/layers/convolution/#gcnconv) relating to this layer is available.
- As per the [documentation](https://graphneural.network/layers/convolution/#gcnconv), the original adjacency matrix cannot be used as it is, and should be modified (into the Laplacian). Use the instructions in the documentation to carry out this procedure.
- Set the `use_bias` parameter to `False` in both `GCNConv` layers
- The `GCNConv` layer instance needs two inputs as per the figure above.
    - Embedding from the previous layer
    - Transformed adjacency matrix 
- You can pass the two inputs to the `GCNConv` layer as a list of the two objects.
    - eg: 
    ```
    gcn_layer = GCNConv()
    gcn_embedding = gcn_layer([prev_layer_output, trans_adj_matrix])  
    ```

***Step 1***: Preprocess the adjacency matrix `A` as per the instructions in the [API documentation](https://graphneural.network/layers/convolution/#gcnconv) to create a new variable `_Adj`

In [None]:
# Your code here...
from spektral.utils.convolution import gcn_filter

_Adj = gcn_filter(A)


We convert the `_Adj` variable to floating point values, renaming it to `Adj`.

In [None]:
Adj = _Adj.astype('f4')

In [None]:
print(Adj)

***Step 2***: Build architecture and train, calling your model `model_gnn`.

In [None]:
# GCNConv layer from spektral
from spektral.layers import GCNConv

In [None]:
# Set hyperparameters
channels = 16           # Number of channels in the first layer
dropout = 0.5           # Dropout rate for the features
l2_reg = 5e-4           # L2 regularisation rate
learning_rate = 1e-2    # Learning rate
epochs = 200            # Number of training epochs
es_patience = 10        # Patience for early stopping

In [None]:
# Define architecture
# Your code here...
X_in = Input(shape=(n_feat, )) # input features from the nodes
fltr_in = Input(shape=(n_obs, ), sparse=True) # filter defined by the graph structure

dropout_1 = Dropout(dropout)(X_in)
graph_conv_1 = GCNConv(
    channels,
    activation='relu',
    kernel_regularizer=l2(l2_reg),
    use_bias=False
)([dropout_1, fltr_in])

dropout_2 = Dropout(dropout)(graph_conv_1)
graph_conv_2 = GCNConv(
    n_labels,
    activation='softmax',
    use_bias=False
)([dropout_2, fltr_in])

model_gnn = Model(
    inputs=[X_in, fltr_in],
    outputs=graph_conv_2
)

model_gnn.summary()

In [None]:
optimizer = Adam(learning_rate=learning_rate)

model_gnn.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',
    weighted_metrics=['acc']
)

We prepare the validation data and train the model. 

In [None]:
validation_data = ([X, Adj], one_y, validation_mask)

In [None]:
# Train model
model_gnn.fit(
    [X, Adj],
    one_y,
    sample_weight=train_mask,
    epochs=epochs,
    batch_size=n_obs,
    validation_data=validation_data,
    shuffle=False,
    callbacks=[
        EarlyStopping(patience=es_patience,  restore_best_weights=True)
    ]
)

### Using Test Data

Now that we have trained our graph neural network, we want to evaluate its performance with test data by using the held-out test dataset. However, this is a bit tricky with a GNN as the features and the connection information from the training data are also used as features. 

Therefore, we run a prediction on the whole dataset, including the entire adjacency matrix, before using the mask to obtain results just for the test set.

In [None]:
# Predict on all the nodes in the entire graph
y_test_pred_all = model_gnn.predict([X, Adj], batch_size=n_obs)

In [None]:
# Use the mask to filter out the true and predicted labels, and verify that they are the same dimension
y_test = one_y[test_mask]
y_test_pred = y_test_pred_all[test_mask,:]

In [None]:
y_test_pred.shape

In [None]:
y_test.shape

In [None]:
# Evaluate classification performance
report = classification_report(
    np.argmax(y_test,axis=1),
    np.argmax(y_test_pred,axis=1), 
    target_names=label_index.tolist()
)

print(f'GCN Classification Report: \n {report}')