# Evaluating Jumping Knowledge Networks on Citeseer and Cora
Here I try to replicate the evaluation of JK Networks described in the [Xu et al.](https://arxiv.org/abs/1806.03536). First Xu and colleagues test GCNs and GATs on the Citeseer and Cora datasets. They also test adding the Jumping Knowledge Aggregation to the GCN with LSTM, Max pooling and Concatenation aggregation methods. I will also test a simple MLP as a baseline. Xu and colleagues vary the number of layers from 1-6 (using a hidden layer size of 16 or 32) and choose the best performing model on the validation set then compare each of the best models on the test set. When testing I will use 3 different splits to report the mean and standard deviaiton of test accuracy.

In [1]:
from jk_networks import utils, models
import torch
from itertools import product
from collections import defaultdict

In [2]:
# import the CiteSeer dataset
from torch_geometric.datasets import Planetoid
citeseer = Planetoid(root='/tmp/CiteSeer', name='CiteSeer')

## Multi-Layer Perceptron
I will now train a series of MLPs on the citeseer dataset. This should perform worse than the GCN model because it doesn't have any graph level information but it should provide a good baseline for how well a model can do with just the bag of word features.

In [8]:
def get_accuracies(model_class,data):
  num_layers = range(1, 7)
  hidden_layer_size = [16, 32]
  val_accuracies = defaultdict(dict)
  for num_layers, hidden_layer_size in product(num_layers, hidden_layer_size):
    gcn_model = model_class(data.num_features, [hidden_layer_size] * num_layers, data.num_classes)
    graph = data[0]
    utils.split_data_node_classification(graph, train_ratio=0.6, val_ratio=0.2, manual_seed=42)
    utils.train(gcn_model, graph)
    model_acc = utils.test(gcn_model, graph, graph.val_mask)
    val_accuracies[num_layers][hidden_layer_size] = model_acc
  return val_accuracies

In [9]:
val_accuracies = get_accuracies(models.MLP, citeseer)

In [10]:
import pandas as pd
val_accuracies = pd.DataFrame(val_accuracies)
val_accuracies.index.name = 'Hidden Layer Size'
val_accuracies.columns.name = 'Number of Layers'
val_accuracies.round(3)

Number of Layers,1,2,3,4,5,6
Hidden Layer Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16,0.71,0.683,0.686,0.638,0.528,0.561
32,0.716,0.72,0.701,0.684,0.531,0.617


### Results
As we can see the best result is with the 32 Hidden features, 2 layer model with an accuracy of 72%. Let's retrain 3 times on the train and validation set to see what the test accuracy for the best model is.

In [11]:
import numpy as np
best_mlp_model = models.MLP(citeseer.num_features, [32, 32], citeseer.num_classes)
def train_best_model(model, data):
  graph = data[0]

  accuracies = []
  for i in range(3):
    utils.split_data_node_classification(graph, train_ratio=0.8, val_ratio=0, manual_seed=i)
    utils.train(model, graph)
    accuracies.append(utils.test(model, graph, graph.test_mask))
  accuracies = np.array(accuracies)
  return accuracies.mean(), accuracies.std()

best_mlp_model_acc, best_mlp_model_std = train_best_model(best_mlp_model, citeseer)

print(f'Best model accuracy: {best_mlp_model_acc:.3f} ± {best_mlp_model_std:.3f}')

Best model accuracy: 0.742 ± 0.034


### Results Continued
The simple MLP model achieves about 74% accuracy on the test set. Let's see if we can imporve this performance by using the graph structure.

## Testing GCN
Following [Xu et al.](https://arxiv.org/abs/1806.03536) I will train a series of GCNs without any Jumping Knowledge on the Citeseer dataset. I will test 12 different models with the number of layers going from 1 to 6 and the number of hidden_feauters in {16, 32}. Note Xu and collegues use a 60, 20, 20 split which is different from the built in split of citesseer. 

In [12]:
val_accuracies = get_accuracies(models.GCN, citeseer)

In [13]:
val_accuracies = pd.DataFrame(val_accuracies)
val_accuracies.index.name = 'Hidden Layer Size'
val_accuracies.columns.name = 'Number of Layers'
val_accuracies.round(3)

Number of Layers,1,2,3,4,5,6
Hidden Layer Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16,0.705,0.699,0.684,0.683,0.648,0.639
32,0.702,0.686,0.681,0.671,0.666,0.671


### Results
We can see that the 1 Layer, 32 hidden features network performs the best. I will retrain that model on the validation and train sets.

In [16]:
best_gcn_concat_model = models.GCN(citeseer.num_features, [32], citeseer.num_classes)
best_gcn_model_acc, best_gcn_model_std = train_best_model(best_gcn_concat_model, citeseer)
print(f'Best model accuracy: {best_gcn_model_acc:.3f} ± {best_gcn_model_std:.3f}')

Best model accuracy: 0.743 ± 0.010


### Results Continued
As we can seee the normal GCN network is only at about 74% accuracy on the citeseer dataset which is not any better than the mlp model. This number is different from the paper which found a GCN accuracy of 77.3% for the GCN on the citeseer dataset this could be due to some of the [preprocessing](https://github.com/pyg-team/pytorch_geometric/issues/2018) that torch_geometric does to citeseer or because of a different random seed.

## Jumping Knowledge GCNs
I will now test one of the models proposed in the Xu et al. The jumping knowledge network with gcn layer and the concat aggregation function.

In [17]:
from typing import List
class JK_GCN_Concat(models.GCN_JK_Concat):
  def __init__(self, starting_features: int, hidden_channels: List[int], output_features: int):
    super().__init__(starting_features, hidden_channels, output_features, agg_method='concat')

val_accuracies = get_accuracies(JK_GCN_Concat, citeseer)

In [18]:
val_accuracies = pd.DataFrame(val_accuracies)
val_accuracies.index.name = 'Hidden Layer Size'
val_accuracies.columns.name = 'Number of Layers'
val_accuracies.round(3)

Number of Layers,1,2,3,4,5,6
Hidden Layer Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16,0.749,0.74,0.725,0.735,0.734,0.74
32,0.746,0.719,0.711,0.713,0.711,0.711


### Results
We can see that the 1 layer 16 hidden features model is the best. So I will retrain it on the validation and train sets.

In [20]:
best_jk_gcn_concat_model = JK_GCN_Concat(citeseer.num_features, [16], citeseer.num_classes)
best_jk_gcn_model_acc, best_jk_gcn_model_std = train_best_model(best_jk_gcn_concat_model, citeseer)
print(f'Best model accuracy: {best_jk_gcn_model_acc:.3f} ± {best_jk_gcn_model_std:.3f}')

Best model accuracy: 0.771 ± 0.009


### Results Continued
As we can see the JK-GCN network achieves about 77% accuracy on the citeseer dataset which is better than the mlp model and the standard gcn model. This number is different from the paper which found a GCN accuracy of 78.3% for the jumping gcn on the citeseer dataset. Again this could be due to the different preprocessing.