# Evaluating Jumping Knowledge Networks on Citeseer and Cora
Here I try to replicate the evaluation of JK Networks described in the [Xu et al.](https://arxiv.org/abs/1806.03536). First Xu and colleagues test GCNs and GATs on the Citeseer and Cora datasets. They also test adding the Jumping Knowledge Aggregation to the GCN with LSTM, Max pooling and Concatenation aggregation methods. I will also test a simple MLP as a baseline. Xu and colleagues vary the number of layers from 1-6 (using a hidden layer size of 16 or 32) and choose the best performing model on the validation set then compare each of the best models on the test set. When testing I will use 3 different splits to report the mean and standard deviaiton of test accuracy.

In [1]:
from jk_networks import utils, models

In [2]:
# import the CiteSeer dataset
from torch_geometric.datasets import Planetoid
citeseer = Planetoid(root='/tmp/CiteSeer', name='CiteSeer')

In [3]:
from itertools import product
from collections import defaultdict
def get_gcn_accuracies(data):
  num_layers = range(1, 7)
  hidden_layer_size = [16, 32]
  val_accuracies = defaultdict(dict)
  for num_layers, hidden_layer_size in product(num_layers, hidden_layer_size):
    gcn_model = models.GCN(data.num_features, [hidden_layer_size] * num_layers, data.num_classes)
    graph = data[0]
    utils.split_data_node_classification(graph, train_ratio=0.6, val_ratio=0.2)
    utils.train(gcn_model, graph)
    model_acc = utils.test(gcn_model, graph, graph.val_mask)
    val_accuracies[num_layers][hidden_layer_size] = model_acc
  return val_accuracies

In [4]:
import pandas as pd

val_accuracies = get_gcn_accuracies(citeseer)
pd.DataFrame(val_accuracies, )

Unnamed: 0,1,2,3,4,5,6
16,0.651128,0.681203,0.518797,0.386466,0.317293,0.154887
32,0.718797,0.645113,0.502256,0.324812,0.345865,0.196992
