In [1]:
import logging

from ppnp.pytorch import PPNP
from ppnp.pytorch.training import train_model
from ppnp.pytorch.earlystopping import stopping_args
from ppnp.pytorch.propagation import PPRExact, PPRPowerIteration
from ppnp.data.io import load_dataset

In [2]:
logging.basicConfig(
        format='%(asctime)s: %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S',
        level=logging.INFO)

# Load dataset

First we need to load the dataset we want to train on. The datasets used are in the `SparseGraph` format. This is just a class providing the adjacency, attribute and label matrices in a dense (`np.ndarray`) or sparse (`scipy.sparse.csr_matrix`) matrix format and some (in principle unnecessary) convenience functions. If you want to use external datasets, you can e.g. use the `networkx_to_sparsegraph` method in `ppnp.data.io` for converting NetworkX graphs to our SparseGraph format.

The four datasets from the paper (Cora-ML, Citeseer, PubMed and MS Academic) can be found in the directory `data`.

For this example we choose the Cora-ML graph.

In [3]:
graph_name = 'citeseer'
graph = load_dataset(graph_name)
graph.standardize(select_lcc=True)



<Undirected, unweighted and connected SparseGraph with 7336 edges (no self-loops). Data: adj_matrix (2110x2110), attr_matrix (2110x3703), labels (2110), node_names (2110), class_names (6)>

# Set up propagation

Next we need to set up the proper propagation scheme. In the paper we've introduced the exact PPR propagation used in PPNP and the PPR power iteration propagation used in APPNP.

Here we use the hyperparameters from the paper. Note that we should use a different `alpha = 0.2` for MS Academic.

In [4]:
prop_ppnp = PPRExact(graph.adj_matrix, alpha=0.1)
prop_appnp = PPRPowerIteration(graph.adj_matrix, alpha=0.1, niter=10)

# Choose model hyperparameters

Now we choose the hyperparameters. These are the ones used in the paper for all datasets.

Note that we choose the propagation for APPNP.

In [5]:
model_args = {
    'hiddenunits': [64],
    'drop_prob': 0.5,
    'propagation': prop_appnp}

# Train model

Now we can train the model.

In [6]:
idx_split_args = {'ntrain_per_class': 20, 'nstopping': 500, 'nknown': 1500, 'seed': 2413340114}
reg_lambda = 5e-3
learning_rate = 0.01

test = True
device = 'cuda'
print_interval = 20

In [7]:
for _ in range(5):
    print(train_model(
            PPNP, graph, model_args, learning_rate, reg_lambda,
            idx_split_args, stopping_args, test, None))

{'stopping_acc': 0.744, 'valtest_acc': 0.760655737704918, 'test': True}
{'stopping_acc': 0.738, 'valtest_acc': 0.7262295081967213, 'test': True}
{'stopping_acc': 0.75, 'valtest_acc': 0.740983606557377, 'test': True}
{'stopping_acc': 0.742, 'valtest_acc': 0.740983606557377, 'test': True}
{'stopping_acc': 0.738, 'valtest_acc': 0.7377049180327869, 'test': True}
