Methods for unsupervised representation learning on graphs can be described in terms of modules:
- Graph encoders
- Representations
- Scoring functions
- Loss functions
- Sampling strategies
By identifying this we can reproduce existing methods:
Variational Graph Autoencoders (Kipf and Welling, 2016):
encoder = GCNEncoder(dataset.num_features, hidden_dims=[256, 128])
representation = GaussianVariational()
loss = bceloss
sampling = FirstNeighborSampling
Graph Autoencoders (Kipf and Welling, 2016):
encoder = GCNEncoder(dataset.num_features, hidden_dims=[256, 128])
representation = EuclideanInnerProduct()
loss = bceloss
sampling = FirstNeighborSampling
Deep Graph Infomax (Veličković et al., 2018):
encoder = GCNEncoder(dataset.num_features, hidden_dims=[256, 128])
representation = EuclideanBilinear()
loss = bceloss
sampling = GraphCorruptionSampling
Graph2Gauss (Bojchevski and Günnemann, 2017):
encoder = MLPEncoder(dataset.num_features, hidden_dims=[256, 128])
representation = Gaussian()
loss = square_exponential
sampling = RankedSampling
We can also use this framework to create new methods. For example, we can simplify Graph2Gauss with an Euclidean distance:
encoder = MLPEncoder(dataset.num_features, hidden_dims=[256, 128])
representation = EuclideanDistance()
loss = square_exponential
sampling = RankedSampling
Under this framework, all these methods can be trained and evaluated with the same procedure:
method = EmbeddingMethod(encoder, representation, loss, sampling)
embeddings, results = train(dataset, method)
Create a conda environment with all the requirements (edit environment.yml
if you want to change the name of the environment):
conda env create -f environment.yml
Activate the environment
source activate graphlearn
We use Sacred to run and log all the experiments. To list the configuration variables and their default values, run
python train.py print_config
Two commands are available: link_pred_experiments
and node_class_experiments
.
The default settings train our best method (EB-GAE) on the link prediction task with the Cora dataset:
python train.py link_pred_experiments
Other methods can be evaluated as well:
GAE
python train.py link_pred_experiments \
with dataset_str='cora' \
encoder_str='gcn' \
repr_str='euclidean_inner' \
loss_str='bce_loss' \
sampling_str='first_neighbors'
DGI
python train.py link_pred_experiments \
with dataset_str='cora' \
encoder_str='gcn' \
repr_str='euclidean_infomax' \
loss_str='bce_loss' \
sampling_str='graph_corruption'
Graph2Gauss
python train.py link_pred_experiments \
with dataset_str='cora' \
encoder_str='mlp' \
repr_str='gaussian' \
loss_str='square_exponential_loss' \
sampling_str='ranked'