This repository contains the implementation of Knowledge Enhanced Graph Neural Networks KeGNN and the experiments.
We stack knowledge enhancement layers as proposed in the paper Knowledge Enhanced Neural Networks for Relational Domains [Daniele, Serafini]] on top of Graph Neural Networks. This is a work by the Tyrex Team. It is as an accepted paper at the KBCG Workshop at IJCAI'23.
Graph data is omnipresent and has a large variety of applications such as natural science, social networks or semantic web. Though rich in information, graphs are often noisy and incomplete. Therefore, graph completion tasks such as node classification or link prediction have gained attention. On the one hand, neural methods such as graph neural networks have proven to be robust tools for learning rich representations of noisy graphs. On the other hand, symbolic methods enable exact reasoning on graphs. We propose KeGNN, a neuro-symbolic framework for learning on graph data that combines both paradigms and allows for the integration of prior knowledge into a graph neural network model. In essence, KeGNN consists of a graph neural network as a base on which knowledge enhancement layers are stacked with the objective of refining predictions with respect to prior knowledge. We instantiate KeGNN in conjunction with two standard graph neural networks: Graph Convolutional Networks and Graph Attention Networks, and evaluate KeGNN on multiple benchmark datasets for node classification.
We apply KeGNN to the following datasets that are benchmarks for node classification. The datasets are publicly available at the dataset collection of PyTorch Geometric.
Name | Description | #nodes | #edges | #features | #Classes | Task |
---|---|---|---|---|---|---|
CiteSeer | from Planetoid, Citation Network | 3,327 | 9,104 | 3,703 | 6 | Node classification |
Cora | from Planetoid, Citation Network | 2,708 | 10,556 | 1,433 | 7 | Node Classification |
PubMed | from Planetoid, Citation Network | 19,717 | 88,648 | 500 | 3 | Node Classification |
Flickr | from GraphSaint [1], Image Network | 89,250 | 899.756 | 500 | 7 | Node Classification |
The works of Knowledge Enhanced Neural Networks can be cited as follows:
# Knowledge Enhanced Graph Neural Networks for Graph Completion
@misc{werner2023knowledge,
title={Knowledge Enhanced Graph Neural Networks for Graph Completion},
author={Luisa Werner and Nabil Layaïda and Pierre Genevès and Sarah Chlyah},
year={2023},
eprint={2303.15487},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
# Knowledge Enhanced Neural Networks for Relational Domains
@InProceedings{10.1007/978-3-031-27181-6_7,
author="Daniele, Alessandro
and Serafini, Luciano",
editor="Dovier, Agostino
and Montanari, Angelo
and Orlandini, Andrea",
title="Knowledge Enhanced Neural Networks for Relational Domains",
booktitle="AIxIA 2022 -- Advances in Artificial Intelligence",
year="2023",
publisher="Springer International Publishing",
address="Cham",
pages="91--109",
isbn="978-3-031-27181-6"
}
# Knowledge Enhanced Neural Networks
@InProceedings{10.1007/978-3-030-29908-8_43,
author="Daniele, Alessandro
and Serafini, Luciano",
editor="Nayak, Abhaya C.
and Sharma, Alok",
title="Knowledge Enhanced Neural Networks",
booktitle="PRICAI 2019: Trends in Artificial Intelligence",
year="2019",
publisher="Springer International Publishing",
address="Cham",
pages="542--554",
isbn="978-3-030-29908-8"
}
- In order to make sure that the right environment is used, the necessary Python packages and their versions are specified in
requirements.txt
. We use Python 3.9. To install them go in the project directory and create a conda environment with the following packages.
pip install -r requirements.txt
We use Weights and Biases (WandB) as experiment tracking tool. The experiments can be run WITHOUT or WITH the use of WandB.
- To run the experiments without WandB, run the following command.
cd Experiments
python train_and_evaluate.py conf.json
(By default, "wandb_use" : false
is set in re-implementation/conf.json
)
The results can be viewed and visualizes with a Jupyter notebook. The model and the dataset name have to be set manually in the first cells of the notebook. The notebook can be found in
cd Experiments/notebooks
inspect_results_.ipynb
- If you want to use weights and biases specify the following parameters in
Experiments/conf.json
.
"wandb_use" : True,
"wandb_label": "<your label>",
"wandb_project" : "<your project>",
"wandb_entity": "<your entity>"
Then use the following command to run the experiments:
cd Experiments
python run_experiments.py conf.json
The results can be viewed and visualizes with a Jupyter notebook.
Remark that the url to the weights and biases database need to be adapted according to your wandb project
and wandb entity
and the correct run id needs to be set.
Currently, we put this information for our wandb repository.
The notebook can be found in
cd Experiments/notebooks
inspect_results_wandb.ipynb
The settings and parameters of the run can be specified in a configuration file Experiments/conf.json
as follows.
- dataset: The dataset on which the experiments are conducted. Values ['CiteSeer', 'Cora', 'PubMed', 'Flickr'] - Default: 'CiteSeer'
- device: GPU number in case of available GPUS. Values: positive integers - Default: 0
- model: Model for the experiments. Values: ['KeGCN', 'KeGAT', 'KeMLP', 'GCN', 'GAT', 'MLP'] - Default: 'KeGCN'
- eval_steps. How often to evaluate and calculate validation and test accuracy (Each x-th epoch). Values: positive integers - Default: 1
- planetoid split: Split indices into train/valid/test for the Plantoid datasets. See here for details. Default: 'full'
- runs: Number of independent runs to conduct. Values: positive integers. Default: 50
- seed: Random seed. Values: positive integers. Default: 1234
- adam_beta1: Adam Optimizer Parameter Beta1. Default: 0.9
- adam_beta2: Adam Optimizer Parameter Beta2. Default: 0.99
- adam_eps: Adam Optimizer Parameter Epsilon. Default: 1e-07
- attention_heads: Number of attention heads for multi-head attention. Values: positive integers. Default: 8
- batch_size: Batch Size for Mini-Batch Training. Small Batches increase runtimes. Only used when full_batch is set to False. Positive Integers. Default: 512
- dropout: Dropout rate to avoid overfitting. Real numbers in [0,0.9] recommended. Default: 0.5
- edges_drop_rate: Random dropout of edges before training. Real numbers in [0.0, 0.9] recommended. Default: 0.0
- epochs: Number of training epochs. Positive integers. Default: 200
- es_enabled: Early stopping mechanism. Values: [true, false]. Default: true
- es_min_delta: Early stopping minimum delta between validation accuracy of previous steps and current validation accuracy. Small positivereal valued numbers. Default: 0.001.
- es_patience: number of epochs before early stopping can be activated. Positive integers. Default: 10
- full_batch: activation of full batch training: Values: [true, false]. Default: true
- hidden_channels: Number of neurons in hidden layers of base NN. Values: positive Integers. Default: 128
- lr: learning rate. Real Valued Numbers in [0.0001, 0.1]. Default: 0.01
- normalize_edges: Normalize edges with Degree Matrix. Values: [true, false]. Default: false
- num_layers: Number of hidden layers in base NN. Values: positive Integers. Default: 3
- binary_preactivation: Initialization of binary predicate groundings. Values: high positive real-valued numbers. Default: 500.0
- boost function: value: "GodelBoostConormApprox"
- clause weight: Initialization of the clause weight. Values: positive real-valued numbers. Default: 0.5
- min weight: minimum weight for clause weight clipping. Default: 0.0
- max weight: maximum weight for clause weight clipping. Default: 500.0
- num_kenn_layers: number of stacked kenn layers. Values: positive integers. Default: 3
- wandb_use: flag to use wandb or not. values: [true, false]. Default: false
- wandb_label: runs can be labelled. Put a label value here if you want to label your runs. Values: 'string'. Default: 'test'
- wandb_project: your wandb project
- wandb_entity: your wandb project