This repository contains the source code used for the implementation of the Bass approach presented in the under review paper "Neural Knowledge Base Repairs" by Thomas Pellissier Tanon and Fabian Suchanek.
Bass aims at doing automated knowledge base curation. It makes use of the advances in neural network research to improve the automated correction of constraint violations. It is a deep learning refinement of "Learning how to correct a knowledge base from the edit history", and similarly uses the edits that solved some violations in the past to infer how to solve similar violations in the present. Bass makes use of the graph content, literal embeddings, and features extracted from Web pages to improve its performance. An experimental evaluation on Wikidata shows significant improvements over baselines. The evaluation code is provided in this repository.
Only a sample of the evaluation dataset is provided in this git repository. The full dataset is available on FigShare.
The Bass implementation is provided as a Jupyter Notebook named corrections_learning.ipynb. It requires Python 3.6+, Tensorflow 2.0+ and the tokenizer Python library (pip3 install tokenizers).
Because only a sample of the dataset is provided in this repository because of git space constraints, performances are slightly different from the paper evaluation.
We also increased the maximal number of epochs to 20 to allow more training on this smaller dataset.
To reproduce the evaluation on the complete dataset please replace the provided sample dataset by the one shared on FigShare.
The dataset is composed of:
- The file
constraints.tsvcontains the definition of Wikidata constraints. Each constraint is a row of this tab-separated file. - The directory
constraint-corrections. Each file is compressed with gzip. The files' content is tab-separated and is using the following columns:- constraint id
- revision that fixed the constraint violation
- first violation triple subject
- first violation triple predicate
- first violation triple object
- second violation triple subject (blank if no second violation triple)
- second violation triple predicate (blank if no second violation triple)
- second violation triple object (blank if no second violation triple)
- separator (not useful)
- subject of the first triple in the correction
- predicate of the first triple in the correction
- object of the first triple in the correction
- is the first triple in the correction an addition or a deletion (
<http://wikiba.se/history/ontology#deletion>for a deletion and<http://wikiba.se/history/ontology#addition>for an addition) - subject of the second triple in the correction (might not exist)
- predicate of the second triple in the correction (might not exist)
- object of the second triple in the correction (might not exist)
- is the second triple in the correction an addition or a deletion (
<http://wikiba.se/history/ontology#deletion>for a deletion and<http://wikiba.se/history/ontology#addition>for an addition) (might not exist) - Description of the subject of the first violation triple encoded in JSON
- Description of the object of the first violation triple encoded in JSON (might be empty for literals)
- Description of the term of the second triple that has not already be described by the two previous descriptions. (might be empty for literals or if there is no second triple)