Skip to content


Repository files navigation

Detection of Relation Assertion Errors in Knowledge Graphs

Implementation of the PaTyBRED error detection method from the paper "Detection of Relation Assertion Errors in Knowledge Graphs" in the proceedings of K-CAP 2017.

How to use

Firstly the dataset needs to be converted into NPZ format supported by our system. This can be done with which takes NT files as input. If the dataset is in another format RDF2RDF can be used to convert it into the NT format.

python dataset.nt

Once the dataset was correctly loaded into the correct format, it is possible to compute the triple score and rank all facts in the data with rank_facts. The KG model is selected with -m, the path of the ranked data output with -o and the learned model can be saved to the path defined by -sp.

python dataset.npz -m patybred -o ranked_dataset.pkl -sp learned-model.pkl 

An evaluation can be performed by adding noise (wrong triples) to a dataset and subsequently detecting it with the chosen method. In order to add noise can be used. The parameter -pe indicates the ratio of noise to be generated (0.01 means 1% of the original number of triples). The parameter -ek is the kind of noise generated by corrupting correct triples by replacing the subject or object (1 for substituting original entity with a random entity of any type and 2 with a random entity of same type as the original). A NPZ file with the original data plus the generated errors will be created as output.

This file can then be used by, which will learn a KG model on the noisy dataset, rank the facts, and evaluate how the erroneous facts are ranked. Evaluation results are shown with various performance measures.

python dataset.npz -pe 0.01 -ek 1
python dataset-ek1.npz -m patybred -o ranked_dataset.pkl -sp learned-model.pkl 

Generating SHACL Constraints

Implementation of the generation of SHACL-SPARQL relation constraints from the paper "Automatic Detection of Relation Assertion Errors and Induction of Relation Constraints" submitted to the Semantic Web Journal.

In order to generate the SHACL constraints it is necessary to learn a PaTyBRED model with decision trees as local classifiers (-m patybred -clf dt) when learning the model. When generating the constraints there two mandatory parameters: the first is the path to the learned model and the second the path to the original KG dataset, which contains the relation and type names.

python learned-model.pkl dataset.npz -c 0.99 -ms 10

The parameter -c specifies the minimum confidence and -ms the minimum support. These parameters are used when pruning the learned decision tree. In order to validate your dataset against the set of learned constraints you can use the TopBraid implementation of SHACL based on Jena


The datasets used in the paper's automatic evaluation (containing generated errorenous triples) can be downloaded here:

The datasets used in the paper's manual evaluation can be downloaded here:


Knowledge Graph Error Detection






No releases published