Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

KGist: Knowledge Graph Summarization for Anomaly Detection & Completion

Caleb Belth, Xinyi Zheng, Jilles Vreeken, and Danai Koutra. What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. ACM The Web Conference (WWW), April 2020. [Link to the paper]

If used, please cite:

  title={What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization},
  author={Belth, Caleb and Zheng, Xinyi and Vreeken, Jilles and Koutra, Danai},
  booktitle={Proceedings of The Web Conference 2020},



  1. git clone
  2. cd data/
  3. unzip
  4. unzip
  5. cd ../src/
  6. cd test/
  7. python


  • Python 3
  • numpy
  • scipy
  • networkx


Nell and DBpedia are zipped in the data/ directory. Yago is too big to distribute via Github.

{KG_name}.txt format: space separated, one triple per line.

s1 p1 o1
s2 p2 o2

{KG_name}_labels.txt format: space separated, one entity per line followed by a variable number of labels, also space separated.

e1 l1 l2 ...
e2 l1 l2 l3 ...

Example usage (from src/ dir)

Command Line

python --graph nell


from graph import Graph
from searcher import Searcher
from model import Model

# load graph
graph = Graph('nell', idify=True)
# create a Searcher object to search for a model (set of rules)
searcher = Searcher(graph)
# build initial model
model = searcher.build_model()
# perform rule merging refinement
model = model.merge_rules()
# perform rule nesting refinement
model = model.nest_rules()

To compute anomaly scores for triples as in Section 4.3:

from anomaly_detector import AnomalyDetector

# construct an anomaly detector with the KGist model
anomaly_detector = AnomalyDetector(model)
# an edge/triple to score
edge = ('concept:company:limited_brands', 'concept:companyceo', 'concept:ceo:leslie_wexner')
>>> 26.5164

Larger numbers mean more anomalous. Note that in our experiments in Section 5.2, we used KGist+m, which would be the model without running model.nest_rules().


--graph {KG_name} Expects {KG_name}.txt and {KG_name}_labels.txt to be in data/ directory in format as described above for NELL and DBpedia.

--rule_merging / -Rm True/False (Optional; Default = False) Use rule merging refinement (Section 4.2.2)

--rule_nesting / -Rn True/False (Optional; Default = False) Use rule nesting refinement (Section 4.2.2)

--idify / -i True/False (Optional; Default = True) Convert entities and predicates to integer ids internally for faster processing

--verbosity / -v [0, infinity) (Optional; Default = 1,000,000) How frequently to log progress (use integers)

--output_path / -o (Optional; Default = 'output/') What directory to write the output to (log will still be printed to stdout)


  • output/{KG_name}_model.pickle saves a Model object.
  • output/{KG_name}_model.rules saves the rules, which are recursively defined, in parenthetical form.

Frequently Asked Questions (FAQ)

I want to run KGist on my own dataset. How did you construct the labels file?

We constructed the labels file by moving the rdf:type triples to the labels file. Thus, if, for example, there are triples (LaRose, rdf:type, book) and (LaRose, rdf:type, novel) in the KG, then LaRose book novel would be a row in the labels file.

Comments or Questions

Contact Caleb Belth with comments or questions: