Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
output
src
test
.gitignore
README.md

README.md

KGist

Caleb Belth, Xinyi Zheng, Jilles Vreeken, and Danai Koutra. What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. ACM The Web Conference (WWW), April 2020. [pdf]

Setup

  1. git clone git@github.com:GemsLab/KGist.git
  2. cd data/
  3. unzip nell.zip
  4. unzip dbpedia.zip
  5. cd ../src/
  6. cd test/
  7. python tester.py

Data

Nell and DBpedia are zipped in the data/ directory. Yago is too big to distribute via Github.

{KG_name}.txt format: space separated, one triple per line.

s1 p1 o1
s2 p2 o2
...

{KG_name}_labels.txt format: space separated, one entity per line followed by a variable number of labels, also space separated.

e1 l1 l2 ...
e2 l1 l2 l3 ...
...

Example usage (from src/ dir)

Command Line

python main.py --graph nell

Interface

# load graph
graph = Graph('nell', idify=True)
# create a Searcher object to search for a model (set of rules)
searcher = Searcher(graph)
# build initial model
model = searcher.build_model()
model.print_stats()
# perform rule merging refinement
model = model.merge_rules()
model.print_stats()
# perform rule nesting refinement
model = model.nest_rules()
model.print_stats()

Arguments

--graph {KG_name} Expects {KG_name}.txt and {KG_name}_labels.txt to be in data/ directory in format as described above for NELL and DBpedia.

--rule_merging / -Rm True/False (Optional; Default = False) Use rule merging refinement (Section 4.2.2)

--rule_nesting / -Rn True/False (Optional; Default = False) Use rule nesting refinement (Section 4.2.2)

--idify / -i True/False (Optional; Default = True) Convert entities and predicates to integer ids internally for faster processing

--verbosity / -v [0, infinity) (Optional; Default = 1,000,000) How frequently to log progress (use integers)

--output_path / -o (Optional; Default = 'output/') What directory to write the output to (log will still be printed to stdout)

Output

  • output/{KG_name}_model.pickle saves a Model object.
  • output/{KG_name}_model.rules saves the rules, which are recursively defined, in parenthetical form.

Coming Soon

  • Documentation on loading models.
  • More extensive examples.

Comments or Questions

Contact Caleb Belth with comments or questions: cbelth@umich.edu

You can’t perform that action at this time.