Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
results
README.md
build_wikidata_people.py
build_wikidata_people_complete.py
cardinalities.tsv

README.md

Wikidata experiments

You should download a Wikidata truthy NTriples dump. They are available at: https://dumps.wikimedia.org/wikidatawiki/entities/ and have names like wikidata-YYYYMMDD-truthy-BETA.nt.gz.

Rule mining with cardinalities

To generate the completed graph for people on Wikidata with its cardinalities you have to run:

python3 build_wikidata_people_complete wikidata-YYYYMMDD-truthy-BETA.nt.gz wikidata_people_triples_full.tsv wikidata_people_triples_cardinalities.tsv

wikidata_people_triples_full.tsv is going to store the completed graph and wikidata_people_cardinalities.tsv its cardinalities.

This executes the following rules in this order to complete the graph:

  • hasFather(x, y) <- hasChild(y, x), hasGender(y, male)
  • hasMother(x, y) <- hasChild(y, x), hasGender(y, female)
  • hasChild(x, y) <- hasFather(y, x)
  • hasChild(x, y) <- hasMother(y, x)
  • hasSibling(x,y) <- hasChild(z, x), hasChild(z, y), x =/= y
  • hasSibling(x,y) <- hasSibling(y,x)
  • hasSpouse(x,y) <- hasSpouse(y,x)

To generate the cardinalities we follow the rules on the complete graph:

  • num(hasBirthPlace, x) = 1 if type(x, human)
  • num(hasFather, x) = 1 if type(x, human)
  • num(hasMother, x) = 1 if type(x, human)
  • num(hasDeathPlace, x) = 1 if exists y: hasDeathDate(x, y) or hasDeathPlace(x, y)
  • num(p, x) = #y: p(x, y) if exists y: p(x, y) and p is in {hasChild, hasSibling, hasSpouse and hasStepParent}

Then you can run the experimentation with python3 patterns_using_cardinalities_people.py ../eval/wikidata/wikidata_people_triples_full.tsv ../eval/wikidata/wikidata_people_cardinalities.tsv output_file.tsv This should be done in the carl directory.

The results generated from the March 6th,2017 dump is provided in the results/rule_learning_wikidata_people_20170306.tsv file. The mines ruled with a global ratio of 0.5 are displayed at rule_learning_wikidata_people_20170306_rules_global_ratio_0.5.tsv

Cardinality mining

To filter and simplify triples related ot people you should run:

python3 build_wikidata_people wikidata-YYYYMMDD-truthy-BETA.nt.gz wikidata_people_triples.tsv

To create file with only cardinalities related to people:

grep -P 'P(22|25|26|40|3373|3448|19|20|27)\t' cardinalities.tsv > cardinalities_people.tsv

Then you can run the experimentation with:

python3 cardinalities_mining.py ../eval/wikidata/input_triples.tsv ../eval/wikidata/cardinalities_people.tsv output_file.tsv

This should be done in the carl directory.

The results generated from the March 6th,2017 dump is provided in the results/cardinality_learning_wikidata_people_20170306_with_schema.tsv file.

The ones generated from the same dump but without owl:FunctionalProperty are in the results/cardinality_learning_wikidata_people_20170306_with_schema.tsv file.