Benchmark for Temporal Fact Validation
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
core
files
test
train
.gitignore
LICENSE
README.md
dbtrends.log
pom.xml

README.md

FactBench

FactBench is a multilingual benchmark for the evaluation of fact validation algorithms. All facts in FactBench are scoped with a timespan in which they were true, enableing the validation of temporal relation extraction algorithms. FactBench currently supports english, german and french. You can get the current release here.

FactBench is a set of RDF models. Each model contains a singular fact and the timespan in which it holds true. FactBench consists of a train set, a test set and a set of auxilliary files which were needed to create the benchmark.

##Which relations does FactBench contain? FactBench provides data for 10 relations. The data was automatically extracted from Wikipedia (DBpedia respectivly) and Freebase.

#propertydescriptiontypeKB
1awardpersons who received a nobel prizetimepointFreebase
2birthbirth place and date of a persontimepointDBPedia
3deathdeath place and date of a persontimepointDBPedia
4foundationPlaceplace and date of a company's foundationtimepointFreebase
5leaderpresidents of countriestimespanDBPedia
6nbateamteam associations of NBA playerstimespanDBPedia
7publicationDateauthor of a book and it's publication datetimepointFreebase
8spousemarriage of two persons timespanFreebase
9starringactors who starred in filmstimepointDBPedia
10subsidiarycompanies and their subsidiariestimepointFreebase

The granularity of FactBench's time information is year. This means that a timespan is an interval of two years, e.g. 2008 - 2012. A timepoint is considered as a timespan with the same start and end year, e.g. 2008 - 2008.

##How is FactBench structured? FactBench is divided in a train and in a test set (of facts). Typically you should use the train set to fit your algorithm to the given problem and evaluate you configuration on the test set. It is highly recommended not to debug/improve your algorithm on the test data.

###Positive Examples In general, we use facts contained in DBpedia and Freebase as positive examples. For each of the properties we consider, we generated positive examples by issuing a SPARQL or MQL query and selecting the top 150 results. Note that the results in Freebase (MQL) are ordered by an internal relevance score. The results for the DBpedia SPARQL queries were ordered by the number of inbound-links of a given resources' wikipedia page. We collected a total of 1500 correct statements (750 in test and train set). Each relation has 150 correct facts distributed equally in the test and train set.

###Negative Examples The generation of negative examples is more involved than the generation of positive examples. In order to effectively train any fact validation algorithm, we considered it essential that many of the negative examples are similar to true statements. In particular, most statements should be meaningful triples. For this reason, we derive negative examples from positive examples by modifying them while still following domain and range restrictions. Assume the input triple and the corresponding timespan (s, p, o)(from,to) in a knowledge base K is given and let S be the set of all subjects, O the set of all objects of the given relation p and P the set of all properties. We used the following methods to generate the negative example sets dubbed subject, object, subject-object, property, random, 20%mix and date (in that order):

  • A triple (s',p,o) is generated where s' is a random element of S, the triple (s',p,o) is not contained in K.
  • A triple (s,p,o') is generated analogously by selecting a random element of O.
  • A triple (s',p,o') is generated analogously by selecting a random s' from S and a random o' from O.
  • A triple (s,p',o) is generated in which p' is randomly selected from the list of all properties and (s,p',o) is not contained in K and p = p' is allowed.
  • A triple (s',p',o') is generated where s' and o' are randomly selected resources, p' is a randomly selected property from the list of all properties and (s',p',o') is not contained in K.
  • A triple (s,p,o)(from',to') is generated.
    • Timepoint: from' is a random year drawn from a gaussian distribution (µ = from and σ² = 5), from' = to', from' ≠ from and 0 < from' ≤ 2013.
    • Timespan: from' is a random year drawn from a gaussian distribution (µ = from and σ² = 2), the duration d' is generated by drawing a random number from a gaussion distribution (μ = to - from and σ² = 5), to' = from' + d', 0 < d' ≤ 2013, from ≠ from', to ≠ to', from ≤ 2013 and to ≤ 2013
  • 1/6 of each of the above created negative training sets were randomly selected to create a heterogenous test set. Note that this set contains 780 negative examples.

###Folder Structure

testcontains 7 test sets for testing
|___correctcontains 750 (10*75) true facts
||_____award
||_____...other relations ...
|___wrong7 different sets with 750 wrong facts each
|_____domainthe domain of the fact was changed
||_________award75 nobel peace prize winners
||_________...other relations
|_____range...
||_________award
||_________...
|_____domainrange...
||_________award
||_________...
|_____property...
||_________award
||_________...
|_____random...
||_________award
||_________...
|_____date...
||_________award
||_________...
|_____mixcontains 13 facts from all test sets of a certain relations, 6 x 13 x 10 = 780 wrong facts
|_________domain
||_________award
||_________...
|_________range
|_________domainrange
|_________property
|_________random
|_________date
traincontains the same folder structure as test
filesfiles neccessary to create *FactBench*
|_____queriesDBpedia and Freebase queries
|_____rdfsurface forms and in/outbound links for wikipedia concepts
|_____freebaseresults of freebase queries (json)