claimskg_generator

The data lifting module for ClaimsKG that creates the RDF/LOD dataset instantiation from the ClaimsKG model (illustrated hereafter).

Installation & Requirements

This program requires Python 3.x to run.

To install the dependencies you may use: pip3 install -r requirements.txt

Command-line usage

For usage information you may use

    python3 export.py -h

The options are the following:
- --input [file] Indicated the location of the zip file generated by the fake new extractor (mandatory)
- --output [file] Specifies the output file for the model (default: out.ttl)
- --format [format] Specifies the format of the output serialization. You may use any of the supported formats in the rdflib package (xml', 'n3', 'turtle', 'nt', 'pretty-xml', 'trix', 'trig' and 'nquads'; default: turtle)
- --model-uri The base URI of the model (by default http://data.gesis.org/claimskg/public/)
- --resolve Specifies whether to resolve the Wikipedia page identifiers for TagMe annotations to DBPedia URIs. If this option is activated, the resolution is performed through SPARQL queries to the official DBPedia endpoint, which requires you to have an active Internet connection. Additionally, you will need a running instance of redis-server as the results of the queries are cached to prevent unnecessary queries from being performed. If resolve is not supplied, all entities will have URIs of the form tagme://wikpediaPageID.
- --threshold [float_value] If --resolve is present, specifies the cutoff confidence threshold to include a TagMe annotations as a mention. The TagMe documentation recommends a value between 0.1 and 0.3 (default 0.3)
- --include-body If --include-body is supplied, the body of the claim review is included in the schema:ClaimReview instances through the schema:reviewBody property.
Claim Matching Evaluation

In the comtext of the claim matching approach, we have produced a annotated dataset for evaluation purposes. The dataset contains 318 matching claims categorized in several types of matches:
- E: Exact match
- E*: Same claim but different claimee
- ST: Same topic, meaning that two claims are about the same occurence or event
The first columnt contains the type of match, the second column, the URI of the first claim, the third column, the URI of the second claim.

The gold file can be downloaded here: https://drive.google.com/open?id=1evf67t_p0LqF5ZvNiL-geDCrEBlppZVF

The URIs correspond to that of the following dataset: http://andon.tchechmedjiev.eu/files/claimskg_20_12_2018.ttl.gz

Latest releases of core data from en.wikipedia.org (ttl file): https://databus.dbpedia.org/dbpedia/collections/latest-core https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2021.03.01/categories_lang=en_skos.ttl.bz2

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
claimskg		claimskg
experiments		experiments
.gitignore		.gitignore
README.md		README.md
configuration.yaml.sample		configuration.yaml.sample
export.py		export.py
exporter_help_text.txt		exporter_help_text.txt
model.png		model.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

claimskg

claimskg

experiments

experiments

.gitignore

.gitignore

README.md

README.md

configuration.yaml.sample

configuration.yaml.sample

export.py

export.py

exporter_help_text.txt

exporter_help_text.txt

model.png

model.png

requirements.txt

requirements.txt

Repository files navigation

claimskg_generator

Installation & Requirements

Command-line usage

Claim Matching Evaluation

About

Releases

Packages

Contributors 3

Languages

claimskg/claimskg_generator

Folders and files

Latest commit

History

Repository files navigation

claimskg_generator

Installation & Requirements

Command-line usage

Claim Matching Evaluation

About

Resources

Stars

Watchers

Forks

Languages