Skip to content

wikifactcheck-english/wikifactcheck-english

 
 

Repository files navigation

WikiFactCheck-English

This repository contains the data to accompany 'Automated Fact-Checking of Claims from Wikipedia'.

Contents as follows:

.
│
├── wikifactcheck-en_full0.jsonl
├── wikifactcheck-en_full1.jsonl
├── wikifactcheck-en_full2.jsonl
├── wikifactcheck-en_full3.jsonl
├── wikifactcheck-en_full4.jsonl
│
├── wikifactcheck-en_test.jsonl
└── wikifactcheck-en_train.jsonl

As explained in the paper, the annotated portion of the corpus is split into train and test sets. The entirety of the data (including annotated as well as non-annotated) is contained in the full sets, split into 5 for space constraints.

You may want to make use of the provided loading script to make use of the dataset in your code. Tip: add the repository directory to your PATH so that you can use the script in your project folder.

usage: loadwfc-en.py [-h] [-d] [-f]
                     [-r [{train,test,full} [{train,test,full} ...]]]
                     [-n NUMLINES] [-t {json,python}]

optional arguments:
  -h, --help            show this help message and exit
  -d, --download        download dataset
  -f, --force           force re-download?
  -r [{train,test,full} [{train,test,full} ...]], --read [{train,test,full} [{train,test,full} ...]]
                        read from particular datasets (default: all)
  -n NUMLINES, --numlines NUMLINES
                        numlines to read from each one
  -t {json,python}, --fmt {json,python}
                        output format for --read option

Citation:

@InProceedings{wikifactchkeng:2020:LREC,
  author    = {Sathe, Aalok  and  Ather, Salar  and  Le, Tuan Manh  and  Perry, Nathan  and  Park, Joonsuk},
  title     = {Automated Fact-Checking of Claims from Wikipedia},
  booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month          = {May},
  year           = {2020},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {6874--6882},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.849}
}

About

Data and download script to accompany LREC2020 paper "Automated Fact-Checking of Claims from Wikipedia"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%