A code implementation of this paper "Document-level Claim Extraction and Decontextualisation for Fact-Checking" (ACL main conference 2024).
Download raw datas from AVeriTeC.
All models we rely on are pre-trained models (e.g., BertSum) and approaches that do not require training (e.g., BM25).
- Step 1. extracts URLs (the URL linking to the original
web article of the claim) available for claim extraction and corresponding text data from AVeriTeC.
python 1_extract_texts_from_url.py
- Step 2. generates high-quality context for decontextualisation.
- Sentence Ranking: download from here.
- Text Entailment: download from here
- Candidate Answer Extraction: download spacy (python -m spacy download en_core_web_lg) to extract ambiguous information units.
- Question Generation: download from here.
- Question Answering: download from here.
- QA-to-Context: follow this repository to download the QA2D model from here.
- High-quality Context Generation: download from here.
python 2_context_generation.py
- Step 3. decontextualises candidate central sentences with generated qa pairs. (Download the decontextualsation model from here):)
python 3_decontextualisation.py
If you use this code useful, please star our repo or consider citing:
@misc{deng2024documentlevel,
title={Document-level Claim Extraction and Decontextualisation for Fact-Checking},
author={Zhenyun Deng and Michael Schlichtkrul and Andreas Vlachos},
year={2024},
eprint={2406.03239},
archivePrefix={arXiv},
primaryClass={cs.CL}
}