Document-level Claim Extraction and Decontextualisation for Fact-Checking

A code implementation of this paper "Document-level Claim Extraction and Decontextualisation for Fact-Checking" (ACL main conference 2024).

Data

Download raw datas from AVeriTeC.

Components

All models we rely on are pre-trained models (e.g., BertSum) and approaches that do not require training (e.g., BM25).

Step 1. extracts URLs (the URL linking to the original web article of the claim) available for claim extraction and corresponding text data from AVeriTeC.
```
python 1_extract_texts_from_url.py
```
Step 2. generates high-quality context for decontextualisation.
- Sentence Ranking: download from here.
- Text Entailment: download from here
- Candidate Answer Extraction: download spacy (python -m spacy download en_core_web_lg) to extract ambiguous information units.
- Question Generation: download from here.
- Question Answering: download from here.
- QA-to-Context: follow this repository to download the QA2D model from here.
- High-quality Context Generation: download from here.
```
python 2_context_generation.py 
```
Step 3. decontextualises candidate central sentences with generated qa pairs. (Download the decontextualsation model from here):)
```
python 3_decontextualisation.py 
```

Citation

If you use this code useful, please star our repo or consider citing:

@misc{deng2024documentlevel,
      title={Document-level Claim Extraction and Decontextualisation for Fact-Checking}, 
      author={Zhenyun Deng and Michael Schlichtkrul and Andreas Vlachos},
      year={2024},
      eprint={2406.03239},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
all_data/averitec_data		all_data/averitec_data
presumm		presumm
1_data_extraction.py		1_data_extraction.py
2_context_generation.py		2_context_generation.py
3_decontextualisation.py		3_decontextualisation.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document-level Claim Extraction and Decontextualisation for Fact-Checking

Data

Components

Citation

About

Releases

Packages

Languages

Tswings/AVeriTeC-DCE

Folders and files

Latest commit

History

Repository files navigation

Document-level Claim Extraction and Decontextualisation for Fact-Checking

Data

Components

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages