This repository contains the materials that allow reproducing the work introduced in the paper "EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles". The EUvsDisinfo dataset contains around 18K articles labelled as either containing misinformation or not. The misinformation articles are sourced from pro-Kremilin outlets, while non-misinformation articles are sourced from credible / less biased outlets. The dataset is collected according to the URLs cited within the debunks made by the EUvsDisinfo organisation in their website.
Use this repository to collect the EuvsDisinfo dataset described in our paper TBA.
conda create -n euvsdisinfo python=3.11.5
conda activate euvsdisinfo
pip install -r requirements.txt
- Download the base data file in Zenodo.
- Create a folder named
data
in the root directory. - Place the base data file inside the
data
folder. - Run
python3 scripts/collect/collect.py
. - When finished, the script should save a file named
euvsdisinfo.csv
inside thedata
folder.
- Data analysis: open and run the eda.ipynb jupyter notebook.
- Classification:
- Run the python script for the desired scenario inside baselines/.
- After finished, the script will save the results in a file named
results_{scenario}.csv
in the root folder.
Please refer to this file.
The EUvsDisinfo dataset is licensed under a Creative Commons BY-SA 4.0 license. The code available for reproducing experiments is licensed under an Apache-2.0 license that can be found in the file LICENSE.txt.
Dataset: https://zenodo.org/records/10514307
Software: https://zenodo.org/records/10492913
Paper: TBA