The codebase accompanying the Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialectal Arabic Datasets
paper, accepted to ACL 2024.
conda create -n "ALDI_IAA" python=3.10
pip install -r requirements.txt
camel_data -i defaults
Dataset | Link | |
---|---|---|
1 | MPOLD | GitHub |
2 | YouTube Cyberbullying | OneDrive |
3 | DCD | Personal Site |
4 | ArSAS | Personal Site |
5 | ArSarcasm-v1 | Provided by the authors |
6 | iSarcasm | GitHub |
7 | DART | Dropbox |
8 | Mawqif | Provided by the authors |
9 | ASAD | Provided by the authors |
conda activate ALDI_IAA
# 1) MANUALLY Download the dataset files to `data/raw_data/`
# 2) Augment the dataset files with ALDi scores, and dialect labels
python prepare_datasets.py
# 3) Generate the Agreement plots
python compute_agreement_percentages.py