ALDi-and-IAA

The codebase accompanying the Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialectal Arabic Datasets paper, accepted to ACL 2024.

Environment and Dependencies

conda create -n "ALDI_IAA" python=3.10
pip install -r requirements.txt

camel_data -i defaults

Datasets

	Dataset	Link
1	MPOLD	GitHub
2	YouTube Cyberbullying	OneDrive
3	DCD	Personal Site
4	ArSAS	Personal Site
5	ArSarcasm-v1	Provided by the authors
6	iSarcasm	GitHub
7	DART	Dropbox
8	Mawqif	Provided by the authors
9	ASAD	Provided by the authors

Generating the ALDi-IAA Plots

conda activate ALDI_IAA

# 1) MANUALLY Download the dataset files to `data/raw_data/`

# 2) Augment the dataset files with ALDi scores, and dialect labels
python prepare_datasets.py

# 3) Generate the Agreement plots
python compute_agreement_percentages.py

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitignore		.gitignore
README.md		README.md
compute_agreement_percentages.py		compute_agreement_percentages.py
datasets.py		datasets.py
download_datasets.sh		download_datasets.sh
prepare_datasets.py		prepare_datasets.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALDi-and-IAA

Environment and Dependencies

Datasets

Generating the ALDi-IAA Plots

About

Releases

Packages

Languages

AMR-KELEG/ALDi-and-IAA

Folders and files

Latest commit

History

Repository files navigation

ALDi-and-IAA

Environment and Dependencies

Datasets

Generating the ALDi-IAA Plots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages