Skip to content

NoraAlt/Mawqif-Arabic-Stance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection

  • This repository contains the data and classifier used in the paper titled "Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection" accepted to appear at WANLP, EMNLP 2022 . Link to the paper: here

  • Mawqif is the first Arabic dataset that can be used for target-specific stance detection.

  • This is a multi-label dataset where each data point is annotated for stance, sentiment, and sarcasm, which will provide a benchmark for the three tasks.

  • We benchmark Mawqif dataset on the stance detection task and evaluate the performance of four BERT-based models. Our best model achieves a macro-F1 of 78.89%, which shows that there is ample room for improvement on this challenging task.

  • In addition to the annotated tweets, we also release the annotation guidelines, and the code used to build a standard pipeline under the PyTorch Lightning framework to fine-tune BERT-based models for stance detection.

  • Dataset on HuggingFace 🤗: https://huggingface.co/datasets/NoraAlt/Mawqif_Stance-Detection

Mawqif Statistics

  • This dataset consists of 4,121 tweets in multi-dialectal Arabic. Each tweet is annotated with a stance toward one of three targets: “COVID-19 vaccine,” “digital transformation,” and “women empowerment.” In addition, it is annotated with sentiment and sarcasm polarities.

  • The following figure illustrates the labels’ distribution across all targets, and the distribution per target.

dataStat-2

Interactive Visualization

To browse an interactive visualization of the Mawqif dataset, please click here

  • You can click on visualization components to filter the data by target and by class. For example, you can click on “women empowerment" and "against" to get the information of tweets that express against women empowerment.

Citation

If you feel our paper and resources are useful, please consider citing our work!

@inproceedings{alturayeif-etal-2022-mawqif,
    title = "Mawqif: A Multi-label {A}rabic Dataset for Target-specific Stance Detection",
    author = "Alturayeif, Nora Saleh  and
      Luqman, Hamzah Abdullah  and
      Ahmed, Moataz Aly Kamaleldin",
    booktitle = "Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.wanlp-1.16",
    pages = "174--184",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages