COVID-19 misinformation dataset is part of a project that identifies and discovers real-time misinformation on Arabic Twitter using machine learning approaches.
COVID-19 misinformation repository contains the dataset of already annotated tweets ids over a span from March 2020 to April 2020. In total, the COVID19 misinformation dataset consists of 8,786 Arabic tweets. The annotated misinformation dataset covers significant, misleading, and inaccurate content that was widely shared among Arab Twitter users during the early months of COVID-19 ( March and April). The number of tweets containing misinformation in April is 709 tweets and 602 tweets in March. The table below shows general statistics about the dataset.
Misinformation | Not Misinformation | Total |
---|---|---|
1,311 | 7,475 | 8,786 |
Two Arabic native speaker volunteers labeled the tweets. Before labeling the tweets, the annotators reviewed a list of collected misinformation reported on both the World Health Organization (WHO) website and the Ministry of Health in Saudi Arabia website. The tweets which contains misinformation were labeled as "1" and others were labeled as "0".
To hydrate the tweets-ID from our COVID19-Misinformation-dataset you can use our Hydrate_TweetIDs_Arabic_COVID19 notebook.
- The notebook runs on google collab
- You are required to have a Twitter developer account
For those who prefer to use a Graphical User Interface (GUI) , We suggest using Hydrator.
Using Hydrator
To use Hydrator follow the instructions in the Hydrator GitHub repository.
For Arabic guideline on both Hydrator and our Twarc notebook check our دليل استعادة قاعدة بيانات التغريدات.
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0).By using this dataset , you agree to the terms of the LICENSE, and to all Twitter’s Terms of Service, and cite our paper:https://arxiv.org/abs/2101.05626
If you have any suggestions or questions, please reach out to saraa.alqurashi on Gmail , s43680523(AT)st(dot)uqu(dot)edu(dot)sa or eaanazi(AT)uqu(dot)edu(dot)sa.