Skip to content

COVID-19 misinformation repository contains the dataset of already annotated tweets ids.

Notifications You must be signed in to change notification settings

SarahAlqurashi/COVID19-Misinformation-dataset-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

COVID19-Misinformation-Dataset

COVID-19 misinformation dataset is part of a project that identifies and discovers real-time misinformation on Arabic Twitter using machine learning approaches.

Dataset Overview

COVID-19 misinformation repository contains the dataset of already annotated tweets ids over a span from March 2020 to April 2020. In total, the COVID19 misinformation dataset consists of 8,786 Arabic tweets. The annotated misinformation dataset covers significant, misleading, and inaccurate content that was widely shared among Arab Twitter users during the early months of COVID-19 ( March and April). The number of tweets containing misinformation in April is 709 tweets and 602 tweets in March. The table below shows general statistics about the dataset.

Misinformation Not Misinformation Total
1,311 7,475 8,786

Dataset annotated

Two Arabic native speaker volunteers labeled the tweets. Before labeling the tweets, the annotators reviewed a list of collected misinformation reported on both the World Health Organization (WHO) website and the Ministry of Health in Saudi Arabia website. The tweets which contains misinformation were labeled as "1" and others were labeled as "0".

Guideline to Hydrate

Using TWARC Notebook

To hydrate the tweets-ID from our COVID19-Misinformation-dataset you can use our Hydrate_TweetIDs_Arabic_COVID19 notebook.

  • The notebook runs on google collab
  • You are required to have a Twitter developer account

For those who prefer to use a Graphical User Interface (GUI) , We suggest using Hydrator.

Using Hydrator

To use Hydrator follow the instructions in the Hydrator GitHub repository.

For Arabic guideline on both Hydrator and our Twarc notebook check our دليل استعادة قاعدة بيانات التغريدات.

Contributors

Licensing

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0).By using this dataset , you agree to the terms of the LICENSE, and to all Twitter’s Terms of Service, and cite our paper:https://arxiv.org/abs/2101.05626

Contact

If you have any suggestions or questions, please reach out to saraa.alqurashi on Gmail , s43680523(AT)st(dot)uqu(dot)edu(dot)sa or eaanazi(AT)uqu(dot)edu(dot)sa.

About

COVID-19 misinformation repository contains the dataset of already annotated tweets ids.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published