This repository contains the dataset with 2,484 tweets annotated for the research paper Vaccine Discourse on Twitter During the COVID-19 Pandemic. The dataset contains tweets related to the COVID-19 vaccines created between March 1, 2020 to July 31, 2021. The first column contains the unique ID of each tweet, while the second column contains a label representing each tweets stance towards the COVID-19 vaccines.
Tweets can have one of three labels:
- P: Expresses a positive stance towards the COVID-19 vaccines.
- N: Expresses a negative stance towards the COVID-19 vaccines.
- U: Expresses no stance or stance is unclear.
The full codebook as well as a more detailed explanation of the annotation process can be found in the paper Vaccine Discourse on Twitter During the COVID-19 Pandemic.
As per Twitter's rules tweets has been dehydrated and only referenced using their unique ID-numbers. To receive the original text content as well as other attributes a tool like Twarc2 can be used.
Feel free to use the dataset for your own research, but please remember to cite our work:
Lindelöf G, Aledavood T, Keller B Dynamics of the Negative Discourse Toward COVID-19 Vaccines: Topic Modeling Study and an Annotated Data Set of Twitter Posts
J Med Internet Res 2023;25:e41319
doi: 10.2196/41319
PMID: 36877804
PMCID: 10134018