Skip to content

Madhour/CovaxAnalytica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Data and Sentiment Analysis in Vaccine Discourse on Twitter

Project Organization

├── data               
│   ├── external       <- exeternal data
│   ├── interim        <- modified dataset
│   ├── processed      <- final dataset used for analysis
│   └── raw            <- original dataset
│
├── docs               <- presentation, documents used for reports etc.
│
├── models             <- Trained Doc2Vec model, TF-IDF Vectors
│
├── notebooks          <- Jupyter notebooks (Creators initials and enumerated)
│
├── reports            
│   └── figures        <- Interactive HTML figures from the analysis
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment
│
├── src                
│   ├── models         <- Scripts to train models
        └── train_model.py

Project structure is an adaption of Cookiecutter data science template.

Datasets:

Unfortunately the Twitter Guidelines do not allow the upload of tweets. Tweet IDs can be provided. To build the dataset, follow the steps here to hydrate the IDs.

How to analyze vaccine tweets:

  1. Download the datasets above and place them in /data/raw
  2. Hydrate the tweet IDs in /data/raw/tweet_ids.csv/ and store the resulting jsonl file as "vaccine_tweets_hydrated.jsonl" in /data/raw/
  3. Run Notebooks 2 - 6 in /notebooks/
  • Note: you may have to install requirements (pip3 install requirements.txt)

How to analyze overall COVID-19 tweets:

  1. Hydrate Corona_Combined_Nov2020-June2021.csv and store as "Hydrated_Tweets.jsonl" in /data/raw
  2. Run Notebook 1 and 7 - 11 in /notebooks/
  • Note: you may have to install requirements (pip3 install requirements.txt)

Used Technologies

NLP Pipeline

  • NLP Pipeline: Word2Vec, Doc2Vec, TF-IDF, K-Means
  • SavGol-Filter (value smoothing)
  • Plotly (interactive Plots)

Report

Read the report here. The interactive Plots are stored in /reports/figures/.

About

Data and Sentiment Analysis in Vaccine Discourse on Twitter

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published