GitHub - Madhour/CovaxAnalytica: Data and Sentiment Analysis in Vaccine Discourse on Twitter

Data and Sentiment Analysis in Vaccine Discourse on Twitter

Project Organization

├── data               
│   ├── external       <- exeternal data
│   ├── interim        <- modified dataset
│   ├── processed      <- final dataset used for analysis
│   └── raw            <- original dataset
│
├── docs               <- presentation, documents used for reports etc.
│
├── models             <- Trained Doc2Vec model, TF-IDF Vectors
│
├── notebooks          <- Jupyter notebooks (Creators initials and enumerated)
│
├── reports            
│   └── figures        <- Interactive HTML figures from the analysis
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment
│
├── src                
│   ├── models         <- Scripts to train models
        └── train_model.py

Project structure is an adaption of Cookiecutter data science template.

Datasets:

Unfortunately the Twitter Guidelines do not allow the upload of tweets. Tweet IDs can be provided. To build the dataset, follow the steps here to hydrate the IDs.

How to analyze vaccine tweets:

Download the datasets above and place them in /data/raw
Hydrate the tweet IDs in /data/raw/tweet_ids.csv/ and store the resulting jsonl file as "vaccine_tweets_hydrated.jsonl" in /data/raw/
Run Notebooks 2 - 6 in /notebooks/

Note: you may have to install requirements (pip3 install requirements.txt)

How to analyze overall COVID-19 tweets:

Hydrate Corona_Combined_Nov2020-June2021.csv and store as "Hydrated_Tweets.jsonl" in /data/raw
Run Notebook 1 and 7 - 11 in /notebooks/

Note: you may have to install requirements (pip3 install requirements.txt)

Used Technologies

NLP Pipeline: Word2Vec, Doc2Vec, TF-IDF, K-Means
SavGol-Filter (value smoothing)
Plotly (interactive Plots)

Report

Read the report here. The interactive Plots are stored in /reports/figures/.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
data		data
docs		docs
models		models
notebooks		notebooks
reports		reports
src/models		src/models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

models

models

notebooks

notebooks

reports

reports

src/models

src/models

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Data and Sentiment Analysis in Vaccine Discourse on Twitter

Project Organization

Datasets:

How to analyze vaccine tweets:

How to analyze overall COVID-19 tweets:

Used Technologies

Report

About

Releases

Packages

Contributors 2

Languages

License

Madhour/CovaxAnalytica

Folders and files

Latest commit

History

Repository files navigation

Data and Sentiment Analysis in Vaccine Discourse on Twitter

Project Organization

Datasets:

How to analyze vaccine tweets:

How to analyze overall COVID-19 tweets:

Used Technologies

Report

About

Topics

Resources

License

Stars

Watchers

Forks

Languages