Topic Modelling on COVID-19 based tweets

This is a script accompanying the research article: COVID-19 Twitter dataset with latent topics, sentiments and emotions attributes by Raj Kumar et al. It is used to extract latent topics from a Twitter dataset.

The input can be an .xlsx or .csv file containing a column of tweets. Since the algorithm is based on Latent Dirichlet Allocation, the number of topics need to be mentioned while running the script. Words to ignore in the dataset can be specified in the stopwords.txt file. The output of the program is a data file (.xlsx) containing the dataset with each row tagged with a topic, and, a sheet containing the word distribution within a given topic.

Steps to install python packages and run script

Unzip the file topic_modelling_covid_twitter.zip
Install the latest version of python (>=3.6) or create a conda virtual environment.
Open Command Prompt or Terminal depending on operating system (Windows, Linux or Mac OS)
Navigate to ./topic_modelling_covid_twitter where ever it unzipped using cd
pip install -r requirements.txt
python main.py <datafile> <column_name> <sample> <encoding> <num topics> <datafile> can be .xlsx or .csv <column_name> should be a valid column in the data file <sample> is between 0 and 1 including 1 <encoding> could be either utf8 or latin1 <num topics> is between 3 and 100

Example: python main.py tweets.xlsx Tweet 0.01 utf8 10

Visualization

Can be done using HighCharts. Example: https://jsfiddle.net/5xa4Ld2g/

Citation:

If you use this script and find it useful for your research, please cite the source as: Gupta, R., Vishwanath, A., and Yang, Y. (2020), COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes, Preprint at: https://arxiv.org/abs/2007.06954

For correspondence, please contact yangyp@ihpc.a-star.edu.sg

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
MOCK_DATA.csv		MOCK_DATA.csv
MOCK_DATA_topics.xlsx		MOCK_DATA_topics.xlsx
README.md		README.md
hydrate_tweets.ipynb		hydrate_tweets.ipynb
main.py		main.py
requirements.txt		requirements.txt
stopwords.txt		stopwords.txt
visualize_topics.py		visualize_topics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MOCK_DATA.csv

MOCK_DATA.csv

MOCK_DATA_topics.xlsx

MOCK_DATA_topics.xlsx

README.md

README.md

hydrate_tweets.ipynb

hydrate_tweets.ipynb

main.py

main.py

requirements.txt

requirements.txt

stopwords.txt

stopwords.txt

visualize_topics.py

visualize_topics.py

Repository files navigation

Topic Modelling on COVID-19 based tweets

Steps to install python packages and run script

Visualization

Citation:

About

Releases

Packages

Contributors 2

Languages

ajvish91/covid_twitter_scripts

Folders and files

Latest commit

History

Repository files navigation

Topic Modelling on COVID-19 based tweets

Steps to install python packages and run script

Visualization

Citation:

About

Resources

Stars

Watchers

Forks

Languages