TwitterTopicModeling

Topic modeling on tweets. Using doc2vec word embedding and k-means clustering to categorize tweets.

The article is availabe here The goal of this code is to categorize tweets into main themes. Figure below shows the main steps of the process:

1- The dataset is a collection of tweets related to a specific subject. In my case it was tweets related to COVID-19 pandemic. Database should be extracted to MySQL folder of XAMPP software. load_data.py loads the tweets.
2- The preprocessing includes converting letters to lower case, removing URL, mentions, stopwords and emojis, correcting repeated characters, tokenizing and replacing negations with NOT. preprocessing.py preprocesses the tweets.
3- Document embedding is done using doc2vec algorithm in doc2vec.py
4- Clustering is performed using k-means algorithm in clustering.py
5- Theme extraction is done manually based on most frequent words used in each cluster which is generated in evaluate.py
All the following steps are performed by running the main.py file.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Flowchart.png		Flowchart.png
README.md		README.md
clustering.py		clustering.py
doc2vec.py		doc2vec.py
evaluate.py		evaluate.py
functions.py		functions.py
load_data.py		load_data.py
main.py		main.py
preprocessing.py		preprocessing.py
repeat_replacer.py		repeat_replacer.py
slang_dict.xlsx		slang_dict.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flowchart.png

Flowchart.png

README.md

README.md

clustering.py

clustering.py

doc2vec.py

doc2vec.py

evaluate.py

evaluate.py

functions.py

functions.py

load_data.py

load_data.py

main.py

main.py

preprocessing.py

preprocessing.py

repeat_replacer.py

repeat_replacer.py

slang_dict.xlsx

slang_dict.xlsx

Repository files navigation

TwitterTopicModeling

About

Releases

Packages

Languages

Hamoon1987/TwitterTopicModeling

Folders and files

Latest commit

History

Repository files navigation

TwitterTopicModeling

About

Topics

Resources

Stars

Watchers

Forks

Languages