COVID19 Retweet Prediction

We worked as a team to accurately predict the number of retweets a tweet can get. We started by extracting relevant features and then applied Machine Learning and Deep Learning techniques we learned through out the course.

Path to original data

We expect you to put the original data in the Data folder of our directory.

Folder Hierarchy explained

Preprocessing

In the preprocessing folder, you will find General_preprocessing.ipynb. Execute all cells in order to get the preprocessed data. This data is used by basic regression models, Simon and textual clustering.

Visualisation

In the folder Visualisation you will find Data_visualisation.ipynb. You can execute it to visualize the correlation matrix and other miscellaneous plots.

Text clustering

In the text clustering folder you will find Kmeans_Text.ipynb and Regression_using_clusters.ipynb. You will need to execute Kmeans_Text.ipynb first to compute the clusters that will be saved in the folder Data. When the script terminates, run Regression_using_clusters.ipynb. It will predict retweets on each cluster.

Basic Regression

In the Basic Regression folder, you will find Basic_Regression_Models.ipynb. Execute all cells in order to train and evaluate all our basic regression models.

Simon neural network model

In the folder named Simon-NN, you will find Simon-NN.ipynb. By executing this notebook, you will train your own version of Simon, our fist neural network model.

DNN_BERT neural network model

In the folder named DNN_BERT, you will find BERT_Tokenizer for Tweets.ipynb and DNN_BERT.ipynb. You will need to execute BERT_Tokenizer for Tweets.ipynb first to compute the tokens that will be saved in the folder Tensors. When the script terminates, run DNN_BERT.ipynb. By executing this notebook, you will train your own version of DNN_BERT, our second neural network model.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Basic_Regression		Basic_Regression
DNN_BERT		DNN_BERT
Origin		Origin
Preprocessing		Preprocessing
Simon-NN		Simon-NN
Text_Clustering		Text_Clustering
Visualisation		Visualisation
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID19 Retweet Prediction

Path to original data

Folder Hierarchy explained

Preprocessing

Visualisation

Text clustering

Basic Regression

Simon neural network model

DNN_BERT neural network model

About

Releases

Packages

Languages

anvdn/COVID19RetweetPrediction

Folders and files

Latest commit

History

Repository files navigation

COVID19 Retweet Prediction

Path to original data

Folder Hierarchy explained

Preprocessing

Visualisation

Text clustering

Basic Regression

Simon neural network model

DNN_BERT neural network model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages