Hashtag_Classification

For our data mining process we used Jefferson-Henrique's/GetOldTweets-python

Model.py is the logistic regression model we used to predict happy and FML hashtags the validation error we got with that was 92% for Bag of Words 91% for tf_idf and Bag of Words

In Embedding.ipynb, we used Twitter Glove vector for word representation trained on a neural network with 1st layer activation function = ReLU and the final activation function = sigmoid to obtain the accuracy of 90%. The neural network was then trained without pre-trained embeddings from Twitter Glove vector in the final graphs and with that we achieved an accuracy of 91% on the test data. But it performs really well on the validation data as the graphs show.

Optimized_embedding.ipynb is the optimized version of the Embedding.ipynb file here, we're training without the pre-trained weights, we use a duel model approach where we train one model on bitching and one on bragging. Then both are tested on bitching, bragging, and neither tweets. Then if bitching outputs a score higher than 50%, we label the tweet as bitching. If bragging outputs a score higher than 50% then the tweet is marked as bragging. If they both have a score higher than 50% then the tweet is marked as both (bitching and bragging). If they both have a score higher than 50% then the tweet is marked as neither. Additionally, we optimized the model where it now takes ReLU after the first input, and softmax instead of sigmoid for the final one.

SimpleRNN.ipynb uses one simple recurrent network layer and final layer with the sigmoid function to obtain an classifying accuracy of 92% on test data. (takes a while to train)

We're still working on updating the Github and bringing it up to speed with our findings until then you can view our report at: Report Link

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Blessed_clean.txt		Blessed_clean.txt
Embedding.ipynb		Embedding.ipynb
Embedding_with_rnn.ipynb		Embedding_with_rnn.ipynb
FML_clean.txt		FML_clean.txt
Happy_clean.txt		Happy_clean.txt
Idiots_clean.txt		Idiots_clean.txt
Logistic-Regression-Keras.ipynb		Logistic-Regression-Keras.ipynb
Model.py		Model.py
Neither_clean.txt		Neither_clean.txt
Optimized_embedding.ipynb		Optimized_embedding.ipynb
Pos_Neg_NN.ipynb		Pos_Neg_NN.ipynb
Proud_clean.txt		Proud_clean.txt
README.md		README.md
Sad_clean.txt		Sad_clean.txt
Simple RNN.ipynb		Simple RNN.ipynb
Simple_Logistic_Regression.ipynb		Simple_Logistic_Regression.ipynb
Single_Duel_Layer_RNN.ipynb		Single_Duel_Layer_RNN.ipynb
Technology_clean.txt		Technology_clean.txt
Ugh_clean.txt		Ugh_clean.txt
bidirectional_lstm_final.ipynb		bidirectional_lstm_final.ipynb
bidirectional_lstm_final_last_out.ipynb		bidirectional_lstm_final_last_out.ipynb
bidirectional_lstm_final_stem_stop.ipynb		bidirectional_lstm_final_stem_stop.ipynb
mnist_gan.ipynb		mnist_gan.ipynb
negative.txt		negative.txt
positive.txt		positive.txt
pre-trained_glove_twitter_model.h5		pre-trained_glove_twitter_model.h5

hamk3010/Tweet-Classifier

Folders and files

Latest commit

History

Repository files navigation

Hashtag_Classification

About

Topics

Resources

Stars

Watchers

Forks

Languages