Skip to content

ParisRohan/Twitter_sentiment_analysis_and_classification

Repository files navigation

Usecase:

This is an entity-level sentiment analysis dataset of twitter. Given a message and an entity, the task is to judge the sentiment of the message about the entity. There are three classes in this dataset: Positive, Negative, Neutral and Irrelevant.

Tech Stack:

  • Category: NLP, Multiclass Classification problem
  • Tech Stack: Python, Regular expression, Word cloud, NLTK, TF-IDF, Bag of Words, Pandas, Matplotlib, Sklearn

Medium Blog:

https://parisrohan.medium.com/twitter-sentiment-analysis-and-classification-7060d4444a27

Files:

  • EDA_TextCleaning.ipynb - EDA and text cleaning code
  • Model_building.ipynb - Model building code

Workflow:

1. Data Collection:

2. EDA:

  • The dataset columns have been renamed to {0:'Tweet_ID',1:'Topic',2:'Sentiment',3:'Tweet'} to get a better sense of the data.
  • 0.9% of the data has been dropped as it contains null values image
  • On an average each tweet contains 23 tokens and there are some tweets with extreme outliers image

3. Data preprocessing:

  • Following actions are performed on the 'Tweet' feature to extract important information.
  • Remove user mentions
  • Remove hashtags
  • Remove contractions
  • Remove urls
  • Remove special characters
  • Convert tweets into lowercase
  • Remove stopwords
  • Normalize text by converting words into lemma
  • Generate word clouds for each sentiment on the cleaned tweets image
  • Perform one-hot encoding on the 'Topic' feature
  • Drop features like 'Tweet_ID','Tweet','Topic' as they are no longer required

4. Model Building

  • TF-IDF vectorizer is used to create bag of words
  • Results of Multinomial Naive Bayes model: image
  • Results of Logistic Regression model: image
  • Results of Decision Tree Classifier model: image
  • Results of Random Forest Classifier model: image

About

A multiclass classification model that performs text cleaning, text analysis and classification based on the sentiments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published