Skip to content

Scrapped #wfh tweets and analyzed general sentiment of public towards work from home.

Notifications You must be signed in to change notification settings

Ajay-rai/BERT-NLP-TweetSentimentClassification

Repository files navigation

BERT-NLP-TweetSentimentClassification: Project Overview

  • Determined the sentiment of public towards work from home by analyzing tweets.
  • Scraped over 700,000 tweets using snscrape. Performed text-preprocessing and EDA.
  • Used AWS sagemaker/kaggle to get the sentiment from a pre-trained model based on DistillBert.
  • Imported the pipeline from Hugging Face model hub.

Code and Resources Used

Web Scraping (snscrape_tweets.ipynb)

Scraped over 700,000 tweets using snscrape. 27 attributes were there, out of which only following were important.

  • date
  • tweets (content)
  • hashtags

Data Cleaning (NLP sentimental analysis.ipynb)

Cleaned the raw text tweets.

  • Removed hashtags, @username and other non english words and punctuations from tweets.

EDA

Plotted distribution of tweets (#WFH) before and after covid. Also observed the average words used per tweets:

alt text alt text

Model Building (model_building.ipnyb)

  • Used transformers to call a pre-trained model from Hugging Face model hub.
  • The pipeline used has an autotokenizer and it's called sentiment-analysis.
  • Trained the model in AWS sagemaker/Kaggle.

alt text alt text alt text alt text

Conclusion and Future Recommendation

  • The sentiment of public is almost equally divided between positive and negative for 'work form home'.
  • Pipeline was implemented to train the model from Hugging Face model hub
  • The accuracy of model can be improved by further training it in a labellled data set of tweets and using other models.

About

Scrapped #wfh tweets and analyzed general sentiment of public towards work from home.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published