Skip to content

Project in the university course TNM108 - Machine Learning for Social Media at Linköpings University 2022. Co-author Amanda Bigelius.

Notifications You must be signed in to change notification settings

aannajonssonn/TNM108-Twitter-Sentiment-Analysis

Repository files navigation

TNM108 Project - Twitter Sentiment Analysis

This is a project made for the university course TNM108 - Machine Learning for Social Media at Linköpings University 2022.

The project is made by Anna Jonsson and Amanda Bigelius, and the goal is to make a Twitter Sentiment Analysis Algorithm.

In the end, the project resulted in two different solutions. One solution where TextBlob, a lexicon-based method, was used, and one where Logistic Regression was used.

Twitter Sentiment Analysis using TextBlob

The algorithm will be heavily based on Nikita Silaparasetty's code from this tutorial

Her repository for the tutorial can be found here

Our modifications and thoughts

Our first modification was to move all the API_KEYS to a separate file in order to be able to uplead the code on GitHub.

We also added our own list of stopwords since the NLTK stopwords removed some words we found important for the classification.

We added a way to check the most frequent words from the tweets, without the query and only using words longer than 2 characters. Later on we added filtered out the NLTK stopwords on our most common words, since the analysis was done and these stopwords weren't relevant when looking at the word frequency. Then we displayed it as a bar plot.

Lastly we added a simple GUI to make it more intuitive for the user where to put the query.

Our assignment was to make a algorithm using machine learning, and although TextBlob is a good tool, it doesn't cover our needs for this assignment.

Graphical User Interface

The GUI has been made with the library PySimpleGUI, and this stackoverflow answer was very helpful.

Requirements 🛠️

In order for this algorithm to work you need to have python installed on your computer, as well as the following libraries:

Install libraries using pip

To install the libraries using pip, write the following command lines one by one:

  • Tweepy: pip install tweepy
  • Matplotlib: pip install matplotlib
  • Pandas: pip install pandas
  • TextBlob: pip install -U textblob as well as python -m textblob.download_corpora to download the necessary NLTK corpora.
  • WordCloud: pip install wordcloud
  • Better Profanity: pip install better_profanity
  • PySimpleGUI: pip install pysimplegui
  • NLTK: pip install nltk
  • Collection: pip install collection

Twitter Sentiment Analysis using Logistic Regression

The algorithm will be heavily based on Kate Arbuzova's code from this tutorial.

The dataset used for this method can be found on Kaggle.

Our modifications and thoughts

Our first modifications to Kate's code was to only look at the Logistic Regression methods she used.

We also increased the numbers of features to 10,000 - this was probably a bad move, but we still did it.

Then we commented out a lot of code, just to make the program print less stuff.

The runtime for this was extremely long, so we would recommend scaling everything down.

Requirements 🛠️

In order for this algorithm to work you need to have python installed on your computer, as well as the following libraries:

Install libraries using pip

To install the libraries using pip, write the following command lines one by one:

  • Scikit-learn: pip install scikit-learn
  • SciPy: pip install scipy
  • NLTK: pip install nltk
  • Statsmodels: pip install statsmodels
  • Emoji: pip install emoji
  • Regex: pip install regex
  • Spacy: pip install spacy
  • TQDM: pip install tqdm
  • Matplotlib: pip install matplotlib
  • Pandas: pip install panda
  • Pickle: pip install pickle
  • Seaborn: pip install seaborn

About

Project in the university course TNM108 - Machine Learning for Social Media at Linköpings University 2022. Co-author Amanda Bigelius.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages