Sentiment Analysis Benchmark

A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets

Authors : Ihab Bendidi, Yousra Bourkiche, Clément Siegrist, Kaouter Berrahal

In general, documents with similar sentiments, would be close to each other in the embeddings feature space. This can become another method to judge the performance of sentiment analysis models.

In this work, we aim to perform a benchmark of recent sentiment analysis works and models, reproduce their results, and judge their performance in comparison to baseline methods.

Outline

The following work in made on a jupyter notebook, that you can find here, or open in Colab here.

I - Processing & Exploratory Data Analysis

Understanding the data
Text Preprocessing

II - Sentiment classification models

Bert Model
LSTM recurrent model
Baseline method : textblob

III - Document Embeddings

Training doc2vec
Doc2vec sentiment classifier

IV - Model performance visualisation

Bert model
LSTM model
Logreg model
Textblob

You can also find .pdfreport with code here.

Installation

This was tested on Ubuntu 20.04 with Python 3.7, but should run on any device and any python 3 version.

Before running it, make sure to install dependencies, by running in terminal :

pip install -r requirements.txt

On Google colab, you would need to import the requirements.txt file, and the tweets.csv dataset to your colab session.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sentiment_embeddings.ipynb		sentiment_embeddings.ipynb
sentiment_embeddings.pdf		sentiment_embeddings.pdf
tweets.csv		tweets.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

sentiment_embeddings.ipynb

sentiment_embeddings.ipynb

sentiment_embeddings.pdf

sentiment_embeddings.pdf

tweets.csv

tweets.csv

Repository files navigation

Sentiment Analysis Benchmark

A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets

Outline

Installation

About

Contributors 2

Languages

License

IhabBendidi/sentiment_embeddings

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Benchmark

A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets

Outline

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages