Skip to content

EdenBD/lyrics-sentiment

Repository files navigation

LyricsAudioBoost

Combining BERT and Spotify Valence Feature For Track Sentiment Analysis

In this notebook I combine Spotify audio feature and BERT word embedding to predict tracks sentiments. I use hugginface pre-trained BERT transformer as an embedding layer, and train an additional bidirectional GRU layer for the sentiment analysis regression task (point prediction in range [0-1]). To train the fine-tunning layer of the model I use Spotify valence attribute which I added to a lyrics dataset.

Motivation:

The examples below use NLTK demo and Spotify valence to measure a track's positivenesss. They demonstrate that using strictly audio OR lyrics might be inaccurate.

  1. Positive Sentiment Example: Baz Luhrmann - Everybody's Free To Wear Sunscreen.
    • NLTK sentiment classification: Negative.
    • Spotify Valence: 0.8.
  2. Negative Sentiment Example: Otis Redding- Mr. pitiful.
    • NLTK sentiment classification: Negative.
    • Spotify Valence: 0.9.

Steps to build model:

  1. Database: gathering songs lyrics, adding Spotify valence attribute and pre-processing. I uploaded to Kaggle the final 150K Lyrics Labeled with Spotify Valence Dataset.
  2. Model Design: Iteratively improved model capacity.
  3. Evaluation: loss and accuracy metrics across 3 buckets - negative, neutral and positive sentiments.
  4. Interpretation: Understanding what the model is learning using word clouds.

Example:

Words in the word cloud are sized by their respective difference on the model's prediction, and their positive (green) or negative (red) influence.

Positive Sentiment Example: Armin Van Buuren- Blah Blah Blah.

  • NLTK sentiment classification: Negative.
  • Spotify Valence: 0.18.
  • LyricsAudioBoost Model: 0.76.

Model interpretation - Word cloud

About

NLP Task: Lyrics Sentiment Analysis using Spotify & Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published