# NLP Task: Lyrics Sentiment Analysis using Spotify & Transformers

#### In this tutorial I implement a BERT transformer with a bi-directional GRU fine-tuning layer to estimate sentiment of lyrical data. The model outputs a real number estimate between 0-1 (extremely negative to positive). 

##### You can try its predictions on your favorite song's lyrics :D

By using the pre-trained BERT transformer from [hugginface transformers library](https://github.com/huggingface/transformers) as an embedding layer, we only have to train an additional GRU layer for the sentiment analysis, regression task (outputting a point prediction instead of a class). To train the fine-tunning layer of the model, I use Spotify valence attribute on a lyrics datset. 

#### References: 
1. Pre-processing and loading a custom dataset, gaining a better understanding of hugginface: [BERT Fine-Tuning Tutorial with PyTorch By Chris McCormick and Nick Ryan](https://colab.research.google.com/drive/1pTuQhug6Dhl9XalKB0zUGf4FIdYFlpcX). 
2. Training the fine-tunning layer: [PyTorch Sentiment Analysis by Ben Trevett](https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/6%20-%20Transformers%20for%20Sentiment%20Analysis.ipynb).
3. [6.864 Natural Language Processing Spring 2020](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-864-advanced-natural-language-processing-fall-2005/) material. 

## Step 1: Import Libraries and Define Constants

I processed the dataset in [this notebook](https://colab.research.google.com/drive/17NWbYNiSXYfoCipbn9qXmkIL1SvCFOah):
1. Got songs lyrics from a Kaggle database (with columns: Band, Lyrics, Song).
2. Queried Spotify for each song [valence](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/) (as a measure of positiveness).
3. Integrated to one dataframe, where each songs has a corresponding valence value, without nulls.


In [0]:
%%bash
# Check colab and silent output if there are no errors
!(stat -t /usr/local/lib/*/dist-packages/google/colab > /dev/null 2>&1) && exit 
# Clone Github repository 
rm -rf lyrics-sentiment
git clone https://github.com/EdenBD/lyrics-sentiment.git

In [0]:
# To open a function from a different notebook
!pip install ipynb
# To use pre-trained BERT models
!pip install transformers
# Get needed function from GitHub repository 
%cd lyrics-sentiment
from ipynb.fs.defs.Spotify_Dataset import get_spotify_valence
%cd ..
# To download my model from the drive
!pip install -U -q PyDrive

In [35]:
import pandas as pd
import numpy as np
import torch

import os
import random

# For deterministic results. 
SEED = 0

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Mount google Drive
from google.colab import drive
drive.mount('/content/gdrive/')



Drive already mounted at /content/gdrive/; to attempt to forcibly remount, call drive.mount("/content/gdrive/", force_remount=True).


#### Notebook Constants: 

If you would like to train and save your own model, approve mounting your drive and change `DRIVE_FOLDER` to your desired directory name.

The `DATASET_SIZE` is small for a shorter training cycle (took me around 3 hours without GPU), but will have less optimal results. 

In [0]:
STR_PRINT_BOUND = 600

# For supervised training of lyrics sentiments.
# below this bound, the song is considered as negative.
LOW_VALENCE_BOUND = 0.1
# above this bound, the song is considered as positive.
HIGH_VALENCE_BOUND = 0.9
# in between these bounds, the song is considered as neutral.
NEUTRAL_LOWER_BOUND = 0.3
NEUTRAL_UPPER_BOUND = 0.5

# Out of the big lyrics database, use only a few rows to decrease trainig & evaluation cycles.
DATASET_SIZE = 1000

# Split dataset to training, validation and test according to these values.
TRAIN_SIZE = 0.7
VALIDATION_SIZE = 0.15

# Model hyperparameters

HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.25
BATCH_SIZE = 32 

# Define used paths. 

# Update DRIVE_FOLDER to your gdrive folder.
DRIVE_FOLDER = "6864"
DRIVE_DIR = "/content/gdrive/My Drive"
ROOT_DIR = os.path.join(DRIVE_DIR,DRIVE_FOLDER)

# Save path for best epoch model. 
MODEL_NAME = "songs-model-mse.pt"
SAVE_PATH = os.path.join(ROOT_DIR,MODEL_NAME)
DATASET_FILENAME = "labeled_lyrics_cleaned.csv"
READY_MODEL_ID= "15iyOz7OR-0QWlCLq5PiW2agsBOgroHMU"

# How far off an absolute difference between label and predictions counts as correct.
ACCURACY_THRESHOLD = 0.35

## Step 2: Prepare Pytorch Dataloader

I processed the labeled lyrics dataset in [this notebook](https://colab.research.google.com/drive/17NWbYNiSXYfoCipbn9qXmkIL1SvCFOah):
1. Got songs lyrics from [250K+ lyrics Kaggle database](https://www.kaggle.com/detkov/lyrics-dataset/metadata).
2. Queried Spotify for each song [valence](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/) (as a measure of positiveness).
3. Integrated to one dataframe, where each songs has a corresponding valence value, without nulls.


In [41]:

# Load lyrics with sentiment file.
dataset_path = os.path.join(ROOT_DIR,DATASET_FILENAME)

df = pd.read_csv(dataset_path, error_bad_lines=False)
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

# Dataset size.
print('Number of training sentences: {:,}\n'.format(df.shape[0]))

# Display 10 random rows from dataset.
df.sample(10)

Number of training sentences: 158,353



Unnamed: 0,artist,seq,song,label
54246,Bad Religion,Three thousand miles of wilderness overcome by...,Against the Grain,0.753
57531,Anahi,El sabor del viento y tu amanecer\r\nRompe la ...,Arena Y Sol,0.771
65329,XTC,"Do something for me, boys \r\nIf I should die ...",All You Pretty Girls [Home Demo],0.961
123663,Avril Lavigne,"Hey, hey,\r\nYou, you,\r\nI don't like your gi...",Girlfriend [The Submarines' Time Warp '66 Mix],0.839
110284,Tom Waits,"My time went so quickly, \r\nI went lickety-sp...",Ol' 55,0.336
61151,Elvis Presley,Oh yes I've got a lot o' living to do\r\nA who...,Got a Lot O' Livin to Do!,0.962
107220,Jethro Tull,"Salamander, \r\nBorn in the sun-kissed flame.\...",Salamander,0.592
8348,Brian McKnight,[Nelly] \r\nLook\r\nShit just ain't the same\r...,All Night Long,0.75
135843,Neil Finn,Totally wired and the game is up\r\nI'm under ...,Rest of the Day Off,0.634
12215,Billy Stritch,It starts with one thing\r\nI don't know why\r...,Breezin' Along with the Breeze/Live Alone and ...,0.475


You can check Spotify sentiment on your choice of artist and song. 
The Kaggle dataset contains +2K artists, uploaded in October 2019. 


In [42]:
artist_name = "guetta"
song_title = "sun"

df[(df.artist.str.contains(artist_name, case=False)) & (df.label > 0.5) & (df.song.str.contains(song_title, case=False))]

Unnamed: 0,artist,seq,song,label
96293,David Guetta,"Oh wooh\nOh gonna break it, break it, break it...",Sun Goes Down,0.695
96298,David Guetta,"Let's light it up, let's light it up\r\nUntil ...",Lovers on the Sun,0.568


Out of this dataset, I took a diversified sample of size `DATASET_SIZE`, to decrease the training cycle time. 

In [43]:
neutral = (df['label'] > NEUTRAL_LOWER_BOUND) & (df['label'] < NEUTRAL_UPPER_BOUND)
positive = (df['label'] > HIGH_VALENCE_BOUND)
negative = (df['label'] < LOW_VALENCE_BOUND)

size_each_part = int(DATASET_SIZE/3)
positive_df, negative_df, neutral_df = df[positive][:size_each_part], df[negative][:size_each_part], df[neutral][:size_each_part]

diversified_df = pd.concat([positive_df, negative_df,neutral_df], axis=0)

# Suffle dataframe rows and drop previous indices column
diversified_df = diversified_df.sample(frac=1).reset_index(drop=True)

lyrics = diversified_df.seq.values
labels = diversified_df.label.values

print("Diversified df example rows: \n")
print(diversified_df[:10])
print("Length of diversified dataset=", len(diversified_df))

Diversified df example rows: 

            artist  ...   label
0    Liza Minnelli  ...  0.0956
1      Dean Martin  ...  0.9760
2    Liza Minnelli  ...  0.9330
3  Ella Fitzgerald  ...  0.3660
4    Dead or Alive  ...  0.9650
5        Dead Moon  ...  0.4300
6          DJ Bobo  ...  0.9670
7     Julee Cruise  ...  0.0343
8             Ella  ...  0.4330
9     Tony Bennett  ...  0.0812

[10 rows x 4 columns]
Length of diversified dataset= 999


We must tokenize the lyrical dataset, since this is what the BERT model expects as input. 
You can tokenize any text by loading the tokenizer of the imported bert model. 

In [44]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

HBox(children=(IntProgress(value=0, description='Downloading', max=231508, style=ProgressStyle(description_wid…




Checking properties of Tokenizer on our dataset.

In [45]:
# Checking number of tokens in imported vocabulary. 
print("Voabulary size of Tokenizer: ",len(tokenizer.vocab),end='\n\n')

# Print the original lyrics.
print('Song lyrics: ')
print(lyrics[0][:STR_PRINT_BOUND],end='\n\n')

# Print the lyrics split into tokens.
print('Tokenized lyrics: ', tokenizer.tokenize(lyrics[0]),end='\n\n')

# Print the lyrics mapped to token ids.
token_ids = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(lyrics[0]))
print('Tokenized IDs: ', token_ids,end='\n\n')

# Check out of vocabulary words in lyrics
print('Precentage of Unknowns: ', token_ids.count(tokenizer.unk_token_id)/len(token_ids)*100, '%',end='\n\n')

# Lyrics Sentiment
print('Label: ', labels[0])

Voabulary size of Tokenizer:  30522

Song lyrics: 
It had to be you, it had to be you;
I wandered around, and finally found - the somebody who
Could make me be true, could make me be blue;
And even be glad, just to be sad, thinking of you.

Some others I've seen, might never be mean;
Might never be cross, or try to be boss,
But they wouldn't do.
For nobody else, gave me a thrill - with all your faults, I love you still.
It had to be you, wonderful you;
It had to be you.

Tokenized lyrics:  ['it', 'had', 'to', 'be', 'you', ',', 'it', 'had', 'to', 'be', 'you', ';', 'i', 'wandered', 'around', ',', 'and', 'finally', 'found', '-', 'the', 'somebody', 'who', 'could', 'make', 'me', 'be', 'true', ',', 'could', 'make', 'me', 'be', 'blue', ';', 'and', 'even', 'be', 'glad', ',', 'just', 'to', 'be', 'sad', ',', 'thinking', 'of', 'you', '.', 'some', 'others', 'i', "'", 've', 'seen', ',', 'might', 'never', 'be', 'mean', ';', 'might', 'never', 'be', 'cross', ',', 'or', 'try', 'to', 'be', 'bo

Perform tokenization on the entire dataset:
1. Convert words to voabulart indices.
2. Add special tokens to the start [CLS] and end of each song lyrics [SEP].
3. Pad & truncate all songs' lyrics to BERT max length (512 tokens).
4. Use attention masks to differentiate real tokens from padding tokens.

In [46]:
# Tokenize all of the sentences and map the tokens to their word IDs.
input_ids = []
attention_masks = []

# For every song:
for lyric in lyrics:

    #   Return a dictionary containing the encoded lyrics, after tokenization and mapping tokens to their vocabulary IDs.
    #   Add special tokens [CLS], [SEP], [PAD].
    #   Pad or truncate the sentence to max_length. 
    #   Create attention masks for [PAD] tokens.
    encoded_dict = tokenizer.encode_plus(
                        lyric,                          # One song to encode.
                        add_special_tokens = True,      # Add '[CLS]' and '[SEP]'
                        max_length = tokenizer.max_len, # Pad & truncate all sentences.
                        pad_to_max_length = True,
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',          # Return pytorch tensors.
                   )
    
    # Add the encoded sentence to the list.    
    input_ids.append(encoded_dict['input_ids'])
    
    # And its 1, 0 attention mask (0 for padding indices).
    attention_masks.append(encoded_dict['attention_mask'])

# Convert the lists into tensors.
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
labels = torch.tensor(labels)
# Check Results
print("----------------------")
print('Original: ', lyrics[1])
print("----------------------")
print('Original Label: ', labels[1].item())
print("----------------------")
print('Token IDs:', input_ids[1])
print("----------------------")
print('Attention mask:', attention_masks[1])
print("----------------------")


----------------------
Original:  I'm Alabamy bound
They'll be no heebie-jeebies hanging 'round
Just gave the meanest ticket man on earth
All I'm worth to put my tootsies in an upper berth
Just hear the choo-choo sound
I know that soon we're gonna cover ground
And then I'll holler so the world will know
Here I go
I'm Alabamy bound
I'm Alabamy bound
They'll be no heebie-jeebies hanging 'round
Just gave the meanest ticket man on earth
All I'm worth to put my tootsies in an upper berth
Just hear the choo-choo sound
I know that soon we're gonna cover the ground
And then I'll holler so the world will know
Here I go
I'm Alabamy...
I'm Alabamy bound
I'm gone
----------------------
Original Label:  0.976
----------------------
Token IDs: tensor([  101,  1045,  1005,  1049, 21862,  3676,  8029,  5391,  2027,  1005,
         2222,  2022,  2053, 18235, 11283,  1011, 15333, 15878,  3111,  5689,
         1005,  2461,  2074,  2435,  1996,  2812,  4355,  7281,  2158,  2006,
       

Split data to Train, Validation, Test and create a Pytorch Iterator.

In [47]:
from torch.utils.data import TensorDataset, random_split, DataLoader, RandomSampler, SequentialSampler

# Combine the inputs into a TensorDataset.

dataset = TensorDataset(input_ids, attention_masks, labels)

# Split train-validation-test sizes.
train_size = int(TRAIN_SIZE * len(dataset))
val_size = int(VALIDATION_SIZE * len(dataset))
test_size = len(dataset) - train_size - val_size

# Divide the dataset randomly.
train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

batch_size = BATCH_SIZE

# Create DataLoaders.
train_dataloader = DataLoader(
            train_dataset,  
            sampler = RandomSampler(train_dataset), # Select batches randomly
            batch_size = batch_size 
        )

validation_dataloader = DataLoader(
            val_dataset,
            sampler = SequentialSampler(val_dataset), # Pull batches sequentially.
            batch_size = batch_size 
        )

test_dataloader = DataLoader(
            test_dataset,
            sampler = SequentialSampler(test_dataset), # Pull batches sequentially.
            batch_size = batch_size 
        )

print('{} training samples'.format(train_size))
print('{} validation samples'.format(val_size))
print('{} test samples'.format(test_size))

699 training samples
149 validation samples
151 test samples


## Step 3: Define the Model

1. Load the same model as with the tokenizer. 
2. Build an extra layer for the sentiment task.  

To have more control on the fine-tuning final layer, I decided to define my own class. However, there are many ready-to-use, task dependent BERT such as [BertForSequenceClassification](https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#bertforsequenceclassification), which adds a single linear layer on top of the original [BERT model](https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.Bert). 

In [48]:
from transformers import BertModel

bert = BertModel.from_pretrained('bert-base-uncased') # consists of 12 Transformer layers

HBox(children=(IntProgress(value=0, description='Downloading', max=433, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=440473133, style=ProgressStyle(description_…




Instead of training an embedding layer, I will use the pre-trained bert model weigths and freeze them during fine-tuning. 

The fine-tuning layers consists of a bidirectional [GRU](https://pytorch.org/docs/stable/nn.html#gru) and linear prediction layer for sentiment output. As stated in the [paper](https://arxiv.org/pdf/1810.04805.pdf) I will only feed to the fine-tuning layers the hidden state of the final time-step.

In [0]:
import torch.nn as nn

class BERTLyricalSentimentGRU(nn.Module):
    def __init__(self, bert, hidden_dim, output_dim, n_layers, bidirectional, dropout):
        
      super().__init__()
      
      self.bert = bert
      
      embedding_dim = bert.config.to_dict()['hidden_size'] # The hidden embedding size that bert outputs
      
      self.gru = nn.GRU(embedding_dim,
                        hidden_dim,
                        num_layers = n_layers,
                        bidirectional = bidirectional,
                        batch_first = True,
                        dropout = 0 if n_layers < 2 else dropout)
      
      self.linear = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)
      
      self.dropout = nn.Dropout(dropout)
      
    def forward(self, text, attention_masks):
      """
        Args:
          text(tensor): list of tokens idx, of shape (batch_size, max_sentence_len)
          attention_masks(tensor): marks padding tokens.
      """

      # print("text.shape=",text.shape) # (batch_size, max_sentence_len)
      # Freezed embedding weights        
      with torch.no_grad():
          embedded = self.bert(input_ids=text, attention_mask=attention_masks)[0] 
              
      # print("embedded.shape=",embedded.shape) #(batch_size, max_sentence_len, embed_dim)
      
      _, hidden = self.gru(embedded) # Do not need the outputs of the last (depth-wise) layer 
      # print("hidden.shape before GRU=",hidden.shape) #(n_layers * n_directions, batch_size, embed_dim)
      
      # Use final time-step of the last GRU layer as linear layer input.
      if self.gru.bidirectional:
          hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)) 
      else:
          hidden = self.dropout(hidden[-1,:,:])

      # print("hidden.shape after GRU=",hidden.shape) #(batch_size, hidden_dim *2 OR hidden_dim)
      
      output = self.linear(hidden)        
      
      # print("output.shape=",output.shape) #(batch_size, linear_dim)     
      
      return output

In [0]:
model = BERTLyricalSentimentGRU(bert,
                         HIDDEN_DIM,
                         OUTPUT_DIM,
                         N_LAYERS,
                         BIDIRECTIONAL,
                         DROPOUT)

Freeze BERT parameters and check learnable parameters that will be updated dduring training.

In [0]:
# Freeze parameters of bert model
for name, param in model.named_parameters():                
    if name.startswith('bert'):
        param.requires_grad = False

In [52]:
for name, param in model.named_parameters():                
    if param.requires_grad:
        print(name)

gru.weight_ih_l0
gru.weight_hh_l0
gru.bias_ih_l0
gru.bias_hh_l0
gru.weight_ih_l0_reverse
gru.weight_hh_l0_reverse
gru.bias_ih_l0_reverse
gru.bias_hh_l0_reverse
gru.weight_ih_l1
gru.weight_hh_l1
gru.bias_ih_l1
gru.bias_hh_l1
gru.weight_ih_l1_reverse
gru.weight_hh_l1_reverse
gru.bias_ih_l1_reverse
gru.bias_hh_l1_reverse
linear.weight
linear.bias


## Step 4: Train the Model

Define an optimizer and [loss function](https://learn-pytorch.oneoffcoder.com/loss.html).

In [0]:
import torch.optim as optim

# The call to model.parameters() contains the learnable parameters.
optimizer = optim.Adam(model.parameters())

I chose MSE since the model outputs a point prediction - a particular sentiment, and not a probability distribution. 
For a more numerically stable result I apply Sigmoid after the forward pass, as part of the loss computation. 

In [0]:
# criterion = nn.BCEWithLogitsLoss() # Combines Softmax with Binary Cross Entropy (BCE) loss

sigmoid = nn.Sigmoid()

criterion = nn.MSELoss() # Since we output a single point prediction output and not a probability distribution over all possible classes.

Check if any GPUs are available, and if available, put the model and criterion onto the GPU.

In [55]:
# If there's a GPU available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('There are {} GPU(s) available'.format(torch.cuda.device_count()))

# Convert model parameters to double
model.double()

# Fit model and criterion to Cuda
model = model.to(device)
criterion = criterion.to(device)


There are 0 GPU(s) available


Compute the accuracy of a batch to track predictions for our three different buckets: negative, neutral and positive sentiment. I use the threshold as an absolute difference that the prediciton can have from the label. 

In [0]:
def batch_accuracy(preds, labels, threshold=ACCURACY_THRESHOLD):
    """
    Args: 
      preds and labels are both (batch_size,) with values between [0-1].

    Returns a float between 0-1, represents precentage of correct predictions in batch. 
    A correct a prediction for which abs(prediction-label) <=  threshold.
    """
    # Move to CPU to be able to use numpy.
    preds = preds.detach().cpu().numpy()
    labels = labels.to('cpu').numpy()
    # Filter all the correct predictions, corrrect are close enough to the labels.
    correct_mask = np.isclose(preds, labels, rtol=threshold, atol=threshold)
    # Get indices of correct predictions.
    correct_indices = correct_mask.nonzero()
    correct_labels = labels[correct_indices]
    # Precentage of correct sentiment out of all the labels with this sentiment range.  
    correct_negative = np.count_nonzero(correct_labels < threshold)/len(labels[labels < threshold])
    correct_positive = np.count_nonzero(correct_labels > 1-threshold)/len(labels[labels > 1-threshold])
    correct_neutral = np.count_nonzero((threshold <= correct_labels) & (correct_labels <= 1-threshold))/len(
        labels[(threshold <= labels) & (labels <= 1-threshold)])

    return np.sum(correct_mask) / len(labels), correct_negative, correct_positive, correct_neutral

Most of the training code is taken from the above mentioned [PyTorch Sentiment Analysis by Ben Trevett](https://github.com/bentrevett/pytorch-sentiment-analysis). 

In [0]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    epoch_negative, epoch_positive, epoch_neutral = 0, 0, 0
    
    model.train()
    
    for batch_id, batch in enumerate(iterator):
        optimizer.zero_grad()
        
        # batch contains three pytorch tensors:
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        
        predictions = model(b_input_ids, attention_masks=b_input_mask).squeeze(1)
        
        loss = criterion(sigmoid(predictions), b_labels)
        
        acc, correct_negative, correct_positive,correct_neutral = batch_accuracy(predictions, b_labels)
        
        loss.backward()
        
        optimizer.step()

        print("Trained batch number: {}   | loss: {:.2f} | accuracy: {:.2f}".format(batch_id, loss.item(), acc.item()))
        print("Predicted correctly:    {:.2f}% negatives | {:.2f}% positives | {:.2f}% neutral".format(
            correct_negative*100, correct_positive*100, correct_neutral*100))

        epoch_loss += loss.item()
        epoch_acc += acc.item()
        epoch_negative += correct_negative
        epoch_positive += correct_positive
        epoch_neutral += correct_neutral

    return epoch_loss / len(iterator), epoch_acc / len(iterator), epoch_negative/ len(iterator), epoch_positive/ len(iterator), epoch_neutral/ len(iterator)

In [0]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    epoch_negative, epoch_positive,epoch_neutral = 0, 0, 0

    
    model.eval()
    
    with torch.no_grad():
    
        for batch_id, batch in enumerate(iterator):

            # batch contains three pytorch tensors:
            b_input_ids = batch[0].to(device)
            b_input_mask = batch[1].to(device)
            b_labels = batch[2].to(device)
            
            predictions = model(b_input_ids, attention_masks=b_input_mask).squeeze(1)            
            loss = criterion(sigmoid(predictions), b_labels)
        
            acc, correct_negative, correct_positive,correct_neutral = batch_accuracy(predictions, b_labels)

            print("Evaluated batch number: {}   | loss: {:.2f} | accuracy: {:.2f}".format(batch_id, loss.item(), acc.item()))
            print("Predicted correctly:    {:.2f}% negatives | {:.2f}% positives | {:.2f}% neutral".format(
            correct_negative*100, correct_positive*100, correct_neutral*100))

            epoch_loss += loss.item()
            epoch_acc += acc.item()
            epoch_negative += correct_negative
            epoch_positive += correct_positive
            epoch_neutral += correct_neutral

        
    return epoch_loss / len(iterator), epoch_acc / len(iterator), epoch_negative/ len(iterator), epoch_positive/ len(iterator), epoch_neutral/len(iterator)

In [0]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [0]:
# The BERT authors recommend between 2 to 4 epochs.

N_EPOCHS = 3

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):
    
    start_time = time.time()
    
    train_loss, train_acc, train_negative, train_positive, train_neutral = train(
        model, train_dataloader, optimizer, criterion)
    valid_loss, valid_acc, valid_negative, valid_positive, valid_neutral = evaluate(
        model, validation_dataloader, criterion)
        
    end_time = time.time()
        
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
        
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        print("Save Model with loss {:.2f} to {}".format(best_valid_loss, SAVE_PATH))
        save_output = open(SAVE_PATH, mode="wb")
        torch.save(model.state_dict(), save_output)
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\t Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}% | Correct Neg: {train_negative*100:.2f}% Pos: {train_positive*100:.2f}% | Neutral: {train_neutral*100:.2f}%')
    print(f'\t  Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}% | Correct Neg: {valid_negative*100:.2f}% Pos: {valid_positive*100:.2f}% | Neutral: {valid_neutral*100:.2f}%')

Trained batch number: 0 with loss: 0.08 and accuracy: 0.31
Predicted correctly 30.77% negatives, 37.50% positives and 31.73% neutral labels
Trained batch number: 1 with loss: 0.06 and accuracy: 0.31
Predicted correctly 18.18% negatives, 57.14% positives and 24.68% neutral labels
Trained batch number: 2 with loss: 0.09 and accuracy: 0.25
Predicted correctly 11.11% negatives, 50.00% positives and 38.89% neutral labels
Trained batch number: 3 with loss: 0.05 and accuracy: 0.22
Predicted correctly 0.00% negatives, 50.00% positives and 50.00% neutral labels
Trained batch number: 4 with loss: 0.05 and accuracy: 0.25
Predicted correctly 7.14% negatives, 50.00% positives and 42.86% neutral labels
Trained batch number: 5 with loss: 0.09 and accuracy: 0.19
Predicted correctly 0.00% negatives, 33.33% positives and 66.67% neutral labels
Trained batch number: 6 with loss: 0.08 and accuracy: 0.31
Predicted correctly 0.00% negatives, 50.00% positives and 50.00% neutral labels
Trained batch number: 7 

Load the best model across epochs, the one that has the lowest validation loss. 

The preset is to load a previously trained model. 
If you would like to use yours, please change `use_trained_model` to True.

Downloading the ready model will take a few minutes and will require your authentication when propmpted. 

In [0]:
# To authenticate to Google Cloud and download a ready to use model
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

def load_model(use_trained_model=False,shared_file_id=READY_MODEL_ID, output_name=MODEL_NAME, colab=True):
  # Use your trained model
  if use_trained_model:
    model.load_state_dict(torch.load(SAVE_PATH))
    print('Loaded the trained model Successfully')  

  # Use existing model
  else:
    # When run in colab need a small modification to connect to Drive.
    if colab:
      auth.authenticate_user()
      gauth = GoogleAuth()
      # Download json metadata
      gauth.credentials = GoogleCredentials.get_application_default() 
    # When run in console
    else:
      gauth = GoogleAuth()
      # Create local webserver which automatically handles authentication.
      gauth.LocalWebserverAuth() 
    # Create GoogleDrive instance with authenticated GoogleAuth instance.
    drive = GoogleDrive(gauth)
    # Initialize GoogleDriveFile instance with file id.
    file_object = drive.CreateFile({'id':shared_file_id}) 
    # Download file with name MODEL_NAME
    file_object.GetContentFile(output_name)
    print('Downloaded model Successfully')  
    model.load_state_dict(torch.load(output_name))
    print('Loaded the ready-to-use model Successfully')  

# Change here to use your model instead
load_model(use_trained_model=False)

Check model performance on test set. 

In [61]:

test_loss, test_acc, correct_negative, correct_positive, correct_neutral = evaluate(model, test_dataloader, criterion)
print("Test Loss: {:.2f} | Test Accuracy: {:.2f}%".format(test_loss, 100*test_acc))
print("--------------------------------------------------------")
print("Predicted correctly: {:.2f}% negatives | {:.2f}% positives | {:.2f}% neutral".format(correct_negative*100, correct_positive*100, correct_neutral*100))


Evaluated batch number: 0   | loss: 0.10 | accuracy: 0.53
Predicted correctly:    42.11% negatives | 87.50% positives | 40.00% neutral
Evaluated batch number: 1   | loss: 0.10 | accuracy: 0.44
Predicted correctly:    29.41% negatives | 60.00% positives | 60.00% neutral
Evaluated batch number: 2   | loss: 0.08 | accuracy: 0.59
Predicted correctly:    20.00% negatives | 90.91% positives | 63.64% neutral
Evaluated batch number: 3   | loss: 0.09 | accuracy: 0.50
Predicted correctly:    22.22% negatives | 76.92% positives | 40.00% neutral
Evaluated batch number: 4   | loss: 0.13 | accuracy: 0.39
Predicted correctly:    40.00% negatives | 44.44% positives | 25.00% neutral
Test Loss: 0.10 | Test Accuracy: 49.08%
--------------------------------------------------------
Predicted correctly: 30.75% negatives | 71.96% positives | 45.73% neutral


## Step 5: Predict unseen samples

Time to try out the model ! Test the sentiment of lyrics from the given dataset or your own. 

Before passing lyrical text through the model, we will convert the input text to vocabulary indices, add special tokens and convert it to a reshaped tensor.

In [0]:
def predict_lyrics_sentiment(model, tokenizer, lyrics):
    model.eval()
    # Turn lyrics to voacbulary indices.
    input_ids = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(lyrics))
    text = torch.LongTensor(input_ids).reshape(1,-1)

    # Create a mask for input's padding.
    mask = torch.LongTensor(np.where(input_ids == 0, 0, 1)).reshape(1,-1)

    prediction = model(text, attention_masks=mask).squeeze(1) # in place remove all size 1 in the given dimension (here 1).
    prediction = torch.sigmoid(prediction)
    return prediction.item()

#### Low valence Sample

In [63]:
negative_sample_row = diversified_df[diversified_df.label<LOW_VALENCE_BOUND].sample()
negative_sample_row_lyrics = negative_sample_row['seq'].item()
print("Low valence lyrical sentiment analysis:")
print("-----------------")
print("Name:", negative_sample_row.song.item())
print("Artist:", negative_sample_row.artist.item())
print("-----------------")
print("Lyrics extract:")
print(negative_sample_row_lyrics[:STR_PRINT_BOUND])

prediction = predict_lyrics_sentiment(model, tokenizer, negative_sample_row_lyrics)
print("-----------------")
print("Model Prediction: {:.2f}".format(prediction))
print("Spotify Label: {:.2f}".format(negative_sample_row.label.item()))


Low valence lyrical sentiment analysis:
-----------------
Name: The End of Words
Artist: Dead Can Dance
-----------------
Lyrics extract:
Murderer!
Man of fire.

Murderer!
I've seen the eyes of living dead.
It's the same game - survival.
The great mass play a waiting game.
Embalmed, crippled, dying in fear of pain.
All sense of freedom gone.

Black sun in a white world.
Like having a black sun in a white world.

I have a son,
His name is Eden.
It's his birthright,
Beyond estranged time.

Give me 69 years,
Another season in this hell.
It's all sex and death as far as I can tell.

Like Prometheus we are bound,
Chained to this rock of a brave new world,
Our godforsaken lot.
And I feel that's all we've ever 
-----------------
Model Prediction: 0.48
Spotify Label: 0.07


#### High valence Sample 

In [64]:
pos_sample_row = diversified_df[diversified_df.label>HIGH_VALENCE_BOUND].sample()
pos_sample_row_lyrics = pos_sample_row['seq'].item()

print("High valence lyrical sentiment analysis:")
print("-----------------")
print("Name:", pos_sample_row.song.item())
print("Artist:", pos_sample_row.artist.item())
print("-----------------")
print("Lyrics extract:")
print(pos_sample_row_lyrics[:STR_PRINT_BOUND])

prediction = predict_lyrics_sentiment(model, tokenizer, pos_sample_row_lyrics)
print("-----------------")
print("Model Prediction: {:.2f}".format(prediction))
print("Spotify Label: {:.2f}".format(pos_sample_row.label.item()))

High valence lyrical sentiment analysis:
-----------------
Name: Lay Back in the Arms of Someone
Artist: Juice Newton
-----------------
Lyrics extract:
If you want my sympathy,
Just open your heart to me,
And you'll get whatever you'll ever need.
You think that's too high for you,
Oh baby, I would die for you,
When there's nothin' left,
You know where I'll be.

Lay back in the arms of someone,
You give in to the charms of someone,
Lay back in the arms of someone you love.
Lay back in the arms of someone,
When you feel you're a part of someone,
Lay back in the arms of someone you love.

So baby just call on me,
When you want all of me,
And I'll be your lover I'll be your friend.
And there's nothing I won't do,
Cause baby I j
-----------------
Model Prediction: 0.77
Spotify Label: 0.91


#### Your own example

In [84]:
 
# Your favorite lyrics!
my_lyrics = """
If love is a lie, then why do we need it?
We swear we're alive, but we're falling to pieces
We fight like lions
We howl at the moon
We should be flying
Instead we bury the truth
But I know inside we're beautiful creatures (beautiful)
We're beautiful creatures
When your highs are low, keep the faith yeah
'Cause you know that a life's never wasted
Standing tall, shaking off the dust
Now we know, now we know what we're made of
We got monsters in our closets
Had a reason but we lost it
No direction, we've been calling through the night
Through the night
If love is a lie, then why do we need it?
We swear we're alive, but we're falling to pieces
We fight like lions
We howl at the moon
We should be flying
Instead we bury the truth
But I know inside we're beautiful creatures (beautiful)
"""

prediction = predict_lyrics_sentiment(model, tokenizer, my_lyrics)
print("-----------------------")
print("Model Prediction: {:.2f}".format(prediction))
print("-----------------------")


-----------------------
Model Prediction: 0.52
-----------------------


In [86]:
song_title = "beautiful creatures"
artist = "illenium"

# Get a token by pressing "Get token" here https://developer.spotify.com/console/get-audio-features-track
# Spotify_token = "YOUR_TOKEN"
Spotify_token = "BQDlH3pC0hjMnAaoO-9JfvIyHR9U67ayouKGoeJVc4difxjy4rLAzGbbduPrZFdcmXwQpniQo5d59r9l2SEjB-A1VvsBgQAL5vhi8TTw8VgMLf60LwTf-O4eCIyqvX723vEiAU9DfKgM5IlgzpT9l2IuiqUHGALKiSuG5Mk-429Q2HO4d1VFGN0itEBtPdUKIw"
print("---------------------------------------------------------------------")
spotify_label = get_spotify_valence(song_title,artist,Spotify_token)
print("---------------------------------------------------------------------")


---------------------------------------------------------------------
Found valence: 0.21 of the song: beautiful creatures - illenium
---------------------------------------------------------------------
