# Lyrics Emotions Project

## Our Data

Before doing anything with data, we need to import the necessary libraries to read and manipulate the data.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd

In [60]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.metrics import f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import cross_val_score
from sklearn import metrics
from sklearn.metrics import confusion_matrix
import numpy as np
import itertools
from nltk.tokenize import WordPunctTokenizer
import nltk

In [46]:
multi_label = pd.read_csv('/content/drive/MyDrive/MultiLabel.csv', encoding='utf-8')
multi_label.head()

Unnamed: 0,artist,genre,title,album,year,lyrics,labels
0,Nirvana,Rock,You Know You’re Right,Nirvana,2002.0,I will never bother you\nI will never promise ...,"Calmness, Sadness"
1,Damian Marley,Reggae,Here We Go,Stony Hill,2017.0,Here we go\nMy big ego is gonna get me in trou...,"Power, Tension"
2,The Mission UK,Rock,Jade,Another Fall from Grace,2016.0,She came as Lolita dressed as Venus\nAnd adorn...,"Amazement, Calmness, Solemnity, Tenderness"
3,UB40,Reggae,Food For Thought,Signing Off,1980.0,"Ivory Madonna, dying in the dust\nWaiting for ...","Joyful-activation, Sadness, Tension"
4,Johnny Cash,Country,I’ve Been Everywhere,American II: Unchained,1996.0,I was totin' my pack along the dusty Winnemucc...,"Amazement, Calmness, Joyful-activation"


This dataset contains the same information but differs in the final column, with it being labels and containing multiple labels for each example. This dataset is to be used for the Multi-Label Classification problem.

## Data Exploration and Preprocessing

### B: Multi-Label Classification

First we are going to take a look at the distribution of labels in our dataset.

After taking a closer look at the data in hand, we proceed with solving the classification problem.

In [5]:
from sklearn.preprocessing import MultiLabelBinarizer

For this task we will need a Binarizer for the labels. As each example is annotated as more than one classes, we need to transform this information to a machine-readable format. But first, let's create X and y.

In [6]:
X = multi_label['lyrics']
X.head()

0    I will never bother you\nI will never promise ...
1    Here we go\nMy big ego is gonna get me in trou...
2    She came as Lolita dressed as Venus\nAnd adorn...
3    Ivory Madonna, dying in the dust\nWaiting for ...
4    I was totin' my pack along the dusty Winnemucc...
Name: lyrics, dtype: object

In [56]:
label_dataframe = pd.DataFrame(multi_label['labels'])
label_dataframe['labels'] = label_dataframe['labels'].str.lower()
for i, label in label_dataframe.iterrows():
  label= list(str(label[0]).split(', '))
  label_dataframe.at[i,'labels'] = label
label_dataframe['labels'][0]

['calmness', 'sadness']

In [50]:
df_MLB = pd.DataFrame({"genre": [["action", "drama","fantasy"], ["fantasy","action"], ["drama"], ["sci-fi", "drama"]]})
df_MLB['genre']
df_MLB['genre'][0]

['action', 'drama', 'fantasy']

In [57]:
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(label_dataframe['labels'])
y
mlb.classes_

array(['amazement', 'calmness', 'joyful-activation', 'nostalgia', 'power',
       'sadness', 'solemnity', 'tenderness', 'tension'], dtype=object)

In the line above, our labels are transformed into arrays, where 1 indicates the presence of a specific class.

In [58]:
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier

For this problem we are using Logistic Regression-trained classifiers for each class, with the help of the OneVsRestClassifier. The data is again fed into the classifier via a pipeline, where it is transformed into vectors of TF/IDF features.

In [61]:
tfv = TfidfVectorizer(min_df=3, max_features=3000, strip_accents='unicode',lowercase =True,
                            analyzer='word', token_pattern=r'\w{3,}', ngram_range=(1,1),
                            use_idf=True,smooth_idf=True, sublinear_tf=True, stop_words = "english")
lr = LogisticRegression()
clf = OneVsRestClassifier(lr)
pipeline = make_pipeline(tfv, clf)

Now, that we have our model, we once again split into training and test sets to calculate our classifier's efficiency in classifying lyrics into sets of emotions' labels.

In [62]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 42)

Before we fit our classifiers, we import the f1-score metric to evaluate the efficiency of learning of our model.

In [63]:
from sklearn.metrics import f1_score

This time the average setting of f1_score is set to micro in order to aggregate the contributions of all classes and compute the average metric.

In [64]:
kendrick_lyrics = pd.read_csv('/content/drive/MyDrive/convert_sample.csv', encoding='utf-8')
kendrick_lyrics.head()

Unnamed: 0,lyrics
0,"We outlawed, then I bogart, any pros that got ..."
1,"Uh-uh, fuck that\nEight doobies to the face, f..."
2,"Life is a traffic jam, life is a traffic jam\n..."
3,"She look better than Beyoncé, Alicia Keys, Hal..."
4,"All day hoe, my neck look sweeter than parfait..."


In [65]:
X__test = pd.DataFrame(data = kendrick_lyrics)
X_test = X__test['lyrics']

In [67]:
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
y_pred
# print('F1-SCORE :',f1_score(y_test, y_pred, average="micro"))

array([[0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 1]])

In [68]:
result_label = mlb.inverse_transform(y_pred)
result_label

# Amazement
# Calmness
# Joyful activation
# Nostalgia
# Power
# Sadness
# Solemnity
# Tenderness
# Tension

[('tension',),
 ('power', 'sadness', 'tension'),
 ('sadness',),
 ('sadness', 'tenderness'),
 ('power', 'tension'),
 ('tension',),
 ('power', 'tension'),
 ('power',),
 ('power', 'tension'),
 ('nostalgia', 'sadness', 'tension'),
 ('tension',),
 ('power', 'tension'),
 ('power', 'sadness'),
 ('power', 'tension'),
 ('sadness',),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('sadness', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 (),
 ('power', 'sadness', 'tension'),
 ('tension',),
 ('power', 'sadness', 'tension'),
 (),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'sadness', 'tension'),
 (),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('power', 'tension'),
 ('nostalgia', 'sadness', 'tension'),
 ('power', 'tension'),
 (),
 ('power', 'tension'),
 ('nostalgia', 'power', 'sadness', 'tension'),
 ('power', 't