<a href="https://colab.research.google.com/github/Meomeoowww/MeoLyrics/blob/master/Rev_1_of_Natural_Language_Processing_and_Hip_Hop_Songs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Theme analysis in Rap Songs : Practical case  of Natural Language Processing

After going through my course content on Natural language processing I decided to give a try.

We aimed to use Natural Language Processing with non-negative Matrix Factorization approach to resume main theme in Hip Hop songs.

We started by previously building a web scraper to collect Lyrics from Genius.com api. The scaper extracts a list of Hip Hop Artists from Wikipedia ((https://en.wikipedia.org/wiki/List_of_hip_hop_musicians) and then requests lyrics of these artists. For more details on this scrap please read my article about it.

## Goals

* Natural langage processing with collected data

    1- use Tfidfvectorizer to fit transform and create a sparse matrix (tfidf)
    
    2- construct NMF based on sparse matrix to analyze thematics

## Data Processing

We will start by importing all necessary modules:

In [0]:
import pandas as pd # Import Pandas
from sklearn.feature_extraction.text import TfidfVectorizer # Import TfidfVectorizer
from sklearn.decomposition import NMF # Import NMF
from sklearn.feature_extraction import text

import warnings # Import warnings to...
warnings.filterwarnings("ignore") #silence warnings

from spacy.lang.fr.stop_words import STOP_WORDS as FR # Import french stopwords
from spacy.lang.en.stop_words import STOP_WORDS as EN # Import english stopwords
STOP = list(FR) + list(EN) + list(['intro', 'chorus', 'verse', 'hook', 'outro', 'll', 'ain', 'em', 'got', 'don', 'gon'])# load stopwords list in and add some songs related stopwords like chorus and intro

from google.colab import drive # Import 

For this one, we used Google Colab platform to run our model. 

All used csv files were in a GDrive folder, so we needed to mount Google drive to allow direct access. 

In [91]:
drive.mount('/content/drive') # mount drive
root_path = 'drive/My Drive/Colab Notebooks/NLP with GeniusLyrics/'  #c set a new root path for folder

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Let's convert to dataframe our previously scraped files:

In [0]:
# convert files to Pandas Dataframe
adb = pd.read_csv(root_path+'adb.csv', names=['ID', 'Lyrics'])
asongtitle = pd.read_csv(root_path+'asongtitle.csv', names=['ID', 'Title'])
aart = pd.read_csv(root_path+'aart.csv', names=['ID', 'Artist'])

# Rename and drop ID Columms
adb = adb.drop('ID', axis=1)
adb = adb.drop([0], axis=0) #drop 0 which  doesn't have any lyrics
asongtitle = asongtitle.drop('ID', axis=1)
asongtitle = asongtitle.drop([11258], axis=0) #drop 11258 which  doesn't have any lyrics
asongtitle = asongtitle.drop([0], axis=0) #drop 0 which  doesn't have any lyrics
aart = aart.drop('ID', axis=1)

In [93]:
asongtitle.head(2)

Unnamed: 0,Title
1,Never Bend
2,Run for Yo Life


In [94]:
adb.head(2)

Unnamed: 0,Lyrics
1,\n\n[Intro]\nYeah\nYou lil bitch ass niggas st...
2,"\n\n[Intro]\n03, yeah, yeah\n\n[Chorus]\nI bee..."


The following files were loaded:

    *adb: contains all lyrics pulled from Genius.com
    *asongtitle: contains songs title
    *aart: contains names of these artists


In [95]:
print(str('adb holds {} lyrics for 1397 artists.').format(len(adb)))

adb holds 13699 lyrics for 1397 artists.


We need to correcly flatten our Dataframes, as NLP uses list-like objects:

In [0]:
#flatten lyrics
k = adb.values.tolist()
lyr = []
for n1 in k:
    lyr.append(n1[0])

#flatten songtitles
k = asongtitle.values.tolist()
songs = []
for n1 in k:
    songs.append(n1[0])

## Model

To run our model, we need first to prepare the text we are supposed to feed to our model by transforming it to a text vectorizer. We have to add as parameters stopwords. Stopwords are words that are common in texts like 'a', 'the' and so on.

When processing, we noticed that in our list we have some french rappers in it like PNL, so we worked with a list containing both languages stopwords.

The following vectorizer is:

### 1- Tfidfvectorizer

In [0]:
# Create a TfidfVectorizer: tfidf
tfidf = TfidfVectorizer(max_df=0.95, min_df=0.2, stop_words=STOP, use_idf=True) 

# Apply fit_transform to document: csr_mat
csr_mat = tfidf.fit_transform(lyr)

# Get the words: words
words = tfidf.get_feature_names()

Now let's run our model with the transformed matrix:

### 2- Non negative Matrix Factorization

One of the most subjective part of NMF processing is that we need to feed the number of components (theme). How can we know the number of themes that could be ?

Well, we tried some and by default used 3.

In [0]:
# Create an NMF instance: model
model = NMF(n_components=3)

# Fit the model to articles
model.fit(csr_mat)

# Transform the articles: nmf_features
nmf_features = model.transform(csr_mat)

### 3- Comments on Components

To have a clear idea of principal words describing our components, we will print top 10 of them:

In [99]:
# Print top 10 words
for i,topic in enumerate(model.components_):
    print(f'Top 10 words for topic #{i}:')
    print([words[i] for i in topic.argsort()[-10:]])
    print('\n')

Top 10 words for topic #0:
['come', 'life', 'way', 'let', 'time', 'man', 'love', 'cause', 'know', 'like']


Top 10 words for topic #1:
['ya', 'money', 'bitches', 'ass', 'like', 'shit', 'fuck', 'bitch', 'niggas', 'nigga']


Top 10 words for topic #2:
['let', 'right', 'wanna', 'want', 'bitch', 'love', 'know', 'girl', 'baby', 'yeah']




Main themes in Rap songs are finally revealed here.

As we can see, first topic is related to Conscient Hip Hop. We can see 'life', 'way', 'time' and 'love' that links to conscious and more profound contents in those lyrics (https://en.wikipedia.org/wiki/Political_hip_hop#Conscious_hip_hop).

Second topic is related to non- conscious themes, as self love, ego, money and commonly encountered  themes in actual gangsta hip hop songs.

Last one is related to Love songs with 'love' and/ or 'baby' (https://www.redbull.com/au-en/best-rap-love-songs).

This work could later be extended with comparison between early 90's artists and current ones themes, and highliting some artists names representing those different streams.

