#### This notebook contains all code for preprocessing the lyrics dataset and producing the backend of a hypothetical song recommendation app that recommends songs to users based on lyrics similarity.

# Package imports

In [6]:
# Imports
import os
import pandas as pd
import numpy as np
import sklearn as skl

# Data loading and merging
First we inspect a single one of these datasets

In [54]:
CardiB = pd.read_csv('lyrics_dataset\csv\CardiB.csv')
CardiB

Unnamed: 0.1,Unnamed: 0,Artist,Title,Album,Year,Date,Lyric
0,0,Cardi B,WAP,,2020.0,2020-08-07,cardi b al t mclaran megan thee stallion whor...
1,1,Cardi B,Bodak Yellow,Invasion of Privacy,2017.0,2017-06-16,ksr it's cardi ayy said i'm the shit they can'...
2,2,Cardi B,Bartier Cardi,Invasion of Privacy,2017.0,2017-12-22,cardi b savage bardi in a 'rari diamonds all...
3,3,Cardi B,Be Careful,Invasion of Privacy,2018.0,2018-03-30,yeah care for me care for me care for me uh ye...
4,4,Cardi B,Money,,2018.0,2018-10-22,look my bitches all bad my niggas all real i r...
...,...,...,...,...,...,...,...
70,70,Cardi B,Up (Radio Edit),,2021.0,2021-02-05,up up up ayy up uh up look this is fire once...
71,71,Cardi B,Do Me Dat Remix (Region Liberty Records / Atla...,,2019.0,2019-08-24,artist cardi b song do me dat remix featuring ...
72,72,Cardi B,Cardi B’s Tattoos,,,,faceneckhand red star right armloyalty over r...
73,73,Cardi B,The Good (Unreleased),,,,whats this now how could this happen after eve...


We have several unnecessary features. For my lyric recommendation engine, I want only the lyrics data and a single ID feature, for which I can use a combination of Artist & Title. 

I design a loop for quickly loading the 21 lyrics data files and preprocessing them accordingly.

In [56]:
data_folder = "lyrics_dataset/csv"

# Loop through all files in folder
for file in os.listdir(data_folder):
    # Search only for .csvs, just in case
    if file.endswith(".csv"):
        # Assign name
        dataframe_name = os.path.splitext(file)[0]
        # Get path
        file_path = os.path.join(data_folder, file)
        # Load each dataframe as a global variable
        globals()[dataframe_name] = pd.read_csv(file_path)
        # Drop rows where 'Lyric' is NaN
        globals()[dataframe_name] = globals()[dataframe_name].dropna(subset=["Lyric"])
        # Merge artist and song name into one variable
        globals()[dataframe_name]["Song and Artist"] = (globals()[dataframe_name]["Artist"] + " - " + globals()[dataframe_name]["Title"])
        # Drop Date, Album and Year
        globals()[dataframe_name].drop(columns=["Artist", "Title", "Date", "Album", "Year"], inplace=True)
        # Make "Song and Artist" the first column
        cols = ["Song and Artist"] + ["Lyric"]
        globals()[dataframe_name] = globals()[dataframe_name][cols]
        # DEBUG PRINT to inspect the outputs of this loop
        print("************************")
        print(dataframe_name, "dataset: ")
        print(globals()[dataframe_name].head())
        print("************************")
        print("\n")
    

************************
ArianaGrande dataset: 
                          Song and Artist  \
0          Ariana Grande - ​thank u, next   
1                 Ariana Grande - 7 rings   
2         Ariana Grande - ​God is a woman   
3            Ariana Grande - Side To Side   
4  Ariana Grande - ​​no tears left to cry   

                                               Lyric  
0  thought i'd end up with sean but he wasn't a m...  
1  yeah breakfast at tiffany's and bottles of bub...  
2  you you love it how i move you you love it how...  
3  ariana grande  nicki minaj i've been here all ...  
4  right now i'm in a state of mind i wanna be in...  
************************


************************
Beyonce dataset: 
           Song and Artist                                              Lyric
0  Beyoncé - Drunk in Love  beyoncé i've been drinkin' i've been drinkin' ...
1      Beyoncé - Formation  messy mya what happened at the new wil'ins bit...
2      Beyoncé - Partition  part  yoncé   let m

In [57]:
TaylorSwift

Unnamed: 0,Song and Artist,Lyric
0,Taylor Swift - ​cardigan,vintage tee brand new phone high heels on cobb...
1,Taylor Swift - ​exile,justin vernon i can see you standing honey wit...
2,Taylor Swift - Lover,we could leave the christmas lights up 'til ja...
3,Taylor Swift - ​the 1,i'm doing good i'm on some new shit been sayin...
4,Taylor Swift - Look What You Made Me Do,i don't like your little games don't like your...
...,...,...
474,Taylor Swift - Teardrops on my Guitar (Live fr...,drew looks at me i fake a smile so he won't se...
475,Taylor Swift - Evermore [Forward],to put it plainly we just couldnt stop writing...
476,Taylor Swift - Welcome Back Grunwald,turn wycd on you're on your grunwald back from...
477,Taylor Swift - Tolerate it (Polskie Tłumaczenie),zwrotka siedzę i patrzę jak czytasz z głową p...
