# Getting Lyrics for Each Track from Genius
The Capstone scope has changed directions some, I still want to do a skip prediction model, but instead of looking at Spotify user behavior, it will be based on whether I would skip the song or not. The classification will be if the track begins with the artist announcing their name within the first verse. Using the lyrics I will do some NLP modeling. Lyrics are pulled from Genius using the Lyricsgenius library.

### Imports

In [1]:
import pandas as pd
import numpy as np

import time   
import lyricsgenius

%load_ext email_notify_magic
# sends me an email when a cell has finished running (for long running cells)
# https://pypi.org/project/email-notify-magic/

Setting up email notifications for cell run completion, and testing it out the first time. Because of the possibility of insecurity and the need to provide the password. I decided to use a throwaway email, that was created when Pokemon Go came out.

In [5]:
%%email funnyshitthisis@gmail.com -k
# when this cell finishes running send an email. 
# (Testing it out before the cell I really need it for)

tracks_df = pd.read_csv('./data/tracks_cleaned.csv', index_col=0)

Type your password and press enter: ········


In [6]:
tracks_df.head()

Unnamed: 0,artists,title,track_id,popularity,explicit,release_year
0,Shawn Mendes,Wonder,5KCbr5ndeby4y4ggthdiAb,0,False,2020
1,Justin Bieber,Holy (feat. Chance The Rapper),5u1n1kITHCxxp8twBcZxWy,92,False,2020
2,24kGoldn,Mood (feat. Iann Dior),3tjFYV6RSFtuktYl3ZtYcq,100,True,2020
3,Internet Money,Lemonade,02kDW379Yfd5PzW5A6vuGt,93,True,2020
4,BLACKPINK,Bet You Wanna (feat. Cardi B),1hPkiovjTqiJAJen4uyNRg,0,False,2020


One of the things I forgot to check for in my cleanup notebook is if there were any duplicates. Which is likely as the songs/tracks were pulled from playlists. A song/ track could appear multiple times in different playlists: '1990s', 'Best of the 90s', etc. The drop is done based on artist and title match only, because the track_id and popularity change based on which playlist it was on.

In [7]:
tracks_df[tracks_df.duplicated()]

Unnamed: 0,artists,title,track_id,popularity,explicit,release_year


In [8]:
print(f' There are now {tracks_df.shape[0]} rows remaining')

 There are now 76338 rows remaining


In [9]:
# Empty list to hold all the lyrics for the tracks
lyrics = []

### Get Lyrics function
The function takes no arguments, and runs starting with a while loop.
- While the length of the lyrics list (above) is not equal to the length of the dataframe
- call to the lyricsgenius library
- For each track (row) in the dataframe's values
- search for song matching the track/row's 2nd column value ('title') and the 1st column value ('artist')
- try to append the result to the lyrics list
- except if there is an issue (more likely no results) then append NaN (easy to find missing values later)
- sleep 15seconds and go again.

In [10]:
# lyricsgrenius is a python library that makes API pulls from Genius alot easier, instead of scrapping the entire HTML page to
# get the lyrics this makes it seemless where all I need to provide is the name of the artist and the song title.
# https://pypi.org/project/lyricsgenius/

def get_lyrics(): #no arguments needed
    while len(lyrics) != len(tracks_df): # while the length of list is not equal to length of dataframe
        genius = lyricsgenius.Genius("2AaY5B5Iylvf3X2oQl_4Jmk74pSZ-J8VqcboMk1PO_y7N8wlN83D9a85uhNh49Id") # call to lyricsgenius
        for track in tracks_df.values: # for each track in the dataframe. I have indexed it to 5 rows for submission only
            song = genius.search_song(track[1], track[0]) # find song on Genius with matching name and artist
            try:    
                lyrics.append(song.lyrics) # append value to list
            except:
                lyrics.append(np.NAN) # if nothing to append, append NaN
        time.sleep(30) # take a catnap        

In [None]:
%%email funnyshitthisis@gmail.com --body 'Done'
# This is where the email notification comes handy. Getting 80k + Lyrics will take a long time.

In [None]:
%%email funnyshitthisis@gmail.com --body 'Done'
get_lyrics() # calling the function

Type your password and press enter: ········
Searching for "Wonder" by Shawn Mendes...
Done.
Searching for "Holy (feat. Chance The Rapper)" by Justin Bieber...
Done.
Searching for "Mood (feat. Iann Dior)" by 24kGoldn...
Done.
Searching for "Lemonade" by Internet Money...
Done.
Searching for "Bet You Wanna (feat. Cardi B)" by BLACKPINK...
Done.
Searching for "Head & Heart (feat. MNEK)" by Joel Corry...
Done.
Searching for "you broke me first" by Tate McRae...
Done.
Searching for "Dynamite" by BTS...
Done.
Searching for "For The Night (feat. Lil Baby & DaBaby)" by Pop Smoke...
Done.
Searching for "Laugh Now Cry Later (feat. Lil Durk)" by Drake...
Done.
Searching for "WAP (feat. Megan Thee Stallion)" by Cardi B...
Done.
Searching for "Take You Dancing" by Jason Derulo...
Done.
Searching for "Breaking Me" by Topic...
Done.
Searching for "Savage Love (Laxed – Siren Beat) [BTS Remix]" by Jawsh 685...
Done.
Searching for "Watermelon Sugar" by Harry Styles...
Done.
Searching for "my ex's best 

Done.
Searching for "I Should Probably Go To Bed" by Dan + Shay...
Done.
Searching for "Lovin' On You" by Luke Combs...
Done.
Searching for "Everywhere But On" by Matt Stell...
Done.
Searching for "Happy Anywhere (feat. Gwen Stefani)" by Blake Shelton...
Done.
Searching for "Long Live" by Florida Georgia Line...
Done.
Searching for "No I in Beer" by Brad Paisley...
Done.
Searching for "More Than My Hometown" by Morgan Wallen...
Done.
Searching for "On Me (feat. Ava Max) - From 'SCOOB!' The Album" by Thomas Rhett...
No results found for: 'On Me (feat. Ava Max) - From 'SCOOB!' The Album Thomas Rhett'
Searching for "Things I Can't Say (feat. Julia Cole)" by Spencer Crandall...
Specified song does not contain lyrics. Rejecting.
Searching for "Got What I Got" by Jason Aldean...
Done.
Searching for "Broken Up" by Mitchell Tenpenny...
Done.
Searching for "One Too Many" by Keith Urban...
Done.
Searching for "One Beer (HARDY feat. Lauren Alaina, Devin Dawson)" by HIXTAPE...
No results found for

Done.
Searching for "Must Stop (feat. Sarah Barthel, Phantogram)" by ONR...
Done.
Searching for "tiny life" by EVAN GIIA...
Done.
Searching for "You Only Die Once" by Kelsy Karter...
Done.
Searching for "MEN" by deryk...
Done.
Searching for "Kissez" by Sevyn Streeter...
Done.
Searching for "Composure" by Lud Foe...
Done.
Searching for "CLUB E11EVEN (feat. Guapdad 4000)" by ALLBLACK...
Specified song does not contain lyrics. Rejecting.
Searching for "I'd Rather Die Than Let You In" by The Hunna...
Done.
Searching for "Halfway Down" by Corey Taylor...
Done.
Searching for "Beautiful Drug" by Bon Jovi...
Done.
Searching for "Baby" by Brandon...
Done.
Searching for "Wild Child (Feat. Lil Baby)" by Noodah05...
Done.
Searching for "Hey Baby (feat. Gia Koka)" by Imanbek...
Done.
Searching for "I Got You" by Disciples...
Done.
Searching for "The Business" by Tiësto...
Done.
Searching for "Like A Prayer" by Galwaro...
Done.
Searching for "Too Many Nights" by 220 KID...
Done.
Searching for "Tu Tu

In [None]:
len(lyrics)

In [None]:
df = tracks_df[0:18477]

In [None]:
df

In [None]:
df['lyrics'] = lyrics

In [None]:
df.drop_duplicates(['artists','title'],keep= 'first', inplace=True)

In [None]:
# df.to_csv('./data/lyrics_cp.csv')

In [None]:
len(tracks_df)

### Some Additional Cleanup
Now that I have all my lyrics I need to drop any duplicates that might have remained: 
<br> Clean up the strings as there are some obscure characters in them \n & \ </br>
<br>I will also remove the [verse 1] [chorus] parts as the artist's names are listed in there and will through my model off.</br> <br>And finally I will create a binarized column on whether the artist self-announces at the beginning of the track, I will need to extract the &feat artist name from title to add to the artist column. </br>

In [None]:
lyrics[0] = lyrics[0].replace('\n', ' ') 

In [None]:
lyrics[0].replace("\'", "")

In [None]:
lyrics[10664]

In [None]:
del lyrics[6607]

In [None]:
len(tracks_df)

In [None]:
tracks_df.iloc[6607]

# Drop track_id