## Song Lyrics Generator

#### Problem Statement
Overcoming writers block and writing creative lyrics is difficult task for most musicians, which leads to frustration and feeling "stuck". For this project, I am hoping to create a text generation model that predicts the next word, couple words, or line of text for a new song in order to help musicians create awesome new lyrics. 

#### Proposed Method and Models
- Gpt-2 to train my model -- Depending on the output, it would be interesting to let the user select a single artist they want their lyrics to sound like (MVP). It would also be interesting if I pulled from Spotify a list of similar artists and grouped them together to create another model, showcasing my ability to cluster. In addition, I could also create a generator by genre (which would require me to get a list of the top songs by genre and train a model for each)
- Flask App
- NLP
- AWS

#### Risks and Assumptions of My Data
- Will have a large variety of artists and the lyrics to each of their songs
- I could have duplicates of the same song
- Song lyrics can be abstract, which I am not sure yet how that might affect my model
- The score of the model can be subjective

#### Initial Goals and Success Criteria
- Predict either the next word, words, or line of a song based on an text input from the user
- Consider using a BLEURT Score as a metric of success - https://medium.com/@jrodthoughts/googles-bleurt-is-bert-for-evaluating-natural-language-generation-models-fa0ce898c38a.
- Give the user multiple options for the next word(s), score on the number of accepted recommendations
- Rhyme (stretch)
- Write an original song with > 50% of the lyrics computer generated

#### Initial Data Source and EDA
I plan to pull a handful more artists and add them as csvs. With the function below, the biggest hurdle is the amount of time is takes to scrape the large pulls of data, but it is a repeatable process that I could let run during off hours. I have not fully decided if I am going to integrate the Spotify API data, which is why I've tried to pull the `spotify_url` from the songs below. I may be able to look up by the `id` or worst case can use the `artist_name` with the `title` . 

In [19]:
#Keep my access token secret

with open('./token.txt', 'r') as f:
    token = f.read().replace('\n', '')

In [20]:
#Credit: https://lyricsgenius.readthedocs.io/en/master/index.html
from lyricsgenius import Genius
import pandas as pd
import numpy as np
import json
import os

genius = Genius(token, 
                sleep_time = 1, #don't overload the servers
                remove_section_headers=True) #removes [Chorus], [Bridge], etc. headers from lyrics.
                #verbose=False will stop the pull from printing

# Scrape Lyrics from Multiple Artists

**Artists**
- John Mayer
- Jack Johnson
- ...(will add more)

In [19]:
john_mayer = genius.search_artist('John Mayer')
jack_johnson = genius.search_artist('Jack Johnson')

Searching for songs by John Mayer...

Song 1: "New Light"
Song 2: "Gravity"
Song 3: "Slow Dancing in a Burning Room"
Song 4: "Free Fallin’"
Song 5: "In the Blood"
Song 6: "You’re Gonna Live Forever in Me"
Song 7: "Daughters"
Song 8: "Stop This Train"
Song 9: "Moving On and Getting Over"
Song 10: "Wildfire Interlude"
Song 11: "Still Feel Like Your Man"
Song 12: "Waiting on the World to Change"
Song 13: "Rosie"
Song 14: "Paper Doll"
Song 15: "In Your Atmosphere"
Song 16: "Emoji of a Wave"
Song 17: "Love on the Weekend"
Song 18: "Edge of Desire"
Song 19: "Your Body Is a Wonderland"
Song 20: "No Such Thing"
Song 21: "Carry Me Away"
Song 22: "Who Says"
Song 23: "Never on the Day You Leave"
Song 24: "Half of My Heart"
Song 25: "Who You Love"
Song 26: "XO"
Song 27: "Born and Raised"
Song 28: "I Don’t Trust Myself (With Loving You)"
Song 29: "Helpless"
Song 30: "Neon"
Song 31: "Walt Grace’s Submarine Test, January 1967"
Song 32: "Why Georgia"
Song 33: "Dreaming With a Broken Heart"
Song 34: "H

## Save output as a json file

In [21]:
genius.save_artists(artists=[john_mayer, jack_johnson], overwrite=True)


tmp_lyrics/tmp_0_JohnMayer
Wrote `Lyrics_JohnMayer.json`
tmp_lyrics/tmp_1_JackJohnson
Wrote `Lyrics_JackJohnson.json`
Time elapsed: 0.00010644555091857911 hours


# Function to Create Lyrics Dataframe for each artist

In [73]:
data['songs'][232]['media']

[]

In [80]:
len(data['songs'][221]['media'])

0

In [86]:
spotify_url = []
for song in range(0,len(data['songs'])): #for each song
    if len(data['songs'][song]['media']) > 0: #are there any values in media?

        #store a list of providers to later check if spotify is in that list
        contains_spotify = [data['songs'][song]['media'][item]['provider'] for item in range(0,len(data['songs'][song]['media']))] 
        
        if 'spotify' in contains_spotify: 
            for item in range(0,len(data['songs'][song]['media'])):  #loop through each item
                if (data['songs'][song]['media'][item]['provider'] == 'spotify'): #find the item that is spotify
                    spotify_url.append(data['songs'][song]['media'][item]['url']) #add the url 
        else:
            spotify_url.append('NA')
    else:
        spotify_url.append('NA')

234

In [87]:
def get_lyrics(filename):
    
    # Reading the json as a dict
    with open(filename) as json_data:
        data = json.load(json_data)

    #Source credit: https://stackoverflow.com/questions/28373282/how-to-read-a-json-dictionary-type-file-with-pandas
    
    #need to account for that every song doesn't have a spotify url
    spotify_url = []
    for song in range(0,len(data['songs'])): #for each song
        if len(data['songs'][song]['media']) > 0: #are there any values in media?

            #store a list of providers to later check if spotify is in that list
            contains_spotify = [data['songs'][song]['media'][item]['provider'] for item in range(0,len(data['songs'][song]['media']))] 

            if 'spotify' in contains_spotify: 
                for item in range(0,len(data['songs'][song]['media'])):  #loop through each item
                    if (data['songs'][song]['media'][item]['provider'] == 'spotify'): #find the item that is spotify
                        spotify_url.append(data['songs'][song]['media'][item]['url']) #add the url 
            else:
                spotify_url.append('NA')
        else:
            spotify_url.append('NA')
    
    #create a dataframe
    df =  pd.DataFrame({

        #multiply by length of songs in json response to create a new row for each
        'artist_name' : [data['name']] * len(data['songs']),                                              #artist name
        'image_url' : [data['image_url']] * len(data['songs']),                                           #thumbnail image
        'url' : [data['url']] * len(data['songs']),                                                       #genius url

        #song data
        'title' : [data['songs'][item]['title'] for item in range(0,len(data['songs']))],                 #title of each song
        'lyrics' : [data['songs'][item]['lyrics'] for item in range(0,len(data['songs']))],                #song lyrics           
        'spotify_url' :  spotify_url, #spotify url


    })
    return df    

### Collect data for each artist

In [88]:
john_mayer_df = get_lyrics('Lyrics_JohnMayer.json')
john_mayer_df.head(3)

Unnamed: 0,artist_name,image_url,url,title,lyrics,spotify_url
0,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,New Light,"Ah, ah, ah\nAh...\n\nI'm the boy in your other...",https://open.spotify.com/track/3bH4HzoZZFq8UpZ...
1,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,Gravity,Gravity is working against me\nAnd gravity wan...,https://open.spotify.com/track/52K3qt1rCYf3Ciu...
2,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,Slow Dancing in a Burning Room,It's not a silly little moment\nIt's not the s...,https://open.spotify.com/track/3f8Uygfz3CIpUCo...


In [89]:
jack_johnson_df = get_lyrics('Lyrics_JackJohnson.json')
jack_johnson_df

Unnamed: 0,artist_name,image_url,url,title,lyrics,spotify_url
0,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,Banana Pancakes,Can't you see that it's just raining?\nThere a...,https://open.spotify.com/track/451GvHwY99NKV4z...
1,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,Better Together,There is no combination of words\nI could put ...,https://open.spotify.com/track/4VywXu6umkIQ2OS...
2,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,Upside Down,Who's to say what's impossible?\nWell they for...,https://open.spotify.com/track/2s1rRlb24PuFv80...
3,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,Flake,I know she said it's alright\nBut you can make...,https://open.spotify.com/track/09dOpua4kGcHzH9...
4,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,"Sitting, Waiting, Wishing","Well I was sitting, waiting, wishing\nYou beli...",https://open.spotify.com/track/5eWOsyHHic4vJP3...
...,...,...,...,...,...,...
176,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,"Good people - Live At Bonnaroo, Manchester, Te...","Well, you win, it's your show now\nSo what's i...",
177,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,"Constellations - Live at Bonnaroo, Manchester,...",The light was leaving in the west it was blue\...,
178,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,"Times Like These - Live In Santa Barbara, Cali...",In times like these\nIn times like those\nWhat...,
179,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,Secret Heart,"Secret Heart, what are you made of?\nWhat are ...",


# Stack the  Dataframes

In [90]:
all_songs_df = pd.concat([john_mayer_df,jack_johnson_df])
all_songs_df

Unnamed: 0,artist_name,image_url,url,title,lyrics,spotify_url
0,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,New Light,"Ah, ah, ah\nAh...\n\nI'm the boy in your other...",https://open.spotify.com/track/3bH4HzoZZFq8UpZ...
1,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,Gravity,Gravity is working against me\nAnd gravity wan...,https://open.spotify.com/track/52K3qt1rCYf3Ciu...
2,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,Slow Dancing in a Burning Room,It's not a silly little moment\nIt's not the s...,https://open.spotify.com/track/3f8Uygfz3CIpUCo...
3,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,Free Fallin’,"She's a good girl, loves her mama\nLoves Jesus...",https://open.spotify.com/track/4LloVtxNZpeh7q7...
4,John Mayer,https://images.genius.com/4c443bf06bf6b696ad4f...,https://genius.com/artists/John-mayer,In the Blood,How much of my mother has my mother left in me...,https://open.spotify.com/track/77Y57qRJBvkGCUw...
...,...,...,...,...,...,...
176,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,"Good people - Live At Bonnaroo, Manchester, Te...","Well, you win, it's your show now\nSo what's i...",
177,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,"Constellations - Live at Bonnaroo, Manchester,...",The light was leaving in the west it was blue\...,
178,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,"Times Like These - Live In Santa Barbara, Cali...",In times like these\nIn times like those\nWhat...,
179,Jack Johnson,https://s3.amazonaws.com/rapgenius/Jack+Johnso...,https://genius.com/artists/Jack-johnson,Secret Heart,"Secret Heart, what are you made of?\nWhat are ...",


In [95]:
all_songs_df[all_songs_df['spotify_url'] == 'NA'].shape

(313, 6)

In [94]:
#save as a csv
all_songs_df.to_csv('all_songs.csv', index=False)

# Create a Dataframe with all of the lyrics in one cell by artist

In [96]:
lyrics_df = pd.DataFrame(list(all_songs_df.groupby('artist_name')['lyrics']),columns=['artist_name','lyrics'])
lyrics_df

Unnamed: 0,artist_name,lyrics
0,Jack Johnson,0 Can't you see that it's just raining?\n...
1,John Mayer,"0 Ah, ah, ah\nAh...\n\nI'm the boy in you..."


In [97]:
#save as a csv
lyrics_df.to_csv('lyrics_df.csv', index=False)

### MISC

In [13]:
all_songs_df['title'].str.cat(sep=' ')

'New Light Gravity Slow Dancing in a Burning Room Banana Pancakes Better Together Upside Down'

In [14]:
' '.join(all_songs_df['title'])

'New Light Gravity Slow Dancing in a Burning Room Banana Pancakes Better Together Upside Down'

In [98]:
#when add more artists can loop through like this in one function
for filename in os.listdir('./'):
    if filename.startswith('Lyrics_'):
        print(filename)

Lyrics_JohnMayer.json
Lyrics_JackJohnson.json
