# Lyric generation with LSTMs

**Author : ** Aniruddha Mysore

Lyric data has been parsed from Lyrics Wikia. The songlist was parsed manually with beautiful soup, and used the API to get lyrics of each song - [API](https://github.com/rhnvrm/lyric-api)

**Credits: **
 
1. Videos on LSTMs and RNNs by Siraj Raval (Youtube)

2. Ivan Liljeqvist's [article](https://medium.com/@ivanliljeqvist/using-ai-to-generate-lyrics-5aba7950903) on using Keras for generating lyrics and his [code](https://github.com/ivan-liljeqvist/ailyrics/) 

![](https://data.whicdn.com/images/36141347/large.jpg)


## First: Data Collection 

In [35]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import json
from urllib.request import urlopen
import urllib.request
import re 

# Get html
url = 'http://lyrics.wikia.com/wiki/Eminem'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

count = 0
data = list()

# Parse the data to get list of songs and urls
for album in soup.find_all(class_='album-art'):
    count += 1
    for song in album.find_next('ol').children:
        a = re.search('\:(.*)', song.b.a['href'])
        data.append({
            'url': song.b.a['href'],
            'name': a.group(1)
        #    'name': song.b.a.contents[0]
        })

df = pd.DataFrame(data, columns=["url","name"])
df.head()

Unnamed: 0,url,name
0,/wiki/Eminem:Infinite,Infinite
1,/wiki/Eminem:W.E.G.O.,W.E.G.O.
2,/wiki/Eminem:It%27s_OK,It%27s_OK
3,/wiki/Eminem:313,313
4,/wiki/Eminem:Tonite,Tonite


In [36]:
# here's what the lyrics look like
data = json.load(urlopen('http://lyric-api.herokuapp.com/api/find/Eminem/'+df.iloc[0]['name']))
print(data['lyric'])

spoken:
Oh yeah, this is Eminem baby, back up in that motherfucking ass
One time for your mother fucking mind, we represent the 313
You know what I'm saying?, 'cause they don't know shit about this
For the 9-6

Ayo, my pen and paper cause a chain reaction
To get your brain relaxin, 'cause they be actin' maniac in action
A brainiac in fact son, you mainly lack attraction
You looking zany whack with just a fraction of my tracks spun
My rhyming skills got you climbing hills
I travel through your mind into your spine like siren drills
I'm sliming grills of roaches, with sprayed on disinfectants
Twistin necks of rappers till their spinal column disconnects
Put this in decks and check the monologue, turn your system up
Twist them up, and indulge in the marijuana smog
This is the season for noise pollution contamination
Examination of more cartoons than animation
My lamination of narration
Hit's a snare and bass in the track fucked up rapper interrogation
When I declare invasion, there ain't 

In [37]:
#Saving the corpus file for training the model

corpus = ""

for index, row in df.iterrows():
    try:
        data = json.load(urlopen('http://lyric-api.herokuapp.com/api/find/Eminem/'+row['name']))
        corpus += "\n" + data['lyric']
    except urllib.error.HTTPError:
        print("ERROR :",index, row['name'])
    else:
        print(index, row['name'])

with open("corpus.txt", "w") as text_file:
    text_file.write(corpus)

0 Infinite
1 W.E.G.O.
2 It%27s_OK
3 313
4 Tonite
5 Maxine
6 Open_Mic
7 Never_2_Far
8 Searchin%27
9 Backstabber
10 Jealousy_Woes_II
11 Intro_(Slim_Shady)
12 Low_Down_Dirty
13 If_I_Had
14 Just_Don%27t_Give_A_Fuck
15 Mommy
16 Just_The_Two_Of_Us
17 No_One%27s_Iller_Than_Me
18 Murder_Murder
19 If_I_Had_(Radio_Edit)
20 Just_Don%27t_Give_A_Fuck_(Radio_Edit)
21 Public_Service_Announcement
22 My_Name_Is
23 Guilty_Conscience
24 Brain_Damage
25 Paul
26 If_I_Had
27 %2797_Bonnie_%26_Clyde
28 Bitch
29 Role_Model
30 Lounge
31 My_Fault
32 Ken_Kaniff#As_heard_on_The_Slim_Shady_LP
33 Cum_On_Everybody
34 Rock_Bottom
35 Just_Don%27t_Give_A_Fuck
36 Soap
37 As_The_World_Turns
38 I%27m_Shady
39 Bad_Meets_Evil
40 Still_Don%27t_Give_A_Fuck
41 Hazardous_Youth
42 Get_You_Mad
43 Greg
44 Public_Service_Announcement_2000
45 Kill_You
46 Stan
47 Paul
48 Who_Knew
49 Steve_Berman
50 The_Way_I_Am
51 The_Real_Slim_Shady
52 Remember_Me%3F
53 I%27m_Back
54 Marshall_Mathers
55 Ken_Kaniff#As_heard_on_The_Marshall_Mathers_LP


Now that we have our corpus file saved, it's time for

## Second: Training the Model

Before training we define the length of each line

In [7]:
import io

PATH = "corpus.txt" 
sequence_length = 40
step = 3


text = []
chars = []


# get the lyrics corpus from the file
with io.open(PATH, 'r', encoding='utf8') as f:
    text = f.read().lower()
    chars = sorted(list(set(text)))

# sequences is input to nueral network
# next_chars are labels while training
sequences = []
next_chars = []
for i in range(0, len(text) - sequence_length, step):
    sequences.append(text[i: i + sequence_length])
    next_chars.append(text[i + sequence_length])

    
char_to_index = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

### Vectorization

We need to convert all our character strings into a format that can be used by the LSTM.

In [8]:
import numpy as np

# vectorize the data since we cannot use characters and strings 

X = np.zeros((len(sequences), sequence_length, len(chars)), dtype=np.bool)
y = np.zeros((len(sequences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sequences):
    for t, char in enumerate(sentence):
        X[i, t, char_to_index[char]] = 1
        y[i, char_to_index[next_chars[i]]] = 1

### Training

This may take some time to run. Be default the model trains for 20 epochs.

On my NVIDIA 940MX laptop GPU, each epoch takes about 2 minutes 30 seconds

You can skip the training by using the pretrained model

In [10]:
# MODEL TRAINING
# skip this if you want to use the pretrained model

from keras.models import Sequential, load_model
from keras.layers import Dense, Activation, Dropout
from keras.layers import LSTM
from keras.optimizers import RMSprop

EPOCHS = 20

# this is our keras model. It has 128 LSTM neurons

model = Sequential()
model.add(LSTM(128, input_shape=(sequence_length, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

#UNCOMMENT TO TRAIN : 

# model.fit(X, y, batch_size=128, nb_epoch=EPOCHS)
# model.save('eminem.h5')

In [11]:
from keras.models import load_model

# load pretrained

model = load_model("eminem.h5")  # you can skip training by loading the trained weights


## Third: Predictions

Now for the fun part :)

The diversity parameter controls how similar each line of lyrics will be. The iteration explores lyrics at different values of Diversity

In [27]:
import sys
import numpy as np

INPUT = "i want to go back home tonight so I can "

if len(sentence) is not 40:
    print("Sentence length is", len(sentence))

else:
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print()
        print('====================================================\nDIVERSITY:', diversity)

        generated = ''
        # insert your 40-chars long string. OBS it needs to be exactly 40 chars!
        sentence = INPUT
        sentence = sentence.lower()
        generated += sentence

        print('SEED: "' + sentence + '"\n====================================================')
        sys.stdout.write(generated)

        for i in range(400):
            x = np.zeros((1, sequence_length, len(chars)))

            for t, char in enumerate(sentence):
                x[0, t, char_to_index[char]] = 1.


            predictions = model.predict(x, verbose=0)[0]

            if diversity == 0:
                diversity = 1

            preds = np.asarray(predictions).astype('float64')
            preds = np.log(preds) / diversity
            exp_preds = np.exp(preds)
            preds = exp_preds / np.sum(exp_preds)
            probas = np.random.multinomial(1, preds, 1)
            next_index =  np.argmax(probas)


            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()


DIVERSITY: 0.2
SEED: "i want to go back home tonight so i can "
i want to go back home tonight so i can started in a song and i can say

i got the shit the way i was some shot of the michate off in the mind of the mind of the shit
so destreminant to say the shit i can say the mind
i say i got the morning of the bell and i can started to hear me
the street and the belong of the shit i think i was somebody and we as i was a motherfuckin' that i got the morning and started in the mind of the bottom

i 

DIVERSITY: 0.5
SEED: "i want to go back home tonight so i can "
i want to go back home tonight so i can say
i hear me in the bone the reason i'm the thing to the really on the day
i can be careed on shot it and you say the change to suck your show
but i litter fine 'em back and things that you see it
when i was surrous of who when i god i go to the one fuckin' when i don't got to pee some syult
the michates!)
i got up in the morning out and have it and it every, i wanna change to say
be l

![](https://vignette.wikia.nocookie.net/looneytunes/images/e/e1/All.jpg/revision/latest/scale-to-width-down/260?cb=20150313020828)