# AI Rapper

### _Generating rap lyrics with different AI and NLP techniques_

We pull rap lyrics off of the Genius API, then feed them through a RNN, eventually asking the network to produce new text from a seed. Then we use n-Gram models to generate lyrics.

In [3]:
%matplotlib inline

import numpy as np
import random
import sys
import os
import datetime
import io
from bs4 import BeautifulSoup
import requests as rq
import pandas as pd
import json

import keras
from keras.callbacks import TensorBoard
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from keras.optimizers import RMSprop

Using TensorFlow backend.


### Download the Lyrics Data

In [81]:
import lyricsgenius as genius
# lol don't use my API key
geniusCreds = 'Lw6NjXtbU7NndUFHRCcOX9FdLhPzVokLIt9c4LWzsTxM10wF7EICGtWSSso8Ohsq'
# for more data, you can adjust the number of artists as well as the max number of songs
artist_names = ['DaBaby', 'Drake', 'J. Cole', 'Travis Scott', 'Kendrick Lamar']

api = genius.Genius(geniusCreds)

for artist_name in artist_names:
    artist = api.search_artist(artist_name, max_songs=20)
    artist.save_lyrics()

Searching for songs by DaBaby...

Song 1: "Suge"
Song 2: "INTRO"
Song 3: "BOP"
Song 4: "Next Song"
Song 5: "Goin Baby"
Song 6: "VIBEZ"
Song 7: "Baby Sitter"
Song 8: "TOES"
Song 9: "21"
Song 10: "Today (Intro)"
Song 11: "Walker Texas Ranger"
Song 12: "Blank Blank"
Song 13: "OFF THE RIP"
Song 14: "Pull Up Music"
Song 15: "XXL"
Song 16: "4x"
Song 17: "GOSPEL"
Song 18: "POP STAR"
Song 19: "RAW SHIT"
Song 20: "Pony"

Reached user-specified song limit (20).
Done. Found 20 songs.
Wrote `Lyrics_DaBaby.json`


### Load the Data and Produce Basic Baseline Model

In [4]:
# Reading the json as a dict
text = ""
for filename in os.listdir('./data/'):
    with open('./data/' + filename) as json_data:
        data = json.load(json_data)

        for song in data['songs']:
            text += song['lyrics']

In [6]:
from collections import Counter

cnt = Counter()

for char in text:
    cnt[char] += 1

most_com = cnt.most_common()[0]

# if we were to run a baseline model, predicting every character as a space
correct = 0
for char in text:
    if char == most_com[0]:
        correct += 1
        
# print the accuracy
print(f'Baseline Model Accuracy: {correct/len(text)}')

Baseline Model Accuracy: 0.1700664496363421


### RNN Construction & Training

In [11]:
chars = sorted(list(set(text)))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

seqlen = 40
step = seqlen
sentences = []
for i in range(0, len(text) - seqlen - 1, step):
    sentences.append(text[i: i + seqlen + 1])

x = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, (char_in, char_out) in enumerate(zip(sentence[:-1], sentence[1:])):
        x[i, t, char_indices[char_in]] = 1
        y[i, t, char_indices[char_out]] = 1


model = Sequential()
model.add(LSTM(128, input_shape=(seqlen, len(chars)), return_sequences=True))
model.add(Dense(len(chars), activation='softmax'))

model.compile(
    loss='categorical_crossentropy',
    optimizer=keras.optimizers.RMSprop(learning_rate=0.01),
    metrics=['categorical_crossentropy', 'accuracy']
)

def sample(preds, temperature=1.0):
    """Helper function to sample an index from a probability array."""
    preds = np.asarray(preds).astype('float64')
    preds = np.exp(np.log(preds) / temperature)  # softmax
    preds = preds / np.sum(preds)                #
    probas = np.random.multinomial(1, preds, 1)  # sample index
    return np.argmax(probas)                     #

def on_epoch_end(epoch, _):
    """Function invoked at end of each epoch. Prints generated text."""
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - seqlen - 1)
    
    for diversity in [0.5]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + seqlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(200):
            x_pred = np.zeros((1, seqlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.
            preds = model.predict(x_pred, verbose=0)
            next_index = sample(preds[0, -1], diversity)
            next_char = indices_char[next_index]

            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=75,
          callbacks=[])

Epoch 1/75
Epoch 2/75
Epoch 3/75
Epoch 4/75
Epoch 5/75
Epoch 6/75
Epoch 7/75
Epoch 8/75
Epoch 9/75
Epoch 10/75
Epoch 11/75
Epoch 12/75
Epoch 13/75
Epoch 14/75
Epoch 15/75
Epoch 16/75
Epoch 17/75
Epoch 18/75
Epoch 19/75
Epoch 20/75
Epoch 21/75
Epoch 22/75
Epoch 23/75
Epoch 24/75
Epoch 25/75
Epoch 26/75
Epoch 27/75
Epoch 28/75
Epoch 29/75
Epoch 30/75
Epoch 31/75
Epoch 32/75
Epoch 33/75
Epoch 34/75
Epoch 35/75
Epoch 36/75
Epoch 37/75
Epoch 38/75
Epoch 39/75
Epoch 40/75
Epoch 41/75
Epoch 42/75
Epoch 43/75
Epoch 44/75
Epoch 45/75
Epoch 46/75
Epoch 47/75
Epoch 48/75
Epoch 49/75
Epoch 50/75
Epoch 51/75
Epoch 52/75
Epoch 53/75
Epoch 54/75
Epoch 55/75
Epoch 56/75
Epoch 57/75
Epoch 58/75
Epoch 59/75
Epoch 60/75
Epoch 61/75
Epoch 62/75
Epoch 63/75
Epoch 64/75
Epoch 65/75
Epoch 66/75
Epoch 67/75
Epoch 68/75
Epoch 69/75
Epoch 70/75
Epoch 71/75
Epoch 72/75
Epoch 73/75
Epoch 74/75
Epoch 75/75


<keras.callbacks.callbacks.History at 0x143def048>

### RNN Rap Lyric Generation

In [13]:
print()
print('... Generating lyrics')
diversity = .6
print('... With diversity value:', diversity)

generated = ''
sentence = 'Like a shepherd having sex with a sheep, fuck what you heard'[:40]
generated += sentence
print('... Generating with seed: "' + sentence + '"')
print()
sys.stdout.write(generated)

for i in range(5000):
    x_pred = np.zeros((1, seqlen, len(chars)))
    for t, char in enumerate(sentence):
        x_pred[0, t, char_indices[char]] = 1.
    preds = model.predict(x_pred, verbose=0)
    next_index = sample(preds[0, -1], diversity)
    next_char = indices_char[next_index]

    sentence = sentence[1:] + next_char

    sys.stdout.write(next_char)
    sys.stdout.flush()
print()


... Generating lyrics
... With diversity value: 0.6
... Generating with seed: "Like a shepherd having sex with a sheep,"

Like a shepherd having sex with a sheep, pussy is love
Way too many fly with your mind, where the case I ain't nothin'
They like when I was too much for my shit
I was too bushed, you got a Pati, and the mall you with the star
When you could talk, you gon' be all day with the ones
I love it was pretendays my family shit
I want you what you know where I chelf
I'm just everybody believe it I stay toure on the club
I ain't take a polis that I was learn and these nigga one bread for sure
I'm talkin' about her doing mornin', this man with you
And I never liked the club up (Yeah)

[Verse 2: Travis Scott]
I ain't lookin' for me way down
For the schemin' shot as he askin' with you

[Verse 2]
I hope you know what's that she get it, I got to show me (Keep gon' go back)
I see my waves all in
All the way I want my team (row)
Get it, get it, count it up, count it, count it
I wan

### n-Gram Model

In [5]:
import re
# number of words to generate with the n-gram models
NUM_OF_ITS = 100

regex = r"[\w']+|[\n]"
document = []

words = re.findall(regex, text)

print(words[:20])

['Produced', 'by', 'Phonix', 'Beats', 'and', 'J', 'Cole', '\n', '\n', 'Verse', '1', '\n', 'First', 'things', 'first', 'rest', 'in', 'peace', 'Uncle', 'Phil']


### Bi-Gram Model

Words are generated looking at one previous word. (Probability distribution over possible next words)

In [6]:
from collections import defaultdict
from tqdm import tqdm

transitions = defaultdict(list)
for prev, current in zip(words, words[1:]):
    transitions[prev].append(current)

def generate_using_bigrams():
    current = '\n'
    result = []
    count = 0
    while True:
        next_word_candidates = transitions[current]
        current = random.choice(next_word_candidates)
        result.append(current)
        if count == NUM_OF_ITS: return " ".join(result)
        count += 1

res = generate_using_bigrams()
print(res)

Preach preach preach 
 All the woman that You treat it private room and didn't have you comfort me give a baby when I see from it was entering a price on the ride for you blame for you love you get thin 
 Your homegirl that assume 'cause you 
 Chorus Drake 
 Had the bottom now now 
 Said it a flower 
 Pull them kids is my pops he 
 Pastor reverend for your name hold on my Rockstar skinnies Yeah they down the set us 
 I ain't out the rumors 
 Bitches can't help it did


### Tri-Gram Model

Words are generated looking at two previous words. (Probability distribution over possible next words)
This results in many lyrics being directly copied over.

In [8]:
trigram_transitions = defaultdict(list)
starts = []

for prev, current, nxt in zip(words, words[1:], words[2:]):
    if prev == '\n':
        starts.append(current)
    trigram_transitions[(prev, current)].append(nxt)
    
def generate_using_trigrams():
    current = random.choice(starts)
    prev = '\n'
    result = [current]
    count = 0
    
    while True:
        next_word_candidates = trigram_transitions[(prev, current)]
        next_word = random.choice(next_word_candidates)
        
        prev, current = current, next_word
        result.append(current)
        if count == NUM_OF_ITS: return ' '.join(result)
        count += 1
        
print(generate_using_trigrams())
        

You know that you're over and she eat it 
 Blank Blank 
 Pop anything pop anything to buy a vest Ayy 
 I'm findin' the more I just checked Checked checked 
 Hate that I met you in the air run it 
 And I m tried in a dark room 
 
 Chorus 
 I'm on fire and them leavin' 
 In the parking lot Gonzales Park odor 
 We wrapping up plastic actually 
 I just shoot 
 
 Chorus Drake 
 Looking for things you gotta go 
 It's just the motion yeah 
 Redemption's on your mind
