# pyraps

For our final project, we will be attempting to classify rap styles based on song lyrics. We will be using a subsection of rap music published in the 1990's, where rap style from different geographical regions were distinct, which differs from modern rap music has become more of an almalgamation of the three main rap styles. The two main geographical regions we will be looking at are east-coast (New York City), west-coast (Los Angeles) - we may extend this to further classify more modern movements such as southern (Atlanta) and midwest (Detroit).

Our project consists of three parts.

1. Data Collection - We will build a database of lyrics from 1990's rap artists and label them based on the rappers style based on geographical location.
2. Creating features - We will create features to capture the rhythm and rhyme of a song, as well as the particular lyrical content and vocabulary.
3. Training a classifier - Using our features, we will train different types of classifiers and compare results.

## Creating Features

We will be using two sets of information as features for our machine learning algorithms: lyrical content, in other words the actual words that are being used and the order in which those words occur in, and rhyme patterns along with rhythmic beats, which we will analyze using NLTK.

### Lyrical Content

The first and easiest way of creating features is by using a TFIDF vectorizer. Given a domain space, our vectorizer can create constant dimension vectors for each document (in this case song). However, since we need to first preprocess our data first, we will move onto the next feature space.

### Rhyme Patterns

Rhyme patterns are pretty interesting in the way it manifests itself in east-coast versus west-coast rap. East-coast tends to try to create intricate and interlacing rhyme patterns where as west-coast rap focuses more on creating a vibe rather than building intense rhyme structures. We can use this as another feature we can train on to provide better separation

Lets take a look at a couple lines from Nas' "NY State of Mind", a classic east coast style song

1. Rappers I <font color='blue'>monkey</font> <font color='red'>flip em</font> with the <font color='blue'>funky</font> <font color='red'>rhythm</font> I be <font color='red'>kickin'</font>
2. <font color='red'>musician</font>, <font color='red'>inflictin</font> <font color='red'>composition</font>
3. <font color='green'>of pain</font> I'm like Scarface <font color='red'>sniffin</font> <font color='green'>cocaine</font>
4. Holdin a <font color='purple'>M-16</font>, see with the pen <font color='purple'>I'm extreme</font>, now

Now lets take a look at a couple lines from 2Pac's California Love, a west coast style song

1. Now let me welcome everybody to the wild, wild <font color='red'>west</font>
2. A state that's untouchable like Elliot <font color='red'>Ness</font>
3. The track hits ya eardrum like a slug to ya <font color='red'>chest</font>
4. Pack a <font color='red'>vest</font> for your Jimmy in the city of <font color='red'>sex</font>

We can immediately see a difference between the rhyme style between these two styles of rap. East coast tends to have more rhymes in general and focuses a lot more on variety of rhyme patterns interspersed throughout the lines, as opposed to west coast which focuses more on simpler last-word rhymes.

How can we do this computationally? We will use CMU's pronuncation dictionary in the NLTK package.

In [None]:
import nltk
import pandas as pd
import scipy as sp
import numpy as np
from nltk.corpus import cmudict

In [None]:
class Pronunciation(object):
    
    CMUDICT = cmudict.dict()
    
    def __init__(self, word):
        self.word = word
        self.word_lower = word.lower()
        if self.word_lower in Pronunciation.CMUDICT:
            self.pron = Pronunciation.CMUDICT[self.word_lower][0]
            self.syllable_loc = [i for i in xrange(len(self.pron)) if self.pron[i][-1].isdigit()]
        else:
            self.pron = None
            self.syllable_loc = None
    def __repr__(self):
        if self.pron:
            pron_repr =  "/".join(self.pron)
        else:
            pron_repr = "?"
        return "%s(%s)" % (self.word,pron_repr)
    
    def rhyme_group(self):
        if self.syllable_loc == []:
            return "/".join(self.pron)
        elif self.pron == None:
            return "UNKNOWN_GROUP"
        else:
            return "/".join(self.pron[self.syllable_loc[-1]:])
    
    def __eq__(self,other):
        return self.word_lower == other.word_lower
    
    def __hash__(self):
        return hash(self.word_lower)
    
        

def tokenize(s):
    tokenizer = nltk.tokenize.RegexpTokenizer(r"\w[\w-]*'?[\w-]*")
    tokenized_lines = [tokenizer.tokenize(line) for line in s.split("\n") if line]
    return tuple([tuple([Pronunciation(token) for token in token_line]) for token_line in tokenized_lines])
    

nas = '''Rappers I monkey flip em with the funky rhythm I be kicking\nmusician, inflicting composition\nof pain '''+\
      '''I'm like Scarface sniffing cocaine\nHolding a M-16, see with the pen I'm extreme, now\n\n'''
token_lines = tokenize(nas)
for line in token_lines:
    print line

Now we need to define some sort of metric for rhyming words.
We know that monkey(M/AH1/NG/K/IY0) rhymes with funky(F/AH1/NG/K/IY0) and is a perfect rhyme. Lets break this down. Monkey has two syllables and thus two stress vowels. These stress vowels mark separations of syllables - monkey can be broken down to (M/AH1/NG) and (K/IY0); funky can be broken down to (F/AH1/NG) and (K/IY0). Immediately, we see that the last two syllables rhyme because they are equal; the NG at the end of 'mon' and 'fun' also add to the rhyme scheme, but the relationship that causes this to be a strong rhyme is equivalence of the last syllable.

Lets look at a harder example. flip(F/L/IH1/P), em(EH1/M) as a couple rhymes with rhythm(R/IH1/DH/AH0/M). To simplify things, lets just look at em(EH1/M) and rhythm(R/IH1/DH/AH0/M). This is a weak rhyme because the stress syllables are different but sound the same. This is another complication we need to take into account.

Lets implement a quick naive rhyme scheme to see all of our strong rhymes...

In [None]:
import collections
def rhyme_groups_naive(tokens):
    groups = collections.defaultdict(set)
    for line in tokens:
        for token in line:
            group = token.rhyme_group()
            groups[group].add(token)
    return dict(groups)

strong_rhyme_groups = rhyme_groups_naive(token_lines)

for (k,v) in strong_rhyme_groups.iteritems():
    if len(v) > 1:
        print k, v

Lets visualize this to see if it matches our manual rhyme above

In [None]:
import IPython.display, random

def random_color():
    return "#%03x" % random.randint(0, 0xFFF)

# get rid of solo groups
groups = [[k,v] for (k,v) in strong_rhyme_groups.iteritems() if len(v) > 1 and k != "UNKNOWN_GROUP"]
print groups
# assign colors
for group in groups:
    group[0] = random_color()
# reverse keys and value
color_dict = dict(reduce(lambda x,y: x+y,[[(v_i,k) for v_i in v] for [k,v] in groups]))

html = ""
for token_line in token_lines:
    for token in token_line:
        if token in color_dict:
            html += "<b><font color=%s>%s</font></b> " % (color_dict[token], token.word)
        else:
            html += token.word + " "
    html += "<br>"
        

IPython.display.display_html(html, raw=True)

## Musixmatch API

In this section we will now start using the musixmatch api to start scraping some songs and their respective lyrics. We will import the standard python requests library and make calls to the api with our respective apikey that we regestered for. 

The standard format for the requests will be:

"http://api.musixmatch.com/ws/1.1/method?track_id=?&apikey=?"

where method are the API methods such as "track.lyrics.get", "track.search", "chart.atrists.get", and many others.
We need to fill in a track_id for the song and our respective apikey.

## Search Function

The code below will now scrape the musixmatch database for you. All you need to do is pass in the correct song and title and the function will return the lyrics to you. The musixmatch api has a database full of songs where each song has a corresponding track id. The thing is that if we want the lyrics for a certain song then we need the respective track id. However now we just use the song's respective information to get the track id and then return the lyrics. We first split the artist and title into the correct format for the api call. Then we just use this information for the track id and lyrics following.

In [None]:
import requests
from datetime import datetime

class MusixApi:
    def __init__(self, apikey):
        self.apikey = apikey
        self.search_url = "http://api.musixmatch.com/ws/1.1/track.search"
        self.lyrics_get_url = "http://api.musixmatch.com/ws/1.1/track.lyrics.get"
        self.artist_search_url = "http://api.musixmatch.com/ws/1.1/artist.search"
        self.album_get_url = "http://api.musixmatch.com/ws/1.1/artist.albums.get"
        self.album_tracks_get_url = "http://api.musixmatch.com/ws/1.1/album.tracks.get"
        self.track_lyrics_get = "http://api.musixmatch.com/ws/1.1/track.lyrics.get"
        
    def search(self, artist, title):
        '''
        Pass in artist/title and return song lyrics
        Basic search capability
        '''
        
        url = self.search_url
        params = {"q_track": title.lower(),
                  "q_artist": artist.lower(),
                  "f_has_lyrics": 1,
                  "apikey": self.apikey}
        song = requests.get(url, params=params).json()
        status_code = song["message"]["header"]["status_code"]
        if status_code != 200:
            raise Exception("Recieved status code %d" % status_code)
        track_id = song['message']['body']['track_list'][0]['track']['track_id']
        
        url = self.lyrics_get_url
        params = {"track_id": track_id,
                  "apikey": self.apikey}
        lyrics = requests.get(url, params=params).json()
        status_code = lyrics["message"]["header"]["status_code"]
        if status_code != 200:
            raise Exception("Recieved status code %d" % status_code)
        return lyrics['message']['body']['lyrics']['lyrics_body']
    
    def artist_id(self, artist):
        '''
        This function returns the artist ID for an artist
        
        Input: An album name
        Output: A list of all song lyrics for that album
        
        '''
        params = {"q_artist": artist.lower(),
                  "page_size": 5,
                  "apikey": self.apikey}
        url = self.artist_search_url
        artist_json = requests.get(url, params=params).json()
        status_code = artist_json["message"]["header"]["status_code"]
        if status_code != 200:
            raise Exception("Recieved status code %d" % status_code)
        artist_list = artist_json['message']['body']['artist_list']
        artist_id = artist_list[0]['artist']['artist_id']
        return artist_id
    
    
    def all_albums(self, artist_id):
        '''
        This function returns all the album for a given artist ID
        
        Input: the ID of an artist
        Output: a list of album
        '''
        
        rez = []
        url = self.album_get_url
        page_num = 1
        album_length = 100
        while album_length == 100:
            params = {"artist_id": artist_id,
                      "s_release_date": "desc",
                      "page_size": 100,
                      "page": page_num,
                      "g_album_name": 1,
                      "apikey": self.apikey}
            album_json = requests.get(url, params=params).json()
            status_code = album_json["message"]["header"]["status_code"]
            if status_code != 200:
                raise Exception("Recieved status code %d" % status_code)
            album_list = album_json['message']['body']['album_list']
            rez += [album_result["album"] for album_result in album_list]
            album_length = len(album_list)
            page_num += 1
        return rez
    
    
    def all_lyris_in_album(self, album):
        '''
        Input: An album
        Output: All song lyrics for the respective songs in those albums
        
        '''
        
        
        album_id = album["album_id"]
        url = self.album_tracks_get_url
        song_url = self.track_lyrics_get
        params = {"album_id": album_id,
                  "page": 1,
                  "page_size": 100,
                  "apikey": self.apikey}
        tracks_json = requests.get(url, params=params).json()
        status_code = tracks_json["message"]["header"]["status_code"]
        if status_code != 200:
            print "Album track lookup for %d failed with status_code %d"\
                % (album_id, status_code)
            return (None, None, None)
        
        track_list = tracks_json['message']['body']['track_list']
        final_lyrics = []
        total = len(track_list)
        for track in track_list:
            song_id = track['track']['track_id']
            song_params = {"track_id": song_id,
                          "apikey": self.apikey}
            response = requests.get(song_url, params=song_params).json()
            status_code = response['message']['header']['status_code']
            if status_code == 200:
                final_lyrics.append(response['message']['body']['lyrics']['lyrics_body'])
        return (final_lyrics, len(final_lyrics), total)
        
    def get_all_lyrics_from_artist(self, artist, date_start, date_end):
        '''
        Input: artist name and the range of album dates we want
        Output: List of (album_name, lyrics) from that arist in said date range
        '''
        def in_date_range(date_string, start, end):
            try:
                dt = datetime.strptime(date_string, "%Y-%m-%d")
            except:
                try:
                    dt = datetime.strptime(date_string, "%Y-%m")
                except:
                    try:
                        dt = datetime.strptime(date_string, "%Y")
                    except:
                        return False
            return dt <= date_end and dt >= date_start
        print "*******************************************************"
        print artist
        print "*******************************************************"
        artist_id = self.artist_id(artist)
        print " * artist_id: %d" % artist_id
        albums = self.all_albums(artist_id)
        print " * number albums: %d" % len(albums)
        albums_in_range = [album for album in albums if 
                         in_date_range(album["album_release_date"], date_start, date_end)]
        print " * number albums in date range: %d" % len(albums_in_range)        
        
        all_lyrics = []
        
        for album in albums_in_range:
            (lyrics, success, total) = self.all_lyris_in_album(album)
            if lyrics == None:
                continue
            all_lyrics.append((album["album_name"], lyrics))
            print " * found (%d/%d) lyrics in album %s" % (success, total, album["album_name"])
        return all_lyrics
        
        
        
        

The following code will get an API key that is stored in a file called 'secrets.json'. For security reasons, it is never a good idea to post any personal keys to the public.

In [None]:
import json
with open("secrets.json", "r") as f:
    music_parser = MusixApi(json.load(f)["musixApiKey"])

#search("Taylor Swift", "Back To December")
#print music_parser.search("Mobb Deep", "Survival of the Fittest")
#ID = music_parser.artist_id("Jay-Z")
#albums = music_parser.all_albums(ID)
#music_parser.get_all_lyrics_from_artist("Coldplay", datetime(2008,1,1), datetime(2009,1,1))

Phenomenal! We pretty much have most of the functions we need to start scraping the musixmatch library for all our rap lyrics. We have everything we need. Now we'll just get some real data like a csv file of rapper names! We'll use the rapper names to generate all songs that rapper has created recently. So if we input a csv file of say ['Ice Cube', 'Kanye', ...], then we can return all the rap lyrics for those guys!

## Building a hip-hop lyrics database

After meticulous research, we have compiled a list of hip-hop artists from the 90's that are representative of either East-Coast hip-hop or West-Coast hip-hop. In this section, we will scrape the actual data that we will be using for this project.

In [None]:
class Lyric(object):
    
    @staticmethod
    def _clean(text):
        # drop the footer
        text = "\n".join(text.split("\n\n")[:-1])
        return text
    
    def __init__(self, text, artist, album, label):
        self.artist = artist
        self.album = album
        self.label = label
        self.text = Lyric._clean(text)
        self.tokens = tokenize(self.text)
        
    def __repr__(self):
        return "%s/%s: \"%s...\"" % (self.artist, self.album, self.text[:10])
    
    def __hash__(self):
        return hash(self.artist) + hash(self.album) + hash(self.tokens)
    
    

In [None]:
east_coast_rappers = ["Notorious B.I.G.", "Nas", "Wu-Tang Clan", "Jay-Z", "DMX", "Rakim",
                      "Method Man", "Busta Rhymes", "Run-DMC", "Public Enemy", "Mobb Deep",
                      "KRS-One", "50 Cent", "Big L", "LL Cool J", "Ghostface Killah",
                      "Ol' Dirty Bastard", "Raekwon", "A Tribe Called Quest",
                      "Big Daddy Kane","Gang Starr", "GZA", "Redman", "Mos Def", "Q-Tip"]
west_coast_rappers = ["2Pac", "Ice Cube", "Dr. Dre", "Snoop Dogg", "N.W.A",
                      "Nate Dogg", "Warren G", "MC Ren", "Eazy-E", "Ice-T", "Too $hort", "Kurupt",
                      "The Pharcyde", "E-40"]

In [None]:
'''
date_start = datetime(1990,1,1)
date_end = datetime(1999,12,31)

lyrics = []

for artist in east_coast_rappers:
    for (album_name, album_lyrics) in music_parser.get_all_lyrics_from_artist(artist, date_start, date_end):
        for lyric in album_lyrics:
            lyrics.append(Lyric(lyric,artist,album_name,"east"))
for artist in west_coast_rappers:
    for (album_name, album_lyrics) in music_parser.get_all_lyrics_from_artist(artist, date_start, date_end):
        for lyric in album_lyrics:
            lyrics.append(Lyric(lyric,artist,album_name,"west"))
with open("lyrics.pickle") as f:
    pickle.dump(lyrics, f)
'''

In [None]:
import pickle
with open('lyrics.pickle') as f:
    all_lyrics = pickle.load(f)

### Filling in slang words
As you may imagine, rap contains a lot of slang words that do not have an entry in CMUdict. We can fill in the gaps by approximating the pronunciations using [CMU Lextools](http://www.speech.cs.cmu.edu/tools/lextool.html). The following code finds all the unknown words and writes them to a file as input to the lextool. We then need to parse the return dict file from the lextool and refill unknown pronunciations with our approximations.

In [None]:
slang = set()

for lyric in all_lyrics:
    if len(lyric.tokens) == 0:
        continue
    for p in reduce(lambda x,y: list(x) + list(y), lyric.tokens):
        if p.pron == None:
            slang.add(p.word.lower())

slang = sorted(list(slang))
with open("slang.txt", "w") as f:
    for word in slang:
        f.write(word.encode('utf8') + "\n")

### Initial Classification

An initial approach to classifying rap lyrics by region would simply be to use a tfidf. With the tfidf function that we wrote for homework, let's see if we can come up with anything meaningful.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
import sklearn

In [None]:
from nltk.corpus import stopwords
sw = stopwords.words("english")
sw += ["new","york","california","cali","la","nyc","ny"]

To provide some training data for classification, we must write a function that outputs the region number that the lyrics fall under. I.e, a '1' represents west coast rap lyrics, '2' represents east coast, etc.

In [None]:
def create_labels(lyrics,**regions):
    """
    Args:
        List(Lyric) : a list of lyric objects
        regions (Lists of artists in particular regions)
    Output:
        dense int array representing the labels for each example lyric
    """
    region_labels = {
        'no region'  : -1,
        'west_coast' : 0,
        'east_coast' : 1,
        'south'      : 2,
        'midwest'    : 3
    }
    result = []
    for lyric in lyrics:
        if len(lyric.tokens) == 0:
            continue
        region_name = 'no region'
        for region in regions:
            if lyric.artist in regions[region]:
                region_name = region
                break
        result.append(region_labels[region_name])
    return np.array(result)

In [None]:
import random
random.seed(0)

# split data into sets by artists

# split the data based on artists
artists_lyrics = {}
all_lyrics = filter(lambda lyric: len(lyric.tokens) > 0, all_lyrics)
for lyric in all_lyrics:
    artists_lyrics[lyric.artist] = artists_lyrics.get(lyric.artist, [])
    artists_lyrics[lyric.artist].append(lyric)

# split artists
num_artists = len(artists_lyrics)
train = int(round(0.75*num_artists))
artists = artists_lyrics.keys()
random.shuffle(artists)
artists_tr = artists[:train]
artists_val = artists[train:]
print "Found %d artists, taking %d for training, %d for val" % (len(artists), len(artists_tr), len(artists_val))
lyrics_tr = reduce(lambda x,y: x+y, [artists_lyrics[artist] for artist in artists_tr])
lyrics_val = reduce(lambda x,y: x+y, [artists_lyrics[artist] for artist in artists_val])
print "Training set: %d, Validation set: %d" % (len(lyrics_tr), len(lyrics_val))

# create features and labels for training and validation set
labels_tr = create_labels(lyrics_tr, east_coast=east_coast_rappers, west_coast=west_coast_rappers)
docs_tr = [" ".join(reduce(lambda x,y: x+y, 
           [[token.word_lower for token in token_line] for token_line in lyric.tokens],
           [])) for lyric in lyrics_tr]

labels_val = create_labels(lyrics_val, east_coast=east_coast_rappers, west_coast=west_coast_rappers)
docs_val = [" ".join(reduce(lambda x,y: x+y, 
           [[token.word_lower for token in token_line] for token_line in lyric.tokens],
           [])) for lyric in lyrics_val]
docs = docs_tr + docs_val

In [None]:
# pulled from datascience homework

# create a tfidf vectorizer with all docs
tfidf = TfidfVectorizer(analyzer='word', stop_words=sw).fit(docs)

# predict with an svm
kernel = 'linear'
classifier = sklearn.svm.SVC(kernel=kernel).fit(tfidf.transform(docs_tr),labels_tr)

# transform our validation documents and classify
X_val = tfidf.transform(docs_val)
classifier_score = classifier.score(X_val, labels_val)
labels_pred = classifier.predict(X_val)
classifier_score

### Classifying Based on Rhyme
Something interesting that we can find out is if there is an underlying structural difference between the two styles of rap. Abandoning the goal of trying to build the best classifier, lets see if we can distinguish between the two styles based soley on rhyme. To do this lets recall the previous section on rhyming. We will extract all rhyming words and create a vector representing said rhyme scheme. Each element in the vector will correspond to the order in which the rhyming words occur.

For example, if we take "Yet I'm the mild, money-gettin' style, rollin' foul The versatile, honey-sticking, wild golden child", we can represent that as the array `[0,1,0,1,0,0,1,0,1,0]` where 0 represents the "mild" sounding words and 1 represents the "money-gettin'" sounding words. After extracting these features, we can pad them and concatenate to form our features matrix.

In [None]:
def lyric_to_rhyme(lyric):
    groups = []
    token_lines = lyric.tokens
    # get all rhyme groups with more than one rhyming word
    for k,v in rhyme_groups_naive(token_lines).iteritems():
        if k == "UNKNOWN_GROUP" or len(v) <= 1:
            continue
        groups.append(v)
    # give each rhyme group a number and flip the dictionary
    lookup = {}
    for i,words in enumerate(groups):
        for word in list(words):
            lookup[word] = i
    #TODO: implement new lines
    return np.array([lookup[token] for token in reduce(lambda x,y: x+y, token_lines) if token in lookup])
    
def pad_vector(vector, pad, n):
    num_padding = n-len(vector)
    if num_padding < 0:
        raise Exception("Size of vector (%d) greater than pad size (%d)" % (len(vector), n))
    return np.pad(vector, [0,num_padding], "constant", constant_values=pad)

features = [lyric_to_rhyme(lyric) for lyric in all_lyrics]
dim = max([len(feature) for feature in features])

X_tr = np.array([pad_vector(lyric_to_rhyme(lyric), -1, dim) for lyric in lyrics_tr])
X_val = np.array([pad_vector(lyric_to_rhyme(lyric), -1, dim) for lyric in lyrics_val])

### CNN Feature Extraction & Classification
The input can be considered a 1D image of some sorts, in that the cells next to eachother are more related to some underlying structure than cells located further away. We can use convolution and pool layers to extract deeper underlying features then use a fully linked layer to classify.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Convolution1D, MaxPooling1D, Flatten, Dropout

# based loosely off of https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py
cnn = Sequential()
cnn.add(Convolution1D(32, 32, border_mode='same', input_shape=(dim, 1)))
cnn.add(MaxPooling1D(pool_length=2, stride=None, border_mode='valid'))
cnn.add(Convolution1D(32, 5, border_mode='same'))
cnn.add(MaxPooling1D(pool_length=2, stride=None, border_mode='valid'))
cnn.add(Convolution1D(32, 3, border_mode='same'))
cnn.add(MaxPooling1D(pool_length=2, stride=None, border_mode='valid'))
cnn.add(Flatten())
cnn.add(Dense(1024, activation="relu"))
cnn.add(Dropout(0.5))
cnn.add(Dense(1024, activation="relu"))
cnn.add(Dropout(0.5))
cnn.add(Dense(1, activation="sigmoid"))

cnn.summary()

# https://keras.io/getting-started/sequential-model-guide/
cnn.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
cnn.fit(X_tr[:,:,None], labels_tr, nb_epoch=10, batch_size=32, validation_data=(X_val[:,:,None], labels_val))

### Results
Here is one of many results we got from running the above code
```____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution1d_22 (Convolution1D) (None, 483, 32)       1056        convolution1d_input_6[0][0]      
____________________________________________________________________________________________________
maxpooling1d_22 (MaxPooling1D)   (None, 241, 32)       0           convolution1d_22[0][0]           
____________________________________________________________________________________________________
convolution1d_23 (Convolution1D) (None, 241, 32)       5152        maxpooling1d_22[0][0]            
____________________________________________________________________________________________________
maxpooling1d_23 (MaxPooling1D)   (None, 120, 32)       0           convolution1d_23[0][0]           
____________________________________________________________________________________________________
convolution1d_24 (Convolution1D) (None, 120, 32)       3104        maxpooling1d_23[0][0]            
____________________________________________________________________________________________________
maxpooling1d_24 (MaxPooling1D)   (None, 60, 32)        0           convolution1d_24[0][0]           
____________________________________________________________________________________________________
flatten_6 (Flatten)              (None, 1920)          0           maxpooling1d_24[0][0]            
____________________________________________________________________________________________________
dense_16 (Dense)                 (None, 1024)          1967104     flatten_6[0][0]                  
____________________________________________________________________________________________________
dropout_11 (Dropout)             (None, 1024)          0           dense_16[0][0]                   
____________________________________________________________________________________________________
dense_17 (Dense)                 (None, 1024)          1049600     dropout_11[0][0]                 
____________________________________________________________________________________________________
dropout_12 (Dropout)             (None, 1024)          0           dense_17[0][0]                   
____________________________________________________________________________________________________
dense_18 (Dense)                 (None, 1)             1025        dropout_12[0][0]                 
====================================================================================================
Total params: 3027041
____________________________________________________________________________________________________
Train on 2310 samples, validate on 1031 samples
Epoch 1/10
2310/2310 [==============================] - 9s - loss: 0.7002 - acc: 0.6506 - val_loss: 0.6520 - val_acc: 0.6566
Epoch 2/10
2310/2310 [==============================] - 9s - loss: 0.6265 - acc: 0.6701 - val_loss: 0.6904 - val_acc: 0.6256
Epoch 3/10
2310/2310 [==============================] - 9s - loss: 0.5202 - acc: 0.7420 - val_loss: 0.8487 - val_acc: 0.6576
Epoch 4/10
2310/2310 [==============================] - 9s - loss: 0.3756 - acc: 0.8247 - val_loss: 1.3008 - val_acc: 0.6547
Epoch 5/10
2310/2310 [==============================] - 9s - loss: 0.2713 - acc: 0.8857 - val_loss: 1.4730 - val_acc: 0.5606
Epoch 6/10
2310/2310 [==============================] - 9s - loss: 0.2390 - acc: 0.8996 - val_loss: 1.5849 - val_acc: 0.6285
Epoch 7/10
2310/2310 [==============================] - 9s - loss: 0.1804 - acc: 0.9195 - val_loss: 1.7476 - val_acc: 0.6178
Epoch 8/10
2310/2310 [==============================] - 9s - loss: 0.1548 - acc: 0.9338 - val_loss: 2.1233 - val_acc: 0.6227
Epoch 9/10
2310/2310 [==============================] - 9s - loss: 0.1501 - acc: 0.9381 - val_loss: 2.3608 - val_acc: 0.5849
Epoch 10/10
2310/2310 [==============================] - 9s - loss: 0.1414 - acc: 0.9411 - val_loss: 2.7952 - val_acc: 0.5810```

### Conclusion
Looks like classifying raps based on rhyme scheme is harder than expected! We are only able to achieve around 60% classification accuracy with a fairly shallow neural network. Our intuition is that we need a much more complicated neural network to capture the nuances of rhyme schemes. We also need a much larger database of songs to work off of -- this is rather hard to achieve because "east coast" vs "west coast" rap is highly subjective and a lot of newer artists embody multiple regions of styles and cannot be used in a classification such as this.

Additionally, the rhyming function is fairly juvenile and needs much more work. Right now it only pairs strongly rhyming words together which doesn't work very well in this scenario because rappers will often use pseudo-rhymes and often words are pronounced differently to force a rhyme. We would need a much more in depth rhyming algorithm to extract the real structure, which would require a much more powerful processor.