# Phish Setlist Modeling

The purpose of this notebook is to explore the idea of applying Neural Language Modeling techniques to the Phish setlist/song space with the goal of accurately predicting which song will play next given a historical sequence of songs.

In [176]:
%load_ext autoreload
%autoreload 2

import os
import sys
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical

module_path = os.path.abspath(os.path.join('../'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
import pyphish

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Load in Raw Setlist Data

In [32]:
# load data
all_setlists = pyphish.util.load_pickle_object(r'../../data-extracts/extract-05032019/all_setlists.pickle')
all_shows = pyphish.util.load_pickle_object(r'../../data-extracts/extract-05032019/all_shows.pickle')

#### verify only Phish setlists

In [33]:
all_setlists[all_setlists.artistid == 1].shape

(1726, 16)

In [34]:
all_setlists.shape

(1726, 16)

In [35]:
# reset index
all_setlists.reset_index(drop=True, inplace=True)

In [37]:
all_setlists.head()

Unnamed: 0,artist,artistid,gapchart,location,long_date,rating,relative_date,setlistdata,setlistnotes,short_date,showdate,showid,url,venue,venueid,setlistdata_clean
0,<a href='http://phish.net/setlists/phish'>Phis...,1,http://phish.net/setlists/gap-chart/phish-dece...,"Burlington, VT, USA",Friday 12/02/1983,3.9322,35 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,"Trey, Mike, Fish, and Jeff Holdsworth recall b...",12/02/1983,1983-12-02,1251253100,http://phish.net/setlists/phish-december-02-19...,"<a href=""http://phish.net/venue/7/Harris-Milli...",7,"Set 1, Long Cool Woman in a Black Dress, Proud..."
1,<a href='http://phish.net/setlists/phish'>Phis...,1,http://phish.net/setlists/gap-chart/phish-octo...,"Burlington, VT, USA",Tuesday 10/23/1984,4.1667,35 years ago,<p><span class='set-label'>Set 1</span>: <a ti...,"This show, played in the garage of a house on ...",10/23/1984,1984-10-23,1250613219,http://phish.net/setlists/phish-october-23-198...,"<a href=""http://phish.net/venue/46/69_Grant_St...",46,"Set 1, Makisupa Policeman"
2,<a href='http://phish.net/setlists/phish'>Phis...,1,http://phish.net/setlists/gap-chart/phish-nove...,"Burlington, VT, USA",Saturday 11/03/1984,3.1143,34 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,"The setlist for this show might be incomplete,...",11/03/1984,1984-11-03,1251262142,http://phish.net/setlists/phish-november-03-19...,"<a href=""http://phish.net/venue/246/Slade_Hall...",246,"Set 1, In the Midnight Hour, Wild Child, Jam, ..."
3,<a href='http://phish.net/setlists/phish'>Phis...,1,http://phish.net/setlists/gap-chart/phish-dece...,"Burlington, VT, USA",Saturday 12/01/1984,3.7021,34 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,Skippy and Fluffhead featured The Dude of Life...,12/01/1984,1984-12-01,1251262498,http://phish.net/setlists/phish-december-01-19...,"<a href=""http://phish.net/venue/2/Nectar%27s"">...",2,"Set 1, Jam, Wild Child, Bertha, Can't You Hear..."
4,<a href='http://phish.net/setlists/phish'>Phis...,1,http://phish.net/setlists/gap-chart/phish-febr...,"Burlington, VT, USA",Friday 02/01/1985,4.3333,34 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,It is unconfirmed if this setlist is correct f...,02/01/1985,1985-02-01,1251587227,http://phish.net/setlists/phish-february-01-19...,"<a href=""http://phish.net/venue/344/Doolin%27s...",344,"Set 1, Slave to the Traffic Light, Mike's Song..."


#### Inspect and clean problematic setlists

In [29]:
# inspect shows
for i, row in all_setlists.iterrows():
    print(row.setlistdata_clean)
    print()

Set 1, Long Cool Woman in a Black Dress, Proud Mary, In the Midnight Hour, Squeeze Box, Roadhouse Blues, Happy Birthday to You, Set 2, Scarlet Begonias, Fire on the Mountain

Set 1, Makisupa Policeman

Set 1, In the Midnight Hour, Wild Child, Jam, Bertha, Can't You Hear Me Knocking, St. Stephen Jam, Can't You Hear Me Knocking, Camel Walk, Eyes of the World, Whipping Post, Drums

Set 1, Jam, Wild Child, Bertha, Can't You Hear Me Knocking, Camel Walk, Jam, In the Midnight Hour, Scarlet Begonias, Fire, Fire on the Mountain, Makisupa Policeman, Slave to the Traffic Light, Spanish Flea, Don't Want You No More, Cities, Drums, Skippy the Wondermouse, Fluffhead, Encore, Eyes of the World

Set 1, Slave to the Traffic Light, Mike's Song, Dave's Energy Guide, You Enjoy Myself, Alumni Blues, Letter to Jimmy Page, Alumni Blues, Prep School Hippie, Run Like an Antelope

Set 1, Anarchy, Camel Walk, Fire Up the Ganja, Skippy the Wondermouse, In the Midnight Hour

Set 1, Sneakin' Sally Through the Alle

Set 1, NICU, Sample in a Jar, My Mind's Got a Mind of its Own, The Moma Dance, Down with Disease, Dog Faced Boy, Piper, Waste, Chalk Dust Torture, Set 2, Tweezer, Also Sprach Zarathustra, Loving Cup, My Soul, Sweet Adeline, Encore, Harry Hood

Set 1, Birds of a Feather, Cars Trucks Buses, Theme From the Bottom, Brian and Robert, Meat, Fikus, Shafty, Fluffhead, Ginseng Sullivan, Punch You in the Eye, Character Zero, Set 2, Ghost, Runaway Jim, Prince Caspian, You Enjoy Myself, Encore, Simple

Set 1, Stash, Beauty of My Dreams, Sample in a Jar, Guyute, Also Sprach Zarathustra, Down with Disease, Limb By Limb, Water in the Sky, My Soul, You Enjoy Myself, A Day in the Life

Set 1, Birds of a Feather, Taste, Cavern, Reba, Fee, Water in the Sky, Lawn Boy, Chalk Dust Torture, Set 2, Bathtub Gin, The Moma Dance, McGrupp and the Watchful Hosemasters, Jam, Axilla, Harry Hood, Rocky Top, Encore, Funky Bitch

Set 1, Buried Alive, AC/DC Bag, Ghost, Cities, Limb By Limb, Train Song, Roggae, Maze, Gol


Set 1, Buried Alive, Ghost, Crazy Sometimes, Free, More, Halley's Comet, Ocelot, Theme From the Bottom, First Tube, Set 2, Turtle in the Clouds, Stray Dog, Everything is Hollow, We Are Come to Outlive Our Brains, Say it to Me S.A.N.T.O.S., The Final Hurrah, Play by Play, Death Don't Hurt Very Long, Cool Amber and Mercury, Passing Through, Set 3, Set Your Soul Free, Tweezer, A Song I Heard the Ocean Sing, Backwards Down the Number Line, Meatstick, Bug, Run Like an Antelope, Encore, Loving Cup, Tweezer Reprise

Set 1, Everything's Right, AC/DC Bag, Wolfman's Brother, Nellie Kane, Funky Bitch, Chalk Dust Torture, I Been Around, Joy, Walls of the Cave, Set 2, Blaze On, No Men In No Man's Land, Fuego, Twist, Prince Caspian, Twist, Bouncing Around the Room, Harry Hood, Encore, Contact, Rise/Come Together

Set 1, Cavern, Beauty of My Dreams, If I Could, Weigh, Sand, Back on the Train, Martian Monster, Mercury, Suzy Greenberg, Set 2, Soul Planet, Down with Disease, Guyute, Sneakin' Sally Thro

In [60]:
# create a new dataframe that has ONLY complete datasets (i.e. has Set 1, Set 2, and Set 3)

complete_setlists = pd.DataFrame()

for i, row in all_setlists.iterrows():
    
    # get setlist as list
    setlist = row.setlistdata_clean.split(', ')
    
    # Check for presence of Set 1, Set 2, and Encore
    if 'Set 1' and 'Set 2' and 'Encore' in setlist:
        complete_setlists = complete_setlists.append(row)    

In [63]:
print(f'{complete_setlists.shape[0]} of the {all_setlists.shape[0]} have a Set 1, Set 2, and Encore section')

1430 of the 1726 have a Set 1, Set 2, and Encore section


In [71]:
# reset index
complete_setlists.reset_index(drop=True, inplace=True)

In [76]:
complete_setlists.head()

Unnamed: 0,artist,artistid,gapchart,location,long_date,rating,relative_date,setlistdata,setlistdata_clean,setlistnotes,short_date,showdate,showid,url,venue,venueid
0,<a href='http://phish.net/setlists/phish'>Phis...,1.0,http://phish.net/setlists/gap-chart/phish-dece...,"Burlington, VT, USA",Saturday 12/01/1984,3.7021,34 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,"Set 1, Jam, Wild Child, Bertha, Can't You Hear...",Skippy and Fluffhead featured The Dude of Life...,12/01/1984,1984-12-01,1251262000.0,http://phish.net/setlists/phish-december-01-19...,"<a href=""http://phish.net/venue/2/Nectar%27s"">...",2.0
1,<a href='http://phish.net/setlists/phish'>Phis...,1.0,http://phish.net/setlists/gap-chart/phish-may-...,"Burlington, VT, USA",Friday 05/03/1985,3.6,34 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,"Set 1, Slave to the Traffic Light, Mike's Song...",This show was performed at the Last Day Party ...,05/03/1985,1985-05-03,1251596000.0,http://phish.net/setlists/phish-may-03-1985-un...,"<a href=""http://phish.net/venue/348/University...",348.0
2,<a href='http://phish.net/setlists/phish'>Phis...,1.0,http://phish.net/setlists/gap-chart/phish-apri...,"Burlington, VT, USA",Tuesday 04/01/1986,3.6429,33 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,"Set 1, Quinn the Eskimo, Have Mercy, Harry Hoo...",This show was billed as Hunt&rsquo;s Festival ...,04/01/1986,1986-04-01,1252184000.0,http://phish.net/setlists/phish-april-01-1986-...,"<a href=""http://phish.net/venue/10/Hunt%27s"">H...",10.0
3,<a href='http://phish.net/setlists/phish'>Phis...,1.0,http://phish.net/setlists/gap-chart/phish-octo...,"Burlington, VT, USA",Wednesday 10/15/1986,4.3548,33 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,"Set 1, Alumni Blues, Makisupa Policeman, Skin ...","Before YEM, Page played cocktail-style jazz, i...",10/15/1986,1986-10-15,1252186000.0,http://phish.net/setlists/phish-october-15-198...,"<a href=""http://phish.net/venue/10/Hunt%27s"">H...",10.0
4,<a href='http://phish.net/setlists/phish'>Phis...,1.0,http://phish.net/setlists/gap-chart/phish-marc...,"Plainfield, VT, USA",Friday 03/06/1987,3.4706,32 years ago,<p><span class='set-label'>Set 1</span>: <a hr...,"Set 1, Funky Bitch, Good Times Bad Times, Cori...",Free Bird was an actual attempt at the song an...,03/06/1987,1987-03-06,1252190000.0,http://phish.net/setlists/phish-march-06-1987-...,"<a href=""http://phish.net/venue/347/Goddard_Co...",347.0


In [66]:
# inspect shows
for i, row in complete_setlists.iterrows():
    print(row.setlistdata_clean)
    print()

Set 1, Jam, Wild Child, Bertha, Can't You Hear Me Knocking, Camel Walk, Jam, In the Midnight Hour, Scarlet Begonias, Fire, Fire on the Mountain, Makisupa Policeman, Slave to the Traffic Light, Spanish Flea, Don't Want You No More, Cities, Drums, Skippy the Wondermouse, Fluffhead, Encore, Eyes of the World

Set 1, Slave to the Traffic Light, Mike's Song, Dave's Energy Guide, Big Leg Emma, Set 2, Alumni Blues, Wild Child, Can't You Hear Me Knocking, Jam, Cities, Bring It On Home, Set 3, Scarlet Begonias, Eyes of the World, Whipping Post, McGrupp and the Watchful Hosemasters, Makisupa Policeman, Run Like an Antelope, The Other One, Encore, Anarchy

Set 1, Quinn the Eskimo, Have Mercy, Harry Hood, The Pendulum, Dave's Energy Guide, Icculus, You Enjoy Myself, Set 2, Help on the Way, Slipknot!, AC/DC Bag, McGrupp and the Watchful Hosemasters, Alumni Blues, Letter to Jimmy Page, Alumni Blues, Dear Mrs. Reagan, Encore, Not Fade Away

Set 1, Alumni Blues, Makisupa Policeman, Skin It Back, Citie

Set 1, David Bowie, Sample in a Jar, Tweezer, Bouncing Around the Room, AC/DC Bag, Frankie Says, Llama, Hold Your Head Up, Love You, Hold Your Head Up, Tweezer Reprise, Set 2, Gotta Jibboo, Suzy Greenberg, Jam, Theme From the Bottom, Water in the Sky, Friday, Harry Hood, Encore, Sleeping Monkey, Loving Cup

Set 1, Piper, Foam, Anything But Me, Limb By Limb, Wolfman's Brother, Poor Heart, Cavern, Set 2, Rock and Roll, Twist, Boogie On Reggae Woman, Ghost, Free, Divided Sky, Good Times Bad Times, Encore, Waste, Encore 2, The Squirming Coil

Set 1, Wilson, Sand, Shafty, NICU, Weigh, Cities, Strange Design, Scent of a Mule, Bathtub Gin, Also Sprach Zarathustra, Set 2, Tube, L.A. Woman, Birds of a Feather, L.A. Woman, Makisupa Policeman, P-Funk Medley, Makisupa Policeman, Down with Disease, Encore, Contact, While My Guitar Gently Weeps

Set 1, Wilson, Mike's Song, I Am Hydrogen, Weekapaug Groove, The Moma Dance, Guyute, You Enjoy Myself, First Tube, Tube, Set 2, Stash, Seven Below, Lawn Boy

#### Build full "corpus" 

In [96]:
setlist_list = []

for i, row in complete_setlists.iterrows():
    
    # add a ', ' unless its the last record
    if i == complete_setlists.shape[0]-1:
        setlist = row.setlistdata_clean
    else:
        setlist = row.setlistdata_clean + ', '
    
    # append to full list
    setlist_list.append(setlist)
    
# join to one long string
setlist_string = ''.join(setlist_list)
    

In [97]:
setlist_string

"Set 1, Jam, Wild Child, Bertha, Can't You Hear Me Knocking, Camel Walk, Jam, In the Midnight Hour, Scarlet Begonias, Fire, Fire on the Mountain, Makisupa Policeman, Slave to the Traffic Light, Spanish Flea, Don't Want You No More, Cities, Drums, Skippy the Wondermouse, Fluffhead, Encore, Eyes of the World, Set 1, Slave to the Traffic Light, Mike's Song, Dave's Energy Guide, Big Leg Emma, Set 2, Alumni Blues, Wild Child, Can't You Hear Me Knocking, Jam, Cities, Bring It On Home, Set 3, Scarlet Begonias, Eyes of the World, Whipping Post, McGrupp and the Watchful Hosemasters, Makisupa Policeman, Run Like an Antelope, The Other One, Encore, Anarchy, Set 1, Quinn the Eskimo, Have Mercy, Harry Hood, The Pendulum, Dave's Energy Guide, Icculus, You Enjoy Myself, Set 2, Help on the Way, Slipknot!, AC/DC Bag, McGrupp and the Watchful Hosemasters, Alumni Blues, Letter to Jimmy Page, Alumni Blues, Dear Mrs. Reagan, Encore, Not Fade Away, Set 1, Alumni Blues, Makisupa Policeman, Skin It Back, Citi

## Transform Data for Modeling

#### Create an encoding for songs to integers

In [113]:
setlist_string_list = setlist_string.split(', ')
print(f'Phish has {len(setlist_string_list)} songs/set identifiers in this corpus.')

setlist_string_list

Phish has 35432 songs/set identifiers in this corpus.


['Set 1',
 'Jam',
 'Wild Child',
 'Bertha',
 "Can't You Hear Me Knocking",
 'Camel Walk',
 'Jam',
 'In the Midnight Hour',
 'Scarlet Begonias',
 'Fire',
 'Fire on the Mountain',
 'Makisupa Policeman',
 'Slave to the Traffic Light',
 'Spanish Flea',
 "Don't Want You No More",
 'Cities',
 'Drums',
 'Skippy the Wondermouse',
 'Fluffhead',
 'Encore',
 'Eyes of the World',
 'Set 1',
 'Slave to the Traffic Light',
 "Mike's Song",
 "Dave's Energy Guide",
 'Big Leg Emma',
 'Set 2',
 'Alumni Blues',
 'Wild Child',
 "Can't You Hear Me Knocking",
 'Jam',
 'Cities',
 'Bring It On Home',
 'Set 3',
 'Scarlet Begonias',
 'Eyes of the World',
 'Whipping Post',
 'McGrupp and the Watchful Hosemasters',
 'Makisupa Policeman',
 'Run Like an Antelope',
 'The Other One',
 'Encore',
 'Anarchy',
 'Set 1',
 'Quinn the Eskimo',
 'Have Mercy',
 'Harry Hood',
 'The Pendulum',
 "Dave's Energy Guide",
 'Icculus',
 'You Enjoy Myself',
 'Set 2',
 'Help on the Way',
 'Slipknot!',
 'AC/DC Bag',
 'McGrupp and the Watchf

In [109]:
# get list of all unique songs sorted alphabetically
unique_songs = sorted(set(setlist_string_list))

print(f'Phish has {len(unique_songs)} unique songs.')

Phish has 869 unique songs.


In [116]:
# create a mapping for the encoded songs
mapping = {song:index for index, song in enumerate(unique_songs)}
mapping

{'1999': 0,
 '46 Days': 1,
 '50 Ways to Leave Your Lover': 2,
 '555': 3,
 '5:15': 4,
 '99 Problems': 5,
 'A Apolitical Blues': 6,
 'A Day in the Life': 7,
 'A Song I Heard the Ocean Sing': 8,
 'AC/DC Bag': 9,
 'Access Me': 10,
 'Acoustic Army': 11,
 'After Midnight': 12,
 "Ain't Love Funny": 13,
 'Alaska': 14,
 'Albuquerque': 15,
 'All Along the Watchtower': 16,
 'All Blues': 17,
 'All Down the Line': 18,
 'All That You Dream': 19,
 'All Things Reconsidered': 20,
 'All of These Dreams': 21,
 'All the Pain Through the Years': 22,
 'Also Sprach Zarathustra': 23,
 'Alumni Blues': 24,
 'Alumni Blues Jam': 25,
 'Amazing Grace': 26,
 'Amazing Grace Jam': 27,
 'Ambient Jam': 28,
 'American Woman': 29,
 'Amidst the Peals of Laughter': 30,
 'Amoreena': 31,
 'Anarchy': 32,
 'Any Colour You Like': 33,
 'Anything But Me': 34,
 'Architect': 35,
 'Army of One': 36,
 'Art Jam': 37,
 'Ass Handed': 38,
 'Auld Lang Syne': 39,
 'Avenu Malkenu': 40,
 'Axilla': 41,
 'Axilla (Part II)': 42,
 'Baby Elephant 

#### Apply the mapping to the full "corpus"

In [123]:
encoded_setlist_string_list = [mapping[song] for song in setlist_string_list]
len(encoded_setlist_string_list)

35432

#### Create sequences to feed into the model

Now that we have a long list of encoded songs, we need to break them into sequences to feed to our model.

**NOTE** - the sequence length here is ultimately a hyperparameter that we need to decide on... how many songs is sufficient for predicting what will come next?

I am going to arbitrarily choose 100 to start - rational being that each show is ~20 songs (?) that gives the model 5 shows worth of sequence to learn and guess from...

In [124]:
# need to create a list of integer(encoding) lists

encoded_setlist_string_list

[598,
 362,
 838,
 57,
 110,
 108,
 362,
 350,
 589,
 226,
 227,
 420,
 624,
 641,
 186,
 129,
 197,
 623,
 234,
 205,
 215,
 598,
 624,
 447,
 160,
 64,
 599,
 24,
 838,
 110,
 362,
 129,
 96,
 600,
 589,
 215,
 827,
 429,
 420,
 572,
 735,
 205,
 32,
 598,
 538,
 289,
 288,
 737,
 160,
 340,
 861,
 599,
 297,
 628,
 9,
 429,
 24,
 392,
 24,
 163,
 205,
 490,
 598,
 24,
 420,
 622,
 129,
 316,
 429,
 9,
 861,
 417,
 599,
 515,
 268,
 682,
 677,
 108,
 609,
 468,
 234,
 631,
 843,
 624,
 538,
 447,
 289,
 288,
 600,
 567,
 582,
 32,
 205,
 130,
 598,
 252,
 270,
 146,
 268,
 538,
 631,
 599,
 244,
 286,
 288,
 692,
 529,
 246,
 843,
 205,
 624,
 598,
 612,
 24,
 392,
 24,
 268,
 682,
 677,
 226,
 622,
 129,
 417,
 599,
 180,
 436,
 160,
 686,
 282,
 538,
 9,
 600,
 515,
 234,
 270,
 32,
 420,
 572,
 87,
 766,
 624,
 644,
 429,
 699,
 138,
 316,
 831,
 205,
 250,
 598,
 861,
 417,
 529,
 624,
 631,
 130,
 515,
 729,
 40,
 729,
 420,
 855,
 599,
 175,
 252,
 287,
 101,
 287,
 234,
 270,


In [139]:
length = 100

sequences = []

for i in range(length, len(encoded_setlist_string_list)):
    # select the sequence of ints
    seq = encoded_setlist_string_list[i-length: i+1]
    # append to list
    sequences.append(seq)

print(f'We now have {len(sequences)} sequences.')

We now have 35332 sequences.


#### Split sequences into X, y pairs then train/test split

For each sequence, the first 100 items are X and the last 1 item is y.

In [148]:
sequences_array = np.array(sequences)

In [149]:
X_data, y_data = sequences_array[:,:-1], sequences_array[:,-1]

In [156]:
# input is 35332 lists of length 100
X_data.shape

(35332, 100)

In [157]:
# output is 35332 single encodings
y_data.shape

(35332,)

In [160]:
# split into test and train
# NOTE - unable to stratify because some songs only occur once and stratify needs >1 ... think through this...
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.1, random_state=2)

In [163]:
print(f'X_train is: {X_train.shape}')
print(f'X_test is: {X_test.shape}')
print(f'y_train is: {y_train.shape}')
print(f'y_test is: {y_test.shape}')

X_train is: (31798, 100)
X_test is: (3534, 100)
y_train is: (31798,)
y_test is: (3534,)


In [165]:
# pickle data

pyphish.util.create_pickle_object(obj= X_train, pickle_name='X_train.pickle', file_path='../../modeling-data/mvp-modeling/')
pyphish.util.create_pickle_object(obj= X_test, pickle_name='X_test.pickle', file_path='../../modeling-data/mvp-modeling/')
pyphish.util.create_pickle_object(obj= y_train, pickle_name='y_train.pickle', file_path='../../modeling-data/mvp-modeling/')
pyphish.util.create_pickle_object(obj= y_test, pickle_name='y_test.pickle', file_path='../../modeling-data/mvp-modeling/')

Successfully pickled [[675 599 420 ...   9 861 120]
 [598 647 276 ... 861 582 241]
 [861 565 600 ... 120 746 477]
 ...
 [ 36 282 790 ... 123 598 120]
 [477 787 205 ... 779 237 447]
 [362 123 205 ... 123 205 529]] to C:\Users\anreed\Documents\My Stuff\TrAI\modeling-data\mvp-modeling\X_train.pickle
Successfully pickled [[599 102 573 ... 205 733 674]
 [698 746 477 ... 175 172 722]
 [792 578 727 ... 295 746 205]
 ...
 [599 190 727 ... 578 734 534]
 [550 657 342 ... 603 303 573]
 [146 600 380 ... 235 599  24]] to C:\Users\anreed\Documents\My Stuff\TrAI\modeling-data\mvp-modeling\X_test.pickle
Successfully pickled [643 205 657 ... 108 316 598] to C:\Users\anreed\Documents\My Stuff\TrAI\modeling-data\mvp-modeling\y_train.pickle
Successfully pickled [598 855 415 ... 811 205 392] to C:\Users\anreed\Documents\My Stuff\TrAI\modeling-data\mvp-modeling\y_test.pickle


#### One-hot encode each character in our datasets

Use keras to convert each encoded song into a one hot vector of length 869 (total number of unique songs in our corpus)

In [178]:
# load data from pickles
X_train = pyphish.util.load_pickle_object(file_path='../../modeling-data/mvp-modeling/X_train.pickle')
X_test = pyphish.util.load_pickle_object(file_path='../../modeling-data/mvp-modeling/X_test.pickle')
y_train = pyphish.util.load_pickle_object(file_path='../../modeling-data/mvp-modeling/y_train.pickle')
y_test = pyphish.util.load_pickle_object(file_path='../../modeling-data/mvp-modeling/y_test.pickle')


In [169]:
num_classes = len(unique_songs)
num_classes

869

In [181]:
X_train_hot = np.array([to_categorical(x, num_classes=num_classes) for x in X_train])
# X_test_hot = np.array([to_categorical(x, num_classes=num_classes) for x in X_test])
# y_train_hot = to_categorical(y_train, num_classes=num_classes)
# y_test_hot = to_categorical(y_test, num_classes=num_classes)

MemoryError: 

(100, 869)