# Wilson's Morning Wake Up Playlist Generator, Modeling and Learning

The following steps will be executed:

* Upload your data to S3.
* Define a benchmark and candidate models and training scripts
* Train models and deploy.
* Evaluate deployed estimator.

## Load Data to S3

In [10]:
import pandas as pd
import boto3
import sagemaker

In [11]:
# session and role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# create an S3 bucket
bucket = sagemaker_session.default_bucket()

In [12]:
!ls -la data

total 516
drwxrwxr-x 2 ec2-user ec2-user   4096 Mar 18 23:10 .
drwxrwxr-x 8 ec2-user ec2-user   4096 Mar 18 23:44 ..
-rw-rw-r-- 1 ec2-user ec2-user  28467 Mar  4 23:01 test.csv
-rw-rw-r-- 1 ec2-user ec2-user 196152 Mar 18 23:43 train.csv
-rw-rw-r-- 1 ec2-user ec2-user 122882 Mar 18 23:43 wmw.csv
-rw-rw-r-- 1 ec2-user ec2-user 166951 Mar 18 23:35 wmw_tracks.csv


## Upload your training data to S3

In [13]:
# should be the name of directory you created to save your features data
data_dir = 'data'

# set prefix, a descriptive name for a directory  
prefix = 'sagemaker/wmw_estimator'

# upload all data to S3
input_data = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)

---

# Modeling

It's time to define and train the models!

---

## Complete a training script 

To implement a custom estimator, I need to complete a `train.py` script. 

A typical training script:
* Loads training data from a specified directory
* Parses any training & model hyperparameters (ex. nodes in a neural network, training epochs, etc.)
* Instantiates a model of your design, with any specified hyperparams
* Trains that model 
* Finally, saves the model so that it can be hosted/deployed, later

### Defining and training a model

To complete a `train.py` file, you will:
1. Import any extra libraries you need
2. Define any additional model training hyperparameters using `parser.add_argument`
2. Define a model in the `if __name__ == '__main__':` section
3. Train the model in that same section


In [14]:
# Directory of train.py
!pygmentize model/train.py

Error: cannot read infile: [Errno 2] No such file or directory: 'model/train.py'


---
# Create an Estimator

When a custom model is constructed in SageMaker, an entry point must be specified. This is the Python file which will be executed when the model is trained; the `train.py` function you specified above. To run a custom training script in SageMaker, construct an estimator, and fill in the appropriate constructor arguments:

* **entry_point**: The path to the Python script SageMaker runs for training and prediction.
* **source_dir**: The path to the training script directory `source_sklearn` OR `source_pytorch`.
* **entry_point**: The path to the Python script SageMaker runs for training and prediction.
* **source_dir**: The path to the training script directory `train_sklearn` OR `train_pytorch`.
* **entry_point**: The path to the Python script SageMaker runs for training.
* **source_dir**: The path to the training script directory `train_sklearn` OR `train_pytorch`.
* **role**: Role ARN, which was specified, above.
* **train_instance_count**: The number of training instances (should be left at 1).
* **train_instance_type**: The type of SageMaker instance for training. Note: Because Scikit-learn does not natively support GPU training, Sagemaker Scikit-learn does not currently support training on GPU instance types.
* **sagemaker_session**: The session used to train on Sagemaker.
* **hyperparameters** (optional): A dictionary `{'name':value, ..}` passed to the train function as hyperparameters.

Note: For a PyTorch model, there is another optional argument **framework_version**, which you can set to the latest version of PyTorch, `1.0`.

## Define PyTorch estimators

In [45]:
# Build sequences and targets
def create_playlist_sequences(input_data):
    input_playlists = []
    
    for i in input_data['volume'].unique():
        temp_vol = input_data[input_data['volume'] == i]
        X = temp_vol.iloc[:, 2:10].values
        y = temp_vol.iloc[:, 10:].values
        input_playlists.append((X, y))
        
    return input_playlists

In [46]:
from unittest.mock import MagicMock, patch

def _print_success_message():
    print('Tests Passed!')

def test_playlist_sequences(input_playlists):
    
    track_features = [-2.39099487, -2.63509459, -0.27732204,  0.92969533, -0.48983686,-1.15691947,  1.08569029, -1.20454903,  2.09618458, -5.37044178, 0.23380331]
    
    track_features_len = 11
    target_features_len = 8
    
    # check shape and equality of first track
    assert len(input_playlists[0][0][0]) == len(track_features), \
        'Number of features in input_playlist features does not match expected number of ' + str(len(track_features))    
    
    # check shape of input and output arrays
    assert input_playlists[0][0].shape[1]==track_features_len, \
        'input_features should have as many columns as selected features, got: {}'.format(train_x.shape[1])
    assert input_playlists[0][1].shape[1]==target_features_len, \
        'target_features should have as many columns as selected features, got: {}'.format(train_x.shape[1])
    
    #TODO: Add more tests
    
    _print_success_message()

### Test run of benchmark and candidate models and train components
Here I will see if the configurations I have set work accordingly with no errors. Once it runs smoothly, I will instantiate an estimator using the Sagemaker API.

In [47]:
import os
import torch
import torch.utils.data

train_data = pd.read_csv(os.path.join(data_dir, "train.csv"))

# Gather sequences and targets
processed_data = create_playlist_sequences(train_data)

In [48]:
# Training function for LSTM
def train_lstm(model, train_loader, epochs, criterion, optimizer, device):
    """
    This is the training method that is called by the PyTorch training script of the LSTM model. The parameters
    passed are as follows:
    model        - The PyTorch model that we wish to train.
    train_loader - The PyTorch DataLoader that should be used during training.
    epochs       - The total number of epochs to train for.
    criterion    - The loss function used for training. 
    optimizer    - The optimizer to use during training.
    device       - Where the model and data should be loaded (gpu or cpu).
    """
    
    # training loop is provided
    for epoch in range(1, epochs + 1):
        
        model.train() # Make sure that the model is in training mode.

        total_loss = 0

        for batch in train_loader:
            
            # get data
            batch_x, batch_y = batch
            
            # 
            batch_x = torch.from_numpy(batch_x).float().squeeze()
            batch_y = torch.from_numpy(batch_y).float()

            batch_x = batch_x.to(device)
            batch_y = batch_y.to(device)

            optimizer.zero_grad()
            
            model.hidden_cell = (torch.zeros(1, 1, model.hidden_layer_dim),
                torch.zeros(1, 1, model.hidden_layer_dim))

            # get predictions from model
            y_pred = model(batch_x)
            
            # perform backprop
            loss = criterion(y_pred, batch_y)
            loss.backward()
            optimizer.step()
            
            total_loss += loss.data.item()
            
        if epoch%25 == 1:
            print("Epoch: {}, Loss: {}".format(epoch, total_loss / len(train_loader)))

In [50]:
import torch.optim as optim
from model.LSTM_Estimator import LSTMEstimator

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMEstimator(8, 30, 1, 8)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = torch.nn.L1Loss()

train_lstm(model, processed_data, 100, loss_fn, optimizer, device)

Epoch: 1, Loss: 0.7053991008449245
Epoch: 26, Loss: 0.5734052601698283
Epoch: 51, Loss: 0.5347163717488985
Epoch: 76, Loss: 0.49401476093240687


In [51]:
%env SPOTIFY_EMAIL=gillaw06@gmail.com

env: SPOTIFY_EMAIL=gillaw06@gmail.com


In [52]:
%env SPOTIFY_ID=ce1d1ca394724265951a48a0deea6d01

env: SPOTIFY_ID=ce1d1ca394724265951a48a0deea6d01


In [53]:
%env SPOTIFY_SECRET=3ce5bb4c8c18423f9e8b3f12db963e31

env: SPOTIFY_SECRET=3ce5bb4c8c18423f9e8b3f12db963e31


In [54]:
!pip install spotipy

[31mfastai 1.0.60 requires nvidia-ml-py3, which is not installed.[0m
[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [55]:
# Spotify API
import spotipy
import spotipy.util as util

# Defaults
import os
import sys

# Spotify for developers client auth variables
username = os.environ['SPOTIFY_EMAIL']
spotify_id = os.environ['SPOTIFY_ID']
spotify_secret = os.environ['SPOTIFY_SECRET']

# Set API scope
scope='playlist-read-private'

# Get auth token
token = util.prompt_for_user_token(username, 
                                   scope,
                                   client_id=spotify_id,
                                   client_secret=spotify_secret,
                                   redirect_uri='http://localhost/')

In [56]:
from spotipy.oauth2 import SpotifyClientCredentials

In [57]:

#Authenticate
sp = spotipy.Spotify(
    client_credentials_manager = SpotifyClientCredentials(
        client_id=spotify_id,
        client_secret=spotify_secret
    )
)

In [58]:
# Read in WMW tracks to date for recommendations
track_data = pd.read_csv(os.path.join(data_dir, "wmw_tracks.csv"))

track_data.head()

Unnamed: 0,volume,position,track_name,artist_name,danceability,energy,key,loudness,mode,speechiness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,38,1,Finding It There,Goldmund,0.187,0.00257,1,-37.134,1,0.0427,...,0.0915,0.0374,123.707,audio_features,6CnPCuUcM3A5PMP4gUy0vw,spotify:track:6CnPCuUcM3A5PMP4gUy0vw,https://api.spotify.com/v1/tracks/6CnPCuUcM3A5...,https://api.spotify.com/v1/audio-analysis/6CnP...,220120,5
1,38,2,Light Forms,Rohne,0.671,0.545,10,-12.848,0,0.0393,...,0.118,0.284,133.036,audio_features,6MkUPsz5hYeneo0a9H0VT8,spotify:track:6MkUPsz5hYeneo0a9H0VT8,https://api.spotify.com/v1/tracks/6MkUPsz5hYen...,https://api.spotify.com/v1/audio-analysis/6MkU...,265870,4
2,38,3,C-Side,Khruangbin,0.688,0.779,11,-10.129,0,0.0579,...,0.349,0.938,94.073,audio_features,6GvAM8oyVApQHGMgpBt8yl,spotify:track:6GvAM8oyVApQHGMgpBt8yl,https://api.spotify.com/v1/tracks/6GvAM8oyVApQ...,https://api.spotify.com/v1/audio-analysis/6GvA...,283407,4
3,38,4,Didn't I (Dave Allison Rework),Darondo,0.539,0.705,0,-6.729,1,0.0527,...,0.133,0.685,186.033,audio_features,1owjOeZt1BdYWW6T8fIAEe,spotify:track:1owjOeZt1BdYWW6T8fIAEe,https://api.spotify.com/v1/tracks/1owjOeZt1BdY...,https://api.spotify.com/v1/audio-analysis/1owj...,328000,4
4,38,5,Woman Of The Ghetto - Akshin Alizadeh Remix,Marlena Shaw,0.707,0.573,7,-8.403,0,0.0276,...,0.0858,0.189,100.006,audio_features,2h8cQH7zhUWrynZi2MKhhC,spotify:track:2h8cQH7zhUWrynZi2MKhhC,https://api.spotify.com/v1/tracks/2h8cQH7zhUWr...,https://api.spotify.com/v1/audio-analysis/2h8c...,302467,4


In [19]:
from tqdm.notebook import tqdm
from sklearn.externals import joblib

feature_list =  ['danceability','energy', 'loudness', 'speechiness', 'acousticness',
                 'instrumentalness', 'liveness', 'valence','mode','key','tempo']

std_scaler = joblib.load('standard_features.pkl')

class Playlist():
    def __init__(self):
        self.name = "Wilson's Morning Wake Up Vol. Test"
        self.intro_songs = []
        self.search_results = []
        self.recommended_track_ids = pd.DataFrame() #list of track ids straight from spotify
        self.trax = [] #all tracks as dict
        self.df = None #this is where the data goes
        self.playlist = None
        
       
        # DO EVERYTHING
        self.get_recommendations() # Grab recommendations based on full WMW catalog
        self.prep_features() # Prepare features using StandardScaler
#         self.get_predictions() # Generate features for each track position for new WMW
        
        
    def get_recommendations(self):
        print('Getting Recommendations...')
        
        # Iterate full catalog of WMW songs
        for _, row in tqdm(track_data[track_data['volume'] == 38].iterrows(), total=track_data[track_data['volume'] == 38].shape[0]):
            song_search = row['track_name'].partition('-')[0] + ' ' + row['artist_name']
            try:
        
                # Query Spotify to get track metadata
                song_res = sp.search(song_search, limit=1)['tracks']['items'][0]

                self.search_results.append({
                    'id': song_res['id'],
                    'artists': [i['name'] for i in song_res['artists']],
                    'name': song_res['name']
                })
                
                # Gather recommendations for each of the past WMW tracks
                results = sp.recommendations(seed_tracks = [song_res['id']], limit=10)

                for r in results['tracks']:
                    track={}
                    track['id'] = r['id']
                    track['artists'] = [i['name'] for i in r['artists']],
                    track['name'] = r['name']
                    track_features = sp.audio_features(r['id'])[0]
                    track.update(track_features)
                    final_track = pd.DataFrame(track, index=[0])
                    self.recommended_track_ids = self.recommended_track_ids.append(final_track, ignore_index=True)
                    
            except:
                print("Song not searchable")
        
        return self.recommended_track_ids
    
    
    def prep_features(self):
        self.recommended_track_ids[feature_list] = std_scaler.transform(self.recommended_track_ids[feature_list])
            
    
    def generate_playlist_features(model, intro_tracks, predict_len=15):
        hidden = model.init_hidden()
        
        # extracts features from intro tracks
        intro_input = text_to_tensor()
        
        # predicted playlist
        predicted = intro_tracks
        
        # build up hidden state
        for p in range(len(intro_tracks) - 1):
            _, hidden = model(intro_input[p], hidden)
        inp = intro_input[-1]
        
        for p in range(predict_len):
            output, hidden = model(inp, hidden)
            


In [35]:
pl = Playlist()

Getting Recommendations...


HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))






In [None]:
pl.recommended_track_ids

In [173]:
recommended = pd.DataFrame()

# Iterate full catalog of WMW songs
for _, row in tqdm(track_data[track_data['volume'] == 38].iterrows(), total=track_data[track_data['volume'] == 38].shape[0]):
    song_search = row['track_name'].partition('-')[0] + ' ' + row['artist_name']
    try:

        # Query Spotify to get track metadata
        song_res = sp.search(song_search, limit=1)['tracks']['items'][0]

        # Gather recommendations for each of the past WMW tracks
        results = sp.recommendations(seed_tracks = [song_res['id']], limit=10)

        for r in results['tracks']:
            track={}
            track['id'] = r['id']
            track['artists'] = [i['name'] for i in r['artists']],
            track['name'] = r['name']
            track_features = sp.audio_features(r['id'])[0]
            track.update(track_features)
            final_track = pd.DataFrame(track, index=[0])
            recommended = recommended.append(final_track, ignore_index=True)

    except:
        print("Song not searchable")
        
recommended[feature_list] = std_scaler.transform(recommended[feature_list])
        

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))




In [167]:
recommended[feature_list].head()

Unnamed: 0,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence
0,-2.843595,-1.574993,-1.041479,-0.206284,1.211163,0.824983,-0.79854,-1.171175
1,-1.79682,-1.775794,-1.62081,-0.536784,1.830972,0.694344,-0.392616,-1.170329
2,-2.5861,-2.512685,-3.840984,-0.416196,2.107455,1.01949,1.618177,-1.037055
3,-1.332209,-2.382398,-1.895296,-0.291142,2.143914,0.883045,-0.362491,-0.976553
4,-0.811621,-1.075327,-0.75506,-0.489889,1.958579,0.920785,-0.347429,-0.972745


In [80]:
song = [ 0.0609,  1.0491,  0.5765, -0.3122, -0.6526,  0.0296, -0.7938,  0.0199]

from scipy.spatial.distance import cdist

import numpy as np

np.argmin(cdist([song], recommended[feature_list]))

142

In [154]:
recommended.iloc[142]

id                                             4Zwo8D1koSzLJCYd5NlX81
artists                                                       [Tycho]
name                                                     Outer Sunset
danceability                                                 0.526236
energy                                                       0.545087
key                                                                 8
loudness                                                   -0.0348222
mode                                                                1
speechiness                                                 -0.235315
acousticness                                                 0.691617
instrumentalness                                             0.824983
liveness                                                    -0.473198
valence                                                       1.59245
tempo                                                          96.994
type                

In [178]:
from tqdm.notebook import tqdm
from sklearn.externals import joblib

feature_list =  ['danceability','energy', 'loudness', 'speechiness', 'acousticness',
                 'instrumentalness', 'liveness', 'valence']

std_scaler = joblib.load('standard_features.pkl')

def predict_playlist(model, initial_songs=[], predict_len=15):
    global recommended
    intro_tracks = pd.DataFrame() #list of track ids straight from spotify
    
    model.eval()
    
    # Iterate full catalog of WMW songs
    for song in tqdm(initial_songs, total=len(initial_songs)):
                
        song_search = song
                
        try:
            
            # Query Spotify to get track metadata
            song_res = sp.search(song_search, limit=1)['tracks']['items'][0]
            
            track = {
                'id': song_res['id'],
                'artists': [i['name'] for i in song_res['artists']],
                'name': song_res['name']
            }
            
            track_features = sp.audio_features(track['id'])[0]
            
            track.update(track_features)
            
            final_track = pd.DataFrame(track, index=[0])
            
            intro_tracks = intro_tracks.append(final_track, ignore_index=True)
                
        except:
            print("Song not searchable")

    intro_tracks[feature_list] = std_scaler.transform(intro_tracks[feature_list])
            
    predicted = intro_tracks

    inp = torch.FloatTensor(intro_tracks[feature_list].values)
    
    print("Intro", inp)
    
    sample = recommended.copy()

    for p in range(predict_len):
        output = model(inp).detach().numpy()
#         print(recommended.shape)
        next_song_id = np.argmin(cdist(output, sample[feature_list]))
#         print(next_song_id)
        next_song = sample.iloc[next_song_id].copy()
        inp = torch.FloatTensor([next_song[feature_list]])
        print("New song", next_song['artists'], next_song['name'])
        predicted = predicted.append(next_song, ignore_index=True)
        sample = sample.drop([next_song_id], axis=0).reset_index(drop=True)
        
    return predicted

In [179]:
initial_songs = ['luke howard portrait gallery']

predict_playlist(model, initial_songs=initial_songs, predict_len=15)

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


Intro tensor([[-1.7968, -2.6588, -4.1506, -0.4095,  2.1287,  0.7989, -0.4152, -0.9601]])
New song ['Doon Kanda'] Nastasya
New song ['Spencer Brown', 'Ben Böhmer'] SF to Berlin
New song ['Goldroom', 'Love, Alexa'] I Can Feel It
New song ['Eli & Fur'] High West
New song ['Crooked Colours'] Flow
New song ['Andy Stott'] New Romantic
New song ['Nora En Pure'] Birthright
New song ['Amtrac', 'Totally Enormous Extinct Dinosaurs'] Radical - Edit
New song ['Township Rebellion'] Kristalle
New song ['Avoure'] Aura - Edit
New song ['ITO'] Window Drops
New song ['Yotto', 'Joseph Ray'] Nova - Joseph Ray Remix
New song ['Against All Logic'] Now U Got Me Hooked
New song ['Kaskade', 'Lipless'] My Light
New song ['Xinobi', 'James Grant', 'Jody Wisternoff'] Far Away Place - Jody Wisternoff & James Grant Remix


Unnamed: 0,id,artists,name,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms,time_signature
0,3oRuCzqdduzZ2CBhAtv8zO,Luke Howard,Portrait Gallery,-1.79682,-2.658849,1,-4.150644,1,-0.409497,2.128723,0.798855,-0.415209,-0.960053,127.536,audio_features,spotify:track:3oRuCzqdduzZ2CBhAtv8zO,https://api.spotify.com/v1/tracks/3oRuCzqdduzZ...,https://api.spotify.com/v1/audio-analysis/3oRu...,367667,4
1,1sjNonmNGniFOzOTJgfZp0,[Doon Kanda],Nastasya,0.431074,1.040084,0,0.357121,0,-0.355902,-0.329245,0.032438,-0.21187,0.424724,105.019,audio_features,spotify:track:1sjNonmNGniFOzOTJgfZp0,https://api.spotify.com/v1/tracks/1sjNonmNGniF...,https://api.spotify.com/v1/audio-analysis/1sjN...,265691,4
2,4AYZu6Uym0B7t3RqmAGocR,"[Spencer Brown, Ben Böhmer]",SF to Berlin,0.095211,0.372305,5,-0.297583,0,-0.237548,-0.368743,0.758212,-0.712686,-0.099909,124.024,audio_features,spotify:track:4AYZu6Uym0B7t3RqmAGocR,https://api.spotify.com/v1/tracks/4AYZu6Uym0B7...,https://api.spotify.com/v1/audio-analysis/4AYZ...,373133,4
3,2Pe8kPCiJSI363uO0Q4eB2,"[Goldroom, Love, Alexa]",I Can Feel It,0.369499,0.960698,7,0.174968,0,-0.277744,-0.656164,0.749502,-0.794774,0.22164,124.008,audio_features,spotify:track:2Pe8kPCiJSI363uO0Q4eB2,https://api.spotify.com/v1/tracks/2Pe8kPCiJSI3...,https://api.spotify.com/v1/audio-analysis/2Pe8...,326653,4
4,2aSKl24mMkjZyWCSEOZ9Fx,[Eli & Fur],High West,0.279936,0.914,1,0.631816,1,-0.387166,-0.867926,0.671119,0.096903,-0.738776,122.003,audio_features,spotify:track:2aSKl24mMkjZyWCSEOZ9Fx,https://api.spotify.com/v1/tracks/2aSKl24mMkjZ...,https://api.spotify.com/v1/audio-analysis/2aSK...,259734,4
5,0BQ0ZzRiojTMbWeMbNw6LF,[Crooked Colours],Flow,-0.526137,0.218203,6,0.612973,0,-0.157156,-0.37482,-0.286903,-0.287181,-0.315685,118.017,audio_features,spotify:track:0BQ0ZzRiojTMbWeMbNw6LF,https://api.spotify.com/v1/tracks/0BQ0ZzRiojTM...,https://api.spotify.com/v1/audio-analysis/0BQ0...,288651,4
6,4XGQUYPYfHpCMrO8BKOk5F,[Andy Stott],New Romantic,0.470258,0.218203,1,0.21747,1,-0.445227,-0.714499,0.160174,-0.561311,-1.070902,99.979,audio_features,spotify:track:4XGQUYPYfHpCMrO8BKOk5F,https://api.spotify.com/v1/tracks/4XGQUYPYfHpC...,https://api.spotify.com/v1/audio-analysis/4XGQ...,339460,4
7,24KBMWfxD8l3z9JwfgU9oL,[Nora En Pure],Birthright,0.106406,0.615134,11,0.232126,1,-0.434061,-0.816889,0.854014,-0.076311,-1.18556,122.021,audio_features,spotify:track:24KBMWfxD8l3z9JwfgU9oL,https://api.spotify.com/v1/tracks/24KBMWfxD8l3...,https://api.spotify.com/v1/audio-analysis/24KB...,183566,4
8,7Le48y7wfWRq63dRAzVmWF,"[Amtrac, Totally Enormous Extinct Dinosaurs]",Radical - Edit,0.167981,0.806595,6,0.555396,0,-0.413963,-0.529772,-0.11562,-0.407678,-1.170752,122.024,audio_features,spotify:track:7Le48y7wfWRq63dRAzVmWF,https://api.spotify.com/v1/tracks/7Le48y7wfWRq...,https://api.spotify.com/v1/audio-analysis/7Le4...,237978,4
9,6IgIPWjUrRBTIm07t7H64S,[Township Rebellion],Kristalle,1.030031,0.134147,10,0.21433,1,0.030425,-0.844416,0.586929,-0.204339,0.306258,119.986,audio_features,spotify:track:6IgIPWjUrRBTIm07t7H64S,https://api.spotify.com/v1/tracks/6IgIPWjUrRBT...,https://api.spotify.com/v1/audio-analysis/6IgI...,484005,3


In [11]:
# # Training function
# def train_rnn(model, train_loader, epochs, criterion, optimizer, device):
#     """
#     This is the training method that is called by the PyTorch training script. The parameters
#     passed are as follows:
#     model        - The PyTorch model that we wish to train.
#     train_loader - The PyTorch DataLoader that should be used during training.
#     epochs       - The total number of epochs to train for.
#     criterion    - The loss function used for training. 
#     optimizer    - The optimizer to use during training.
#     device       - Where the model and data should be loaded (gpu or cpu).
#     """
    
#     # training loop is provided
#     for epoch in range(1, epochs + 1):
#         model.train() # Make sure that the model is in training mode.

#         total_loss = 0
        
#         hidden = model.initHidden()

#         for batch in train_loader:
            
#             # get data
#             batch_x, batch_y = batch
            
#             # 
#             batch_x = torch.from_numpy(batch_x).float().squeeze()
#             batch_y = torch.from_numpy(batch_y).float()

#             batch_x = batch_x.to(device)
#             batch_y = batch_y.to(device)

#             optimizer.zero_grad()

#             y_pred = []
            
#             # get predictions
#             for x in batch_x:
#                 y, hidden = model(x, hidden)
#                 y_pred.append(y)
            
#             # perform backprop
#             loss = criterion(y_pred, batch_y)
#             loss.backward()
#             optimizer.step()
            
#             total_loss += loss.data.item()
            
#         if epoch%25 == 1:
#             print("Epoch: {}, Loss: {}".format(epoch, total_loss / len(train_loader)))

#TODO: Create working RNN Benchmark model

In [18]:
# import torch.optim as optim
# from model.RnnEstimator import RNNEstimator

# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model = RNNEstimator(11, 30, 8)
# optimizer = optim.Adam(model.parameters(), lr=0.001)
# loss_fn = torch.nn.L1Loss()

# train_rnn(model, processed_data, 100, loss_fn, optimizer, device)

### Build and Train the PyTorch Model with Hyperparameter Tuning

In [15]:
# Estimator code
from sagemaker.pytorch import PyTorch
output_path = 's3://{}/{}'.format(bucket, prefix)

estimator = PyTorch(entry_point="LSTM_Train.py",
                    source_dir="model",
                    role=role,
                    framework_version='0.4.0',
                    train_instance_count=1,
                    output_path = output_path,
                    train_instance_type='ml.m4.xlarge',
                    hyperparameters={
                        'input_features': 11,
                        'hidden_dim': 12,
                        'output_dim': 8,
                        'epochs': 100
                    })

In [16]:
# Fit estimator
estimator.fit({'train': input_data})

2020-03-05 03:44:03 Starting - Starting the training job...
2020-03-05 03:44:04 Starting - Launching requested ML instances.........
2020-03-05 03:45:34 Starting - Preparing the instances for training.........
2020-03-05 03:47:17 Downloading - Downloading input data
2020-03-05 03:47:17 Training - Downloading the training image..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-03-05 03:47:37,157 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-03-05 03:47:37,160 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-03-05 03:47:37,172 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-03-05 03:47:37,176 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-03-05 03:47:37,390 sagemaker-containers INFO     Module LSTM_Train doe

In [17]:
%%time

# deploy your model to create a predictor
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

-----------!CPU times: user 254 ms, sys: 15.1 ms, total: 270 ms
Wall time: 5min 32s


In [28]:
torch.Tensor(processed_data[0][0]).float()

tensor([[-2.3910e+00, -2.6351e+00, -2.7732e-01,  9.2970e-01, -4.8984e-01,
         -1.1569e+00,  1.0857e+00, -1.2045e+00,  2.0962e+00, -5.3704e+00,
          2.3380e-01],
        [ 2.9102e-01, -1.3109e-01, -3.5296e-01,  6.4590e-01, -2.8399e-01,
         -1.2541e-01, -9.2107e-01,  1.4323e+00,  6.8586e-01, -3.7562e-01,
          6.8282e-01],
        [ 3.8522e-01,  9.4912e-01,  6.0842e-02, -1.4610e+00,  1.5104e+00,
          2.6102e+00, -9.2107e-01,  1.7253e+00, -7.7038e-01,  1.8358e-01,
         -1.1925e+00],
        [-4.4044e-01,  6.0751e-01, -5.4846e-02, -1.4123e+00, -1.6747e-01,
          1.5519e+00,  1.0857e+00, -1.4975e+00, -8.6166e-01,  8.8285e-01,
          3.2336e+00],
        [ 4.9051e-01, -1.8348e-03, -6.1326e-01, -9.6511e-01, -5.3411e-01,
         -5.2279e-01, -9.2107e-01,  5.5335e-01, -8.1149e-01,  5.3856e-01,
         -9.0695e-01],
        [ 6.9554e-01,  1.2076e+00, -5.0396e-02,  8.1217e-01, -7.6404e-01,
         -7.5214e-02,  1.0857e+00, -3.2617e-02,  3.1377e-01,  1.0708e-0

In [31]:
preds = predictor.predict(torch.Tensor(processed_data[0][0]).float())

In [32]:
len(preds)

15

In [49]:
torch.Tensor(new_tracks[-1]).float()

tensor([-2.3910, -2.6351, -0.2773,  0.9297, -0.4898, -1.1569,  1.0857, -1.2045,
         2.0962, -5.3704,  0.2338])

In [57]:
fut_pred = processed_data[0][0][0]

[-2.3909948690196825,
 -2.635094590468726,
 -0.2773220412902482,
 0.9296953263811546,
 -0.4898368594156362,
 -1.1569194705342014,
 1.0856902892884872,
 -1.2045490262286025,
 2.0961845785776654,
 -5.370441776536932,
 0.23380331292868914]

In [62]:
# model.eval()

fut_pred = processed_data[0][0][0]

playlist_len = 15

new_tracks = [torch.Tensor(fut_pred).float()]

print(new_tracks[-1])


predictor.predict(new_tracks[-1])

# for i in range(playlist_len - len(fut_pred)):
#         print(i)
#         print(predictor.predict(new_tracks[-1].values))
#         break

tensor([-2.3910, -2.6351, -0.2773,  0.9297, -0.4898, -1.1569,  1.0857, -1.2045,
         2.0962, -5.3704,  0.2338])


ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request.  Either the server is overloaded or there is an error in the application.</p>
". See https://ap-southeast-2.console.aws.amazon.com/cloudwatch/home?region=ap-southeast-2#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-pytorch-2020-03-05-03-44-02-776 in account 999752527953 for more information.