# Wilson's Morning Wake Up Playlist Generator, Modeling and Learning

The following steps will be executed:

* Upload your data to S3.
* Define a benchmark and candidate models and training scripts
* Train models and deploy.
* Evaluate deployed estimator.

In [234]:
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
import numpy as np
from tqdm.notebook import tqdm
from sklearn.externals import joblib

## Load Data to S3

In [4]:
# import boto3
# import sagemaker

In [5]:
# # session and role
# sagemaker_session = sagemaker.Session()
# role = sagemaker.get_execution_role()

# # create an S3 bucket
# bucket = sagemaker_session.default_bucket()

In [6]:
!ls -la data

'ls' is not recognized as an internal or external command,
operable program or batch file.


## Upload your training data to S3

In [7]:
# should be the name of directory you created to save your features data
data_dir = 'data'

In [8]:
# # set prefix, a descriptive name for a directory  
# prefix = 'sagemaker/wmw_estimator'

# # upload all data to S3
# input_data = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)

---

# Modeling

It's time to define and train the models!

---

## Complete a training script 

To implement a custom estimator, I need to complete a `train.py` script. 

A typical training script:
* Loads training data from a specified directory
* Parses any training & model hyperparameters (ex. nodes in a neural network, training epochs, etc.)
* Instantiates a model of your design, with any specified hyperparams
* Trains that model 
* Finally, saves the model so that it can be hosted/deployed, later

### Defining and training a model

To complete a `train.py` file, you will:
1. Import any extra libraries you need
2. Define any additional model training hyperparameters using `parser.add_argument`
2. Define a model in the `if __name__ == '__main__':` section
3. Train the model in that same section


In [9]:
# Directory of train.py
!pygmentize model/train.py

Error: cannot read infile: [Errno 2] No such file or directory: 'model/train.py'


---
# Create an Estimator

When a custom model is constructed in SageMaker, an entry point must be specified. This is the Python file which will be executed when the model is trained; the `train.py` function you specified above. To run a custom training script in SageMaker, construct an estimator, and fill in the appropriate constructor arguments:

* **entry_point**: The path to the Python script SageMaker runs for training and prediction.
* **source_dir**: The path to the training script directory `source_sklearn` OR `source_pytorch`.
* **entry_point**: The path to the Python script SageMaker runs for training and prediction.
* **source_dir**: The path to the training script directory `train_sklearn` OR `train_pytorch`.
* **entry_point**: The path to the Python script SageMaker runs for training.
* **source_dir**: The path to the training script directory `train_sklearn` OR `train_pytorch`.
* **role**: Role ARN, which was specified, above.
* **train_instance_count**: The number of training instances (should be left at 1).
* **train_instance_type**: The type of SageMaker instance for training. Note: Because Scikit-learn does not natively support GPU training, Sagemaker Scikit-learn does not currently support training on GPU instance types.
* **sagemaker_session**: The session used to train on Sagemaker.
* **hyperparameters** (optional): A dictionary `{'name':value, ..}` passed to the train function as hyperparameters.

Note: For a PyTorch model, there is another optional argument **framework_version**, which you can set to the latest version of PyTorch, `1.0`.

## Define PyTorch estimators

In [10]:
# Build sequences and targets
def create_playlist_sequences(input_data):
    input_playlists = []
    
    for i in input_data['volume'].unique():
        temp_vol = input_data[input_data['volume'] == i]
        X = temp_vol.iloc[:, 2:10].values
        y = temp_vol.iloc[:, 10:].values
        input_playlists.append((X, y))
        
    return input_playlists

In [11]:
from unittest.mock import MagicMock, patch

def _print_success_message():
    print('Tests Passed!')

def test_playlist_sequences(input_playlists):
    
    track_features = [-2.39099487, -2.63509459, -0.27732204,  0.92969533, -0.48983686,-1.15691947,  1.08569029, -1.20454903,  2.09618458, -5.37044178, 0.23380331]
    
    track_features_len = 11
    target_features_len = 8
    
    # check shape and equality of first track
    assert len(input_playlists[0][0][0]) == len(track_features), \
        'Number of features in input_playlist features does not match expected number of ' + str(len(track_features))    
    
    # check shape of input and output arrays
    assert input_playlists[0][0].shape[1]==track_features_len, \
        'input_features should have as many columns as selected features, got: {}'.format(train_x.shape[1])
    assert input_playlists[0][1].shape[1]==target_features_len, \
        'target_features should have as many columns as selected features, got: {}'.format(train_x.shape[1])
    
    #TODO: Add more tests
    
    _print_success_message()

### Test run of benchmark and candidate models and train components
Here I will see if the configurations I have set work accordingly with no errors. Once it runs smoothly, I will instantiate an estimator using the Sagemaker API.

In [12]:
import os
import torch
import torch.utils.data

train_data = pd.read_csv(os.path.join(data_dir, "train.csv"))

# Gather sequences and targets
processed_data = create_playlist_sequences(train_data)

In [13]:
# Training function for LSTM
def train_lstm(model, train_loader, epochs, criterion, optimizer, device):
    """
    This is the training method that is called by the PyTorch training script of the LSTM model. The parameters
    passed are as follows:
    model        - The PyTorch model that we wish to train.
    train_loader - The PyTorch DataLoader that should be used during training.
    epochs       - The total number of epochs to train for.
    criterion    - The loss function used for training. 
    optimizer    - The optimizer to use during training.
    device       - Where the model and data should be loaded (gpu or cpu).
    """
    
    # training loop is provided
    for epoch in range(1, epochs + 1):
        
        model.train() # Make sure that the model is in training mode.

        total_loss = 0

        for batch in train_loader:
            
            # get data
            batch_x, batch_y = batch
            
            # 
            batch_x = torch.from_numpy(batch_x).float().squeeze()
            batch_y = torch.from_numpy(batch_y).float()

            batch_x = batch_x.to(device)
            batch_y = batch_y.to(device)

            optimizer.zero_grad()
            
            model.hidden_cell = (torch.zeros(1, 1, model.hidden_layer_dim),
                torch.zeros(1, 1, model.hidden_layer_dim))

            # get predictions from model
            y_pred = model(batch_x)
            
            # perform backprop
            loss = criterion(y_pred, batch_y)
            loss.backward()
            optimizer.step()
            
            total_loss += loss.data.item()
            
        if epoch%25 == 1:
            print("Epoch: {}, Loss: {}".format(epoch, total_loss / len(train_loader)))

In [14]:
import torch.optim as optim
from model.LSTM_Estimator import LSTMEstimator

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMEstimator(8, 30, 1, 8)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = torch.nn.L1Loss()

train_lstm(model, processed_data, 100, loss_fn, optimizer, device)

Epoch: 1, Loss: 0.6786920992103783
Epoch: 26, Loss: 0.5698046370132549
Epoch: 51, Loss: 0.5307731805620967
Epoch: 76, Loss: 0.5026495650007918


In [15]:
# %env SPOTIFY_EMAIL=gillaw06@gmail.com

In [12]:
# %env SPOTIFY_ID=ce1d1ca394724265951a48a0deea6d01

env: SPOTIFY_ID=ce1d1ca394724265951a48a0deea6d01


In [13]:
# %env SPOTIFY_SECRET=3ce5bb4c8c18423f9e8b3f12db963e31

env: SPOTIFY_SECRET=3ce5bb4c8c18423f9e8b3f12db963e31


In [14]:
# !pip install spotipy



In [16]:
# Spotify API
import spotipy
import spotipy.util as util

# Defaults
import os
import sys

# Spotify for developers client auth variables
username = os.environ['SPOTIFY_EMAIL']
spotify_id = os.environ['SPOTIFY_ID']
spotify_secret = os.environ['SPOTIFY_SECRET']

# Set API scope
scope='playlist-read-private'

# Get auth token
token = util.prompt_for_user_token(username, 
                                   scope,
                                   client_id=spotify_id,
                                   client_secret=spotify_secret,
                                   redirect_uri='http://localhost/')

In [17]:
from spotipy.oauth2 import SpotifyClientCredentials

In [18]:
#Authenticate
sp = spotipy.Spotify(
    client_credentials_manager = SpotifyClientCredentials(
        client_id=spotify_id,
        client_secret=spotify_secret
    )
)

In [19]:
# Read in WMW tracks to date for recommendations
track_data = pd.read_csv(os.path.join(data_dir, "wmw_tracks.csv"))

track_data.head()

Unnamed: 0,volume,position,track_name,artist_name,danceability,energy,key,loudness,mode,speechiness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,38,1,Finding It There,Goldmund,0.187,0.00257,1,-37.134,1,0.0427,...,0.0915,0.0374,123.707,audio_features,6CnPCuUcM3A5PMP4gUy0vw,spotify:track:6CnPCuUcM3A5PMP4gUy0vw,https://api.spotify.com/v1/tracks/6CnPCuUcM3A5...,https://api.spotify.com/v1/audio-analysis/6CnP...,220120,5
1,38,2,Light Forms,Rohne,0.671,0.545,10,-12.848,0,0.0393,...,0.118,0.284,133.036,audio_features,6MkUPsz5hYeneo0a9H0VT8,spotify:track:6MkUPsz5hYeneo0a9H0VT8,https://api.spotify.com/v1/tracks/6MkUPsz5hYen...,https://api.spotify.com/v1/audio-analysis/6MkU...,265870,4
2,38,3,C-Side,Khruangbin,0.688,0.779,11,-10.129,0,0.0579,...,0.349,0.938,94.073,audio_features,6GvAM8oyVApQHGMgpBt8yl,spotify:track:6GvAM8oyVApQHGMgpBt8yl,https://api.spotify.com/v1/tracks/6GvAM8oyVApQ...,https://api.spotify.com/v1/audio-analysis/6GvA...,283407,4
3,38,4,Didn't I (Dave Allison Rework),Darondo,0.539,0.705,0,-6.729,1,0.0527,...,0.133,0.685,186.033,audio_features,1owjOeZt1BdYWW6T8fIAEe,spotify:track:1owjOeZt1BdYWW6T8fIAEe,https://api.spotify.com/v1/tracks/1owjOeZt1BdY...,https://api.spotify.com/v1/audio-analysis/1owj...,328000,4
4,38,5,Woman Of The Ghetto - Akshin Alizadeh Remix,Marlena Shaw,0.707,0.573,7,-8.403,0,0.0276,...,0.0858,0.189,100.006,audio_features,2h8cQH7zhUWrynZi2MKhhC,spotify:track:2h8cQH7zhUWrynZi2MKhhC,https://api.spotify.com/v1/tracks/2h8cQH7zhUWr...,https://api.spotify.com/v1/audio-analysis/2h8c...,302467,4


In [20]:
from tqdm.notebook import tqdm
from sklearn.externals import joblib

feature_list =  ['danceability','energy', 'loudness', 'speechiness', 'acousticness',
                 'instrumentalness', 'liveness', 'valence']

#'mode','key','tempo'

std_scaler = joblib.load('standard_features.pkl')

class Playlist():
    def __init__(self):
        self.name = "Wilson's Morning Wake Up Vol. Test"
        self.intro_songs = []
        self.search_results = []
        self.recommended_track_ids = pd.DataFrame() #list of track ids straight from spotify
        self.trax = [] #all tracks as dict
        self.df = None #this is where the data goes
        self.playlist = None
        
       
        # DO EVERYTHING
        self.get_recommendations() # Grab recommendations based on full WMW catalog
        self.prep_features() # Prepare features using StandardScaler
#         self.get_predictions() # Generate features for each track position for new WMW
        
        
    def get_recommendations(self):
        print('Getting Recommendations...')
        
        # Iterate full catalog of WMW songs
        for _, row in tqdm(track_data[track_data['volume'] == 38].iterrows(), total=track_data[track_data['volume'] == 38].shape[0]):
            song_search = row['track_name'].partition('-')[0] + ' ' + row['artist_name']
            try:
        
                # Query Spotify to get track metadata
                song_res = sp.search(song_search, limit=1)['tracks']['items'][0]

                self.search_results.append({
                    'id': song_res['id'],
                    'artists': [i['name'] for i in song_res['artists']],
                    'name': song_res['name']
                })
                
                # Gather recommendations for each of the past WMW tracks
                results = sp.recommendations(seed_tracks = [song_res['id']], limit=10)

                for r in results['tracks']:
                    track={}
                    track['id'] = r['id']
                    track['artists'] = [i['name'] for i in r['artists']],
                    track['name'] = r['name']
                    track_features = sp.audio_features(r['id'])[0]
                    track.update(track_features)
                    final_track = pd.DataFrame(track, index=[0])
                    self.recommended_track_ids = self.recommended_track_ids.append(final_track, ignore_index=True)
                    
            except:
                print("Song not searchable")
        
        return self.recommended_track_ids
    
    
    def prep_features(self):
        self.recommended_track_ids[feature_list] = std_scaler.transform(self.recommended_track_ids[feature_list])
            
    
    def generate_playlist_features(model, intro_tracks, predict_len=15):
        hidden = model.init_hidden()
        
        # extracts features from intro tracks
        intro_input = text_to_tensor()
        
        # predicted playlist
        predicted = intro_tracks
        
        # build up hidden state
        for p in range(len(intro_tracks) - 1):
            _, hidden = model(intro_input[p], hidden)
        inp = intro_input[-1]
        
        for p in range(predict_len):
            output, hidden = model(inp, hidden)
            



In [21]:
pl = Playlist()

Getting Recommendations...


HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))




In [22]:
pl.recommended_track_ids.head()

Unnamed: 0,id,artists,name,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms,time_signature
0,4TUXZLGOSPiDibFtUsdXXB,[Garth Stevenson],Horizon,-1.416175,-2.037301,6,-2.010031,0,-0.594845,1.867431,0.987556,-0.443074,-0.19722,100.335,audio_features,spotify:track:4TUXZLGOSPiDibFtUsdXXB,https://api.spotify.com/v1/tracks/4TUXZLGOSPiD...,https://api.spotify.com/v1/audio-analysis/4TUX...,239368,4
1,6lNUewdE3ZY4vUMxXpHtIC,[Deaf Center],White Lake,-2.076707,-2.348309,10,-1.917698,0,-0.51222,1.928197,0.717568,-0.458889,-0.785316,85.185,audio_features,spotify:track:6lNUewdE3ZY4vUMxXpHtIC,https://api.spotify.com/v1/tracks/6lNUewdE3ZY4...,https://api.spotify.com/v1/audio-analysis/6lNU...,395440,3
2,5q4HX5dFrepBt1T4Kjuw6p,[LUCHS],Red Gold Yesterday,-1.332209,-1.603012,10,-2.033062,1,-0.496588,1.818819,0.639185,-0.517631,-0.523,95.95,audio_features,spotify:track:5q4HX5dFrepBt1T4Kjuw6p,https://api.spotify.com/v1/tracks/5q4HX5dFrepB...,https://api.spotify.com/v1/audio-analysis/5q4H...,138575,3
3,7CBRVxkndOKeWUEfRclXNK,[Peter Sandberg],Borrowed Peace,-2.177466,-2.528095,8,-2.794336,1,-0.427362,2.137838,0.804661,-0.437802,-1.093326,70.862,audio_features,spotify:track:7CBRVxkndOKeWUEfRclXNK,https://api.spotify.com/v1/tracks/7CBRVxkndOKe...,https://api.spotify.com/v1/audio-analysis/7CBR...,105974,4
4,3hrr00VycUcp9S0R2ojBFq,[Sophie Hutchings],Grace,-1.455359,-2.600944,1,-2.489701,0,-0.431828,2.1348,0.970138,-0.113966,-0.328378,73.218,audio_features,spotify:track:3hrr00VycUcp9S0R2ojBFq,https://api.spotify.com/v1/tracks/3hrr00VycUcp...,https://api.spotify.com/v1/audio-analysis/3hrr...,131361,3


In [23]:
import random

recommended = pd.DataFrame()

# Iterate full catalog of WMW songs
for _, row in tqdm(track_data[track_data['volume'] == random.randint(1, 38)].iterrows(), total=track_data[track_data['volume'] == 38].shape[0]):
    song_search = row['track_name'].partition('-')[0] + ' ' + row['artist_name']
    try:

        # Query Spotify to get track metadata
        song_res = sp.search(song_search, limit=1)['tracks']['items'][0]

        # Gather recommendations for each of the past WMW tracks
        results = sp.recommendations(seed_tracks = [song_res['id']], limit=10)

        for r in results['tracks']:
            track={}
            track['id'] = r['id']
            track['artists'] = [i['name'] for i in r['artists']],
            track['name'] = r['name']
            track_features = sp.audio_features(r['id'])[0]
            track.update(track_features)
            final_track = pd.DataFrame(track, index=[0])
            recommended = recommended.append(final_track, ignore_index=True)

    except:
        print("Song not searchable")
          

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))




In [24]:
recommended[feature_list] = std_scaler.transform(recommended[feature_list])

#TODO: Make sure new playlist has unique songs compared to the all previous WMWs

In [25]:
recommended[feature_list].head()

Unnamed: 0,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence
0,-1.488946,-0.827828,-0.731819,-0.195119,0.448554,0.45629,0.149621,-1.171598
1,0.02244,-0.449576,0.727708,-0.655139,1.302311,0.563704,-0.324836,-0.56954
2,0.279936,-0.505613,-1.057601,-0.344737,-0.405202,0.824983,3.568722,-1.069633
3,0.431074,-0.477595,-0.024144,-0.355902,-0.216829,-0.060461,0.465925,0.59396
4,0.190372,-0.328162,-0.382169,-0.407264,0.9681,0.78434,-0.583151,-0.903781


In [223]:
def harmonic_match(key, mode):
    
    # Harmonic Mixing Wheel: Pitch Class 
    # 1A 0 - A flat minor: 8
    # 1B 0 - B major: 11
    # 2A 1 - E flat minor: 3
    # 2B 1 - F-sharp major: 6
    # 3A 2 - B-flat minor: 10
    # 3B 2 - D-flat major: 1
    # 4A 3 - F minor: 5
    # 4B 3 - A-flat major: 8
    # 5A 4 - C minor: 0 #
    # 5B 4 - E-flat major: 3
    # 6A 5 - G minor: 7
    # 6B 5 - B-flat major: 10
    # 7A 6 - D minor: 2
    # 7B 6 - F major: 5
    # 8A 7 - A minor: 9
    # 8B 7 - C major: 0
    # 9A 8 - E minor: 4
    # 9B 8 - G major: 7
    # 10A 9 - B minor: 11
    # 10B 9 - D major: 2
    # 11A 10 - F sharp minor: 6
    # 11B 10 - A major: 9
    # 12A 11 - D flat minor: 1
    # 12B 11 - E major: 4
    
    # Harmonic keys mapped to corresponding pitch classes
    pitch_to_harmonic_keys = {0: [4, 7], 1: [11, 2], 2: [6, 9],
                              3: [1, 4], 4: [8, 11], 5: [3, 6],
                              6: [10, 1], 7: [5, 8], 8: [5, 3],
                              9: [7, 10], 10: [2, 5], 11: [9, 0]}
    
    # Extract values and keys
    dv = np.array(list(pitch_to_harmonic_keys.values()))
    dk = np.array(list(pitch_to_harmonic_keys.keys()))

    # Harmonic key code corresponding song pitch class
    harm_key = dv[np.where(dk == key)][0][mode]
    
    print("Key", key,"Harmonic keycode: ", harm_key)
    
    # Harmonic key codes
    harmonic_keys = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
    
    # Get compatible key codes
    comp_keycodes = np.take(harmonic_keys, 
                            [harm_key - 1, harm_key, harm_key + 1],
                            mode='wrap')

    print("Compatible keycodes:", comp_keycodes)
    
    # Compatible keys
    comp_keys = [np.where(dv[:, mode] == i)[0][0].tolist() for i in comp_keycodes]
    
    # Compatible up/down key
    inner_outer_key = np.array([np.where(dv[:, int(not bool(mode))] == harm_key)[0][0]])
    
    comp_keys = np.concatenate([comp_keys, inner_outer_key])
    
    print("Compatible keys:", comp_keys)
    
    return comp_keys, inner_outer_key

In [216]:
from unittest.mock import MagicMock, patch

def _print_success_message():
    print('Tests Passed!')

def test_harmonic_mixing(song):
    
    truth_octaves = [11, 0, 1]
    
    next_octaves = harmonic_match(0, 1)
    
    # check shape and equality of first track
    assert len(truth_octaves) == len(next_octaves), \
        'Number of octaves incorrect, should get: ' + str(len(truth_octaves))    
    
    # check shape of input and output arrays
    assert input_playlists[0][0].shape[1]==track_features_len, \
        'input_features should have as many columns as selected features, got: {}'.format(train_x.shape[1])
    assert input_playlists[0][1].shape[1]==target_features_len, \
        'target_features should have as many columns as selected features, got: {}'.format(train_x.shape[1])
    
    #TODO: Add more tests
    
    _print_success_message()

In [27]:
# Look at a track
recommended.iloc[142]

id                                             4uJXr3G8CR8dkjmUKj7iKS
artists                                                [Eelke Kleijn]
name                                               Moments Of Clarity
danceability                                               -0.0223414
energy                                                       0.666502
key                                                                10
loudness                                                     0.791776
mode                                                                0
speechiness                                                 -0.603777
acousticness                                                -0.862464
instrumentalness                                             0.633379
liveness                                                      2.26585
valence                                                     -0.201451
tempo                                                         121.999
type                

In [232]:

feature_list =  ['danceability','energy', 'loudness', 'speechiness', 'acousticness',
                 'instrumentalness', 'liveness', 'valence']

std_scaler = joblib.load('standard_features.pkl')

def predict_playlist(model, initial_songs=[], predict_len=15):
    global recommended
    
    intro_tracks = pd.DataFrame() #list of track ids straight from spotify
    
    model.eval()
    
    # Iterate full catalog of WMW songs
    for song in tqdm(initial_songs, total=len(initial_songs)):
                
        song_search = song
                
        try:
            
            # Query Spotify to get track metadata
            song_res = sp.search(song_search, limit=1)['tracks']['items'][0]
            
            track = {
                'id': song_res['id'],
                'artists': [i['name'] for i in song_res['artists']],
                'name': song_res['name']
            }
            
            track_features = sp.audio_features(track['id'])[0]
            
            track.update(track_features)
            
            final_track = pd.DataFrame(track, index=[0])
            
            intro_tracks = intro_tracks.append(final_track, ignore_index=True)
                
        except:
            print("Song not searchable")

    intro_tracks[feature_list] = std_scaler.transform(intro_tracks[feature_list])
            
    predicted = intro_tracks

    inp = torch.FloatTensor(intro_tracks[feature_list].values)
    
    print("Intro", inp)

    sample = recommended.copy()

    for p in range(predict_len):
        output = model(inp).detach().numpy()
        keys, outer_inner_key = harmonic_match(predicted.iloc[-1]['key'], predicted.iloc[-1]['mode'])
        harmonic_next_songs = sample[(sample['key'].isin(keys) & sample['mode'] == predicted.iloc[-1]['mode'])]
        next_song_id = np.argmin(cdist(output, sample[feature_list]))
        next_song = sample.iloc[next_song_id].copy()
        inp = torch.FloatTensor([next_song[feature_list]])
        print("New song", next_song['artists'], next_song['name'])
        predicted = predicted.append(next_song, ignore_index=True)
        sample = sample.drop([next_song_id], axis=0).reset_index(drop=True)
        
    return predicted

In [233]:
import numpy as np

initial_songs = ['luke howard portrait gallery']

predict_playlist(model, initial_songs=initial_songs, predict_len=14)

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


Intro tensor([[-1.7968, -2.6588, -4.1506, -0.4095,  2.1287,  0.7989, -0.4152, -0.9601]])
Key 1 Harmonic keycode:  2
Compatible keycodes: [1 2 3]
Compatible keys: [ 6  1  8 10]
New song ['Monolink', 'Tale Of Us'] Swallow - Tale Of Us Remix
Key 6 Harmonic keycode:  1
Compatible keycodes: [0 1 2]
Compatible keys: [11  6  1  3]
New song ['Christian Löffler'] Ry - Edit
Key 7 Harmonic keycode:  8
Compatible keycodes: [7 8 9]
Compatible keys: [0 7 2 4]
New song ['Rival Consoles'] Helios
Key 9 Harmonic keycode:  10
Compatible keycodes: [ 9 10 11]
Compatible keys: [2 9 4 6]
New song ['Joris Voorn'] District Seven
Key 1 Harmonic keycode:  2
Compatible keycodes: [1 2 3]
Compatible keys: [ 6  1  8 10]
New song ['Jesper Ryom'] Ada
Key 8 Harmonic keycode:  3
Compatible keycodes: [2 3 4]
Compatible keys: [1 8 3 5]
New song ['Apparat', 'Solomun'] OUTLIER - Solomun Remix
Key 1 Harmonic keycode:  2
Compatible keycodes: [1 2 3]
Compatible keys: [ 6  1  8 10]
New song ['Moderat'] Invaluable Waste from th

Unnamed: 0,id,artists,name,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms,time_signature
0,3oRuCzqdduzZ2CBhAtv8zO,Luke Howard,Portrait Gallery,-1.79682,-2.658849,1,-4.150644,1,-0.409497,2.128723,0.798855,-0.415209,-0.960053,127.536,audio_features,spotify:track:3oRuCzqdduzZ2CBhAtv8zO,https://api.spotify.com/v1/tracks/3oRuCzqdduzZ...,https://api.spotify.com/v1/audio-analysis/3oRu...,367667,4
1,476iNCuZzxV3AXQeRlF1qv,"[Monolink, Tale Of Us]",Swallow - Tale Of Us Remix,0.643788,0.143486,6,0.197999,1,0.066155,0.044463,0.360488,-0.483741,-0.594925,124.006,audio_features,spotify:track:476iNCuZzxV3AXQeRlF1qv,https://api.spotify.com/v1/tracks/476iNCuZzxV3...,https://api.spotify.com/v1/audio-analysis/476i...,445151,3
2,3Tc6x9waprUmAwaOjEIWEa,[Christian Löffler],Ry - Edit,0.526236,0.157496,7,-0.142858,1,-0.199585,-0.107451,0.877239,-0.385085,-0.209913,123.012,audio_features,spotify:track:3Tc6x9waprUmAwaOjEIWEa,https://api.spotify.com/v1/tracks/3Tc6x9waprUm...,https://api.spotify.com/v1/audio-analysis/3Tc6...,220011,3
3,7GDGodzWVL6nCOBAhZ39rf,[Rival Consoles],Helios,0.531833,0.250891,9,0.483581,1,-0.063365,-0.858757,0.610154,-0.600473,-0.506076,110.011,audio_features,spotify:track:7GDGodzWVL6nCOBAhZ39rf,https://api.spotify.com/v1/tracks/7GDGodzWVL6n...,https://api.spotify.com/v1/audio-analysis/7GDG...,301875,4
4,6GfE7mDQb4Bak9uQx92Qrk,[Joris Voorn],District Seven,0.93487,0.227542,1,0.470181,1,0.014793,-0.866717,0.798855,-0.533446,-0.074524,126.022,audio_features,spotify:track:6GfE7mDQb4Bak9uQx92Qrk,https://api.spotify.com/v1/tracks/6GfE7mDQb4Ba...,https://api.spotify.com/v1/audio-analysis/6GfE...,395702,4
5,6BaR3Eu4Xnytfru01TxC3p,[Jesper Ryom],Ada,0.492649,0.619804,8,0.368218,1,0.296165,-0.825396,0.764018,-0.083842,-0.374918,124.019,audio_features,spotify:track:6BaR3Eu4Xnytfru01TxC3p,https://api.spotify.com/v1/tracks/6BaR3Eu4Xnyt...,https://api.spotify.com/v1/audio-analysis/6BaR...,367590,4
6,3QlGFEiHMujrKmcnFpMLK8,"[Apparat, Solomun]",OUTLIER - Solomun Remix,0.705363,0.092119,1,-0.264503,1,0.04829,-0.716322,0.618863,-0.241994,-0.692236,122.002,audio_features,spotify:track:3QlGFEiHMujrKmcnFpMLK8,https://api.spotify.com/v1/tracks/3QlGFEiHMujr...,https://api.spotify.com/v1/audio-analysis/3QlG...,473509,4
7,0SpAYfKBDclNahOhi33FBN,[Moderat],"Invaluable Waste from the Outlying Districts, ...",-0.274239,0.70386,11,-0.237703,0,-0.235315,-0.511542,0.607251,-0.538718,-0.544154,122.997,audio_features,spotify:track:0SpAYfKBDclNahOhi33FBN,https://api.spotify.com/v1/tracks/0SpAYfKBDclN...,https://api.spotify.com/v1/audio-analysis/0SpA...,267368,4
8,0p81XQP5fAeyxzzVMQKYp5,"[Toro y Moi, Channel Tres]",Who I Am (Channel Tres Remix),0.682972,-0.440236,7,0.131628,1,-0.378234,-0.749743,0.302426,-0.362491,-0.923244,120.004,audio_features,spotify:track:0p81XQP5fAeyxzzVMQKYp5,https://api.spotify.com/v1/tracks/0p81XQP5fAey...,https://api.spotify.com/v1/audio-analysis/0p81...,247836,4
9,1eMY7YCOBHg2RfabIAmX5V,[Yotto],Shifter,0.26874,1.423006,6,0.494887,0,-0.141524,-0.831169,0.630476,-0.492026,-0.129525,122.993,audio_features,spotify:track:1eMY7YCOBHg2RfabIAmX5V,https://api.spotify.com/v1/tracks/1eMY7YCOBHg2...,https://api.spotify.com/v1/audio-analysis/1eMY...,226600,4


In [224]:
harmonic_match(0, 1)

Key 0 Harmonic keycode:  7
Compatible keycodes: [6 7 8]
Compatible keys: [5 0 7 9]


(array([5, 0, 7, 9], dtype=int64), array([9], dtype=int64))

array([12,  0,  1])

In [53]:
# row, column
x, y = 1, 1
print("Row:", x + 1, "Column:", y + 1)
print()
print("Neighbours:\n", harmonic_wheel[x-1:x+2, y-1:y+2])
print()
print("Uniques: \n", set(harmonic_wheel[x-1:x+2, y-1:y+2].ravel()))
print()

Row: 2 Column: 2

Neighbours:
 [[ 1  0  2]
 [ 0  1  2]
 [12 12  0]]

Uniques: 
 {0, 1, 2, 12}



In [144]:
a,b 

((0, 5), (1, 4))

In [11]:
# # Training function
# def train_rnn(model, train_loader, epochs, criterion, optimizer, device):
#     """
#     This is the training method that is called by the PyTorch training script. The parameters
#     passed are as follows:
#     model        - The PyTorch model that we wish to train.
#     train_loader - The PyTorch DataLoader that should be used during training.
#     epochs       - The total number of epochs to train for.
#     criterion    - The loss function used for training. 
#     optimizer    - The optimizer to use during training.
#     device       - Where the model and data should be loaded (gpu or cpu).
#     """
    
#     # training loop is provided
#     for epoch in range(1, epochs + 1):
#         model.train() # Make sure that the model is in training mode.

#         total_loss = 0
        
#         hidden = model.initHidden()

#         for batch in train_loader:
            
#             # get data
#             batch_x, batch_y = batch
            
#             # 
#             batch_x = torch.from_numpy(batch_x).float().squeeze()
#             batch_y = torch.from_numpy(batch_y).float()

#             batch_x = batch_x.to(device)
#             batch_y = batch_y.to(device)

#             optimizer.zero_grad()

#             y_pred = []
            
#             # get predictions
#             for x in batch_x:
#                 y, hidden = model(x, hidden)
#                 y_pred.append(y)
            
#             # perform backprop
#             loss = criterion(y_pred, batch_y)
#             loss.backward()
#             optimizer.step()
            
#             total_loss += loss.data.item()
            
#         if epoch%25 == 1:
#             print("Epoch: {}, Loss: {}".format(epoch, total_loss / len(train_loader)))

#TODO: Create working RNN Benchmark model

In [18]:
# import torch.optim as optim
# from model.RnnEstimator import RNNEstimator

# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model = RNNEstimator(11, 30, 8)
# optimizer = optim.Adam(model.parameters(), lr=0.001)
# loss_fn = torch.nn.L1Loss()

# train_rnn(model, processed_data, 100, loss_fn, optimizer, device)

### Build and Train the PyTorch Model with Hyperparameter Tuning

In [15]:
# Estimator code
from sagemaker.pytorch import PyTorch
output_path = 's3://{}/{}'.format(bucket, prefix)

estimator = PyTorch(entry_point="LSTM_Train.py",
                    source_dir="model",
                    role=role,
                    framework_version='0.4.0',
                    train_instance_count=1,
                    output_path = output_path,
                    train_instance_type='ml.m4.xlarge',
                    hyperparameters={
                        'input_features': 11,
                        'hidden_dim': 12,
                        'output_dim': 8,
                        'epochs': 100
                    })

In [16]:
# Fit estimator
estimator.fit({'train': input_data})

2020-03-05 03:44:03 Starting - Starting the training job...
2020-03-05 03:44:04 Starting - Launching requested ML instances.........
2020-03-05 03:45:34 Starting - Preparing the instances for training.........
2020-03-05 03:47:17 Downloading - Downloading input data
2020-03-05 03:47:17 Training - Downloading the training image..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-03-05 03:47:37,157 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-03-05 03:47:37,160 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-03-05 03:47:37,172 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-03-05 03:47:37,176 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-03-05 03:47:37,390 sagemaker-containers INFO     Module LSTM_Train doe

In [17]:
%%time

# deploy your model to create a predictor
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

-----------!CPU times: user 254 ms, sys: 15.1 ms, total: 270 ms
Wall time: 5min 32s
