# Spotify Recommendation Model
 This notebook file includes code to:
1. Import required toolboxes
2. Connect to a spotify dev account
3. Organize and pre-process data from provided playlists
4. Build and train and neural network regression model
5. Generate song reccommendations and filter them through model
6. Upload model-filtered song recommendations to a provided spotify playlist

To run this notebook you will need a spotify dev account.  To create one, go to this link and setup an account: https://developer.spotify.com/dashboard/

> Once you have a dev account, go to the dashboard tab and create an app.  Next navigate to your app and locate your Client ID and Client ID Secret.  Copy these two client codes into the notebook code cell below labeld "Authorization".  Next go to your spotify dev project and click "Edit Settings" and under the "Redirect URL" section, place your desired redirect url (ie: https://www.google.com/) for authorization and select save.  Place your redirect url into notebook code cell below "Authorization".

If you are running this code on google colab, use Chrome as the web browser.  You should also change the runtime type to improve model building performance:
1. Select Runtime at the top and next select change runtime type
2. Under the "hardware accelarator" tab, select GPU in the drop-down



# Imports

In [None]:
# General Toolboxes
import numpy as np
from numpy import argmax
import pandas as pd
from collections import Counter
from datetime import datetime
import os
import itertools
import math


# Plotting
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
!pip install ipyplot
!pip install jupyter-dash
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html


# Neural Network
!pip install tensorflow
import tensorflow as tf
from tensorflow.keras.activations import relu, sigmoid, softmax, tanh, selu, elu
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard
!pip install tensorflow_addons
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.regularizers import l2, l1
from tensorboard.plugins.hparams import api as hp
!pip install tensorflow_addons
import tensorflow_addons as tfa

# Spotify
!pip install spotipy
import spotipy
import spotipy.util as util

# Sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.utils.class_weight import compute_class_weight
from sklearn.utils import class_weight
!pip install scikit-optimize
import skopt
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
!pip install scikeras[tensorflow]
from sklearn.metrics import make_scorer
from scikeras.wrappers import KerasClassifier, KerasRegressor
from sklearn.metrics import fbeta_score

# Custom metric imports
from tensorflow.keras import backend as K
from typeguard import typechecked
from tensorflow_addons.utils.types import AcceptableDTypes, FloatTensorLike
from typing import Optional

import warnings 
warnings.filterwarnings("ignore")



#Restart the runtime!
Click on "Runtime" in the menu bar, and select "Restart runtime" from the dropdown menu.<br>
Next re-run the first import cell above.

* The spotipy toolbox requires the runtime to be restarted.

# Authorization
To authenticate your connection to Spotify, click on the link after running this code, and paste the redirected URL into the box.

In [None]:
# Authorization Token
SPOTIPY_CLIENT_ID = '6e4d9e1d2a1a4656af7cb208cd2a2476'
SPOTIPY_CLIENT_SECRET = '37a386970f4f48d4a69b5e56a852a488'
SPOTIPY_REDIRECT_URI = 'https://www.google.com/'
SCOPE = "playlist-modify-public"
sp = spotipy.Spotify(
    auth_manager=spotipy.SpotifyOAuth(
        client_id=SPOTIPY_CLIENT_ID,
        client_secret=SPOTIPY_CLIENT_SECRET,
        redirect_uri=SPOTIPY_REDIRECT_URI,
        scope=SCOPE, open_browser=False,))
form_conn = sp.artist('spotify:artist:3jOstUTkEu2JkjvRdBA5Gu')

Go to the following URL: https://accounts.spotify.com/authorize?client_id=6e4d9e1d2a1a4656af7cb208cd2a2476&response_type=code&redirect_uri=https%3A%2F%2Fwww.google.com%2F&scope=playlist-modify-public
Enter the URL you were redirected to: https://www.google.com/?code=AQCV_d1pN6pxDoK6aBg5lakmZQ9dnhJ5K0yj4e_du4gr67xHj9Aey7ewQjCQvuQ7m7j3A7xzGLUpj_EvLXgxxNQM0TMpIQm3RLAx9d-aslbvx91q9EM7rd4LVHZxC4idI_ZO-8d9ZMcVTl5jNGmM6kJ1lfGLm3hXkhGvpp78jwCgia2--eIA4IJrcYPGEDbFTAfj


## Playlist links:
Your playlists will need to be made public to be accessed.<br>
To get the playlist URI:
1. Open spotify and navigate to your playlist
2. Select the three dots under the playlist and navigate to the share button.
3. Next to the share button mouse over to "Copy Link to Playlist" and press control or command on your keyboard.  "Copy Link to Playlist" will change to "Copy Spotify URI" and click on "Copy Spotify URI".  This link to your playlist will be persitent while the default "Copy Link to Playlist" will change periodically.

To make your playlist public:
1. Click on the three dots next to the playlist and select "Add to profile".


Copy your playlist links into the playlists dictionary below.  Put the name of the playlist before : and the playlist link after.  You need to input at least 2 playlists, one for bad songs and one for good songs.  You can add as many additional playlists as you like, but make sure the last playlist in the square brackets is the playlist with your good songs since the model will use this playlist to generate song recommendations.  <br>
> playlists = {'playlist1 name': 'playlist1 link',   'playlist2 name': 'playlist2 link'}<br>
  PL_OUT = playlist to ouput the model's recommended songs<br>

Make sure last item in square brackets doesn't have a comma after it.

In [None]:
# List of input playlists
playlists = {
             'Bad Songs':       'spotify:playlist:0QJcJIRK1WuQ5morF8DgPg', # Bad playlist here
             'Liked Songs':     'spotify:playlist:0eecKnQXwZ1K8rqu6kuBPH', # ok playlist here
             'Favorite Songs':  'spotify:playlist:0tAdEQcNeMHng8oEjkRhN9' # good playlist here       
}
# Output playlist
PL_OUT = 'spotify:playlist:6wkZ5BiIx6uR3RbHbH7gvc'
playlist_scores = range(len(playlists))

# Functions
Run this cell to generate all of the functions used to load, process, model, and to predict songs.
* None of these functions will be called to run in this cell.

In [None]:
def chunks(lst, n):
      """Yield successive n-sized chunks from lst
      Inputs:
        lst (list): list of items to be split
        n (int): number of splits to make
      Output:
        lst (list): returns list of lists that are broken up into chunks of size n
        """
      for i in range(0, len(lst), n):
          yield lst[i:i + n]
    

def normalize_data(data, use_max=None, sav_max=False):
    ''' Normalizes a list of data to be values from 0-1.  Uses a max value if 
        provided
    Input:
      data (list): list of data points
      use_max (float): if specified, uses provided max value
      sav_max (bool): if True, save outputs data max.
    Output:
      norm_data (list): normalized from 0 - 1
      norm_data (float): calculated maximum data point
      '''
    if use_max:
      data_max = use_max
    else:
      data_max = np.max(data)
    norm_data = (data - np.min(data)) / (data_max - np.min(data))
    if sav_max:
      return norm_data, data_max
    else:
      return norm_data


def extract_audio_features(df):
  '''Takes in a dataframe generated from extract_artist_info() and adds additional
     columns for audio features from spotify's api for each song in the dataframe
  Input:
    df (DataFrame): Dataframe generated from extract_artist_info
  Output:
    df (DataFrame): merges audio feature columns into input dataframe'''

  song_ids = list(df.song_id.values)
  # Chunk up song ids to lengths of 50
  song_chunks = list(chunks(song_ids, 50))
  song_list=[]
  for sample in song_chunks:
    result = sp.audio_features(sample)
    song_list.extend(result)
  # loop through songs and store audio features in a list of lists
  array_list = []
  for song in song_list:
    row_list = []
    try:
      acoust = song['acousticness']
      dance = song['danceability']
      energy = song['energy']
      instrument = song['instrumentalness']
      key = song['key']
      live = song['liveness']
      loud = song['loudness']
      mode = song['mode']
      speech = song['speechiness']
      tempo = song['tempo']
      time_sig = song['time_signature']
      valence = song['valence']
      row_list.extend([acoust, dance, energy, instrument, key, live, loud, mode,
                       speech, tempo, time_sig, valence])
      array_list.append(row_list)
    except:
      # If audio features can't be extracted from song, add NaN's
      # print(f"Unable to pull audio features from a song with id: {song}")
      row_list.extend([np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
                       np.nan, np.nan, np.nan, np.nan, np.nan])
      array_list.append(row_list)
      pass
  # Create audio feature dataframe and merge with input dataframe
  col_nams = ['acoust', 'dance', 'energy', 'instrument', 'key', 'live', 'loud',
              'mode', 'speech', 'tempo', 'time_sig', 'valence']
  df_songs = pd.DataFrame(data=array_list, columns=col_nams)
  df = pd.merge(df, df_songs, left_index=True, right_index=True)
  return df


def extract_artist_info(df):
  '''Takes a dataframe generated from extract_song_contents() and looks up all
     of the artist ids in the dataframe.  A dictionary is created to keep track of
     artist id row location
  Input:
    df (DataFrame): Dataframe generated from extract_song_contents
  Output:
    df (DataFrame): merges artist info and genre columns into input dataframe'''

  # First flatten list of artist id's in df
  nested_artists = list(df.artist_id.values)
  artists = list(itertools.chain(*nested_artists))
  # Partition artists list for artist lookup
  artists_chunks = list(chunks(artists, 50))
  artist_list = []
  for sample in artists_chunks:
    result = sp.artists(sample)
    artist_list.extend(result['artists'])

  # create a dataframe of all artists with their id, name, popularity, and genres
  df_artists=[]
  for artist in artist_list:
    info = []
    artist_id = artist['id']
    artist_name = artist['name']
    artist_genres = artist['genres']
    artist_pop = artist['popularity']
    info.extend([artist_id, artist_name, artist_genres, artist_pop])
    df_artists.append(info)
  df_artists = pd.DataFrame(data=df_artists,
                            columns=['artist_id', 'artist_name', 'artist_genres',
                                     'artist_pop']).drop_duplicates(subset='artist_id')
  df_artists.set_index('artist_id', inplace=True)
  
  # Loop through df and find artist info in df_artists
  artist_genre, artist_pop = [], []
  for row in df.artist_id:
    multiple_art_genres =[]
    sample = df_artists[df_artists.index==row[0]]
    artist_pop.extend([sample.artist_pop[0]])
    if len(row)<=1:
      multiple_art_genres.extend(sample.artist_genres[0])
    # Loop through other artists on the track
    else:
      for id in row:
          sample = df_artists[df_artists.index==id]
          multiple_art_genres.extend(sample.artist_genres[0])
    artist_genre.append(multiple_art_genres)
  # add artist genres and popularity to df_artists
  df['artist_pop'] = artist_pop
  df['genres'] = artist_genre

  return df


def extract_song_contents(raw_song_data, target=None, recommend_data=False):
  ''' Extracts basic song info generated from create_song_df().
      Runs extract_artist_info() and extract_audio_features() functions and merges 
      into a dataframe.
  Input: 
    raw_song_data (list): list of song dictionaries generated from create_song_df()
    target (int): specified int number to add to rating column.  Adds None if not specified
    recommend_data (bool): specifies if raw_song_data was generated through spotify recommendation api, if so, adjusts how the data is indexed
  Output:
    data (DataFrame): dataframe with song info, audio features, and artist info for all songs'''

  data_lst = []
  # Iterate through each item in the list and add song contents to an array
  if target != None:
    col_nams = ['song_id', 'song_nam', 'dur', 'pop', 'album_nam', 'album_type',
                'release', 'artist_id', 'artist_nam', 'rating']
  else:
    col_nams = ['song_id', 'song_nam', 'dur', 'pop', 'album_nam', 'album_type',
                'release', 'artist_id', 'artist_nam']
  for i, song in enumerate(raw_song_data):
    try:
      song_info=[]
      # Adjust indexing based on input type.  Recommend data has slighlty different structure
      if recommend_data:
        track = song
      else:
        track = song['track']

      # Get song general info
      song_id = track['id']
      song_nam = track['name']
      dur = track["duration_ms"]
      pop = track['popularity']
      album_nam = track['album']['name']
      album_type = track['album']['album_type']
      release = track['album']['release_date']
      artist_id = [artist['id'] for artist in track['artists']]
      artist_nam = [artist['name'] for artist in track['artists']]
      if target != None:
        rating = target[i]
        song_info.extend([song_id, song_nam, dur, pop, album_nam, album_type,
                          release, artist_id, artist_nam, rating])
      else:
        song_info.extend([song_id, song_nam, dur, pop, album_nam, album_type,
                          release, artist_id, artist_nam])
      data_lst.append(song_info)
    except:
      print('song id is broken')
      pass
  # Convert array to dataframe
  data = pd.DataFrame(data=data_lst, columns=col_nams)
  # Get artist info
  data = extract_artist_info(data)
  # Get Song Audio features
  data = extract_audio_features(data)
  # Drop rows with null values
  data.dropna(inplace=True)

  return data


def create_song_df(playlists, playlist_scores):
  '''Creates a dataframe by pulling from each provided playlist uri.
     Target values will be assigned from each playlist based on their type:
     Good - 2, ok - 1, bad - 0
  Inputs:
    good_pl_uri (string): spotify URI link to good song playlist
    ok_pl_uri (string): spotify URI link to ok song playlist
    bad_pl_uri (string): spotify URI link to bad song playlist
  Output:
    song_info (DataFrame): dataframe contain all song data from each playlist
    which includes a ratings column that specifies which playlist the song came
    from.  Utilizes the extract_song_contents() function to get song data
     '''
  # Initialize empty list to store looked up songs from each playlist
  song_list, raw_song_data, target_lst = [], [], []
  # for playlist, target_var in zip([good_pl_uri, ok_pl_uri, bad_pl_uri], [2, 1, 0]):
  for playlist, target_var in zip(list(playlists.values()), playlist_scores):
    count=0
    # Pull raw song info from paylist
    while True:
      results = sp.playlist_tracks(playlist, limit=50, offset=count)
      data_len = len(results['items'])
      # Stop adding new song once we reach the end of liked song list
      if data_len==0:
        break
      # add raw song list into our total list
      raw_song_data.extend(results['items'])
      target_lst.extend([target_var] * data_len)
      count += 50
  
  # extract key song details from raw_song_data
  song_info = extract_song_contents(raw_song_data, target_lst)
  return song_info


def rand_sel(source_list, length=None):
  '''Creates a random list of values with a random length from a sample list.
     List length can be specified or randomly chosen.  For spotify's recommend
     function, the max seed length is 5.  Putting length above 5 will break the
     code.
  Inputs:
    source_list (list): list of items to be randomly queried from
    length (int): number of items to randomly select from source_list
  Outputs:
    output (list): randomly selected n number of item/s from source_list based
    on specified length'''
  if length==None:
    length = np.random.randint(len(source_list))
  output = list(np.random.choice(source_list, (1,length))[0])
  return output


def encode_genres(df, col_nam, percen_split=90, cat_list=None, out_ohe_keys=False, split_strings=False):
  '''This function will one-hot-encode a specified column in a dataframe.
     Unique values for the specified column will be identified and the top
     n (percent_split) values will be used to one-hot-encode the column's values.
  
    Function can either write or load encoded lists for one-hot-encoding (ohe).
    Values in each column can also be split into individual words for encoding.
    Function will add one-hot-encoded columns to the input dataframe and remove
    the col_nam column from the dataframe.
  Inputs:
    df (DataFrame): Dataframe generated from create_song_df()
    col_nam (string): name of column in dataframe to be encoded
    percen_split (int): percentile value of most common values used for ohe
    cat_list (list): list of pre-specified one-hot-encoder columns to be used.
      If specified, function will use list for encoding
    out_ohe_keys (bool): if True, then output encoded values, else don't output
    split_strings (bool): if True, then split each value in the specified column
      and encode these split values
  Outputs:
    df_merged (DataFrame): Dataframe with ohe columns added and col_nam column
      removed
    ohe_keys (list): generated ohe value names if out_ohe_keys set to True
    '''
  # Check if funciton provided a pre-processed list of features to onehotencode.
  # If so, then assign the list to mlb's classes
  if cat_list:
    mlb = MultiLabelBinarizer(classes=cat_list)
  else:
    # Loop through each song rating category and find ohe
    result_dictionary = {}
    # for rating in [2, 1, 0]:
    for rating in playlist_scores:
      ohe_list=[]
      df_flt = df[df['rating']==rating]
      # generate onehotencode column list from data
      for item in df_flt[col_nam].values:
        if split_strings:
          # split each string by whitespace
          temp_list=[]
          for string in item:
            temp_list.extend(string.split(' '))
          # remove duplicates from temp_list
          temp_list = list(set(temp_list))
          ohe_list.extend(temp_list)
        else:
          ohe_list.extend(item)

      # Create a dictionary of thresholded genre strings
      output = Counter(ohe_list)
      values = list(output.values())
      top_percen = np.percentile(values, percen_split)
      for key, value in output.items():
        if value >= top_percen:
          result_dictionary[key] = value

    ohe_keys = list(result_dictionary.keys())
    mlb = MultiLabelBinarizer(classes=ohe_keys)

  # edit the columns lists's so that they are a list of strings instead of just one string
  edited_col_data=[]
  for item in df[col_nam]:
    if len(item) <2:
      item.extend(['X', 'X'])
    if split_strings:
      # Split each string in list by whitespace
      temp_list=[]
      for string in item:
        temp_list.extend(string.split(' '))
      # Add that row's string list
      edited_col_data.append(temp_list)
    else:
      edited_col_data.append(item)


  df_new = pd.DataFrame(mlb.fit_transform(edited_col_data), columns=mlb.classes_, index=df.index)
  df_merged = pd.merge(df, df_new, left_index=True, right_index=True).drop(columns=col_nam)
  # If outputing the ohe_keys
  if out_ohe_keys:
    return df_merged, ohe_keys
  else:
    return df_merged
  

def get_date_float(input_str):
  '''Convert release date to float with year_number.month percent of year.
     If string format not easily interpretable, then return some value for year and month.
  Input:
    input_str (sting): datetime string
  Output:
    year (int): year specified in input_str
    month (Int): month specified in input_str
    '''
  if len(input_str) > 5:
    try:
      time = datetime.strptime(input_str, '%Y-%m-%d')
      year = time.timetuple()[0]
      month = time.timetuple()[1]
    except:
      year=2021
      month=1
  else:
    try:
      time = datetime.strptime(input_str, '%Y')
      year = time.timetuple()[0]
      month = 1
    except:
      year = 2021
      month = 1
  return year, month


def man_ord_encode(item):
  '''Manual encode album type column with values between 0 and 1
  Input:
    item (string): album string name
  Output:
    out (float): value based on album string name
    '''
  if item == 'single':
    out = 0.33
  elif item == 'album':
    out = 0.66
  elif item == 'compilation':
    out=0.99
  else:
    out=0
  return out


def wrangle(df, load_enc_lists=False,  genre_list=None, artist_list=None,
            g_split=80, a_split=99.9, max_norm_list=None):
  '''Performs dataset preproccessing by encoding artists and genres, translating
     album type to float, dropping irrelevant columns, and manually scaling
    columns.
  Inputs:
    df (DataFrame): dataframe generated from create_song_df()
    load_enc_lists (bool): if True, use ohe value lists to encode genres and
      artist names
    genre_list (list): list of ohe values for song genres
    artist_list (list): list of ohe values for artists
    g_split (int): percentile value to be used for song genre encoding in
      encode_genres()
    a_split (int): percentile value to be used for artist name encoding in
      encode_genres()
    max_norm_list (list): provided list of maxium values for columns that need
      to be normalized
  Outputs:
    df (DataFrame): input dataframe with preprocessing steps performed
    g_keys (list): list of encoded values for genres if load_enc_lists is False
    a_keys (list): list of encoded values for artists if load_enc_lists is False
    max_list (list): list of maximum values from normalized columns if
      load_enc_lists is False
      '''
  # Format date column
  output = df.release.apply(get_date_float)
  year, month = list(map(list, zip(*output)))
  df['release_year'] = [yr + mth for yr, mth in zip(year, month)]

  # Columns to normalize
  cols_to_normalize = ['artist_pop', 'dur', 'pop_x', 'loud', 'time_sig',
                       'key', 'tempo', 'release_year']

  # Encode string columns - specify if you are usually a pre-run list of categories to one hot encode
  if load_enc_lists:
    df = encode_genres(df, col_nam='genres', cat_list=genre_list, split_strings=True)
    df = encode_genres(df, col_nam='artist_nam', cat_list=artist_list)
    # normalize columns using provided max norm values
    for col, norm in zip(cols_to_normalize, max_norm_list):
      df[col] = normalize_data(df[col].values, use_max=norm)
  else:
    # Artist genre
    df, g_keys = encode_genres(df, col_nam='genres', percen_split=g_split, split_strings=True, out_ohe_keys=True)
    # Artist name
    df, a_keys = encode_genres(df, col_nam='artist_nam', percen_split=a_split, out_ohe_keys=True)
    # Normalize columns and save calculated max value per column
    max_norms=[]
    for col in cols_to_normalize:
      df[col], max_val = normalize_data(df[col].values, sav_max=True)
      max_norms.append(max_val)
  
  # Drop unneeded string columns
  cols_to_drop = ['release', 'song_nam', 'album_nam', 'artist_id']
  df.set_index('song_id', inplace=True)
  df.drop(columns=cols_to_drop, inplace=True)
  df.album_type = df.album_type.apply(man_ord_encode)

  
  

  if load_enc_lists:
    return df
  else:
    return df, g_keys, a_keys, max_norms


def create_model():
  '''Creates a regression Sequential neural network model that implements 
     early stopping to select the best layer weights.  It also utilizes class
     weights to adjust the calculated accuracy metrics on the training data.
     Sample weights are also implemented to adjust accuracy metrics on
     validation data.
  Inputs:
    X_train, y_train, X_test, y_test, class_weights, val_sample_weights variables
    should be generated before running this function.
  Outputs:
    model (tensorflow model): compiled and fitted neural network model
  '''
  # Remove tensorboard logs
  !rm -rf ./logs/
  # Set Callbacks
  tensorboard = tf.keras.callbacks.TensorBoard(log_dir="./logs")
  es = EarlyStopping(
    monitor="val_weighted_mean_squared_error",
		patience=60,
    mode='min',
		restore_best_weights=True,
		min_delta=0.0001)

  kernel_regularizer = l2(0.01)
  # Build Model
  model = Sequential()
  model.add(Dense(400, input_dim=input_dim, activation='relu', kernel_regularizer=kernel_regularizer))
  model.add(Dense(400, activation='relu', kernel_regularizer=kernel_regularizer))
  model.add(Dense(400, activation='relu', kernel_regularizer=kernel_regularizer))
  model.add(Dense(400, activation='relu', kernel_regularizer=kernel_regularizer))
  # model.add(Dense(400, activation='relu', kernel_regularizer=kernel_regularizer))
  # model.add(Dense(400, activation='relu', kernel_regularizer=kernel_regularizer))
  model.add(Dense(1, activation='linear'))
  
  # Compile and fit model
  model.compile(loss='mean_squared_error', optimizer='adam',
                metrics=['mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_logarithmic_error'], 
                weighted_metrics=['mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_logarithmic_error'])
  model.fit(X_train, y_train, validation_data=[X_test, y_test, val_sample_weights], epochs=60,
            batch_size=50, verbose=1, callbacks=[es, tensorboard], class_weight=custom_weight)
  return model, tensorboard


def get_genre_options(df):
  '''Generates genre seed options based on genres present in good playlist.
     Spotify has select amount of available genres used as seeds to generate
     recommendations.
  Inputs:
    df (DataFrame): dataframe generated from create_song_df()
  Outputs:
    avail_genre_seeds (list): list of genre seeds
    '''
  good_genres=[]
  # for item in song_traits[song_traits['rating']==2]['genres']:
  for item in song_traits[song_traits['rating']==playlist_scores[-1]]['genres']:
  # for item in song_traits[song_traits['rating'] > 0]['genres']:
    good_genres.extend(item)
  genre_opts = sp.recommendation_genre_seeds()['genres']
  avail_genre_seeds = list(set(genre_opts) & set(good_genres))
  return avail_genre_seeds


def rec_batch_size(n_rec):
  '''Generates a list of API call legnths based on n_rec which is the number
     of desired recommendations.  Spotify API has a recommendation of 100 per
     recommendation API call.
  Input:
    n_rec (int): number of desired recommendations
  Output:
    out (list): list of recommendation limit numbers whereby the max is 100 for
    each list
  Example:
    n_rec = 230, output = [[100], [100], [23]]
    '''
  if n_rec <= 100:
    out = [n_rec]
  else:
    num = math.ceil(n_rec / 100)
    out = ([100] * (num - 1))
    last = [n_rec - (num - 1) * 100]
    out.extend(last)
  return out


def recommend_songs(seed_songs=None, seed_artists=None, seed_genres=None, n_rec=100, rec_list=None):
  '''Takes in a song, artist, or genre seed and calls spotify's recommendation
     API call with a length of n_rec recommendations.
     Returns a list of song dictionaries
  Inputs:
    seed_songs (list): list of song_ids to be used for recommendation
    seed_artists (list): list of artist_ids to be used for recommendation
    seed_genres (list): list of genres to be used for recommendation
    n_rec (int): number of recommendations
    rec_list (list): list of dictionaries of generated song recommendations,
      if not provided, then it will be created.  If provided, then list will
      have new song recommendations added to it
  Outputs:
    song_list (list): list of dictionaries of generated song_recommendations
     '''
  # If a list of recommended songs is provided then set it equal to song_list 
  if rec_list:
    song_list=rec_list
  else:
    song_list=[]
  
  # Create api call size list and loop through it
  req_sizes = rec_batch_size(n_rec)
  for batch in req_sizes:
    if seed_artists:
      results = sp.recommendations(seed_artists=seed_artists, limit=batch)
    elif seed_genres:
      results = sp.recommendations(seed_genres=seed_genres, limit=batch)
    elif seed_songs:
      results = sp.recommendations(seed_tracks=seed_songs, limit=batch)
    else:
      print('No seed criteria provided.  Please include a seed song, artist, or genre')
      break

    song_list.extend(results['tracks'])

  return song_list


def create_rec_song_list(genre_options, genre_size=300, artist_size=100, song_size=100):
  '''Generates a list of recommended songs.  Uses songs and artists in
     playlist_uri and also uses genre seeds in genre_options to generate song
     recommendations.  _size variables specify number of recommendations to make.
     Implements random selection from song and artist lists to create even more
     recommendations using rand_sel().
  Inputs:
    playlist_uri (string): spotify playlist URI to find song and artist ids
    genre_options (list): list of genre seeds options
    genre_size (int): number of genre recommendations for each seed in g_opts
    artist_size (int): number of artist recommendations for each seed in art_opts
    song_size (int): number of song recommendations for each seed in song_opts
  Outputs:
    rec_song_list (list): list of song dictionary recommendations
    '''

  # Get song and artist ids from favorites playlist
  target_index = playlist_scores[-1]
  s_ids = list(song_traits[song_traits.rating==target_index].song_id.values)
  a_ids = [row[0] for row in song_traits[song_traits.rating==target_index].artist_id]
  
  # Set song list to be empy to start
  rec_song_list=[]
  print("Recommending Songs")
  song_opts = []
  for i in range(40):
    song_opts.append(rand_sel(s_ids, 1))
    song_opts.append(rand_sel(s_ids, 2))
    song_opts.append(rand_sel(s_ids, 3))
    song_opts.append(rand_sel(s_ids, 4))
    song_opts.append(rand_sel(s_ids, 5))
  # song_opts = [rand_sel(s_ids, 1), rand_sel(s_ids, 1), rand_sel(s_ids, 1),
  #              rand_sel(s_ids, 1), rand_sel(s_ids, 1), rand_sel(s_ids, 1),
  #              rand_sel(s_ids, 1), rand_sel(s_ids, 2), rand_sel(s_ids, 2),
  #              rand_sel(s_ids, 2), rand_sel(s_ids, 3), rand_sel(s_ids, 3),
  #              rand_sel(s_ids, 3), rand_sel(s_ids, 3), rand_sel(s_ids, 4),
  #              rand_sel(s_ids, 4), rand_sel(s_ids, 4), rand_sel(s_ids, 4),
  #              rand_sel(s_ids, 5), rand_sel(s_ids, 5), rand_sel(s_ids, 5)]
  for s in song_opts:
    rec_song_list = recommend_songs(seed_songs=s,
                                    seed_artists=None,
                                    seed_genres=None,
                                    n_rec=song_size,
                                    rec_list=rec_song_list)

  print("Recommending Artists")
  # Pick random artist seeds from favorites
  art_opts = []
  for i in range(40):
    art_opts.append(rand_sel(a_ids, 1))
    art_opts.append(rand_sel(a_ids, 2))
    art_opts.append(rand_sel(a_ids, 3))
    art_opts.append(rand_sel(a_ids, 4))
    art_opts.append(rand_sel(a_ids, 5))
  # art_opts = [rand_sel(a_ids, 1), rand_sel(a_ids, 1), rand_sel(a_ids, 1),
  #             rand_sel(a_ids, 1), rand_sel(a_ids, 1), rand_sel(a_ids, 1),
  #             rand_sel(a_ids, 1), rand_sel(a_ids, 1), rand_sel(a_ids, 2),
  #             rand_sel(a_ids, 2), rand_sel(a_ids, 2), rand_sel(a_ids, 2),
  #             rand_sel(a_ids, 2), rand_sel(a_ids, 3), rand_sel(a_ids, 4),
  #             rand_sel(a_ids, 5), rand_sel(a_ids, 5), rand_sel(a_ids, 5)]
  for a in art_opts:
    rec_song_list = recommend_songs(seed_songs=None,
                                    seed_artists=a,
                                    seed_genres=None,
                                    n_rec=artist_size,
                                    rec_list=rec_song_list)

  # Pick random song seeds from favorites
  fav_song_ids=[]
  print("Recommending Genres")
  # Loop through 3 genre options
  g_opts = [rand_sel(genre_options, 1),  # rand index form list
            rand_sel(genre_options, 1), rand_sel(genre_options, 1),
            rand_sel(genre_options, 1), rand_sel(genre_options, 2),
            rand_sel(genre_options, 2), rand_sel(genre_options, 3),
            rand_sel(genre_options, 4), rand_sel(genre_options, 5)]
  for g in g_opts:
    # Create list of song recommendations
    rec_song_list = recommend_songs(seed_songs=None,
                                    seed_artists=None,
                                    seed_genres=g,
                                    n_rec=genre_size,
                                    rec_list=rec_song_list)

  return rec_song_list


def create_rec_df(song_list, df, rec_target_index=-1):
  '''Takes a list of song recommendations and filters them for repeats and 
     inclusion in df and playlist.  Returns a dataframe
  Inputs:
    song_list (list): list of reccomended song dictionaries
    df (DataFrame): dataframe genereated from wrangle()
    playlist_uri (string): playlist string link
  Output:
    df_rec (DataFrame): dataframe of recommended songs to be passed into
      wrangle()
  '''
  df_rec = extract_song_contents(song_list, target=None, recommend_data=True)

  # Filter song_list for duplicates, inclusion in df, and in out playlist
   # Remove duplicates
  df_rec.drop_duplicates(subset='song_id', inplace=True)
  # # Remove songs that are in the liked songs dataframe
  df_rec = df_rec[~df_rec['song_id'].isin(list(df.index.values))]
 
  # Remove recommendations that are already in the output playlist
  p_ids = list(df[df.rating==rec_target_index].index)
  
  df_rec = df_rec[~df_rec['song_id'].isin(p_ids)]
  return df_rec


# Transform dataframe to be sent to model
def gen_predictions(model, df_test, threshold=None, softmax=False):
  '''Takes in a dataframe generated from wrangle() and runs data through model
     to generate predicted ratings for each song.  Songs below provided
     threshold will be removed.
  Inputs:
    model (tensorflow model): neural network model
    df_test (DataFrame): dataframe of recommended songs to be ran through model
    threshold (float): rating threshold to filter songs in df_test
  Output:
    df_song (DataFrame): filtered df_test dataframe with a predicted rating
      for each song
  '''
  # Drop rating column if it exists in df_test or else model can't predict
  if 'rating' in df_test:
    df_test.drop(columns='rating', inplace=True)

  X_rec = df_test.to_numpy()
  rec_pred = model.predict(X_rec)
  if softmax:
    rec_pred = [argmax(row) for row in rec_pred]
  df_test['rating'] = rec_pred
  # Set recommendation threshold automatically if not provided
  if threshold == None:
    threshold = playlist_scores[-1] - 0.3
  # Filter dataframe for only positive predictions
  df_test = df_test[df_test.rating > threshold]
  # create list of song ids
  df_song = list(df_test.index.values)
  df_test.drop(columns='rating', inplace=True)
  return df_song


def upload_songs(df_song, playlist_uri, replace_pl_songs=True):
  ''' Uploads songs to provided playlist.  Can optionally override songs in
      playlist with replace_pl_songs=True.
  Inputs:
    df_song (DataFrame): dataframe of recommended songs
    playlist_uri (string): playlist string name to place songs into
    replace_pl_songs (bool): if True, replaces all songs in playlist with new
      songs.  If false, adds new songs to playlist
  Output:
    Modifies songs in provided playlist.  No variables are returned
  '''
  playlist_out_id = sp.playlist(playlist_uri)['id']
  upload_samples = list(chunks(df_song, 100))
  for i, sample in enumerate(upload_samples):
    if replace_pl_songs:
      if i==0:
        # wipe playlist and add first sample
        sp.playlist_replace_items(playlist_out_id, sample)
      else:
        sp.playlist_add_items(playlist_out_id, sample)
    else:
      sp.playlist_add_items(playlist_out_id, sample)



## Custom metric

In [None]:
class FBetaScore2(tf.keras.metrics.Metric):
    def __init__(
        self,
        num_classes: FloatTensorLike,
        average: Optional[str] = None,
        beta: FloatTensorLike = 1.0,
        threshold: Optional[FloatTensorLike] = None,
        name: str = "fbeta_score2",
        dtype: AcceptableDTypes = None,
        **kwargs,
    ):
        super().__init__(name=name, dtype=dtype)

        __name__= "fbeta_score2",
        if average not in (None, "micro", "macro", "weighted"):
            raise ValueError(
                "Unknown average type. Acceptable values "
                "are: [None, 'micro', 'macro', 'weighted']"
            )

        if not isinstance(beta, float):
            raise TypeError("The value of beta should be a python float")

        if beta <= 0.0:
            raise ValueError("beta value should be greater than zero")

        if threshold is not None:
            if not isinstance(threshold, float):
                raise TypeError("The value of threshold should be a python float")
            if threshold > 1.0 or threshold <= 0.0:
                raise ValueError("threshold should be between 0 and 1")

        self.num_classes = num_classes
        self.average = average
        self.beta = beta
        self.threshold = threshold
        self.axis = None
        self.init_shape = []

        if self.average != "micro":
            self.axis = 0
            self.init_shape = [self.num_classes]

        def _zero_wt_init(name):
            return self.add_weight(
                name, shape=self.init_shape, initializer="zeros", dtype=self.dtype
            )

        self.true_positives = _zero_wt_init("true_positives")
        self.false_positives = _zero_wt_init("false_positives")
        self.false_negatives = _zero_wt_init("false_negatives")
        self.weights_intermediate = _zero_wt_init("weights_intermediate")

    def update_state(self, y_true, y_pred, sample_weight=None):
        if self.threshold is None:
            threshold = tf.reduce_max(y_pred, axis=-1, keepdims=True)
            # make sure [0, 0, 0] doesn't become [1, 1, 1]
            # Use abs(x) > eps, instead of x != 0 to check for zero
            y_pred = tf.logical_and(y_pred >= threshold, tf.abs(y_pred) > 1e-12)
            
        else:
            y_pred = y_pred > self.threshold

        y_true = tf.cast(y_true, self.dtype)
        y_pred = tf.cast(y_pred, self.dtype)

        def _weighted_sum(val, sample_weight):
            if sample_weight is not None:
                val = tf.math.multiply(val, tf.expand_dims(sample_weight, 1))
            return tf.reduce_sum(val, axis=self.axis)

        self.true_positives.assign_add(_weighted_sum(y_pred * y_true, sample_weight))
        self.false_positives.assign_add(
            _weighted_sum(y_pred * (1 - y_true), sample_weight)
        )
        self.false_negatives.assign_add(
            _weighted_sum((1 - y_pred) * y_true, sample_weight)
        )
        self.weights_intermediate.assign_add(_weighted_sum(y_true, sample_weight))

    def result(self):
        precision = tf.math.divide_no_nan(
            self.true_positives, self.true_positives + self.false_positives
        )
        recall = tf.math.divide_no_nan(
            self.true_positives, self.true_positives + self.false_negatives
        )

        mul_value = precision * recall
        add_value = (tf.math.square(self.beta) * precision) + recall
        mean = tf.math.divide_no_nan(mul_value, add_value)
        f1_score = mean * (1 + tf.math.square(self.beta))

        if self.average == "weighted":
            weights = tf.math.divide_no_nan(
                self.weights_intermediate, tf.reduce_sum(self.weights_intermediate)
            )
            f1_score = tf.reduce_sum(f1_score * weights)

        elif self.average is not None:  # [micro, macro]
            f1_score = tf.reduce_mean(f1_score)

        f1_score_last = tf.gather(f1_score, len(playlists)-1)
        return f1_score_last

    def get_config(self):
        """Returns the serializable config of the metric."""

        config = {
            "num_classes": self.num_classes,
            "average": self.average,
            "beta": self.beta,
            "threshold": self.threshold,
        }

        base_config = super().get_config()
        return {**base_config, **config}

    def reset_state(self):
        reset_value = tf.zeros(self.init_shape, dtype=self.dtype)
        K.batch_set_value([(v, reset_value) for v in self.variables])

    def reset_states(self):
        # Backwards compatibility alias of `reset_state`. New classes should
        # only implement `reset_state`.
        # Required in Tensorflow < 2.5.0
        return self.reset_state()
    
fbeta_score2 = FBetaScore2(num_classes=len(playlists), beta=0.5)

## Tuner

In [None]:
from scikeras.wrappers import KerasClassifier

def create_complicated_model_class(
                                  # optimizer, learning_rate,
                                   n_layers, n_layer_1, n_layer_2, n_layer_3, n_layer_4, n_layer_5,
                                   reg_type, reg_mag, max_norm, act,
                                   p_drop, batch_norm):
  # Set Weight regularizer
  if reg_type == 'l1':
    kernel_regularizer = l1(reg_mag)
  elif reg_type == 'l2':
    kernel_regularizer = l2(reg_mag)
  else:
    kernel_regularizer=None
  
  # Set weight value constraint
  kernel_constraint = MaxNorm(max_value=max_norm)
  
    # Create model
  model = Sequential()
  model.add(Dense(n_layer_1, input_dim=input_dim, activation=act,
                  kernel_regularizer=kernel_regularizer,
                  kernel_constraint=kernel_constraint))
  if batch_norm==1:
    model.add(BatchNormalization())
  model.add(Dropout(p_drop))
  if n_layers > 1:
    model.add(Dense(n_layer_2,activation=act,
                  kernel_regularizer=kernel_regularizer,
                  kernel_constraint=kernel_constraint))
    if batch_norm==1:
      model.add(BatchNormalization())
    model.add(Dropout(p_drop))
  if n_layers > 2:
    model.add(Dense(n_layer_3, activation=act,
                  kernel_regularizer=kernel_regularizer,
                  kernel_constraint=kernel_constraint))
    if batch_norm==1:
      model.add(BatchNormalization())
    model.add(Dropout(p_drop))
    if n_layers > 3:
      model.add(Dense(n_layer_4, activation=act,
                    kernel_regularizer=kernel_regularizer,
                    kernel_constraint=kernel_constraint))
      if batch_norm==1:
        model.add(BatchNormalization())
      model.add(Dropout(p_drop))
    if n_layers > 4:
      model.add(Dense(n_layer_5, activation=act,
                    kernel_regularizer=kernel_regularizer,
                    kernel_constraint=kernel_constraint))
      if batch_norm==1:
        model.add(BatchNormalization())
    model.add(Dropout(p_drop))

  # Output layer
  num_cat = len(playlists)
  model.add(Dense(num_cat, activation='softmax'))
  return model

# Tuning params
param_grid = {
    'optimizer' : Categorical(['adam', 'nadam']),
    'optimizer__learning_rate': Categorical([1e-4, 1e-5, 1e-6, 1e-7, 1e-8]),
    'n_layers': Categorical([2, 3, 4, 5]),
    'n_layer_1': Integer(10, 1000, prior="uniform"),
    'n_layer_2': Integer(10, 1000, prior="uniform"),
    'n_layer_3': Integer(10, 1000, prior="uniform"),
    'n_layer_4': Integer(10, 1000, prior="uniform"),
    'n_layer_5': Integer(10, 1000, prior="uniform"),
    'reg_type': Categorical(['l1', 'l2', 'none']),
    'reg_mag': Real(0.001, 0.1),
    'max_norm': Integer(1, 500),
    'act': Categorical(['relu', 'selu', 'elu']),
    'p_drop': Real(0.0, 0.9),
    'batch_norm': Categorical([0, 1]),
    'batch_size': Integer(20, 100),
    'epochs': Integer(2, 250)
  }
# Convert to keras
def custom_fbeta(y_pred, y_true, beta=0.5):
  y_pred_t = [item[-1] for item in y_pred]
  y_true_t = [item[-1] for item in y_true]
  score = fbeta_score(y_true_t, y_pred_t, beta=beta)
  return score

my_scorer = make_scorer(custom_fbeta, greater_is_better=True)
model_cv = KerasClassifier(model=create_complicated_model_class,
                           loss='categorical_crossentropy',
                           verbose=1,
                           act='relu',
                           shuffle=True,
                           batch_norm=0,
                           max_norm=500,
                           n_layer_1=100, n_layer_2=100, n_layer_3=100,
                           n_layer_4=100, n_layer_5=100, n_layers=5,
                           p_drop=0, reg_mag=0, reg_type='l1',
                           random_state=42
                           )

grid = BayesSearchCV(
    estimator = model_cv,
    n_jobs=-1,
    verbose=10,
    cv=5,
    search_spaces=param_grid,
    n_iter=100, random_state=42,
    n_points=1,
    scoring=my_scorer
)
bayes_cv_model = grid.fit(X, y_t,
                          class_weight=class_weights,
                          # callbacks=[tensorboard],
                          verbose=1)

## Config for model creation and song recommendations
This code cell can be customized to adjust how your song data is processed, how many song recommendations are created, and how many songs are added to your output playlist. <br>

Model Parameters:
1. genre_split_percentile (Range 0-100) - Default 80 - Percentile to split for most common genres. ie: 80 = top 20% most common genres.
2. artist_split_percentile (Range 0-100) - Default 100 - Percentile to split for most common artists. ie: 80 = top 20% most common artists

Recommendation Parameters:
1. n_genre_recs (Range 0-2000) - Default 600 - Number of recommendations generated per group based on genre/s
2. n_artist_recs (Range 0-2000) - Default 300 - Number of recommendations generated per group based on artist/s
3. n_song_recs (Range 0-2000) - Default 400 - Number of recommendations generated per group based on song/s
4. song_threshold (Range 0-2) - Default None - Threshold for including a recommendation song in output playlist. Each recommended song will have a predicted value roughly ranging from 0 to the number of playlists included.  See data analysis plots to get a better idea on where to set your threshold
5. replace_out_playlist_songs (Range True, False) - Default True - Replace songs in output playlst: True is replace, False is don't replace



In [None]:
# model parameters
genre_split_percentile = 80
artist_split_percentile = 100
tune_model = False
model_tuning_iterations = 20

# recommendation parameters
n_genre_recs = 600
n_artist_recs = 300
n_song_recs = 400
song_threshold = None   # Change from None to set a custom threshold
replace_out_playlist_songs = True

# Run Functions
This code segment will run all of the above functions and utilizes the parameter values specified in the config segment above.<br>
Running this segment will take around 4 minutes depending on the size of recommendations specified in the config.<br>
The output will be recommended songs added to the output playlist.

In [None]:
# Wrangle
print('Pulling Playlist Data and Processing')
song_traits = create_song_df(playlists)
df, g_keys, a_keys, max_col_vals = wrangle(song_traits, load_enc_lists=False, 
                                           genre_list=None, artist_list=None,
                                           g_split=genre_split_percentile,
                                           a_split=artist_split_percentile)

# Split dataframe
X = df.drop(columns='rating').to_numpy()
y = df['rating'].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8,
                                                    random_state=42, stratify=y,
                                                    shuffle=True)
input_dim = X.shape[1]

# Create weights variable
class_weights = dict(enumerate(compute_class_weight('balanced', classes=np.unique(y), y=y)))
val_sample_weights = class_weight.compute_sample_weight(class_weights, y_test)

# Create model
print('Creating Model')
model, tensorboard = create_model()

# # Recommend
# genre_seed_options = get_genre_options(song_traits)
# song_list = create_rec_song_list(genre_seed_options,
#                                  genre_size=n_genre_recs,
#                                  artist_size=n_artist_recs,
#                                  song_size=n_song_recs)

# print('Processing Recommendations')
# # Create dataframe and filter predictions for repeats
# df_rec = create_rec_df(song_list, df, rec_target_index=-1)

# # Wrangle our dataframe
# df_test = wrangle(df_rec, load_enc_lists=True, genre_list=g_keys,
#                   artist_list=a_keys, max_norm_list=max_col_vals)
# # Use model to generate song predictions and output a list of songs past a threshold
# out_song_list = gen_predictions(model, df_test, threshold=song_threshold)
# # Add songs to playlist
# print(f"{len(out_song_list)} songs were added to the playlist")
# upload_songs(out_song_list, playlist_uri=PL_OUT, replace_pl_songs=replace_out_playlist_songs)
# print("Playlist Updated Successfully!")

Pulling Playlist Data and Processing
Creating Model


# Feature exploration
These code segments visualize and explore aspects of input song data. <br>

The below code segment generates several histograms for various song traits and color codes the plots based on the playlist.  In these interactive plots, playlists can be hidden by clickon on them in the legend for each plot.

In [None]:
categories = ['dur', 'pop_x', 'album_type', 'artist_pop', 'acoust', 'dance_x',
              'energy', 'instrument', 'key', 'live', 'loud', 'mode', 'speech',
              'tempo', 'time_sig', 'valence', 'release_year']
cat_names = ['song duration', 'song popularity', 'album type',
             'artist popularity', 'acousticness', 'danceability', 'energy',
             'instrumentalness', 'song key', 'liveness', 'loudness', 'mode',
             'speech', 'tempo', 'time signature', 'valence', 'release year']
playlist_names = list(playlists.keys())             

# Create Dashboard
def set_layout(fig, fig_title=None, l=0, r=0, t=45, b=0):
    fig.update_layout(
        showlegend=True,
        legend={
            'orientation': "h",
            'yanchor': "bottom",
            'y': -0.2,
            'xanchor': "center",
            'x': 0.5,
        },
        margin={'t': t, 'r': r, 'l': l, 'b': b},
        xaxis={'anchor': 'y', 'domain': [0.0, 1.0]},
        yaxis={'anchor': 'x', 'domain': [0.0, 1.0]},
        height=500,
        barmode='overlay',
        title= fig_title.title()
    )
    fig.update_xaxes(title_text = f"Normalized {fig_title}")
    fig.update_yaxes(title_text = "Count")
    fig.update_traces(opacity=0.75)


app = JupyterDash(__name__, external_stylesheets=['https://codepen.io/chriddyp/pen/bWLwgP.css'])

# Generate graph data for app and add elements to a list
app_graphs, count = [], 0
for cat, cat_nam in zip(categories, cat_names):
  locals()[cat] = go.Figure()
  for val in range(len(playlists)):
    locals()[cat].add_trace(go.Histogram(x=df[df.rating==val][cat].values,
                                        name=f'{playlist_names[val]}'))
  set_layout(locals()[cat], fig_title=cat_nam)

  # Add graph to app_graphs
  app_graphs.append(
      html.Div(
          [dcc.Graph(id=cat_nam, figure=locals()[cat])],
          style={'width': '49%', 'display': 'inline-block', 'padding': '0px 0px', 'margin':'0% 0.5% 0px 0%'}
          ))
  # Move to next row if on column 3
  if count == 1:
    app_graphs.append(html.Br())
    count=0
  else:
    count += 1


app.layout = html.Div(app_graphs)

print('Dashboard loading ~ 2 minutes')
app.run_server(mode='inline', height=1000, host='127.0.0.1', port='8050')

Dashboard loading ~ 2 minutes


<IPython.core.display.Javascript object>

### Scater Polar Plot of Song Features
This plot visualizes the average values of song features for each playlist.  All feature values are normalized from 0 to 1 based on the minimum and maximum values from the input data.

This plot helps to visualize all of the song features together and to compare/contrast their values between playlists.

In [None]:
# Scatter Polar
df_polar = df[categories + ['rating']]
fig = go.Figure()
for i in range(len(playlists)):
  name = playlist_names[i]
  values = list(df_polar[df_polar.rating==i].mean(axis=0).drop('rating'))
  fig.add_trace(go.Scatterpolar(
          r = values,
          theta = cat_names,
          name = name,
          fill='toself',
          opacity=0.75
      ))
fig.update_layout(
    autosize=False,
    width=1000,
    height=1000,
    margin=dict(
        l=100,
        r=100,
        b=100,
        t=100,
        pad=4
    ),
    title='Song Features by Playlist')
fig.show()

In [None]:
corrs = df.corr()
test = corrs[['rating']]
fig = px.imshow(test.drop('rating'), text_auto=False, aspect="auto",
               color_continuous_scale='RdYlGn',
               color_continuous_midpoint=0)
fig.show()

# Model Analysis
These segments explore model evaluation metrics and predictions

### Model Predictions Histogram
This histogram plots the predicted song rating for all of the songs from the input playlists.  This plot can be used to visualize what is the best predicted model rating value to separate your favorites playlist songs from the rest.  You can then use this value as the model prediction threshold to get more accurate song recommendations (set song_threshold in config).<br><br>

A number is assigned for each input playlist, whereby each song from the first playlist will be given a rating of 0.  The songs from the next playlist have a rating of 1.  And so on for the rest of the playlists.<br>

The bars on this histogram are colored by the their true rating label and their x-axis values are the model's predicted rating value.

In [None]:
# Plot histogram of model predictions with true labels
ac_test = df.copy()
predictions_ac = model.predict(ac_test.drop(columns='rating').to_numpy())
ac_test['predicted_rating'] = predictions_ac
ac_test[['rating', 'predicted_rating']].sort_values('rating', ascending=False)
ac_test['name'] = [playlist_names[val] for val in ac_test.rating]
fig = px.histogram(ac_test, x="predicted_rating", color="name", opacity=0.75, barmode='overlay')
fig.show()

### Recommended Song Predicted Ratings

In [None]:
out_pred = model.predict(df_test.drop(columns='rating').to_numpy())
px.histogram(out_pred)

### Model Training History
This interactive visualization tool helps to see how the model evaluation metrics changed across epochs.  Each plot will separate model metrics between the training and validation portion of the dataset.  Our early stopping callback will stop the model from running if the performance metric doesn't improve over a few epochs and will restore the best model's weights

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs