# Playlist Buddy

This notebook is the basis for the PlaylistBuddy.py.

I experimented with the filtering of the recommended songs and then refined to workflow to produce the script. 

What you see here is more or less a more understandable version of the script.

I wrote about the basic authentication, recommendation-retrieval and filtering in a Medium post: https://towardsdatascience.com/using-python-to-refine-your-spotify-recommendations-6dc08bcf408e


## Imports

The most important imports are spotipy, pandas, numpy and my own small library 'spotifuncs' here I use the wildcard import for experimentation, the script only imports necessary functions. spotifuncs itself uses pandas, sklearn, spotipy

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util
from pathlib import Path
import pandas as pd
import numpy as np
from spotifuncs import *

In [2]:
path = Path("C:/Users/ms101/OneDrive/DataScience_ML/projects/spotify_app")

## Setting the Credentials 

I stored the credentials and usernames in a .txt that was not uploaded to GitHub to ensure safety and avoid showing sensible information in my notebooks. 

Here I simply read the lines within the .txt files to retrieve the necessary information

In [3]:
with open(path / "client_s.txt") as f:
    content = f.readlines()
content = [x.strip() for x in content]

client_id = content[0]
client_secret = content[1]

In [4]:
with open(path / "usernames.txt") as f:
    usernames = f.readlines()
usernames = [x.strip() for x in usernames]

username1 = usernames[0]

## App Scope

For the most basic functionality of the following code to work for **public** playlists, the scope needs to contain at least “playlist-modify-public”.

I wanted to explore the data and also be able to modify private playlists. Thus I added more to my scope.


To view all available scopes click here: https://developer.spotify.com/documentation/general/guides/scopes/

In [5]:
scope = "user-library-read user-read-recently-played user-top-read playlist-modify-public playlist-modify-private playlist-read-private playlist-read-collaborative"

redirect_uri = "https://developer.spotify.com/dashboard/applications/4a4e029d299a4241873db8300038bf0a"

client_credentials_manager = SpotifyClientCredentials(client_id=client_id, 
                                                      client_secret=client_secret)


## Authenticate

I reduced the authentication process to a function that can be found within the spotifuncs library. I describe it in more detail in my Medium post (see above).

In [6]:
sp_m = authenticate(redirect_uri, client_credentials_manager, username1, scope, client_id, client_secret)

## Getting the playlist data

For the exploration and building of the script I used a personal playlist called "Redlightcandle" which contains a lot of slow,calm techno/electro music. The playlist is fairly homogenous and has a specific 'feel' which was perfect for the first tests, as it was very easy to judge whether recommended songs fit the playlist or not.

In [7]:
redlight = sp_m.playlist("spotify:playlist:3zcSUFp0puoWXWXuFCF2e6")

I created a small function to create a dataframe from the SpotifyAPI-query results. I build in some assertions into most functions one of which I tested here.

In [8]:
redlight_df = create_df_playlist(redlight) #see if assert error works

AssertionError: sp needs to be specified for appending audio features

In [9]:
redlight_df = create_df_playlist(redlight, sp = sp_m)
redlight_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Desert Woman,RINGOS DESERT,4O1nIKXTS87DKHIQejrn3n,ZHU,374610,45,0.710,0.651,1,-8.537,1,0.0707,0.02230,0.864,0.0657,0.1500,122.987
1,York - Original Mix,York,3eJBKLhpOauQJlgoOSPErD,Christian Löffler,476675,49,0.696,0.634,11,-12.063,0,0.0403,0.15700,0.887,0.1290,0.0359,118.035
2,Personal Space,Personal Space / Mulholland 99,3TYNQdnAM85I6P9UegzYKo,Yotto,301295,47,0.645,0.848,7,-6.913,1,0.0351,0.00254,0.956,0.0868,0.0383,122.011
3,October,October,6FIQ8o2hqlDmHQFoBKmKgW,Icarus,354102,60,0.365,0.711,8,-9.591,1,0.0355,0.03060,0.913,0.0824,0.1170,116.003
4,Azure,Berlin Calling (The Soundtrack by Paul Kalkbre...,2VnlJCQMphFJUyYR5p7da2,Paul Kalkbrenner,364059,58,0.730,0.504,4,-13.915,1,0.0477,0.12400,0.882,0.1070,0.0936,122.013
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,Long Wait,Outer,6Q33LCkK8IO3JYuRl1sl0k,Dusky,338026,32,0.590,0.699,4,-10.834,0,0.0447,0.00760,0.211,0.2080,0.0367,119.979
62,Miss You,The Last Resort,4WTmtPRtIpjzgwBbQsMYyo,trentemøller,247817,54,0.542,0.160,7,-21.672,0,0.0365,0.94600,0.928,0.2630,0.1980,126.921
63,Exile 007 B2,EXILE 007,2BWiEuSIWsagua4DewdXY5,Johannes Heil,419351,41,0.802,0.588,10,-11.046,0,0.0595,0.02990,0.878,0.0822,0.0506,128.989
64,Nices Wölkchen,Amygdala,78WXbs2VMEGYXoGPyokKXZ,DJ Koze,331093,43,0.700,0.387,1,-12.855,1,0.0566,0.24000,0.891,0.2190,0.1650,120.008


In order to create similarity scores I need the audio features.

In [10]:
cols = redlight_df.columns[6:].tolist()
cols

['danceability',
 'energy',
 'key',
 'loudness',
 'mode',
 'speechiness',
 'acousticness',
 'instrumentalness',
 'liveness',
 'valence',
 'tempo']

## Mean Song

I decided to create a "mean song" based on the playlist. This "song" can be used to filter playlists further, later in the process. 

This however is only a small refinement and does not work well for heterogenous playlists. **The feature is always optional when using the PlaylistBuddy.py script.**

In [11]:
redlight_df[cols].mean()

danceability          0.695530
energy                0.553364
key                   6.015152
loudness            -11.642197
mode                  0.469697
speechiness           0.059317
acousticness          0.156192
instrumentalness      0.828045
liveness              0.114939
valence               0.221280
tempo               122.133924
dtype: float64

In [12]:
mean_song = pd.DataFrame(columns=redlight_df.columns)
mean_song.loc["mean"] = redlight_df.mean()

In [13]:
mean_song

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
mean,,,,,391892.30303,37.772727,0.69553,0.553364,6.015152,-11.642197,0.469697,0.059317,0.156192,0.828045,0.114939,0.22128,122.133924


## Using Spotifys own recommendations as a first step

I decided to make use of Spotify's own recommendations. They are a good starting point but I feel like they need to be refined. Here they serve as the base for the recommended songs that are added to a playlist.

All tracks in the current playlist are used as seed tracks.

In [14]:
seed_tracks = redlight_df["track_id"].tolist()

In [15]:
len(seed_tracks)

66

The SpotifyAPI only accepts a small list of seed tracks for each query which is unfortunate as I wanted to use the whole playlit as a seed for the recommendations. I worked around this by using "packages" of 5 seed tracks to retrieve 25 recommendations (basically 5 per song).

All of these create the Recommendations Dataframe which is the basis for further filtering steps.

In [16]:
recomm_dfs = []
for i in range(5,len(seed_tracks)+1,5):
    recomms = sp_m.recommendations(seed_tracks = seed_tracks[i-5:i],limit = 25)
    recomms_df = append_audio_features(create_df_recommendations(recomms),sp_m)
    recomm_dfs.append(recomms_df)
recomms_df = pd.concat(recomm_dfs)
recomms_df.reset_index(drop = True, inplace = True)

In [17]:
recomms_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Domino,Domino Remixes,6VGNzYErt08Rai78mLkIzI,Oxia,435266,56,0.785,0.690,3,-10.225,0,0.0698,0.002910,0.9070,0.2030,0.2150,128.879
1,We Are Mirage,Every Day,4tBYxCOPJXS2HefP749FWK,Eric Prydz,379344,41,0.559,0.782,4,-6.732,0,0.0339,0.000511,0.7190,0.4910,0.0727,127.983
2,Strandfeest - Original,Strandfeest,2SzHWK7RjorlTq66S1nJrJ,Bakermat,275289,34,0.654,0.447,1,-7.057,1,0.0463,0.006110,0.9380,0.0828,0.7370,126.016
3,Branches,Don't Look Back,5FJKNksf8aF976BIIj6s6Q,Nora En Pure,190731,37,0.762,0.827,9,-6.488,0,0.0365,0.025700,0.8970,0.3550,0.3400,122.978
4,All I Know,All I Know,6tr6I3YJnHUqKHfJXnm4jk,EDX,148801,52,0.717,0.944,1,-5.041,1,0.0386,0.005180,0.4770,0.1130,0.3540,124.993
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
320,Araya,Araya,4qUlcJdpzSXRj0tb7luniD,Fatima Yamaha,358243,45,0.694,0.731,1,-10.031,1,0.0546,0.046900,0.8900,0.0964,0.4110,126.016
321,Ringo,Sun:Sets 2018 (Selected by Chicane),6ow8v22PKlDBg9NtrWxpmZ,Joris Voorn,251803,31,0.742,0.641,5,-8.725,0,0.0500,0.119000,0.8800,0.1180,0.5980,121.980
322,Breathe,Breathe,6TR0FGw4zhlGbQALN065AI,CamelPhat,194232,68,0.595,0.877,9,-7.414,0,0.0442,0.006460,0.0879,0.3980,0.0810,125.015
323,Devil's Water - Reprise,Devil's Water,5roFk0NrkH8tirMrgoYKAF,Rennie Foster,310073,14,0.199,0.334,1,-10.137,0,0.0380,0.691000,0.0156,0.2260,0.0841,85.527


## Creating Similarity Scores

### Scaler Testing
At the beginning I used StandardScaling before creating the similarity scores. I switched to MinMaxScaling as it doesn't change the distributions and most features hat a range between 0 and 1 already.

Below are the remains of the `create_similarity_score` modified to use the MinMaxScaler **before** it was introduced to the spotifuncs library.

The code below contains a lot of commented out lines from this test. I basically created recommendations with both techniques and compared their result. Due to the nature of the tast in the end I had to manually judge whether there were any differences between the two methods.

In the end I opted for MinMaxScaling. It makes more sense from a theoretical standpoint and showed as good if not better results than Standard Scaling.

In [19]:
##testing whether MinMax Scaling works better than StandardScaling
## MinMax has been adopted for the spotifuncs function
# from sklearn.preprocessing import MinMaxScaler
# def test_create_similarity_score(df1,df2,similarity_score = "cosine_sim"):
#     """ 
#     Creates a similarity matrix for the audio features (except key and mode) of two Dataframes.

#     Parameters
#     ----------
#     df1 : DataFrame containing track_name,track_id, artist,album,duration,popularity
#             and all audio features
#     df2 : DataFrame containing track_name,track_id, artist,album,duration,popularity
#             and all audio features
    
#     similarity_score: similarity measure (linear,cosine_sim)

#     Returns
#     -------
#     A matrix of similarity scores for the audio features of both DataFrames.
#     """
    
#     assert list(df1.columns[6:]) == list(df2.columns[6:]), "dataframes need to contain the same columns"
#     features = list(df1.columns[6:])
#     features.remove('key')
#     features.remove('mode')
#     df_features1,df_features2 = df1[features],df2[features]
#     scaler = MinMaxScaler()
#     df_features_scaled1,df_features_scaled2 = scaler.fit_transform(df_features1),scaler.fit_transform(df_features2)
#     if similarity_score == "linear":
#         linear_sim = linear_kernel(df_features_scaled1, df_features_scaled2)
#         return linear_sim
#     elif similarity_score == "cosine_sim":
#         cosine_sim = cosine_similarity(df_features_scaled1, df_features_scaled2)
#         return cosine_sim
#     #other measures may be implemented in the future


In [18]:
similarity_score = create_similarity_score(redlight_df,recomms_df)


In [19]:
#test_similarity_score = test_create_similarity_score(redlight_df,recomms_df)

In [20]:
similarity_score.shape#, test_similarity_score.shape

(66, 325)

Here I am retrieving the index of the most similar song to each song of the original playlist to then keep these song as a more refined list of recommended songs.

In [21]:
[np.argmax(i) for i in similarity_score]#, [np.argmax(i) for i in test_similarity_score]

[109,
 60,
 14,
 295,
 307,
 34,
 185,
 83,
 131,
 91,
 4,
 267,
 267,
 68,
 303,
 221,
 86,
 22,
 185,
 184,
 129,
 267,
 207,
 267,
 307,
 251,
 251,
 106,
 21,
 179,
 165,
 303,
 218,
 251,
 298,
 185,
 40,
 113,
 307,
 190,
 182,
 271,
 154,
 15,
 192,
 22,
 198,
 66,
 122,
 185,
 210,
 181,
 251,
 211,
 49,
 247,
 295,
 295,
 299,
 189,
 288,
 322,
 211,
 254,
 36,
 185]

In [22]:
final_recomms = recomms_df.iloc[[np.argmax(i) for i in similarity_score]]
final_recomms = final_recomms.drop_duplicates().reset_index(drop = True)

In [23]:
#test_final_recomms = recomms_df.iloc[[np.argmax(i) for i in test_similarity_score]]
#test_final_recomms = test_final_recomms.drop_duplicates().reset_index(drop = True)

In [24]:
final_recomms

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Flicker - Mixed,fabric presents Bonobo (DJ Mix),6jhm7KeidwkYSop3tWEpCR,Bonobo,337653,41,0.68,0.719,6,-10.719,1,0.043,0.0379,0.788,0.0839,0.0883,121.006
1,Monogamy - Original Mix,Monogamy/Polygamy EP,1Fjt2qzntJQN3yuuzF4iZK,Sascha Braemer,370730,39,0.637,0.71,2,-11.281,1,0.0338,0.000141,0.89,0.209,0.0948,120.986
2,Opus - Four Tet Remix,Opus (Four Tet Remix),3Iw9Nr3rmMM7L4FjSV7DEB,Eric Prydz,598928,40,0.602,0.764,6,-8.928,0,0.0344,0.00202,0.904,0.12,0.141,125.978
3,Battery Point,Beak>,0lf0PVp2zmn7lL2DpPjEtt,Beak>,430184,21,0.157,0.636,2,-9.627,0,0.0377,0.0404,0.888,0.112,0.178,140.318
4,Lion - Jamie XX Remix,Pink Remixes,48MEgryHl7JrjZC4ZOQmui,Four Tet,428500,41,0.802,0.497,10,-15.477,0,0.0647,0.125,0.847,0.348,0.07,120.019
5,House Arrest - Chris Lorenzo Remix,House Arrest (Chris Lorenzo Remix),5x58wIDdtxADXRZoRxKSSO,Sofi Tukker,218709,50,0.752,0.811,8,-7.429,1,0.0376,0.00536,0.701,0.0879,0.283,124.003
6,Coffee & Dub - Original Mix,Techno For Breakfast EP,6AQCOXUWAaB7rSq0Ms2l4U,Sonitus Eco,424000,3,0.597,0.543,1,-15.579,1,0.0405,0.17,0.908,0.262,0.113,125.991
7,The End Of It All 2015 Mix,The End Of It All 2015 mix,0mYY8g0ikVTIi8cZndMGtb,John Tejada,397800,25,0.628,0.724,6,-8.045,0,0.0374,0.00326,0.949,0.206,0.512,126.003
8,Future Days - Hey! Douglas Remix,Future Days (Remixes),2gKrcGKOIhJNhd74kEcWtx,islandman,309691,43,0.802,0.76,7,-7.492,1,0.0586,0.012,0.692,0.0955,0.494,106.005
9,Bitter Fater - Radio Edit,Bitter Fate (Radio Edit),0IBfTHJvcSVMaOEfPBi2Oj,Edu Imbernon,330243,19,0.618,0.912,1,-7.215,1,0.0406,0.00538,0.764,0.123,0.693,121.996


In [25]:
#test_final_recomms

In [26]:
len(final_recomms)

49

Due to the "packaged" recommendation querying some recommended songs may already be in the playlist. These are excluded in the next steps.

In [27]:
final_recomms[final_recomms["track_name"].isin(redlight_df["track_name"])]

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
28,Ordinary - Lake People's Circle Motive Remix,Ordinary,5esTnUp3cLgKsU2Etci2Ao,Kollektiv Turmstrasse,499282,45,0.73,0.488,11,-9.117,0,0.0457,0.19,0.864,0.0527,0.097,121.003
40,Miss You,The Last Resort,4WTmtPRtIpjzgwBbQsMYyo,trentemøller,247817,54,0.542,0.16,7,-21.672,0,0.0365,0.946,0.928,0.263,0.198,126.921


In [29]:
#filter again so tracks are not already in playlist_df
final_recomms = final_recomms[~final_recomms["track_name"].isin(redlight_df["track_name"])]
final_recomms.reset_index(drop = True, inplace = True)

In [30]:
#filter again so tracks are not already in playlist_df
#test_final_recomms = test_final_recomms[~test_final_recomms["track_name"].isin(redlight_df["track_name"])]
#test_final_recomms.reset_index(drop = True, inplace = True)

In [31]:
final_recomms

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Flicker - Mixed,fabric presents Bonobo (DJ Mix),6jhm7KeidwkYSop3tWEpCR,Bonobo,337653,41,0.68,0.719,6,-10.719,1,0.043,0.0379,0.788,0.0839,0.0883,121.006
1,Monogamy - Original Mix,Monogamy/Polygamy EP,1Fjt2qzntJQN3yuuzF4iZK,Sascha Braemer,370730,39,0.637,0.71,2,-11.281,1,0.0338,0.000141,0.89,0.209,0.0948,120.986
2,Opus - Four Tet Remix,Opus (Four Tet Remix),3Iw9Nr3rmMM7L4FjSV7DEB,Eric Prydz,598928,40,0.602,0.764,6,-8.928,0,0.0344,0.00202,0.904,0.12,0.141,125.978
3,Battery Point,Beak>,0lf0PVp2zmn7lL2DpPjEtt,Beak>,430184,21,0.157,0.636,2,-9.627,0,0.0377,0.0404,0.888,0.112,0.178,140.318
4,Lion - Jamie XX Remix,Pink Remixes,48MEgryHl7JrjZC4ZOQmui,Four Tet,428500,41,0.802,0.497,10,-15.477,0,0.0647,0.125,0.847,0.348,0.07,120.019
5,House Arrest - Chris Lorenzo Remix,House Arrest (Chris Lorenzo Remix),5x58wIDdtxADXRZoRxKSSO,Sofi Tukker,218709,50,0.752,0.811,8,-7.429,1,0.0376,0.00536,0.701,0.0879,0.283,124.003
6,Coffee & Dub - Original Mix,Techno For Breakfast EP,6AQCOXUWAaB7rSq0Ms2l4U,Sonitus Eco,424000,3,0.597,0.543,1,-15.579,1,0.0405,0.17,0.908,0.262,0.113,125.991
7,The End Of It All 2015 Mix,The End Of It All 2015 mix,0mYY8g0ikVTIi8cZndMGtb,John Tejada,397800,25,0.628,0.724,6,-8.045,0,0.0374,0.00326,0.949,0.206,0.512,126.003
8,Future Days - Hey! Douglas Remix,Future Days (Remixes),2gKrcGKOIhJNhd74kEcWtx,islandman,309691,43,0.802,0.76,7,-7.492,1,0.0586,0.012,0.692,0.0955,0.494,106.005
9,Bitter Fater - Radio Edit,Bitter Fate (Radio Edit),0IBfTHJvcSVMaOEfPBi2Oj,Edu Imbernon,330243,19,0.618,0.912,1,-7.215,1,0.0406,0.00538,0.764,0.123,0.693,121.996


In [32]:
#test_final_recomms

In [31]:
#add both and compare wait between to
# sp_m.user_playlist_add_tracks(usernames[0],
#                              playlist_id="spotify:playlist:36MtjIS6lPXT7Q97HieR9g",
#                              tracks = final_recomms["track_id"].tolist())

{'snapshot_id': 'NCw1NGJlZmJhNTdiZGRmNTc4NThkYjkxODgyMjE4OTU0MGZmNDUxYWE2'}

In [32]:
# sp_m.user_playlist_add_tracks(usernames[0],
#                              playlist_id="spotify:playlist:1ypSXCaY044CeRD98pteRc",
#                              tracks = test_final_recomms["track_id"].tolist())

{'snapshot_id': 'Miw2MzAwZTcwZjk0YTU2OTNhNGUxMmI1YjQ2ZWVhMmFmOTZmZTdhNzM3'}

## Filtering with mean song

For very homogenous playlist it can make sense (that is at least what my experiments show) to filter the recommendations again based on the average song of a playlist (as mentioned earlier).

The process is the same as filtering the recommendations by similarity to the existing songs (see above), only now just for one "song" the "mean song". I have written a function for this task which can be found in spotifuncs.py.

In [41]:
##adopted into spotifuncs
# def test_filter_with_meansong(mean_song,recommendations_df, n_recommendations = 10):
#     features = list(mean_song.columns[6:])
#     features.remove("key")
#     features.remove("mode")
#     mean_song_feat = mean_song[features].values
#     mean_song_scaled = MinMaxScaler().fit_transform(mean_song_feat.reshape(-1,1))
#     recommendations_df_scaled = MinMaxScaler().fit_transform(recommendations_df[features])
#     mean_song_scaled = mean_song_scaled.reshape(1,-1)
#     sim_mean_finrecomms = cosine_similarity(mean_song_scaled,recommendations_df_scaled)[0][:]
#     #sim_mean_finrecomms = sim_mean_finrecomms[0][:]
#     indices = (-sim_mean_finrecomms).argsort()[:n_recommendations]
#     final_recommendations = recommendations_df.iloc[indices]
#     return final_recommendations

## After these filtering steps:

Some songs are "correct" and would also be part of a manual selection, some however are wrong, and contain e.g. too much text.

**Manual filtering** I had the idea to apply some manual filters (possibly before mean song) based on user input and on attributes such as instrumentalness may be helpful (optional)

In [33]:
#manual filtering
final_recomms.columns

Index(['track_name', 'album', 'track_id', 'artist', 'duration', 'popularity',
       'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo'],
      dtype='object')

In [34]:
final_recomms.describe()

Unnamed: 0,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
count,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0
mean,367591.914894,35.085106,0.684149,0.654,4.808511,-10.440596,0.510638,0.059643,0.105349,0.801525,0.215564,0.236402,120.043149
std,96704.915672,15.474846,0.148763,0.172425,3.234584,3.18357,0.505291,0.037024,0.174817,0.188701,0.182506,0.180407,12.038956
min,148801.0,0.0,0.157,0.345,0.0,-16.61,0.0,0.0286,5.1e-05,0.00576,0.0579,0.0379,90.044
25%,315463.0,25.5,0.622,0.497,2.0,-12.372,0.0,0.04035,0.00537,0.7975,0.1035,0.09155,115.011
50%,382767.0,37.0,0.728,0.678,5.0,-10.713,1.0,0.0492,0.0343,0.868,0.118,0.178,121.996
75%,425509.5,45.0,0.7995,0.782,7.0,-8.0365,1.0,0.06525,0.1115,0.891,0.305,0.3355,124.498
max,598928.0,68.0,0.875,0.95,11.0,-2.216,1.0,0.229,0.71,0.966,0.858,0.693,161.191


I will limit the manual filtering based on attributes to:
- speechiness
- acousticness
- instrumentalness
- liveness

as those can be interpreted with relative ease and a "high" or "low" value of these feat. in a song makes intuitive sense.

In [2]:
##function in spotifuncs
# def feature_filter(df,feature, high = True):
#     assert feature in ["speechiness","acousticness",
#                        "instrumentalness","liveness"], "feature must be one of the following: speechiness,acousticness,instrumentalness,liveness"
#     x = 0.9 if high == True else 0.1
#     df = df[df[feature] > x] if high == True else df[df[feature] < x]
#     return df

In [35]:
feature_filter(final_recomms,feature = "speechiness")
        

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo


In [36]:
feature_filter(final_recomms,feature = "speechiness", high = False)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Flicker - Mixed,fabric presents Bonobo (DJ Mix),6jhm7KeidwkYSop3tWEpCR,Bonobo,337653,41,0.68,0.719,6,-10.719,1,0.043,0.0379,0.788,0.0839,0.0883,121.006
1,Monogamy - Original Mix,Monogamy/Polygamy EP,1Fjt2qzntJQN3yuuzF4iZK,Sascha Braemer,370730,39,0.637,0.71,2,-11.281,1,0.0338,0.000141,0.89,0.209,0.0948,120.986
2,Opus - Four Tet Remix,Opus (Four Tet Remix),3Iw9Nr3rmMM7L4FjSV7DEB,Eric Prydz,598928,40,0.602,0.764,6,-8.928,0,0.0344,0.00202,0.904,0.12,0.141,125.978
3,Battery Point,Beak>,0lf0PVp2zmn7lL2DpPjEtt,Beak>,430184,21,0.157,0.636,2,-9.627,0,0.0377,0.0404,0.888,0.112,0.178,140.318
4,Lion - Jamie XX Remix,Pink Remixes,48MEgryHl7JrjZC4ZOQmui,Four Tet,428500,41,0.802,0.497,10,-15.477,0,0.0647,0.125,0.847,0.348,0.07,120.019
5,House Arrest - Chris Lorenzo Remix,House Arrest (Chris Lorenzo Remix),5x58wIDdtxADXRZoRxKSSO,Sofi Tukker,218709,50,0.752,0.811,8,-7.429,1,0.0376,0.00536,0.701,0.0879,0.283,124.003
6,Coffee & Dub - Original Mix,Techno For Breakfast EP,6AQCOXUWAaB7rSq0Ms2l4U,Sonitus Eco,424000,3,0.597,0.543,1,-15.579,1,0.0405,0.17,0.908,0.262,0.113,125.991
7,The End Of It All 2015 Mix,The End Of It All 2015 mix,0mYY8g0ikVTIi8cZndMGtb,John Tejada,397800,25,0.628,0.724,6,-8.045,0,0.0374,0.00326,0.949,0.206,0.512,126.003
8,Future Days - Hey! Douglas Remix,Future Days (Remixes),2gKrcGKOIhJNhd74kEcWtx,islandman,309691,43,0.802,0.76,7,-7.492,1,0.0586,0.012,0.692,0.0955,0.494,106.005
9,Bitter Fater - Radio Edit,Bitter Fate (Radio Edit),0IBfTHJvcSVMaOEfPBi2Oj,Edu Imbernon,330243,19,0.618,0.912,1,-7.215,1,0.0406,0.00538,0.764,0.123,0.693,121.996


In [37]:
feature_filter(final_recomms,feature = "acousticness")

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo


In [38]:
feature_filter(final_recomms,feature = "acousticness", high = False)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Flicker - Mixed,fabric presents Bonobo (DJ Mix),6jhm7KeidwkYSop3tWEpCR,Bonobo,337653,41,0.68,0.719,6,-10.719,1,0.043,0.0379,0.788,0.0839,0.0883,121.006
1,Monogamy - Original Mix,Monogamy/Polygamy EP,1Fjt2qzntJQN3yuuzF4iZK,Sascha Braemer,370730,39,0.637,0.71,2,-11.281,1,0.0338,0.000141,0.89,0.209,0.0948,120.986
2,Opus - Four Tet Remix,Opus (Four Tet Remix),3Iw9Nr3rmMM7L4FjSV7DEB,Eric Prydz,598928,40,0.602,0.764,6,-8.928,0,0.0344,0.00202,0.904,0.12,0.141,125.978
3,Battery Point,Beak>,0lf0PVp2zmn7lL2DpPjEtt,Beak>,430184,21,0.157,0.636,2,-9.627,0,0.0377,0.0404,0.888,0.112,0.178,140.318
5,House Arrest - Chris Lorenzo Remix,House Arrest (Chris Lorenzo Remix),5x58wIDdtxADXRZoRxKSSO,Sofi Tukker,218709,50,0.752,0.811,8,-7.429,1,0.0376,0.00536,0.701,0.0879,0.283,124.003
7,The End Of It All 2015 Mix,The End Of It All 2015 mix,0mYY8g0ikVTIi8cZndMGtb,John Tejada,397800,25,0.628,0.724,6,-8.045,0,0.0374,0.00326,0.949,0.206,0.512,126.003
8,Future Days - Hey! Douglas Remix,Future Days (Remixes),2gKrcGKOIhJNhd74kEcWtx,islandman,309691,43,0.802,0.76,7,-7.492,1,0.0586,0.012,0.692,0.0955,0.494,106.005
9,Bitter Fater - Radio Edit,Bitter Fate (Radio Edit),0IBfTHJvcSVMaOEfPBi2Oj,Edu Imbernon,330243,19,0.618,0.912,1,-7.215,1,0.0406,0.00538,0.764,0.123,0.693,121.996
10,All I Know,All I Know,6tr6I3YJnHUqKHfJXnm4jk,EDX,148801,52,0.717,0.944,1,-5.041,1,0.0386,0.00518,0.477,0.113,0.354,124.993
11,Kanun - Original Mix,Kanun,3Pd1HEol9zpq1hXQTewJSP,Jacob Groening,366888,54,0.858,0.507,0,-12.317,1,0.0691,0.0474,0.894,0.118,0.0533,114.004


In [39]:
feature_filter(final_recomms,feature = "instrumentalness")

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
2,Opus - Four Tet Remix,Opus (Four Tet Remix),3Iw9Nr3rmMM7L4FjSV7DEB,Eric Prydz,598928,40,0.602,0.764,6,-8.928,0,0.0344,0.00202,0.904,0.12,0.141,125.978
6,Coffee & Dub - Original Mix,Techno For Breakfast EP,6AQCOXUWAaB7rSq0Ms2l4U,Sonitus Eco,424000,3,0.597,0.543,1,-15.579,1,0.0405,0.17,0.908,0.262,0.113,125.991
7,The End Of It All 2015 Mix,The End Of It All 2015 mix,0mYY8g0ikVTIi8cZndMGtb,John Tejada,397800,25,0.628,0.724,6,-8.045,0,0.0374,0.00326,0.949,0.206,0.512,126.003
16,Premium Emo - Original Mix,Anjunadeep 06,5dTuEVETmQ15gP2M8E5I45,16BL,465533,32,0.626,0.8,3,-8.028,1,0.0286,0.437,0.966,0.0891,0.157,121.998
17,Palm Tree Memories - n'to Remix,Palm Tree Memories,7kbaX2gvfPayg8ZmQSmnst,Oliver Schories,419514,35,0.764,0.35,4,-13.974,0,0.0721,0.177,0.943,0.165,0.299,122.004
22,Just,Brightest Lights,6BHx1NCBASduR31WNTzm5S,Lane 8,290731,51,0.656,0.565,9,-11.195,0,0.0323,0.481,0.951,0.313,0.21,122.934
27,Leafs,Hinterland,16jbjK5jP3DpGJqbbtdwrJ,Recondite,312580,32,0.788,0.345,4,-14.106,1,0.0572,0.115,0.928,0.116,0.189,109.998
34,Davos,Unsurfaced,47GeOpdqcvKuck5xL13JCo,Pablo Bolivar,400236,11,0.792,0.627,5,-13.091,0,0.0587,0.0783,0.92,0.106,0.317,123.012


In [40]:
feature_filter(final_recomms,feature = "instrumentalness", high = False)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
41,Joanna,Joanna,28YPPPSlRUUF14Nhsp9BFP,Mord Fustang,420000,23,0.554,0.913,9,-6.764,0,0.0624,0.000895,0.00576,0.0982,0.4,127.99
44,Breathe,Breathe,6TR0FGw4zhlGbQALN065AI,CamelPhat,194232,68,0.595,0.877,9,-7.414,0,0.0442,0.00646,0.0879,0.398,0.081,125.015


In [41]:
feature_filter(final_recomms,feature = "liveness")

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo


In [42]:
feature_filter(final_recomms,feature = "liveness", high = False)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Flicker - Mixed,fabric presents Bonobo (DJ Mix),6jhm7KeidwkYSop3tWEpCR,Bonobo,337653,41,0.68,0.719,6,-10.719,1,0.043,0.0379,0.788,0.0839,0.0883,121.006
5,House Arrest - Chris Lorenzo Remix,House Arrest (Chris Lorenzo Remix),5x58wIDdtxADXRZoRxKSSO,Sofi Tukker,218709,50,0.752,0.811,8,-7.429,1,0.0376,0.00536,0.701,0.0879,0.283,124.003
8,Future Days - Hey! Douglas Remix,Future Days (Remixes),2gKrcGKOIhJNhd74kEcWtx,islandman,309691,43,0.802,0.76,7,-7.492,1,0.0586,0.012,0.692,0.0955,0.494,106.005
13,Pnom Gobal,Thora Vukk,6sWsFzAQ8OHGxjgRlkIou2,Robag Wruhme,318346,30,0.825,0.45,7,-13.316,0,0.0778,0.00421,0.879,0.0969,0.233,121.985
16,Premium Emo - Original Mix,Anjunadeep 06,5dTuEVETmQ15gP2M8E5I45,16BL,465533,32,0.626,0.8,3,-8.028,1,0.0286,0.437,0.966,0.0891,0.157,121.998
21,Customer Is King,Customer Is King EP,1SWyGZhn3nyLUZRfWvQ0to,Solomun,476237,44,0.797,0.497,2,-9.977,1,0.0658,0.0099,0.784,0.0682,0.0625,123.043
28,The Blind Navigator - Extended Mix,The Blind Navigator / Like Clockwork,7cNTILvUTNXwueXOaoUBMK,Kasper Koman,475485,31,0.808,0.866,7,-10.713,1,0.0415,0.0588,0.81,0.0579,0.693,122.016
31,Ataraxia,Ataraxia,41WfYLOlpjo47X3bq6n8LI,Landikhan,395632,26,0.79,0.678,0,-11.821,1,0.0492,0.0619,0.819,0.0889,0.423,117.993
40,Grey Veils,Grey Veils,1lXbVK6pYYRxcDcBLaPME2,Chainless,190280,43,0.402,0.828,10,-2.216,0,0.229,0.00687,0.87,0.082,0.164,161.191
41,Joanna,Joanna,28YPPPSlRUUF14Nhsp9BFP,Mord Fustang,420000,23,0.554,0.913,9,-6.764,0,0.0624,0.000895,0.00576,0.0982,0.4,127.99


## Feature Filter Results
The feature filter seems to work relatively well, the playlist used for testing here is a very instrumental and electronic one, so some dataframes have no entries after applying a filter, this was somewhat expected

## Apply the filtering with mean song


In [43]:
final_recomms = filter_with_meansong(mean_song,final_recomms)

In [44]:
#testScaling_final_recomms = test_filter_with_meansong(mean_song,test_final_recomms)

In [45]:
#testScaling_final_recomms

In [46]:
#test_final_recomms[test_final_recomms["track_name"] == "Pandora"]

In [47]:
final_recomms

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
3,Battery Point,Beak>,0lf0PVp2zmn7lL2DpPjEtt,Beak>,430184,21,0.157,0.636,2,-9.627,0,0.0377,0.0404,0.888,0.112,0.178,140.318
40,Grey Veils,Grey Veils,1lXbVK6pYYRxcDcBLaPME2,Chainless,190280,43,0.402,0.828,10,-2.216,0,0.229,0.00687,0.87,0.082,0.164,161.191
18,Between The Lines,Oddyssey,0JWL4WXYrDdg2HgN5zz6bj,Amtrac,277717,37,0.486,0.926,2,-8.407,1,0.0435,0.0113,0.708,0.113,0.0379,139.999
6,Coffee & Dub - Original Mix,Techno For Breakfast EP,6AQCOXUWAaB7rSq0Ms2l4U,Sonitus Eco,424000,3,0.597,0.543,1,-15.579,1,0.0405,0.17,0.908,0.262,0.113,125.991
12,Siren,Siren,5umBsHgpB2WsRA9ccQZGdz,Tourist,382767,52,0.34,0.7,1,-7.998,1,0.0446,0.0741,0.892,0.103,0.0388,127.704
39,Castles In The Sky,Castles In The Sky,4xZax8srPgsXV2mpMerqEk,i_o,223280,65,0.68,0.95,8,-4.595,1,0.182,0.00203,0.538,0.552,0.14,134.982
41,Joanna,Joanna,28YPPPSlRUUF14Nhsp9BFP,Mord Fustang,420000,23,0.554,0.913,9,-6.764,0,0.0624,0.000895,0.00576,0.0982,0.4,127.99
29,Exotope,Exo,0xKD4wFzyUDwTGhLrCmW62,Rjega,440726,0,0.806,0.469,2,-14.726,0,0.0501,0.153,0.873,0.108,0.379,122.995
46,Haus - Rework,Anima Mundi,4ZeT54Dc34PNya05wj2BKB,Vril,404429,27,0.518,0.469,11,-16.61,0,0.0459,0.106,0.809,0.37,0.257,115.012
44,Breathe,Breathe,6TR0FGw4zhlGbQALN065AI,CamelPhat,194232,68,0.595,0.877,9,-7.414,0,0.0442,0.00646,0.0879,0.398,0.081,125.015


## Adding to the existing playlist
After the process is finished the final tracks are added to the playlist.

In [49]:
sp_m.user_playlist_add_tracks(usernames[0],
                              playlist_id="spotify:playlist:4XP9wRGPImiYYiGtsB6Dd3",
                              tracks = final_recomms["track_id"].tolist())

{'snapshot_id': 'MjIsZWY1ZWY0MmMyY2FhN2U5OWE4ODJlNTE4ZjNjMjI2YTBiYWUwM2E0ZA=='}