# DUO.py

This notebook is the bases for the script_for_duos_playlist.py. It covers all steps which lead to the creation of a new duos playlist for my girlfriend and me.

The notebook was used for experimentation and building the steps to arrive at a DUO.py playlist that was enjoyable to both users.

I wanted to create a playlist that was different from Spotify's 'Duo Mix', which didn't really seem to grasp the music tastes of my girlfriend and me and was somehow disappointing.

The playlist we had hoped for wasn't available so I decided to create one, that could be updated periodically and would incorporate some features:
- include some all time favorites as well as songs we recently both enjoyed
- include new songs based on our favorite songs
- don't include too many songs from the same artist or genre
- provide a listening experience that is based on mutually liked music not just a combination of our individual tastes

Right now the workflow accomplishes this although it is not yet perfect. I intend to add machine learning to further "filter" songs based on what we would like to have in our personal playlist. This howoever requires some data that is currently collected as we enjoy and discard songs from the weekly playlist.

## Imports

The most important imports are spotipy, pandas, numpy and my own small library 'spotifuncs'. Here I use the wildcard import for experimentation, the scripts only imports necessary functions. spotifuncs itself uses pandas, sklearn, spotipy.

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util
from pathlib import Path
import pandas as pd
import numpy as np
from spotifuncs import *

In [2]:
path = Path("C:/Users/ms101/OneDrive/DataScience_ML/projects/spotify_app")

## Setting the Credentials 

I stored the credentials and usernames in a .txt that was not uploaded to GitHub to ensure safety and avoid showing sensible information in my notebooks. 

Here I simply read the lines within the .txt files to retrieve the necessary information

In [3]:
with open(path / "client_s.txt") as f:
    content = f.readlines()
content = [x.strip() for x in content]

client_id = content[0]
client_secret = content[1]

In [4]:
with open(path / "usernames.txt") as f:
    usernames = f.readlines()
usernames = [x.strip() for x in usernames]

username1 = usernames[0]
username2 = usernames[1]

## App Scope

For the following code to work the scope needs to be quite extensive as I am retrieving a lot of user data and modifying a collaborative playlist (which the SpotifyAPI automatically views as private). As the only users are my girlfriend and me this was okay. 

**If others should choose to use my app a thorough explanation concerning the use of their data and the abilities of the app are paramount.**


To view all available scopes click here: https://developer.spotify.com/documentation/general/guides/scopes/

In [5]:
scope = "user-library-read user-read-recently-played user-top-read playlist-modify-public playlist-read-private playlist-read-collaborative playlist-modify-private"

redirect_uri = "https://developer.spotify.com/dashboard/applications/4a4e029d299a4241873db8300038bf0a"

client_credentials_manager = SpotifyClientCredentials(client_id=client_id, 
                                                      client_secret=client_secret)


## Authenticate

I reduced the authentication process to a function that can be found within the spotifuncs library. I describe it in more detail in my [Medium post](https://towardsdatascience.com/using-python-to-refine-your-spotify-recommendations-6dc08bcf408e).

In [6]:
sp_m = authenticate(redirect_uri, client_credentials_manager, username1, scope, client_id, client_secret)

In [8]:
sp_t = authenticate(redirect_uri, client_credentials_manager, username2, scope, client_id, client_secret)

## Retrieving the Data

I am retrieving quite a few dictonaries that end up in multiple dataframes. I packaged the process into a function because I have to repeat it. I decided not to integrate it into spotifuncs, as the function is very specific and subject to changes.

The first important piece of data are all the top tracks of the user, which I combine into a complete list of top tracks that cover everything from short to long term favorites. Notice that I am sampling only 15 songs from the long term top tracks and doing so **without** setting a random seed to **not** get the same results everytime the code is run. Always using the full list of long term favorites in the process would lead to too much repition over the course of multiple weeks and thus playlists.

The second important piece are the top artists, which like the tracks are retrieved for all time frames. The artists are important for the filtering process later on.

Lastly I am also retrieving the last 50 tracks a user saved. 50 is the upper limit here, which is unfortunate as this really limits the use of the data.

In [9]:
def get_dfs(sp):
    ##queries
    #user top tracks
    top_tracks_short = sp.current_user_top_tracks(limit = 50,offset=0,time_range='short_term')
    top_tracks_med = sp.current_user_top_tracks(limit = 50,offset=0,time_range='medium_term')
    top_tracks_long = sp.current_user_top_tracks(limit = 50,offset=0,time_range='long_term')
    
    #combine the top_tracks
    top_tracks_short_df = append_audio_features(create_df_top_songs(top_tracks_short),sp)
    top_tracks_med_df = append_audio_features(create_df_top_songs(top_tracks_med),sp)
    top_tracks_long_df = append_audio_features(create_df_top_songs(top_tracks_long),sp)
    #sample from long-term top tracks to introduce more randomness and avoid having the same artists
    top_tracks_long_df = top_tracks_long_df.sample(n = 15)
    top_tracks_df = pd.concat([top_tracks_short_df,top_tracks_med_df,top_tracks_long_df]).drop_duplicates().reset_index(drop = True)
        
    #user top artists
    top_artists_long = sp.current_user_top_artists(limit = 50, time_range = "long_term")
    top_artists_med = sp.current_user_top_artists(limit = 50, time_range = "medium_term")
    top_artists_short = sp.current_user_top_artists(limit = 50, time_range = "short_term")
    
    artists_short_df = top_artists_from_API(top_artists_short)
    artists_med_df = top_artists_from_API(top_artists_med)
    artists_long_df = top_artists_from_API(top_artists_long)
    artists_df = pd.concat([artists_short_df,artists_med_df,artists_long_df])
    artists_df["genres"] = artists_df["genres"].apply(lambda x: ",".join(x))
    artists_df.drop_duplicates().reset_index(drop = True)
    
    #user saved tracks
    user_saved_tracks = sp.current_user_saved_tracks(limit = 50)
    saved_tracks_df = create_df_saved_songs(user_saved_tracks)
    
        
    return top_tracks_df,artists_df,saved_tracks_df

In [10]:
top_tracks_m, artists_m, saved_tracks_m = get_dfs(sp_m)

In [11]:
top_tracks_t, artists_t, saved_tracks_t = get_dfs(sp_t)

In [12]:
artists_t

Unnamed: 0,name,id,genres,popularity,uri
0,Chelina,3XQZW9cuoDf7JhPbr99bXD,amharic pop,14,spotify:artist:3XQZW9cuoDf7JhPbr99bXD
1,Coldplay,4gzpq5DPGxSnKTe4SA8HAU,"permanent wave,pop",89,spotify:artist:4gzpq5DPGxSnKTe4SA8HAU
2,Andro,4J6A7DGmVEA4CXhTnCxxEd,"russian pop,russian trap",61,spotify:artist:4J6A7DGmVEA4CXhTnCxxEd
3,Ghetts,7zJL978NtANOysfGY21ty6,"grime,uk alternative hip hop,uk hip hop",59,spotify:artist:7zJL978NtANOysfGY21ty6
4,RIN,18ISxWwWjV6rPLoVCXf1dz,"german cloud rap,german hip hop",72,spotify:artist:18ISxWwWjV6rPLoVCXf1dz
...,...,...,...,...,...
45,Rakede,4soVkCNrRQccCv4Nohz273,hamburg hip hop,41,spotify:artist:4soVkCNrRQccCv4Nohz273
46,257ers,6ihLfpY3cmdGyWEnItn30w,"antideutsche,deep german hip hop,german hip ho...",62,spotify:artist:6ihLfpY3cmdGyWEnItn30w
47,MEUTE,1z5xbcOeFRQXBVDpvRPh8H,"german dance,hamburg electronic,livetronica",54,spotify:artist:1z5xbcOeFRQXBVDpvRPh8H
48,Max Herre,7IpWQKu80qQvyer3LO6SW3,"german alternative rap,german hip hop,german pop",53,spotify:artist:7IpWQKu80qQvyer3LO6SW3


In [13]:
artists_m

Unnamed: 0,name,id,genres,popularity,uri
0,badmómzjay,7oWrEQO1d3klp0Qrfh7a5h,"frauenrap,german drill,german hip hop",67,spotify:artist:7oWrEQO1d3klp0Qrfh7a5h
1,RIN,18ISxWwWjV6rPLoVCXf1dz,"german cloud rap,german hip hop",72,spotify:artist:18ISxWwWjV6rPLoVCXf1dz
2,SCH,2kXKa3aAFngGz2P4GjG5w2,"french hip hop,pop urbaine,rap francais,rap ma...",78,spotify:artist:2kXKa3aAFngGz2P4GjG5w2
3,SXTN,0tMFcqLXhtm1Gep20iuIR3,"deep german hip hop,frauenrap,german hip hop",61,spotify:artist:0tMFcqLXhtm1Gep20iuIR3
4,G-Eazy,02kJSzxNuaWGqwubyUba0Z,"hip hop,indie pop rap,oakland hip hop,pop rap,rap",83,spotify:artist:02kJSzxNuaWGqwubyUba0Z
...,...,...,...,...,...
45,Chris Rock,36eSjIksD6fehqxyDUHDA3,"black comedy,comedy",48,spotify:artist:36eSjIksD6fehqxyDUHDA3
46,Wardruna,0NJ6wlOAsAJ1PN4VRdTPKA,"nordic folk,rune folk,viking folk",63,spotify:artist:0NJ6wlOAsAJ1PN4VRdTPKA
47,Carnage,7CCjtD0hCK005Bvg2WG1a7,"edm,electro house,electronic trap,pop rap,trap...",64,spotify:artist:7CCjtD0hCK005Bvg2WG1a7
48,Motörhead,1DFr97A9HnbV3SKTJFu62M,"album rock,alternative metal,hard rock,metal,r...",69,spotify:artist:1DFr97A9HnbV3SKTJFu62M


### Finding common artists

I am finding common artists (of the 2 users) to later filter there top songs by artists. 
The logic behind this is the following:

A track might be only among one users "top tracks", it may however be by an artist both users enjoy. In that case the track is a good candidate for the duos playlist as both users will probably enjoy it, yet it might be a new discovery for one of them. If both already know and like it: still a good fit for the playlist!

In [14]:
common_artists = dataframe_difference(artists_m,artists_t, which = "both")
common_artists

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,name,id,genres,popularity,uri
0,RIN,18ISxWwWjV6rPLoVCXf1dz,"german cloud rap,german hip hop",72,spotify:artist:18ISxWwWjV6rPLoVCXf1dz
1,G-Eazy,02kJSzxNuaWGqwubyUba0Z,"hip hop,indie pop rap,oakland hip hop,pop rap,rap",83,spotify:artist:02kJSzxNuaWGqwubyUba0Z
2,The Weeknd,1Xyo4u8uXC1ZmMpatF05PJ,"canadian contemporary r&b,canadian pop,pop",95,spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ
3,ELIF,65AzRSW0jKSs0WtttEXrOw,"frauenrap,german pop,german singer-songwriter",70,spotify:artist:65AzRSW0jKSs0WtttEXrOw
4,Joyner Lucas,6C1ohJrd5VydigQtaGy5Wa,"boston hip hop,hip hop,pop rap,rap",78,spotify:artist:6C1ohJrd5VydigQtaGy5Wa
5,KitschKrieg,5tHiL8SKSaZGMBUPIiSmX4,"german hip hop,hamburg hip hop",70,spotify:artist:5tHiL8SKSaZGMBUPIiSmX4
6,Arctic Monkeys,7Ln80lUS6He07XvHI8qqHH,"garage rock,modern rock,permanent wave,rock,sh...",87,spotify:artist:7Ln80lUS6He07XvHI8qqHH
7,Frank Sinatra,1Mxqyy3pSjf8kZZL4QVxS0,"adult standards,easy listening,lounge",85,spotify:artist:1Mxqyy3pSjf8kZZL4QVxS0
8,Seeed,5ISjkNS17JpCwiFtW80lpV,"german hip hop,german pop,german reggae",66,spotify:artist:5ISjkNS17JpCwiFtW80lpV
9,Juju,4sg4no0TXdsrM1s4SVUwNF,"frauenrap,german hip hop,german pop",68,spotify:artist:4sg4no0TXdsrM1s4SVUwNF


### Last weeks playlist

In order to avoid encountering the same songs two weeks in a row, which is very likely as short and medium term top tracks won't have changed much, last weeks playlist is read from 'Playlist.csv'. This file was just an empty csv file the first time the code was run. But at the end of the playlist creation process the created playlist is saved in that csv file, so it contains last weeks playlist.

The code that creates the playlist (script_for_duos_playlist.py) does the same every time it is run.

In [15]:
last_week_duo = pd.read_csv(path/"Playlist.csv", index_col = 0)
last_week_duo

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Von Party zu Party,Leben am Limit,4h6UIDvGWTYZvu4BLf2GpO,SXTN,224186,67,0.694,0.895,1,-4.349,1,0.252,0.177,0.0,0.0749,0.621,120.002
1,We Love It,Future Vintage Soul,3QBTiVakPVVocXZF4H9MQ9,Outasight,189609,28,0.769,0.861,2,-2.627,0,0.0385,0.00464,0.0,0.255,0.738,128.968
2,Take Some Time - Emancipator Remix,Take Some Time (Emancipator Remix),28M2ugvRSIa4MIKmiiwNao,Wilderado,290428,40,0.545,0.596,5,-8.216,1,0.0289,0.0249,0.0252,0.219,0.182,95.002
3,Snub,Snub,6bc1eT78cvMHU0TuQQaYtU,FLOHIO,134307,23,0.753,0.683,10,-6.711,0,0.194,0.504,4e-06,0.113,0.437,129.987
4,KANN DAS BITTE SO BLEIBEN,NACHT,45HOck8XCgrSlVUQHHOHMz,ELIF,164511,51,0.705,0.656,6,-6.407,0,0.0468,0.18,0.00581,0.189,0.47,150.066
5,Testo E,Testo E,0bBhZuQEbfQR4JG9n6BDdP,SSIO,182733,39,0.741,0.778,1,-4.42,1,0.0636,0.00976,0.0,0.308,0.277,133.982
6,Bande organisée,Bande organisée,205HNJ73cgpC0LAOnuQiWT,Kofs,356346,80,0.901,0.939,6,-2.762,1,0.274,0.117,0.0,0.0643,0.805,142.948
7,Fragile,Fragile,3YYqctc3S1DH1i827bKpAh,Kora (CA),454008,30,0.87,0.55,11,-8.978,0,0.0641,0.0061,0.82,0.0945,0.138,120.011
8,Beifahrersitz,Beifahrersitz,01qOl2pM8emx1sxdBQc05g,LEA,199586,68,0.712,0.774,6,-3.967,0,0.133,0.346,0.0,0.176,0.471,159.977
9,Qa bone,Qa bone,6X0pg9AHaLYLq8Cy5j8Suz,Azet,190320,56,0.768,0.943,9,-2.875,1,0.127,0.219,1e-06,0.115,0.822,125.139


## Creating the Playlist

The creation of the playlist is the main goal of the task and of the project. It requires a couple of steps to 'assemble all building blocks' that make up the DUO.py playlist. THe building blocks are:

1. Common top tracks, that were not in last weeks playlists
2. A sample of each users top tracks that are most similar to the other users top tracks
3. A sample of each users top tracks from an artists both users like (one of their top artists)
4. A sample of the songs saved by users
5. A recommended track (through spotify and additional filtering) for every track that was added to the playlist in step 1.-4.

### Common top tracks

The playlist is initiate by common top tracks, that did not appear in last weeks playlist already. As these songs are both users' favorites they should enjoy them.

In [16]:
dataframe_difference(top_tracks_m,top_tracks_t,which = "both")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
1,Von Party zu Party,Leben am Limit,4h6UIDvGWTYZvu4BLf2GpO,SXTN,224186,67,0.694,0.895,1,-4.349,1,0.252,0.177,0.0,0.0749,0.621,120.002
2,Napauken - Jpattersson Remix,Zehna,2vPP6FNAzYoo2pplHp7Vop,Shkoon,380616,50,0.793,0.294,9,-13.817,0,0.0556,0.261,0.9,0.0867,0.37,102.029
3,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,71,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007


In [17]:
common_songs = dataframe_difference(top_tracks_m,top_tracks_t,which = "both")
new_playlist_df = common_songs[~common_songs["track_id"].isin(last_week_duo["track_id"])]
new_playlist_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
2,Napauken - Jpattersson Remix,Zehna,2vPP6FNAzYoo2pplHp7Vop,Shkoon,380616,50,0.793,0.294,9,-13.817,0,0.0556,0.261,0.9,0.0867,0.37,102.029
3,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,71,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007


### Users top tracks that are similar to each other.

The next building block consist of songs from both users top tracks that are most similar to on another on the level of audio features.

For this task the top tracks unique to each user are extracted and a similarity matrix is computed. The similarity is based on the audio features (excluding 'key' and 'mode') and computed via cosine similarity (see spotifuncs).
From this matrix the 30 highest similarity scores and corresponding indeces are extracted. The songs corresponding to these indeces are put into a dataframe, any duplicates are dropped and a sample of 10 songs is drawn for the playlist.

Here I am creating a similarity matrix, deleting songs that both dataframes contain first. (common_songs)

In [18]:
unique_top_tracks_m = top_tracks_m[~top_tracks_m["track_id"].isin(common_songs["track_id"])]
unique_top_tracks_m.reset_index(drop = True,inplace = True)
unique_top_tracks_m

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,T.H.A.L,T.H.A.L,73GjtcmkMfHirKnxotxW1Y,badmómzjay,153975,58,0.919,0.560,9,-7.298,1,0.3030,0.07710,0.000000,0.0948,0.1940,139.978
1,Move,T.H.A.L,0Ey20aookshI9ZtXc23wxz,badmómzjay,164360,30,0.837,0.574,3,-5.180,0,0.2360,0.12800,0.000000,0.1660,0.3450,95.987
2,Rollercoaster,T.H.A.L,2uWnoJg6wgaSfBoaLPuwfl,badmómzjay,165833,31,0.815,0.493,2,-9.709,0,0.0937,0.66200,0.000000,0.1100,0.5440,142.112
3,Signal,T.H.A.L,4BG4FBXN3pPhuze6i6lwv8,badmómzjay,131055,31,0.834,0.603,3,-6.899,0,0.1010,0.22200,0.000103,0.0972,0.4100,104.087
4,You and Me,Ghetto Cowboy,53QA7j4pHWAtbE3D0Glh7Q,Yelawolf,242001,63,0.598,0.448,1,-8.525,0,0.0342,0.10900,0.001050,0.1230,0.0727,140.970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
102,Until Dawn,Until Dawn,1R2rmmUufa1tdiD9j9Qs3x,Jaeger,287190,49,0.531,0.643,3,-9.347,0,0.0321,0.00221,0.826000,0.0547,0.1150,150.087
103,INTERNATIONAL GANGSTAS,INTERNATIONAL GANGSTAS,4IAVIBh9jbYN30DwpHFMD4,Farid Bang,286280,59,0.740,0.622,1,-6.297,1,0.2030,0.09650,0.000000,0.0990,0.3930,148.040
104,Points of Authority / 99 Problems / One Step C...,Collision Course (Deluxe Version),65eohvrL4ttjA7EfFkQOhX,JAY-Z,295826,60,0.547,0.951,1,-4.079,0,0.1510,0.00272,0.000011,0.0862,0.5100,95.031
105,X Gon' Give It To Ya,The Best Of DMX,2NeHnSFnwNp1Z5WYgcjJ8L,DMX,219253,0,0.678,0.857,10,-5.173,0,0.2160,0.03250,0.000000,0.0788,0.6220,94.950


In [19]:
unique_top_tracks_t = top_tracks_t[~top_tracks_t["track_id"].isin(common_songs["track_id"])]
unique_top_tracks_t.reset_index(drop = True,inplace = True)
unique_top_tracks_t

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Bati,"Chelina, Vol. 1",03xV05Oll19y59x8GDkWVL,Chelina,260713,12,0.575,0.3610,2,-8.074,0,0.0291,0.5840,0.010300,0.1170,0.430,148.154
1,Intro (Megbia),"Chelina, Vol. 1",0aLjvBLCCpNNoBwGqNY6Gn,Chelina,53034,6,0.554,0.0872,9,-17.823,0,0.1510,0.9090,0.000000,0.1160,0.328,68.149
2,Sai Bai,"Chelina, Vol. 1",3vAJlGxRNNYYMBJdnhxynu,Chelina,201875,23,0.695,0.6770,1,-4.925,0,0.0656,0.0364,0.006430,0.1810,0.394,176.174
3,"Last Night in Sant Celoni - 12"" Mix",Last Night in Sant Celoni (feat. Jaz James),6zAqmi9YkgnMxbqLcf7hDv,Payfone,427150,48,0.823,0.4360,9,-10.439,0,0.1050,0.6820,0.051900,0.3430,0.452,105.096
4,Anemogn,"Chelina, Vol. 1",1BXDo9qPfakhiJfiLZYJ1w,Chelina,193714,7,0.614,0.3060,2,-7.428,1,0.0350,0.8570,0.000002,0.1220,0.169,69.939
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100,The Shape - Acoustic,The Shape (Acoustic),0lReYwQUfeLD2O4ZWlj5lr,Nico de Andrea,166946,12,0.433,0.5090,8,-7.544,0,0.0320,0.5750,0.000000,0.1070,0.314,102.644
101,Painting Greys,Prologue,2Eh0J9SM2SzKnw5spcHJ1B,Emmit Fenn,227368,53,0.870,0.2220,4,-14.323,0,0.0766,0.4570,0.036400,0.0909,0.199,132.988
102,Comets (feat. Natalia Doco) - HUGEL Remix,Comets (feat. Natalia Doco) [The Remixes],1SadOe3BGZW9MlW2OgQJyd,Freddy Verano,266218,27,0.844,0.5620,11,-6.323,0,0.0932,0.2580,0.031600,0.6220,0.385,118.982
103,Worlds Apart,Worlds Apart,5L4YMktQZVQcVZnCHFNymD,Ash,263367,17,0.707,0.5570,5,-8.848,0,0.0356,0.1850,0.901000,0.0999,0.124,113.023


In [20]:
similarity_top_songs = create_similarity_score(unique_top_tracks_m,unique_top_tracks_t)

In [21]:
similarity_top_songs

array([[0.76758631, 0.50963858, 0.83814992, ..., 0.82878224, 0.7118232 ,
        0.85323572],
       [0.76986956, 0.5656348 , 0.80342259, ..., 0.86082282, 0.70061781,
        0.88419206],
       [0.9264914 , 0.8100283 , 0.76695033, ..., 0.80575341, 0.65282266,
        0.85061301],
       ...,
       [0.72853323, 0.36694645, 0.84462263, ..., 0.79169662, 0.69107247,
        0.92391296],
       [0.75786834, 0.4580677 , 0.84583769, ..., 0.81988046, 0.69204618,
        0.95789938],
       [0.75286728, 0.35877836, 0.85198439, ..., 0.78001165, 0.69013424,
        0.94992055]])

Creating a list of tuples containing the indeces for both songs as well as the similarity score.

In [22]:
max_n_scores = [(i,np.argmax(x),x[np.argmax(x)]) for i,x in enumerate(similarity_top_songs)]
max_n_scores

[(0, 5, 0.9419454823140679),
 (1, 75, 0.9278199095090397),
 (2, 56, 0.977899055967248),
 (3, 17, 0.9509538090620334),
 (4, 35, 0.9145349845034312),
 (5, 91, 0.9824336825533914),
 (6, 61, 0.8219181556472046),
 (7, 3, 0.950700535532872),
 (8, 45, 0.9755812907458108),
 (9, 60, 0.9361726243700328),
 (10, 17, 0.979407756162956),
 (11, 8, 0.9363536976479466),
 (12, 76, 0.9778383926517018),
 (13, 90, 0.977080665933461),
 (14, 45, 0.9420800032333909),
 (15, 17, 0.9967309242459156),
 (16, 66, 0.9610589046819443),
 (17, 6, 0.9844914310152725),
 (18, 17, 0.9757130646680877),
 (19, 75, 0.973887748190232),
 (20, 75, 0.9396202108425412),
 (21, 98, 0.9882028630651086),
 (22, 17, 0.9701554122027172),
 (23, 21, 0.9460241499463621),
 (24, 7, 0.963035438385104),
 (25, 20, 0.9770524509266321),
 (26, 58, 0.9801418832005209),
 (27, 75, 0.9869318137395139),
 (28, 103, 0.9958680223280652),
 (29, 75, 0.986822958720218),
 (30, 98, 0.9706459428460529),
 (31, 58, 0.9354849072064834),
 (32, 64, 0.9732298475604584)

In [23]:
from operator import itemgetter
from heapq import nlargest
nlargest(30,max_n_scores,key=itemgetter(2))

[(15, 17, 0.9967309242459156),
 (28, 103, 0.9958680223280652),
 (38, 75, 0.990010468350755),
 (57, 80, 0.9889265279367138),
 (101, 16, 0.9888603135251002),
 (21, 98, 0.9882028630651086),
 (105, 75, 0.9872399036503025),
 (50, 79, 0.9871812260261),
 (86, 104, 0.9871627741414891),
 (27, 75, 0.9869318137395139),
 (29, 75, 0.986822958720218),
 (72, 93, 0.9849107732495253),
 (17, 6, 0.9844914310152725),
 (74, 79, 0.9839898075777093),
 (5, 91, 0.9824336825533914),
 (62, 73, 0.9821532025817385),
 (49, 68, 0.9819266545196924),
 (71, 76, 0.9814452587822746),
 (93, 76, 0.9810067008687731),
 (47, 63, 0.9806737828245374),
 (26, 58, 0.9801418832005209),
 (10, 17, 0.979407756162956),
 (102, 103, 0.9793002550763088),
 (33, 80, 0.9786279448999823),
 (2, 56, 0.977899055967248),
 (55, 9, 0.9778466897640011),
 (12, 76, 0.9778383926517018),
 (60, 17, 0.9776281556554647),
 (46, 103, 0.9772706099902545),
 (13, 90, 0.977080665933461)]

Extracting the track pairs with the 30 highest similarity scores for each user.

In [24]:
idx_simtracks_m = [i[0] for i in  nlargest(30,max_n_scores,key=itemgetter(2))]
idx_simtracks_t = [i[1] for i in  nlargest(30,max_n_scores,key=itemgetter(2))]

In [25]:
idx_simtracks_m

[15,
 28,
 38,
 57,
 101,
 21,
 105,
 50,
 86,
 27,
 29,
 72,
 17,
 74,
 5,
 62,
 49,
 71,
 93,
 47,
 26,
 10,
 102,
 33,
 2,
 55,
 12,
 60,
 46,
 13]

In [26]:
idx_simtracks_t

[17,
 103,
 75,
 80,
 16,
 98,
 75,
 79,
 104,
 75,
 75,
 93,
 6,
 79,
 91,
 73,
 68,
 76,
 76,
 63,
 58,
 17,
 103,
 80,
 56,
 9,
 76,
 17,
 103,
 90]

In [27]:
sim_top_tracks_m = unique_top_tracks_m.loc[idx_simtracks_m]
sim_top_tracks_t = unique_top_tracks_t.loc[idx_simtracks_t]

Creating the dataframe with the most similar top tracks.

In [28]:
similar_top_tracks = pd.concat([sim_top_tracks_m,sim_top_tracks_t])
similar_top_tracks.drop_duplicates(inplace = True)
similar_top_tracks = similar_top_tracks[~similar_top_tracks["track_id"].isin(last_week_duo["track_id"])]
similar_top_tracks.reset_index(drop = True,inplace = True)
similar_top_tracks

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Vagabundo,EROS (2018),4MRitCc3SHpBsqbNbHysm8,RIN,125483,43,0.878,0.714,6,-8.036,0,0.129,0.375,9e-06,0.225,0.796,115.034
1,Another Place,Another Place,1cyRXghdofqkqavGpQzHZt,AK,185294,36,0.554,0.709,7,-16.472,0,0.0444,0.15,0.83,0.111,0.147,136.047
2,Hurricane,TRAUMA,2GFwwTIVLjnOrtP7m9luHC,I Prevail,223173,72,0.389,0.885,6,-5.063,0,0.0551,0.000372,0.000311,0.202,0.187,164.989
3,Burn (feat. Big Sean),Dreams and Nightmares (Deluxe Edition),1tv6IvWoOilhj0XbBoVVMo,Meek Mill,216666,55,0.67,0.895,2,-1.438,1,0.349,0.327,0.0,0.269,0.622,111.389
4,Who - Single Version,Who,0CjBORMsmiQNe3vPDcNIvk,Modeselektor,207988,44,0.76,0.91,7,-8.472,1,0.0484,0.0442,0.00662,0.108,0.392,142.982
5,X Gon' Give It To Ya,The Best Of DMX,2NeHnSFnwNp1Z5WYgcjJ8L,DMX,219253,0,0.678,0.857,10,-5.173,0,0.216,0.0325,0.0,0.0788,0.622,94.95
6,FEIND (feat. Azad),NACHT,4Upt2Q8OcUd85uo5x9QqWK,ELIF,169299,53,0.781,0.77,1,-5.367,1,0.0482,0.0723,8e-06,0.103,0.595,148.005
7,Ausziehen,Leben am Limit,1U2fPOdYmSz44FLUaboG9M,SXTN,178720,51,0.812,0.929,11,-4.34,1,0.0382,0.0639,1.1e-05,0.0917,0.66,123.053
8,Scary,Scary,745Dazbwplj1SDZ8SPKHV5,Stormzy,224233,54,0.629,0.662,7,-12.353,0,0.68,0.43,0.0,0.141,0.535,137.718
9,AUGEN ZU (feat. Samra),NACHT,6ikiAn9th3TUfS3bYS7gDX,ELIF,174248,71,0.685,0.693,5,-6.741,0,0.217,0.125,2e-06,0.275,0.493,167.957


In [29]:
similar_top_tracks.sample(10)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
33,Creep,Pablo Honey,6b2oQwSGFkzsMtQruIWm2p,Radiohead,238640,69,0.515,0.43,7,-9.935,1,0.0369,0.0102,0.000141,0.129,0.104,91.841
11,Lost In The Static,Dig Deep,5QLH7zAdcAJLgR25gtvtoK,After The Burial,273194,60,0.602,0.955,6,-4.888,1,0.154,0.000989,0.0133,0.207,0.193,126.009
22,ABER WO BIST DU,NACHT,2dGtLJysTSI9cbQ6TulL8V,ELIF,185274,56,0.78,0.592,9,-6.02,0,0.0631,0.23,0.0,0.184,0.705,148.057
10,For an Endless Night - Jel Ford Remix,For an Endless Night (Jel Ford Remix),39KN3wgqTrGFxvWRyMW5zL,Alan Fitzpatrick,449768,39,0.751,0.649,8,-6.885,0,0.0579,0.0151,0.761,0.0976,0.179,125.996
20,Berlin lebt wie nie zuvor,Berlin lebt 2,2pfio0uHpT4USTPBNeTIWo,Capital Bra,151294,59,0.654,0.782,1,-2.88,1,0.244,0.198,0.0,0.347,0.635,179.701
9,AUGEN ZU (feat. Samra),NACHT,6ikiAn9th3TUfS3bYS7gDX,ELIF,174248,71,0.685,0.693,5,-6.741,0,0.217,0.125,2e-06,0.275,0.493,167.957
3,Burn (feat. Big Sean),Dreams and Nightmares (Deluxe Edition),1tv6IvWoOilhj0XbBoVVMo,Meek Mill,216666,55,0.67,0.895,2,-1.438,1,0.349,0.327,0.0,0.269,0.622,111.389
32,Double in Love,Sticker on My Suitcase,0YfEgbU4mcFQJNHVP3sKsd,Alle Farben,159000,32,0.821,0.674,9,-6.728,0,0.0572,0.118,0.00145,0.0351,0.689,120.0
23,Southerly,Southerly,2GwpIphHopWzZgHI7m4rnZ,Tom Day,284709,33,0.503,0.778,0,-10.125,1,0.0443,0.25,0.855,0.129,0.3,138.001
34,IC3 (feat. Skepta),IC3 (feat. Skepta),0BMszcKwCrUDHZX3CEEj6L,Ghetts,231106,58,0.663,0.645,8,-7.977,1,0.398,0.443,0.0,0.166,0.394,128.023


In [30]:
new_playlist_df = new_playlist_df.append(similar_top_tracks.sample(10))
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
2,Napauken - Jpattersson Remix,Zehna,2vPP6FNAzYoo2pplHp7Vop,Shkoon,380616,50,0.793,0.294,9,-13.817,0,0.0556,0.261,0.9,0.0867,0.37,102.029
3,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,71,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007
22,ABER WO BIST DU,NACHT,2dGtLJysTSI9cbQ6TulL8V,ELIF,185274,56,0.78,0.592,9,-6.02,0,0.0631,0.23,0.0,0.184,0.705,148.057
17,Until Dawn,Until Dawn,1R2rmmUufa1tdiD9j9Qs3x,Jaeger,287190,49,0.531,0.643,3,-9.347,0,0.0321,0.00221,0.826,0.0547,0.115,150.087
19,Rollercoaster,T.H.A.L,2uWnoJg6wgaSfBoaLPuwfl,badmómzjay,165833,31,0.815,0.493,2,-9.709,0,0.0937,0.662,0.0,0.11,0.544,142.112
18,Struggle Made Me Stronger,Headphones on World Off,3vMtLNMANvkPuDthcLuQzJ,Fearless Motivation,213000,48,0.358,0.886,9,-4.48,0,0.25,0.000645,0.0,0.113,0.238,173.965
25,Beyikrta,"Chelina, Vol. 1",6Kd77EjLuU4b8NZyOEkyTu,Chelina,216410,7,0.768,0.467,10,-8.708,0,0.0491,0.321,0.0,0.157,0.616,90.045
5,X Gon' Give It To Ya,The Best Of DMX,2NeHnSFnwNp1Z5WYgcjJ8L,DMX,219253,0,0.678,0.857,10,-5.173,0,0.216,0.0325,0.0,0.0788,0.622,94.95
20,Berlin lebt wie nie zuvor,Berlin lebt 2,2pfio0uHpT4USTPBNeTIWo,Capital Bra,151294,59,0.654,0.782,1,-2.88,1,0.244,0.198,0.0,0.347,0.635,179.701


### Sampling from each users top tracks which are from an artists both users like (one of their top artists)

The next step filters the top tracks based on common artists. Every track by an artist that is a top artist of **both** users is considered for this approach.



In [31]:
filtered_top_m = top_tracks_m[top_tracks_m["artist"].isin(common_artists["name"]) 
                              & ~top_tracks_m["track_id"].isin(last_week_duo["track_id"])]
filtered_top_m

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
5,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
8,Aretha Franklin Freestyle,EROS (2018),5hKM3DaySvongkyGdvhacX,RIN,184586,55,0.79,0.592,3,-7.226,0,0.0811,0.498,0.325,0.31,0.7,159.96
16,Vagabundo,EROS (2018),4MRitCc3SHpBsqbNbHysm8,RIN,125483,43,0.878,0.714,6,-8.036,0,0.129,0.375,9e-06,0.225,0.796,115.034
22,Who - Single Version,Who,0CjBORMsmiQNe3vPDcNIvk,Modeselektor,207988,44,0.76,0.91,7,-8.472,1,0.0484,0.0442,0.00662,0.108,0.392,142.982
52,ALLES HELAL,NACHT,2KAbQ3PsETrr86R39pru7k,ELIF,175062,62,0.727,0.6,4,-6.186,0,0.0376,0.164,1.7e-05,0.0793,0.144,92.024
58,FEUER,NACHT,0Se4w42WIJiTgld4SYbv8S,ELIF,203386,47,0.769,0.604,8,-6.769,1,0.0455,0.039,0.000277,0.0738,0.227,97.996
62,SCHWARZ,NACHT,7vswtCdKBzC5XN9ojwh8u0,ELIF,135184,47,0.742,0.7,10,-6.937,0,0.177,0.0949,3.7e-05,0.104,0.465,75.985
63,ALASKA,NACHT,2YnYp5f38UP6fvf7q2FnPm,ELIF,217766,49,0.743,0.671,1,-5.594,0,0.0589,0.0581,0.0,0.195,0.484,76.998
64,ABER WO BIST DU,NACHT,2dGtLJysTSI9cbQ6TulL8V,ELIF,185274,56,0.78,0.592,9,-6.02,0,0.0631,0.23,0.0,0.184,0.705,148.057
65,ADHD,ADHD,4X4v3KtkUXwXvDBw5KS9cp,Joyner Lucas,205872,72,0.563,0.78,10,-6.663,1,0.0782,0.00525,8e-06,0.418,0.317,83.913


In [32]:
filtered_top_t = top_tracks_t[top_tracks_t["artist"].isin(common_artists["name"])
                             & ~top_tracks_t["track_id"].isin(last_week_duo["track_id"])]
filtered_top_t

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
13,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
40,M.I.A.,Nimmerland,06pbKFx2Iut0MRs1XjV9Mc,RIN,190640,50,0.663,0.691,4,-3.903,1,0.0586,0.0556,5e-06,0.104,0.19,161.866
50,Wealth,Who Else,5aOlYhQsp75cgPov4yjWIe,Modeselektor,247218,43,0.765,0.452,1,-12.346,1,0.0856,0.0204,5.7e-05,0.0727,0.331,137.972
83,505,Favourite Worst Nightmare,0BxE4FqsDD1Ot4YuBXwAPp,Arctic Monkeys,253586,82,0.526,0.866,0,-5.822,1,0.0568,0.00287,7.8e-05,0.0945,0.248,140.266
98,Lass Sie Gehn,BAM BAM,4RYKr1R3tXrITqY1zWiTNi,Seeed,192853,55,0.711,0.706,7,-4.543,1,0.289,0.0776,0.0,0.127,0.695,138.81


#### Potential issues

This filtering approach often leads to lists of songs that only contain very few artists but a couple of songs by that artist (it's also due to spotify really noticing when you can't stop listening to an album..)
To not have too many songs by the same artist I will sample from the above dataframes.I am assigning weights to the rows depending on how often an artist occurs.

This approach worked reasonably well however still has some flaws (which might be partially driven by my girlfriends and my individual listening behavior)

In [33]:
from collections import Counter

Counter(filtered_top_m["artist"]) ,  Counter(filtered_top_t["artist"])

(Counter({'RIN': 3,
          'Modeselektor': 1,
          'ELIF': 11,
          'Joyner Lucas': 4,
          'KitschKrieg': 1,
          'G-Eazy': 1}),
 Counter({'RIN': 2, 'Modeselektor': 1, 'Arctic Monkeys': 1, 'Seeed': 1}))

In [34]:
weights_m = [1/len(filtered_top_m)/7 if Counter(filtered_top_m["artist"])[x] > 2 else 1/len(filtered_top_m) for x in filtered_top_m["artist"]] 

In [35]:
weights_t = [1/len(filtered_top_t)/7 if Counter(filtered_top_t["artist"])[x] > 2 else 1/len(filtered_top_m) for x in filtered_top_t["artist"]] 

I tried the sampling with and without weights 10 times. With weights artists that occur very often in the filtered dataframe are not too overrepresented just as planned. Without weights the sample sometimes contained just one or two artists which is not desired.

In [36]:
sample_n = (25-len(new_playlist_df))//2
if sample_n > 3: sample_n = 3
sample_n

3

In [37]:
filtered_top_m.sample(sample_n,weights = weights_m)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
104,I Mean It (feat. Remo),These Things Happen,6jmTHeoWvBaSrwWttr8Xvu,G-Eazy,236480,75,0.712,0.562,10,-6.008,1,0.129,0.125,0.0,0.136,0.142,140.0
70,EIN LETZTES MAL,NACHT,04EJyZSlhPPfFOo1NRn2vl,ELIF,144644,52,0.713,0.549,1,-6.168,1,0.0645,0.0461,4.1e-05,0.108,0.393,89.005
71,Lambo Lambo,KitschKrieg,7oqvRZNv4dUV8CgQWtIAMe,KitschKrieg,214991,59,0.876,0.4,5,-9.748,0,0.129,0.494,9e-06,0.1,0.484,144.938


In [38]:
filtered_top_t.sample(sample_n, weights= weights_t)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
13,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
83,505,Favourite Worst Nightmare,0BxE4FqsDD1Ot4YuBXwAPp,Arctic Monkeys,253586,82,0.526,0.866,0,-5.822,1,0.0568,0.00287,7.8e-05,0.0945,0.248,140.266
40,M.I.A.,Nimmerland,06pbKFx2Iut0MRs1XjV9Mc,RIN,190640,50,0.663,0.691,4,-3.903,1,0.0586,0.0556,5e-06,0.104,0.19,161.866


In [39]:
new_playlist_df = new_playlist_df.append(filtered_top_m.sample(sample_n,weights = weights_m))
new_playlist_df = new_playlist_df.append(filtered_top_t.sample(sample_n,weights = weights_t))

In [40]:
new_playlist_df = new_playlist_df.drop_duplicates().reset_index(drop=True)
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
1,Napauken - Jpattersson Remix,Zehna,2vPP6FNAzYoo2pplHp7Vop,Shkoon,380616,50,0.793,0.294,9,-13.817,0,0.0556,0.261,0.9,0.0867,0.37,102.029
2,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,71,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007
3,ABER WO BIST DU,NACHT,2dGtLJysTSI9cbQ6TulL8V,ELIF,185274,56,0.78,0.592,9,-6.02,0,0.0631,0.23,0.0,0.184,0.705,148.057
4,Until Dawn,Until Dawn,1R2rmmUufa1tdiD9j9Qs3x,Jaeger,287190,49,0.531,0.643,3,-9.347,0,0.0321,0.00221,0.826,0.0547,0.115,150.087
5,Rollercoaster,T.H.A.L,2uWnoJg6wgaSfBoaLPuwfl,badmómzjay,165833,31,0.815,0.493,2,-9.709,0,0.0937,0.662,0.0,0.11,0.544,142.112
6,Struggle Made Me Stronger,Headphones on World Off,3vMtLNMANvkPuDthcLuQzJ,Fearless Motivation,213000,48,0.358,0.886,9,-4.48,0,0.25,0.000645,0.0,0.113,0.238,173.965
7,Beyikrta,"Chelina, Vol. 1",6Kd77EjLuU4b8NZyOEkyTu,Chelina,216410,7,0.768,0.467,10,-8.708,0,0.0491,0.321,0.0,0.157,0.616,90.045
8,X Gon' Give It To Ya,The Best Of DMX,2NeHnSFnwNp1Z5WYgcjJ8L,DMX,219253,0,0.678,0.857,10,-5.173,0,0.216,0.0325,0.0,0.0788,0.622,94.95
9,Berlin lebt wie nie zuvor,Berlin lebt 2,2pfio0uHpT4USTPBNeTIWo,Capital Bra,151294,59,0.654,0.782,1,-2.88,1,0.244,0.198,0.0,0.347,0.635,179.701


### Sampling from saved tracks

I am aiming for around 25 known tracks (and 25 new ones through recommendations). To achieve this and to somehow account for the somewhat random nature of the previous steps I am filling the playlist with sampled saved tracks.

In [41]:
#sample the remaining 25-len(new_playlist_df) from saved_tracks
#first get audio_features
saved_tracks_m = append_audio_features(saved_tracks_m, sp_m)
saved_tracks_t = append_audio_features(saved_tracks_t,sp_t)

In [42]:
#filter again so artists are not already in new_playlist_df
filtered_saved_m = saved_tracks_m[~saved_tracks_m["artist"].isin(new_playlist_df["artist"])]
filtered_saved_m

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Aufbruch,Aufbruch,4aH7Y1gpgLPlstw81dUpDE,Sublab,497777,45,0.808,0.709,10,-7.564,0,0.163,0.229,0.915,0.105,0.0943,108.011
1,Try Me,Try Me,72XJZqZHYjnUVGOXZJX8dA,foxwedding,308151,51,0.493,0.708,0,-6.874,1,0.0985,0.309,0.872,0.0577,0.333,151.056
2,From America,Play Me Again,5BMg8D9Wl4yvPqzTq7rWRC,Kid Francescoli,181500,43,0.891,0.428,2,-8.766,0,0.0598,0.0316,0.0373,0.0974,0.322,143.987
3,I Fell In Love,Autumn Bells,0zrHx4EhLePXUa8KhGpS3E,Gidge,385726,45,0.539,0.61,8,-10.999,0,0.134,0.121,0.809,0.0857,0.0395,123.218
4,The Beginning,The Beginning,4njhWDhTAjhReWtYkiMH9t,NR:TN,201904,47,0.783,0.714,7,-7.408,1,0.246,0.243,0.86,0.446,0.0341,125.955
5,Between Breaths,Stateless,4DJGTc1OsgqFsUGI6W8Mtx,Riyoon,479885,44,0.738,0.665,7,-10.441,0,0.0438,0.00748,0.877,0.582,0.0347,99.992
6,Berlin Nights,Berlin Nights,6gG1R1bFdJeNc2ERAwXxCb,Vnce Dolanbay,292115,41,0.901,0.457,10,-13.238,0,0.163,0.229,0.424,0.0977,0.531,127.999
7,Fall Slowly (feat. Ashanti) - Extended Version,Evolution,7wK4pOTZKVaAJ00rziu901,Joyner Lucas,292056,57,0.542,0.453,6,-11.208,0,0.24,0.204,7e-06,0.137,0.0375,88.31
8,Evolution,Evolution,2VopDw2GlF3uwD1kihHmTT,Joyner Lucas,153250,61,0.687,0.819,9,-6.67,0,0.431,0.218,0.0,0.392,0.568,81.185
9,Zim Zimma,Evolution,17nPeSliosCi427f0lUb75,Joyner Lucas,239702,70,0.883,0.621,11,-6.063,0,0.212,0.0871,0.0,0.499,0.676,149.052


In [43]:
filtered_saved_t = saved_tracks_t[~saved_tracks_t["artist"].isin(new_playlist_df["artist"])]
filtered_saved_t

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Вою на луну,MOON FLAME,7Aoos39pDhB7H1egvGvImZ,Andro,114000,39,0.44,0.546,2,-5.904,0,0.119,0.767,0.000177,0.145,0.062,96.22
2,Light,Light,7oyfJcOR2i3whfEYMogQt1,Ash,328888,44,0.614,0.723,4,-9.692,0,0.0298,0.524,0.891,0.201,0.0381,107.996
3,Oboe,Sputnik II,3IRON3rRIf2WJwejIbaehd,Camel Power Club,325586,54,0.746,0.666,0,-8.342,1,0.0351,0.298,0.698,0.0983,0.393,114.992
4,Good Morning Vietnam - Original Mix,Good Morning Vietnam,1wOONZvB0H8xJhzo0vU0k7,Adrien Kepler,506666,40,0.472,0.444,7,-10.459,0,0.04,0.159,0.851,0.0995,0.103,126.014
5,Jazz Got Me,SUGAR LIKE SALT,4o5MsHEU7bPIzOo49mv92l,Louis VI,242480,46,0.495,0.461,10,-12.065,0,0.244,0.441,0.00546,0.146,0.527,64.334
6,Trauma,Trauma,4VOuvOhiGrsPxlDXF3l2Y6,Fhin,206853,46,0.569,0.418,4,-10.187,0,0.14,0.642,0.00351,0.11,0.142,134.754
7,By the Sea,Things Are Changing,3b4WEsIhYW8DAEImZzd98C,Gone Gone Beyond,242285,51,0.574,0.394,0,-11.494,1,0.159,0.655,0.0741,0.111,0.361,209.667
8,From the Beginning - Extended Version,From the Beginning,30tifs9tO3pIW3gc5HlsTI,Zazou,519642,43,0.857,0.35,7,-13.157,1,0.0844,0.129,0.735,0.107,0.296,98.012
9,Hypnotised - EP Mix,Kaleidoscope EP,7HBnZdg7fIQwqMhQhci0VV,Coldplay,391413,60,0.501,0.639,5,-6.591,1,0.0357,0.399,0.85,0.099,0.0789,120.04
10,Trouble In Town,Everyday Life,45PqOIkZ9PdCjsCJQYzx9G,Coldplay,278906,62,0.595,0.315,2,-11.456,0,0.0296,0.427,0.648,0.111,0.336,96.018


In [44]:
sample_n = (25-len(new_playlist_df))//2
sample_n

3

In [45]:
new_playlist_df = pd.concat([new_playlist_df,filtered_saved_m.sample(sample_n),filtered_saved_t.sample(sample_n)])
new_playlist_df.reset_index(drop = True, inplace= True)

In [46]:
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
1,Napauken - Jpattersson Remix,Zehna,2vPP6FNAzYoo2pplHp7Vop,Shkoon,380616,50,0.793,0.294,9,-13.817,0,0.0556,0.261,0.9,0.0867,0.37,102.029
2,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,71,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007
3,ABER WO BIST DU,NACHT,2dGtLJysTSI9cbQ6TulL8V,ELIF,185274,56,0.78,0.592,9,-6.02,0,0.0631,0.23,0.0,0.184,0.705,148.057
4,Until Dawn,Until Dawn,1R2rmmUufa1tdiD9j9Qs3x,Jaeger,287190,49,0.531,0.643,3,-9.347,0,0.0321,0.00221,0.826,0.0547,0.115,150.087
5,Rollercoaster,T.H.A.L,2uWnoJg6wgaSfBoaLPuwfl,badmómzjay,165833,31,0.815,0.493,2,-9.709,0,0.0937,0.662,0.0,0.11,0.544,142.112
6,Struggle Made Me Stronger,Headphones on World Off,3vMtLNMANvkPuDthcLuQzJ,Fearless Motivation,213000,48,0.358,0.886,9,-4.48,0,0.25,0.000645,0.0,0.113,0.238,173.965
7,Beyikrta,"Chelina, Vol. 1",6Kd77EjLuU4b8NZyOEkyTu,Chelina,216410,7,0.768,0.467,10,-8.708,0,0.0491,0.321,0.0,0.157,0.616,90.045
8,X Gon' Give It To Ya,The Best Of DMX,2NeHnSFnwNp1Z5WYgcjJ8L,DMX,219253,0,0.678,0.857,10,-5.173,0,0.216,0.0325,0.0,0.0788,0.622,94.95
9,Berlin lebt wie nie zuvor,Berlin lebt 2,2pfio0uHpT4USTPBNeTIWo,Capital Bra,151294,59,0.654,0.782,1,-2.88,1,0.244,0.198,0.0,0.347,0.635,179.701


### Adding new tracks from Spotify recommendations

In this last step I add new tracks to fill up the other half of the playlist.

I **don't** want to simply add songs spotify recommends based on the songs, which are already in the playlist.

Therefore getting Spotify recommendations is only the first step. I am retrieving multiple songs recommendations per song, which are then filtered again based on similarity scoring.

In [47]:
seed_tracks = new_playlist_df["track_id"].tolist()
#seed_artists = artists_m["name"].tolist() + artists_t["name"].tolist()

In [48]:
len(seed_tracks)

25

Unfortunately **the Spotify API does not accept 25 seed tracks for a recommendation query**, I therefore am splitting up the process into "packages" of 5 seed tracks, retrieving 25 tracks per "package"

In [49]:
seed_tracks[:5], seed_tracks[5:10], seed_tracks[10:15]

(['38UlieEW0eto55PNH9Z0cK',
  '2vPP6FNAzYoo2pplHp7Vop',
  '2WRTnY0slmFgWcrmEr8dPj',
  '2dGtLJysTSI9cbQ6TulL8V',
  '1R2rmmUufa1tdiD9j9Qs3x'],
 ['2uWnoJg6wgaSfBoaLPuwfl',
  '3vMtLNMANvkPuDthcLuQzJ',
  '6Kd77EjLuU4b8NZyOEkyTu',
  '2NeHnSFnwNp1Z5WYgcjJ8L',
  '2pfio0uHpT4USTPBNeTIWo'],
 ['6b2oQwSGFkzsMtQruIWm2p',
  '0BMszcKwCrUDHZX3CEEj6L',
  '4Upt2Q8OcUd85uo5x9QqWK',
  '7oqvRZNv4dUV8CgQWtIAMe',
  '0CjBORMsmiQNe3vPDcNIvk'])

In [50]:
recomms = sp_m.recommendations(seed_tracks = seed_tracks[:5],limit = 25)

In [51]:
append_audio_features(create_df_recommendations(recomms),sp_m)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Mile Away,Mile Away,4Fl3rGS5f8s9QZ0otLX11N,Kelvyn Colt,193548,61,0.548,0.553,7,-8.534,0,0.165,0.17,0.0,0.0939,0.251,122.472
1,Shot Clock,Flyest Alive,0I4vdvCJcIoo0h3rEU39jA,Elias,150935,51,0.78,0.572,8,-3.821,1,0.457,0.0197,0.0,0.101,0.476,178.119
2,MASTERS VOM MARS,OUTTATHISWORLD - RADIO SHOW VOL. 1,0yBCdH5ZDUB55hMct48w4T,Genetikk,160100,41,0.468,0.873,9,-4.273,1,0.485,0.194,0.0,0.356,0.186,83.289
3,Mehr davon,Mehr davon,7rwAzZpANJEN0vn0pzYONW,LOTTE,173808,65,0.772,0.582,7,-6.768,0,0.0353,0.207,0.0,0.154,0.773,137.015
4,Chaos,Armageddon,4xUgmBAib7GP3U3Eg169Dg,Snavs,225000,32,0.625,0.933,0,-5.288,1,0.0421,0.0287,0.713,0.0581,0.163,90.018
5,Tarlabasi - Be Svendsen Remix,Indoor Voyager EP,4cIDajUp2dMjKiDmDvFDg8,Oceanvs Orientalis,545813,58,0.854,0.599,4,-12.259,0,0.0604,0.206,0.861,0.0899,0.216,114.001
6,Perkys,Geld Motivierte Muzik,2qBl6nPXhJMTN7A5R9KvqM,Money Boy,134586,58,0.912,0.592,9,-8.658,1,0.368,0.0279,1e-05,0.0969,0.213,136.049
7,Thinking Bout U,Thinking Bout U,1s3rLYo5NxTv7yQ0VZXxTl,Fabian Mazur,220800,39,0.516,0.4,5,-12.304,0,0.0517,0.0299,0.000778,0.118,0.242,150.057
8,Untitled,Halcyon EP,6Ni0ZItxBirgU2NCTmVaCK,AVEM,447693,36,0.778,0.448,6,-11.985,0,0.0692,0.0767,0.892,0.117,0.111,104.002
9,Viel leichter,Viel leichter,1Q7xCxhz6iwQNpnAUfYULZ,LUNA,208529,59,0.471,0.489,7,-8.225,0,0.0562,0.515,0.0,0.241,0.578,169.706


In [52]:
recomm_dfs = []
for i in range(5,26,5):
    recomms = sp_m.recommendations(seed_tracks = seed_tracks[i-5:i],limit = 25)
    recomms_df = append_audio_features(create_df_recommendations(recomms),sp_m)
    recomm_dfs.append(recomms_df)
recomms_df = pd.concat(recomm_dfs)

In [53]:
recomms_df.reset_index(drop = True, inplace= True)

In [54]:
recomms_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Bleib mit mir wach,Bleib mit mir wach,1qTyTDGi8O9rgDk3mheeuX,Xavi,150072,52,0.587,0.486,0,-8.902,0,0.0807,0.60400,0.000810,0.0976,0.848,132.000
1,Never Gonna Catch Me,Never Gonna Catch Me,3folt4d0CndACKo02YNrin,El Speaker,200969,47,0.661,0.739,9,-5.221,0,0.0314,0.05290,0.064300,0.0559,0.550,98.003
2,TurnUp,All Trap Music,20Xwgtz25c8XBV46z1GYs4,Gent & Jawns,185753,39,0.603,0.972,11,-5.013,0,0.0415,0.00205,0.687000,0.1240,0.388,145.997
3,Aquafina,Flyest Alive,05fEXJyvQ7rji2J5Iri1Ix,Elias,140869,50,0.862,0.851,1,-1.473,1,0.0560,0.04550,0.000000,0.1450,0.541,91.945
4,CASINO,BOSS BITCH,5rywC2PH49hTpJLXxwGlpE,Katja Krasavice,201548,64,0.794,0.489,8,-8.341,0,0.0840,0.27900,0.000004,0.0966,0.542,155.945
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,Time (feat. Rhye),Time (feat. Rhye),0XQypgwTXf0LS7ZGx13XTA,SG Lewis,259173,72,0.750,0.819,10,-6.600,0,0.1510,0.06740,0.000979,0.3080,0.614,116.054
121,Easier (feat. LOWES) - Radio Edit,Easier (feat. LOWES) [Radio Edit],133YafRPaKLYCR28FSbN4M,CamelPhat,214687,66,0.471,0.895,11,-6.622,0,0.0458,0.04350,0.007470,0.1050,0.114,122.954
122,Мне всё Монро,58,01LoBeQWux0KMNfiaqUFWp,Egor Kreed,139742,54,0.820,0.617,2,-4.120,1,0.1510,0.24900,0.000000,0.0824,0.428,125.002
123,Pinned to the Cross (feat. Finn Matthews),Pinned to the Cross (feat. Finn Matthews),15hcBnrMPvMt8dBS54MDMS,Rick Ross,261245,52,0.537,0.826,6,-3.089,0,0.1940,0.10100,0.000000,0.2660,0.437,76.997


The 125 recommendations are further filtered by their similarity to the known tracks in the playlist.

In [55]:
similarity_score = create_similarity_score(new_playlist_df,recomms_df)

In [56]:
new_playlist_df.shape

(25, 17)

In [57]:
recomms_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Bleib mit mir wach,Bleib mit mir wach,1qTyTDGi8O9rgDk3mheeuX,Xavi,150072,52,0.587,0.486,0,-8.902,0,0.0807,0.60400,0.000810,0.0976,0.848,132.000
1,Never Gonna Catch Me,Never Gonna Catch Me,3folt4d0CndACKo02YNrin,El Speaker,200969,47,0.661,0.739,9,-5.221,0,0.0314,0.05290,0.064300,0.0559,0.550,98.003
2,TurnUp,All Trap Music,20Xwgtz25c8XBV46z1GYs4,Gent & Jawns,185753,39,0.603,0.972,11,-5.013,0,0.0415,0.00205,0.687000,0.1240,0.388,145.997
3,Aquafina,Flyest Alive,05fEXJyvQ7rji2J5Iri1Ix,Elias,140869,50,0.862,0.851,1,-1.473,1,0.0560,0.04550,0.000000,0.1450,0.541,91.945
4,CASINO,BOSS BITCH,5rywC2PH49hTpJLXxwGlpE,Katja Krasavice,201548,64,0.794,0.489,8,-8.341,0,0.0840,0.27900,0.000004,0.0966,0.542,155.945
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,Time (feat. Rhye),Time (feat. Rhye),0XQypgwTXf0LS7ZGx13XTA,SG Lewis,259173,72,0.750,0.819,10,-6.600,0,0.1510,0.06740,0.000979,0.3080,0.614,116.054
121,Easier (feat. LOWES) - Radio Edit,Easier (feat. LOWES) [Radio Edit],133YafRPaKLYCR28FSbN4M,CamelPhat,214687,66,0.471,0.895,11,-6.622,0,0.0458,0.04350,0.007470,0.1050,0.114,122.954
122,Мне всё Монро,58,01LoBeQWux0KMNfiaqUFWp,Egor Kreed,139742,54,0.820,0.617,2,-4.120,1,0.1510,0.24900,0.000000,0.0824,0.428,125.002
123,Pinned to the Cross (feat. Finn Matthews),Pinned to the Cross (feat. Finn Matthews),15hcBnrMPvMt8dBS54MDMS,Rick Ross,261245,52,0.537,0.826,6,-3.089,0,0.1940,0.10100,0.000000,0.2660,0.437,76.997


In [58]:
[np.argmax(i) for i in similarity_score]

[45,
 7,
 105,
 45,
 111,
 0,
 102,
 47,
 42,
 71,
 40,
 26,
 45,
 13,
 104,
 71,
 121,
 47,
 73,
 30,
 108,
 98,
 20,
 116,
 78]

In [59]:
final_recomms=recomms_df.loc[[np.argmax(i) for i in similarity_score]]
final_recomms = final_recomms.drop_duplicates()

In [60]:
new_playlist_df = new_playlist_df.append(final_recomms)

In [61]:
new_playlist_df = new_playlist_df.drop_duplicates()
new_playlist_df.reset_index(drop = True, inplace = True)

# The playlist is finished!

Now the only thing left to do is to add the tracks to the playlist.

(Adding a nice picture and thanking your girlfriend for her patience in the playlist description are **not optional**)

In [62]:
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,RSVP,Nimmerland,38UlieEW0eto55PNH9Z0cK,RIN,169293,57,0.787,0.671,9,-4.447,1,0.0409,0.0642,0.0112,0.107,0.548,159.058
1,Napauken - Jpattersson Remix,Zehna,2vPP6FNAzYoo2pplHp7Vop,Shkoon,380616,50,0.793,0.294,9,-13.817,0,0.0556,0.261,0.9,0.0867,0.37,102.029
2,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,71,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007
3,ABER WO BIST DU,NACHT,2dGtLJysTSI9cbQ6TulL8V,ELIF,185274,56,0.78,0.592,9,-6.02,0,0.0631,0.23,0.0,0.184,0.705,148.057
4,Until Dawn,Until Dawn,1R2rmmUufa1tdiD9j9Qs3x,Jaeger,287190,49,0.531,0.643,3,-9.347,0,0.0321,0.00221,0.826,0.0547,0.115,150.087
5,Rollercoaster,T.H.A.L,2uWnoJg6wgaSfBoaLPuwfl,badmómzjay,165833,31,0.815,0.493,2,-9.709,0,0.0937,0.662,0.0,0.11,0.544,142.112
6,Struggle Made Me Stronger,Headphones on World Off,3vMtLNMANvkPuDthcLuQzJ,Fearless Motivation,213000,48,0.358,0.886,9,-4.48,0,0.25,0.000645,0.0,0.113,0.238,173.965
7,Beyikrta,"Chelina, Vol. 1",6Kd77EjLuU4b8NZyOEkyTu,Chelina,216410,7,0.768,0.467,10,-8.708,0,0.0491,0.321,0.0,0.157,0.616,90.045
8,X Gon' Give It To Ya,The Best Of DMX,2NeHnSFnwNp1Z5WYgcjJ8L,DMX,219253,0,0.678,0.857,10,-5.173,0,0.216,0.0325,0.0,0.0788,0.622,94.95
9,Berlin lebt wie nie zuvor,Berlin lebt 2,2pfio0uHpT4USTPBNeTIWo,Capital Bra,151294,59,0.654,0.782,1,-2.88,1,0.244,0.198,0.0,0.347,0.635,179.701


In [63]:
new_playlist_df["track_id"].tolist()

['38UlieEW0eto55PNH9Z0cK',
 '2vPP6FNAzYoo2pplHp7Vop',
 '2WRTnY0slmFgWcrmEr8dPj',
 '2dGtLJysTSI9cbQ6TulL8V',
 '1R2rmmUufa1tdiD9j9Qs3x',
 '2uWnoJg6wgaSfBoaLPuwfl',
 '3vMtLNMANvkPuDthcLuQzJ',
 '6Kd77EjLuU4b8NZyOEkyTu',
 '2NeHnSFnwNp1Z5WYgcjJ8L',
 '2pfio0uHpT4USTPBNeTIWo',
 '6b2oQwSGFkzsMtQruIWm2p',
 '0BMszcKwCrUDHZX3CEEj6L',
 '4Upt2Q8OcUd85uo5x9QqWK',
 '7oqvRZNv4dUV8CgQWtIAMe',
 '0CjBORMsmiQNe3vPDcNIvk',
 '6ikiAn9th3TUfS3bYS7gDX',
 '0BxE4FqsDD1Ot4YuBXwAPp',
 '5aOlYhQsp75cgPov4yjWIe',
 '06pbKFx2Iut0MRs1XjV9Mc',
 '3pJnvBbwIvuwebHHTYzoR3',
 '5R4hprpCcdgKz1DsPoh9p2',
 '01WOwxkxOw2FqNIHkraxcN',
 '0keCs8wD67bWkE4yrqZmir',
 '1wOONZvB0H8xJhzo0vU0k7',
 '30yDbkiLvSiYx3pox7odMR',
 '5CSSjkblq4KEome7dSztB9',
 '3IR7oFtwy29YHudz2kK9Co',
 '0te0mYcVMqRLc4vuDjZ0Yg',
 '02dPa4nXABwnFzjZosKxsk',
 '1qTyTDGi8O9rgDk3mheeuX',
 '2enPRFda84VE2wtI8c86Uf',
 '62ZlScwF8VZuGEFaAo3TNZ',
 '0sqLBIxpehB3UWkLHktEyo',
 '3huNXh7TCbzi9DlqCzhrUS',
 '4RvS94Li3lohRtTSz5X3xZ',
 '5G5EhGZzuBIU7kvfzmjguL',
 '0VRiX7eu12rWKnhuef6sa7',
 

**Note:** Here I am using `user_playlist_add_tracks()` to **add** to an existing playlist. It is possible to create one from scratch, however it wasn't necessary here. 

In the script I am using `playlist_replace_items()` as I don't just want new songs to be added but old ones to be deleted.

In [64]:
sp_m.user_playlist_add_tracks(usernames[0],
                              playlist_id="spotify:playlist:1Vcqtv3nE7QOJ4KFvK7bT8",
                              tracks = new_playlist_df["track_id"].tolist())

{'snapshot_id': 'MjAsMDkyYjQ3ZWNiMTc5NTM3ZGQwMTA2MTFlZTcyYzczYTljMzA0NzZlZg=='}

In [65]:
new_playlist_df.to_csv(path/"Playlist.csv")