# DUO.py

This notebook is the bases for the script_for_duos_playlist.py. It covers all steps which lead to the creation of a new duos playlist for my girlfriend and me.

The notebook was used for experimentation and building the steps to arrive at a DUO.py playlist that was enjoyable to both users.

I wanted to create a playlist that was different from Spotify's 'Duo Mix', which didn't really seem to grasp the music tastes of my girlfriend and me and was somehow disappointing.

The playlist we had hoped for wasn't available so I decided to create one, that could be updated periodically and would incorporate some features:
- include some all time favorites as well as songs we recently both enjoyed
- include new songs based on our favorite songs
- don't include too many songs from the same artist or genre
- provide a listening experience that is based on mutually liked music not just a combination of our individual tastes

Right now the workflow accomplishes this although it is not yet perfect. I intend to add machine learning to further "filter" songs based on what we would like to have in our personal playlist. This howoever requires some data that is currently collected as we enjoy and discard songs from the weekly playlist.

## Imports

The most important imports are spotipy, pandas, numpy and my own small library 'spotifuncs'. Here I use the wildcard import for experimentation, the scripts only imports necessary functions. spotifuncs itself uses pandas, sklearn, spotipy.

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util
from pathlib import Path
import pandas as pd
import numpy as np
from spotifuncs import *

In [2]:
path = Path("C:/Users/ms101/OneDrive/DataScience_ML/projects/spotify_app")

## Setting the Credentials 

I stored the credentials and usernames in a .txt that was not uploaded to GitHub to ensure safety and avoid showing sensible information in my notebooks. 

Here I simply read the lines within the .txt files to retrieve the necessary information

In [3]:
with open(path / "client_s.txt") as f:
    content = f.readlines()
content = [x.strip() for x in content]

client_id = content[0]
client_secret = content[1]

In [4]:
with open(path / "usernames.txt") as f:
    usernames = f.readlines()
usernames = [x.strip() for x in usernames]

username1 = usernames[0]
username2 = usernames[1]

## App Scope

For the following code to work the scope needs to be quite extensive as I am retrieving a lot of user data and modifying a collaborative playlist (which the SpotifyAPI automatically views as private). As the only users are my girlfriend and me this was okay. 

**If others should choose to use my app a thorough explanation concerning the use of their data and the abilities of the app are paramount.**


To view all available scopes click here: https://developer.spotify.com/documentation/general/guides/scopes/

In [5]:
scope = "user-library-read user-read-recently-played user-top-read playlist-modify-public playlist-read-private playlist-read-collaborative"

redirect_uri = "https://developer.spotify.com/dashboard/applications/4a4e029d299a4241873db8300038bf0a"

client_credentials_manager = SpotifyClientCredentials(client_id=client_id, 
                                                      client_secret=client_secret)


## Authenticate

I reduced the authentication process to a function that can be found within the spotifuncs library. I describe it in more detail in my [Medium post](https://towardsdatascience.com/using-python-to-refine-your-spotify-recommendations-6dc08bcf408e).

In [6]:
sp_m = authenticate(redirect_uri, client_credentials_manager, username1, scope, client_id, client_secret)
sp_t = authenticate(redirect_uri, client_credentials_manager, username2, scope, client_id, client_secret)

## Retrieving the Data

I am retrieving quite a few dictonaries that end up in multiple dataframes. I packaged the process into a function because I have to repeat it. I decided not to integrate it into spotifuncs, as the function is very specific and subject to changes.

The first important piece of data are all the top tracks of the user, which I combine into a complete list of top tracks that cover everything from short to long term favorites. Notice that I am sampling only 15 songs from the long term top tracks and doing so **without** setting a random seed to **not** get the same results everytime the code is run. Always using the full list of long term favorites in the process would lead to too much repition over the course of multiple weeks and thus playlists.

The second important piece are the top artists, which like the tracks are retrieved for all time frames. The artists are important for the filtering process later on.

Lastly I am also retrieving the last 50 tracks a user saved. 50 is the upper limit here, which is unfortunate as this really limits the use of the data.

In [7]:
def get_dfs(sp):
    ##queries
    #user top tracks
    top_tracks_short = sp.current_user_top_tracks(limit = 50,offset=0,time_range='short_term')
    top_tracks_med = sp.current_user_top_tracks(limit = 50,offset=0,time_range='medium_term')
    top_tracks_long = sp.current_user_top_tracks(limit = 50,offset=0,time_range='long_term')
    
    #combine the top_tracks
    top_tracks_short_df = append_audio_features(create_df_top_songs(top_tracks_short),sp)
    top_tracks_med_df = append_audio_features(create_df_top_songs(top_tracks_med),sp)
    top_tracks_long_df = append_audio_features(create_df_top_songs(top_tracks_long),sp)
    #sample from long-term top tracks to introduce more randomness and avoid having the same artists
    top_tracks_long_df = top_tracks_long_df.sample(n = 15)
    top_tracks_df = pd.concat([top_tracks_short_df,top_tracks_med_df,top_tracks_long_df]).drop_duplicates().reset_index(drop = True)
        
    #user top artists
    top_artists_long = sp.current_user_top_artists(limit = 50, time_range = "long_term")
    top_artists_med = sp.current_user_top_artists(limit = 50, time_range = "medium_term")
    top_artists_short = sp.current_user_top_artists(limit = 50, time_range = "short_term")
    
    artists_short_df = top_artists_from_API(top_artists_short)
    artists_med_df = top_artists_from_API(top_artists_med)
    artists_long_df = top_artists_from_API(top_artists_long)
    artists_df = pd.concat([artists_short_df,artists_med_df,artists_long_df])
    artists_df["genres"] = artists_df["genres"].apply(lambda x: ",".join(x))
    artists_df.drop_duplicates().reset_index(drop = True)
    
    #user saved tracks
    user_saved_tracks = sp.current_user_saved_tracks(limit = 50)
    saved_tracks_df = create_df_saved_songs(user_saved_tracks)
    
        
    return top_tracks_df,artists_df,saved_tracks_df

In [8]:
top_tracks_m, artists_m, saved_tracks_m = get_dfs(sp_m)

In [9]:
top_tracks_t, artists_t, saved_tracks_t = get_dfs(sp_t)

In [10]:
artists_t

Unnamed: 0,name,id,genres,popularity,uri
0,Emancipator,6HCnsY0Rxi3cg53xreoAIm,"downtempo,electronica,livetronica,trip hop",62,spotify:artist:6HCnsY0Rxi3cg53xreoAIm
1,Chelina,3XQZW9cuoDf7JhPbr99bXD,amharic pop,13,spotify:artist:3XQZW9cuoDf7JhPbr99bXD
2,Element Of Crime,3FweAJRBCbUOGR6jULfaRi,"german indie,german pop,german rock,liedermacher",51,spotify:artist:3FweAJRBCbUOGR6jULfaRi
3,Mac Miller,4LLpKhyESsyAXpc4laK94U,"hip hop,pittsburgh rap,rap",85,spotify:artist:4LLpKhyESsyAXpc4laK94U
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"french indie pop,french indietronica,new frenc...",60,spotify:artist:2G7QgTep5IsJHGHm1hXygD
...,...,...,...,...,...
45,Rakede,4soVkCNrRQccCv4Nohz273,hamburg hip hop,40,spotify:artist:4soVkCNrRQccCv4Nohz273
46,257ers,6ihLfpY3cmdGyWEnItn30w,"antideutsche,deep german hip hop,german hip ho...",62,spotify:artist:6ihLfpY3cmdGyWEnItn30w
47,MEUTE,1z5xbcOeFRQXBVDpvRPh8H,"german dance,hamburg electronic,livetronica",53,spotify:artist:1z5xbcOeFRQXBVDpvRPh8H
48,Max Herre,7IpWQKu80qQvyer3LO6SW3,"german alternative rap,german hip hop,german pop",53,spotify:artist:7IpWQKu80qQvyer3LO6SW3


In [11]:
artists_m

Unnamed: 0,name,id,genres,popularity,uri
0,Rammstein,6wWVKhxIU2cEi0K81v7HvP,"alternative metal,german metal,industrial,indu...",79,spotify:artist:6wWVKhxIU2cEi0K81v7HvP
1,Frank Sinatra,1Mxqyy3pSjf8kZZL4QVxS0,"adult standards,easy listening,lounge",89,spotify:artist:1Mxqyy3pSjf8kZZL4QVxS0
2,SCH,2kXKa3aAFngGz2P4GjG5w2,"french hip hop,pop urbaine,rap francais,rap ma...",77,spotify:artist:2kXKa3aAFngGz2P4GjG5w2
3,Geegun,5W7N6u4EjCEMKj7bDyzPEC,"russian dance,russian dance pop,russian hip ho...",57,spotify:artist:5W7N6u4EjCEMKj7bDyzPEC
4,Joyner Lucas,6C1ohJrd5VydigQtaGy5Wa,"boston hip hop,hip hop,pop rap,rap",77,spotify:artist:6C1ohJrd5VydigQtaGy5Wa
...,...,...,...,...,...
45,Chris Rock,36eSjIksD6fehqxyDUHDA3,"black comedy,comedy",48,spotify:artist:36eSjIksD6fehqxyDUHDA3
46,Wardruna,0NJ6wlOAsAJ1PN4VRdTPKA,"medieval folk,nordic folk,rune folk,viking folk",62,spotify:artist:0NJ6wlOAsAJ1PN4VRdTPKA
47,Carnage,7CCjtD0hCK005Bvg2WG1a7,"edm,electro house,electronic trap,pop rap,rap,...",64,spotify:artist:7CCjtD0hCK005Bvg2WG1a7
48,Motörhead,1DFr97A9HnbV3SKTJFu62M,"album rock,hard rock,metal,rock,speed metal",69,spotify:artist:1DFr97A9HnbV3SKTJFu62M


### Finding common artists

I am finding common artists (of the 2 users) to later filter there top songs by artists. 
The logic behind this is the following:

A track might be only among one users "top tracks", it may however be by an artist both users enjoy. In that case the track is a good candidate for the duos playlist as both users will probably enjoy it, yet it might be a new discovery for one of them. If both already know and like it: still a good fit for the playlist!

In [12]:
common_artists = dataframe_difference(artists_m,artists_t, which = "both")
common_artists

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,name,id,genres,popularity,uri
0,Frank Sinatra,1Mxqyy3pSjf8kZZL4QVxS0,"adult standards,easy listening,lounge",89,spotify:artist:1Mxqyy3pSjf8kZZL4QVxS0
1,Joyner Lucas,6C1ohJrd5VydigQtaGy5Wa,"boston hip hop,hip hop,pop rap,rap",77,spotify:artist:6C1ohJrd5VydigQtaGy5Wa
2,G-Eazy,02kJSzxNuaWGqwubyUba0Z,"hip hop,indie pop rap,oakland hip hop,pop rap,rap",83,spotify:artist:02kJSzxNuaWGqwubyUba0Z
3,Modeselektor,2jYMYP2SVifgmzNRQJx3SJ,"alternative dance,electronica,microhouse,minim...",50,spotify:artist:2jYMYP2SVifgmzNRQJx3SJ
4,Kid Francescoli,2G7QgTep5IsJHGHm1hXygD,"french indie pop,french indietronica,new frenc...",60,spotify:artist:2G7QgTep5IsJHGHm1hXygD
5,Seeed,5ISjkNS17JpCwiFtW80lpV,"german hip hop,german pop,german reggae",65,spotify:artist:5ISjkNS17JpCwiFtW80lpV
6,The Weeknd,1Xyo4u8uXC1ZmMpatF05PJ,"canadian contemporary r&b,canadian pop,pop",94,spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ
7,Christian Löffler,3tSvlEzeDnVbQJBTkIA6nO,"electronica,hamburg electronic,microhouse,mini...",62,spotify:artist:3tSvlEzeDnVbQJBTkIA6nO
8,Monolink,2I4hRNCYkPKJQlkoEZKjYx,organic house,62,spotify:artist:2I4hRNCYkPKJQlkoEZKjYx
9,Egor Kreed,2KoLmBXwsgMkfAvoPBlPmb,"russian hip hop,russian pop",66,spotify:artist:2KoLmBXwsgMkfAvoPBlPmb


### Last weeks playlist

In order to avoid encountering the same songs two weeks in a row, which is very likely as short and medium term top tracks won't have changed much, last weeks playlist is read from 'Playlist.csv'. This file was just an empty csv file the first time the code was run. But at the end of the playlist creation process the created playlist is saved in that csv file, so it contains last weeks playlist.

The code that creates the playlist (script_for_duos_playlist.py) does the same every time it is run.

In [13]:
last_week_duo = pd.read_csv(path/"Playlist.csv", index_col = 0)
last_week_duo

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Who - Single Version,Who,0CjBORMsmiQNe3vPDcNIvk,Modeselektor,207988,43,0.76,0.91,7,-8.472,1,0.0484,0.0442,0.00662,0.108,0.392,142.982
1,Coco L'Eau,Coco L'Eau,5weiiB92gNV7QHFYQXqxZ8,Egor Kreed,130050,65,0.837,0.663,3,-5.071,0,0.0497,0.0949,0.00954,0.175,0.466,100.006
2,A Child's Tale,Phantasma,2JNa5xzODo5tiHDIvLPpGt,Bukahara,179693,48,0.668,0.398,0,-10.539,0,0.046,0.865,0.0102,0.217,0.961,166.874
3,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,70,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007
4,Lambo Lambo,KitschKrieg,7oqvRZNv4dUV8CgQWtIAMe,KitschKrieg,214991,60,0.876,0.4,5,-9.748,0,0.129,0.494,9e-06,0.1,0.484,144.938
5,Come Waltz With Me,Reprise Rarities (Vol. 1),1SfTz9jOsbY9iS3YLQt0CK,Frank Sinatra,175689,42,0.252,0.318,10,-9.773,1,0.0298,0.843,1.1e-05,0.255,0.182,139.987
6,The Beautiful & Damned,The Beautiful & Damned,2WWruw7ul9N7eqoHELyMc2,G-Eazy,189306,61,0.656,0.804,8,-5.191,0,0.363,0.173,0.0,0.837,0.314,125.882
7,KANN DAS BITTE SO BLEIBEN,NACHT,45HOck8XCgrSlVUQHHOHMz,ELIF,164511,50,0.705,0.656,6,-6.407,0,0.0468,0.18,0.00581,0.189,0.47,150.066
8,I Mean It (feat. Remo),These Things Happen,6jmTHeoWvBaSrwWttr8Xvu,G-Eazy,236480,73,0.712,0.562,10,-6.008,1,0.129,0.125,0.0,0.136,0.142,140.0
9,Neo,Mare,1xm2dkD3qpHdw9h7YTDozm,Christian Löffler,437548,39,0.625,0.3,0,-20.749,1,0.0436,0.637,0.897,0.115,0.0688,121.993


## Creating the Playlist

The creation of the playlist is the main goal of the task and of the project. It requires a couple of steps to 'assemble all building blocks' that make up the DUO.py playlist. THe building blocks are:

1. Common top tracks, that were not in last weeks playlists
2. A sample of each users top tracks that are most similar to the other users top tracks
3. A sample of each users top tracks from an artists both users like (one of their top artists)
4. A sample of the songs saved by users
5. A recommended track (through spotify and additional filtering) for every track that was added to the playlist in step 1.-4.

### Common top tracks

The playlist is initiate by common top tracks, that did not appear in last weeks playlist already. As these songs are both users' favorites they should enjoy them.

In [14]:
dataframe_difference(top_tracks_m,top_tracks_t,which = "both")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Who - Single Version,Who,0CjBORMsmiQNe3vPDcNIvk,Modeselektor,207988,43,0.76,0.91,7,-8.472,1,0.0484,0.0442,0.00662,0.108,0.392,142.982
1,Coco L'Eau,Coco L'Eau,5weiiB92gNV7QHFYQXqxZ8,Egor Kreed,130050,64,0.837,0.663,3,-5.071,0,0.0497,0.0949,0.00954,0.175,0.466,100.006
2,A Child's Tale,Phantasma,2JNa5xzODo5tiHDIvLPpGt,Bukahara,179693,47,0.668,0.398,0,-10.539,0,0.046,0.865,0.0102,0.217,0.961,166.874
3,Huldra - Other Version,Huldra,569gNjph2g07MmjtMm6vKm,Gidge,490000,38,0.772,0.514,10,-11.961,0,0.0754,0.148,0.906,0.0934,0.103,118.015
4,Bläulich,Treppenhaus,2WRTnY0slmFgWcrmEr8dPj,Apache 207,196213,69,0.79,0.704,10,-7.935,0,0.417,0.069,0.000658,0.113,0.212,154.007
5,Wealth,Who Else,5aOlYhQsp75cgPov4yjWIe,Modeselektor,247218,41,0.765,0.452,1,-12.346,1,0.0856,0.0204,5.7e-05,0.0727,0.331,137.972


In [16]:
common_songs = dataframe_difference(top_tracks_m,top_tracks_t,which = "both")
new_playlist_df = common_songs[~common_songs["track_id"].isin(last_week_duo["track_id"])]
new_playlist_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
3,Huldra - Other Version,Huldra,569gNjph2g07MmjtMm6vKm,Gidge,490000,38,0.772,0.514,10,-11.961,0,0.0754,0.148,0.906,0.0934,0.103,118.015


### Users top tracks that are similar to each other.

The next building block consist of songs from both users top tracks that are most similar to on another on the level of audio features.

For this task the top tracks unique to each user are extracted and a similarity matrix is computed. The similarity is based on the audio features (excluding 'key' and 'mode') and computed via cosine similarity (see spotifuncs).
From this matrix the 30 highest similarity scores and corresponding indeces are extracted. The songs corresponding to these indeces are put into a dataframe, any duplicates are dropped and a sample of 10 songs is drawn for the playlist.

Here I am creating a similarity matrix, deleting songs that both dataframes contain first. (common_songs)

In [17]:
unique_top_tracks_m = top_tracks_m[~top_tracks_m["track_id"].isin(common_songs["track_id"])]
unique_top_tracks_m.reset_index(drop = True,inplace = True)
unique_top_tracks_m

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Lambo Lambo,KitschKrieg,7oqvRZNv4dUV8CgQWtIAMe,KitschKrieg,214991,59,0.876,0.400,5,-9.748,0,0.1290,0.494000,0.000009,0.1000,0.484,144.938
1,By My Side,Ancient Shadows,0eSWvkrCADGx0409mebsP4,Ecepta,193804,38,0.587,0.529,7,-13.713,0,0.0376,0.285000,0.729000,0.1470,0.175,138.047
2,DEUTSCHLAND,RAMMSTEIN,2bPGTMB5sFfFYQ2YvSmup0,Rammstein,322339,74,0.521,0.895,7,-5.242,1,0.0442,0.000055,0.349000,0.0985,0.237,120.117
3,Zunder,Bittersweet,10Sp3ZHJUkSoYCN9NZO7QL,Marek Hemmann,306976,48,0.811,0.639,9,-9.956,1,0.0515,0.011800,0.893000,0.0850,0.277,126.030
4,Bande organisée,Bande organisée,205HNJ73cgpC0LAOnuQiWT,Kofs,356346,79,0.901,0.939,6,-2.762,1,0.2740,0.117000,0.000000,0.0643,0.805,142.948
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
126,Hell Yeah,Light It Up,0GRu2odJjampqZpyubbFU6,Rev Theory,247613,0,0.361,0.983,3,-3.373,0,0.0931,0.000129,0.000004,0.0228,0.385,167.951
127,Like A G6,Like a G6,5S1yeuswEPofFOCkMFLNLc,Far East Movement,218346,0,0.629,0.869,7,-7.013,0,0.3140,0.006570,0.000000,0.1910,0.715,125.024
128,Halftime,Wolke 7,1N9ZAbqVw5o0m7wccSgRIt,Gzuz,181060,1,0.533,0.783,2,-6.490,1,0.3250,0.181000,0.000000,0.1010,0.471,129.844
129,Push It,Wisconsin Death Trip,43WFwjiWFHc8ZryT1Tz1aY,Static-X,154906,36,0.557,0.977,0,-3.965,1,0.0493,0.000393,0.000004,0.2910,0.504,149.879


In [18]:
unique_top_tracks_t = top_tracks_t[~top_tracks_t["track_id"].isin(common_songs["track_id"])]
unique_top_tracks_t.reset_index(drop = True,inplace = True)
unique_top_tracks_t

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Time for Space,Baralku,3dJXvBddoH1AGLpKvmbYDA,Emancipator,433613,56,0.610,0.3130,5,-13.138,0,0.0323,0.3900,0.840000,0.0899,0.0870,80.009
1,Take Some Time - Emancipator Remix,Take Some Time (Emancipator Remix),28M2ugvRSIa4MIKmiiwNao,Wilderado,290428,41,0.545,0.5960,5,-8.216,1,0.0289,0.0249,0.025200,0.2190,0.1820,95.002
2,Bati,"Chelina, Vol. 1",03xV05Oll19y59x8GDkWVL,Chelina,260713,10,0.575,0.3610,2,-8.074,0,0.0291,0.5840,0.010300,0.1170,0.4300,148.154
3,Intro (Megbia),"Chelina, Vol. 1",0aLjvBLCCpNNoBwGqNY6Gn,Chelina,53034,5,0.554,0.0872,9,-17.823,0,0.1510,0.9090,0.000000,0.1160,0.3280,68.149
4,Greenland,Safe In the Steep Cliffs,2SPTGg9SC5MT1FwNX4IYfx,Emancipator,191066,57,0.593,0.3980,4,-13.301,0,0.0295,0.0273,0.827000,0.2310,0.0989,98.728
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129,The Harder the Ground,Ø,5hb7SiqKHDlMPzs9HEholc,Gløde,243979,13,0.533,0.2400,5,-14.995,1,0.0370,0.9230,0.000239,0.0877,0.0743,126.007
130,Smek - Rey&Kjavik Remix,Smek (Rey&Kjavik Remix),1PyfWp1rg8omX2DOtg65pR,Ÿuma,366006,53,0.892,0.2810,2,-13.880,1,0.0435,0.8560,0.878000,0.1950,0.0433,99.984
131,Baby Girl,Baby Girl,6DNTJAvdyDwkmiPm2Wl8At,Bryce Vine,145371,55,0.860,0.5920,2,-8.606,1,0.0861,0.1660,0.000609,0.2760,0.6210,107.016
132,Will We Ever Carry On,Chances,2AMAh0IXu1Wwwdw3Jzojqt,Luke Marzec,254549,17,0.638,0.2930,0,-15.712,0,0.0337,0.5550,0.505000,0.1110,0.2260,81.802


In [19]:
similarity_top_songs = create_similarity_score(unique_top_tracks_m,unique_top_tracks_t)

In [20]:
similarity_top_songs

array([[ 0.00246244, -0.73826128,  0.35275185, ..., -0.15783126,
         0.32730655, -0.46802349],
       [ 0.69407693, -0.31286898, -0.05266639, ..., -0.61516896,
         0.7241846 , -0.29211147],
       [ 0.42584099,  0.5070319 , -0.12468504, ..., -0.36538411,
         0.0779511 ,  0.47104614],
       ...,
       [-0.38903268, -0.26586009,  0.11660654, ..., -0.20057975,
        -0.30877886, -0.05457458],
       [-0.55930898,  0.44139615,  0.28426055, ...,  0.35761975,
        -0.68306416,  0.493662  ],
       [-0.57500067,  0.18209233,  0.28468891, ..., -0.0342616 ,
        -0.81988793,  0.19045593]])

Creating a list of tuples containing the indeces for both songs as well as the similarity score.

In [21]:
max_n_scores = [(i,np.argmax(x),x[np.argmax(x)]) for i,x in enumerate(similarity_top_songs)]
max_n_scores

[(0, 42, 0.9124961743889184),
 (1, 124, 0.9094278322775948),
 (2, 119, 0.7791874952042619),
 (3, 66, 0.9054499946753933),
 (4, 113, 0.8561162515101026),
 (5, 55, 0.8512760488540422),
 (6, 44, 0.9314149567120572),
 (7, 105, 0.9013989196086852),
 (8, 38, 0.7391856681964181),
 (9, 3, 0.7053260881584944),
 (10, 118, 0.8136696515203443),
 (11, 10, 0.9333967774914899),
 (12, 131, 0.7826843319439812),
 (13, 32, 0.7728838619396077),
 (14, 69, 0.8812387238367473),
 (15, 81, 0.779141363403892),
 (16, 101, 0.822876897994827),
 (17, 91, 0.8971764220497673),
 (18, 66, 0.8916073063101455),
 (19, 124, 0.9018786092021321),
 (20, 107, 0.9023017741771021),
 (21, 100, 0.9640395864441592),
 (22, 1, 0.8889369017104372),
 (23, 38, 0.7101835241708665),
 (24, 30, 0.740927735803688),
 (25, 107, 0.8665741787095874),
 (26, 74, 0.7798559023302319),
 (27, 46, 0.7664194631547017),
 (28, 10, 0.8982786919690497),
 (29, 102, 0.8668800750402578),
 (30, 124, 0.957576231037294),
 (31, 81, 0.7505792289398938),
 (32, 61, 0

In [22]:
from operator import itemgetter
from heapq import nlargest
nlargest(30,max_n_scores,key=itemgetter(2))

[(21, 100, 0.9640395864441592),
 (30, 124, 0.957576231037294),
 (82, 23, 0.9536280513031131),
 (123, 18, 0.944921194677877),
 (87, 43, 0.9443746258098373),
 (38, 105, 0.9346037773901044),
 (11, 10, 0.9333967774914899),
 (96, 121, 0.9326737211369086),
 (6, 44, 0.9314149567120572),
 (104, 16, 0.9287322242348719),
 (111, 10, 0.9253705955960095),
 (67, 108, 0.9247256489537813),
 (93, 16, 0.921148619965111),
 (34, 61, 0.9181729266168114),
 (109, 107, 0.916349603737838),
 (0, 42, 0.9124961743889184),
 (1, 124, 0.9094278322775948),
 (126, 106, 0.9087628594217892),
 (3, 66, 0.9054499946753933),
 (20, 107, 0.9023017741771021),
 (19, 124, 0.9018786092021321),
 (7, 105, 0.9013989196086852),
 (42, 58, 0.9012143244956449),
 (28, 10, 0.8982786919690497),
 (41, 126, 0.8976544159880666),
 (17, 91, 0.8971764220497673),
 (84, 94, 0.8943283062283301),
 (56, 106, 0.8922739791868055),
 (18, 66, 0.8916073063101455),
 (22, 1, 0.8889369017104372)]

Extracting the track pairs with the 30 highest similarity scores for each user.

In [23]:
idx_simtracks_m = [i[0] for i in  nlargest(30,max_n_scores,key=itemgetter(2))]
idx_simtracks_t = [i[1] for i in  nlargest(30,max_n_scores,key=itemgetter(2))]

In [24]:
idx_simtracks_m

[21,
 30,
 82,
 123,
 87,
 38,
 11,
 96,
 6,
 104,
 111,
 67,
 93,
 34,
 109,
 0,
 1,
 126,
 3,
 20,
 19,
 7,
 42,
 28,
 41,
 17,
 84,
 56,
 18,
 22]

In [25]:
idx_simtracks_t

[100,
 124,
 23,
 18,
 43,
 105,
 10,
 121,
 44,
 16,
 10,
 108,
 16,
 61,
 107,
 42,
 124,
 106,
 66,
 107,
 124,
 105,
 58,
 10,
 126,
 91,
 94,
 106,
 66,
 1]

In [26]:
sim_top_tracks_m = unique_top_tracks_m.loc[idx_simtracks_m]
sim_top_tracks_t = unique_top_tracks_t.loc[idx_simtracks_t]

Creating the dataframe with the most similar top tracks.

In [27]:
similar_top_tracks = pd.concat([sim_top_tracks_m,sim_top_tracks_t])
similar_top_tracks.drop_duplicates(inplace = True)
similar_top_tracks = similar_top_tracks[~similar_top_tracks["track_id"].isin(last_week_duo["track_id"])]
similar_top_tracks.reset_index(drop = True,inplace = True)
similar_top_tracks

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Дни и ночи,Дни и ночи,3A88HuZw5yiZOAcon4RkZk,Geegun,186000,48,0.917,0.526,2,-8.15,0,0.125,0.298,5.8e-05,0.108,0.514,119.985
1,The Flying Octopus,Charlie,6bv9gQNLWUUiuzRunuY4lJ,Kalipo,316293,25,0.619,0.557,9,-13.966,0,0.0423,0.31,0.853,0.236,0.0555,127.004
2,Phoenix,Phoenix,6zAiRKvAMlXHxEtyO4yxIO,League of Legends,197633,75,0.42,0.723,10,-6.86,0,0.175,0.0638,1.2e-05,0.123,0.288,167.806
3,WAIDMANNS HEIL,LIEBE IST FÜR ALLE DA (SPECIAL EDITION),6ey8jr96hlSORKyeJk9d8d,Rammstein,212800,0,0.578,0.969,9,-3.894,1,0.0317,0.00113,0.0234,0.0579,0.684,104.999
4,Only,The Pinkprint (International Deluxe Explicit),1UZ25gykR30Oewh3dBRtVZ,Nicki Minaj,312026,66,0.573,0.495,8,-7.245,0,0.592,0.405,0.0,0.0969,0.255,179.196
5,Handsome,Heavy Is The Head,766N0mjf1KzAAYsA1eAJaN,Stormzy,152733,56,0.762,0.559,1,-6.854,1,0.153,0.173,0.0,0.127,0.282,139.955
6,Siren,Siren,5umBsHgpB2WsRA9ccQZGdz,Tourist,382767,51,0.34,0.7,1,-7.998,1,0.0446,0.0741,0.892,0.103,0.0388,127.704
7,Points of Authority / 99 Problems / One Step C...,Collision Course (Deluxe Version),65eohvrL4ttjA7EfFkQOhX,JAY-Z,295826,58,0.547,0.951,1,-4.079,0,0.151,0.00272,1.1e-05,0.0862,0.51,95.031
8,Zuhause,Nie,5vC4M4GjYLkgDaUwQcL7WA,Fynn Kliemann,197887,65,0.656,0.448,9,-8.676,0,0.0442,0.86,5e-06,0.103,0.403,95.626
9,Movin' Bass - GTA Remix,Movin' Bass (GTA Remix),2BnVVl5NCV2o7XlZo1CLEm,Rick Ross,253773,0,0.694,0.912,11,-3.342,1,0.24,0.000709,6.1e-05,0.303,0.583,160.032


In [28]:
similar_top_tracks.sample(10)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
30,Phía Sau Đôi Mắt,Phía Sau Đôi Mắt,3DJZM2sfbUleDaCLRXWSco,Anh bạn Thành,174206,36,0.545,0.398,7,-13.675,0,0.317,0.905,0.000242,0.112,0.156,145.513
8,Zuhause,Nie,5vC4M4GjYLkgDaUwQcL7WA,Fynn Kliemann,197887,65,0.656,0.448,9,-8.676,0,0.0442,0.86,5e-06,0.103,0.403,95.626
5,Handsome,Heavy Is The Head,766N0mjf1KzAAYsA1eAJaN,Stormzy,152733,56,0.762,0.559,1,-6.854,1,0.153,0.173,0.0,0.127,0.282,139.955
26,Wonderful Life,Wonderful Life,0cqLFzFYsQurUI3McUZkLg,Katie Melua,247152,61,0.904,0.341,6,-12.088,0,0.0417,0.792,0.021,0.0977,0.447,105.179
14,NEW MAGIC WAND,IGOR,0fv2KH6hac06J86hBUTcSf,"Tyler, The Creator",195320,70,0.621,0.73,5,-5.414,0,0.107,0.0967,0.000131,0.673,0.464,139.566
33,Cool Me Down,Cool Me Down,0oDpBQNBCa3VCR3bmn7953,Margaret,179211,43,0.647,0.716,2,-4.59,1,0.0483,0.00437,0.000983,0.0997,0.456,90.995
11,Movin' Bass - GTA Remix,Movin' Bass (GTA Remix),7yzwszgJUDrllWnG9cQkxQ,Rick Ross,253773,52,0.698,0.911,11,-2.57,0,0.217,0.000757,0.000103,0.281,0.564,160.031
36,Bella ciao,Bella ciao,51frPF1JFad9MBlgI3dg1J,Mike Singer,184000,42,0.764,0.645,11,-4.652,1,0.0501,0.191,8.2e-05,0.159,0.683,144.019
27,Echo of the Woods,Echo of the Woods,6d8mjJCtnlFtgGv3erEs7c,Giolì,198260,37,0.683,0.314,5,-15.354,0,0.0385,0.768,0.892,0.155,0.0409,115.024
7,Points of Authority / 99 Problems / One Step C...,Collision Course (Deluxe Version),65eohvrL4ttjA7EfFkQOhX,JAY-Z,295826,58,0.547,0.951,1,-4.079,0,0.151,0.00272,1.1e-05,0.0862,0.51,95.031


In [30]:
new_playlist_df = new_playlist_df.append(similar_top_tracks.sample(10))
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
3,Huldra - Other Version,Huldra,569gNjph2g07MmjtMm6vKm,Gidge,490000,38,0.772,0.514,10,-11.961,0,0.0754,0.148,0.906,0.0934,0.103,118.015
0,Дни и ночи,Дни и ночи,3A88HuZw5yiZOAcon4RkZk,Geegun,186000,48,0.917,0.526,2,-8.15,0,0.125,0.298,5.8e-05,0.108,0.514,119.985
2,Phoenix,Phoenix,6zAiRKvAMlXHxEtyO4yxIO,League of Legends,197633,75,0.42,0.723,10,-6.86,0,0.175,0.0638,1.2e-05,0.123,0.288,167.806
17,Zunder,Bittersweet,10Sp3ZHJUkSoYCN9NZO7QL,Marek Hemmann,306976,48,0.811,0.639,9,-9.956,1,0.0515,0.0118,0.893,0.085,0.277,126.03
15,By My Side,Ancient Shadows,0eSWvkrCADGx0409mebsP4,Ecepta,193804,38,0.587,0.529,7,-13.713,0,0.0376,0.285,0.729,0.147,0.175,138.047
28,Wenn der Winter kommt,Mittelpunkt der Welt,6aj2zW1HmhZNQoeGR1avif,Element Of Crime,274306,29,0.332,0.454,4,-8.373,1,0.0443,0.278,0.0,0.11,0.244,193.468
19,Rolls Royce,Rolls Royce,22XFe65IH0P2RY0uowBqdI,Geegun,143437,68,0.872,0.423,1,-7.337,1,0.213,0.142,1e-05,0.104,0.373,95.98
7,Points of Authority / 99 Problems / One Step C...,Collision Course (Deluxe Version),65eohvrL4ttjA7EfFkQOhX,JAY-Z,295826,58,0.547,0.951,1,-4.079,0,0.151,0.00272,1.1e-05,0.0862,0.51,95.031
3,WAIDMANNS HEIL,LIEBE IST FÜR ALLE DA (SPECIAL EDITION),6ey8jr96hlSORKyeJk9d8d,Rammstein,212800,0,0.578,0.969,9,-3.894,1,0.0317,0.00113,0.0234,0.0579,0.684,104.999
5,Handsome,Heavy Is The Head,766N0mjf1KzAAYsA1eAJaN,Stormzy,152733,56,0.762,0.559,1,-6.854,1,0.153,0.173,0.0,0.127,0.282,139.955


### Sampling from each users top tracks which are from an artists both users like (one of their top artists)

The next step filters the top tracks based on common artists. Every track by an artist that is a top artist of **both** users is considered for this approach.



In [31]:
filtered_top_m = top_tracks_m[top_tracks_m["artist"].isin(common_artists["name"]) 
                              & ~top_tracks_m["track_id"].isin(last_week_duo["track_id"])]
filtered_top_m

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
9,Oh Junge,KitschKrieg,0JqbpesudPE6j901fBEzo2,KitschKrieg,193946,57,0.878,0.515,1,-7.715,1,0.188,0.0292,0.0,0.0835,0.269,156.0
15,Zim Zimma,Evolution,17nPeSliosCi427f0lUb75,Joyner Lucas,239702,69,0.883,0.621,11,-6.063,0,0.212,0.0871,0.0,0.499,0.676,149.052
37,ADHD,ADHD,4X4v3KtkUXwXvDBw5KS9cp,Joyner Lucas,205872,70,0.563,0.78,10,-6.663,1,0.0782,0.00525,8e-06,0.418,0.317,83.913
46,Fall Slowly,Fall Slowly,01WOwxkxOw2FqNIHkraxcN,Joyner Lucas,208095,63,0.576,0.534,6,-9.637,0,0.294,0.075,0.0,0.199,0.041,83.56
47,Evolution,Evolution,2VopDw2GlF3uwD1kihHmTT,Joyner Lucas,153250,60,0.687,0.819,9,-6.67,0,0.431,0.218,0.0,0.392,0.568,81.185
51,ALLES HELAL,NACHT,2KAbQ3PsETrr86R39pru7k,ELIF,175062,60,0.727,0.6,4,-6.186,0,0.0376,0.164,1.7e-05,0.0793,0.144,92.024
57,FEUER,NACHT,0Se4w42WIJiTgld4SYbv8S,ELIF,203386,45,0.769,0.604,8,-6.769,1,0.0455,0.039,0.000277,0.0738,0.227,97.996
61,ALL OF MY DREAMS,NACHT,41R2FrKYRgLHntLBpU4NXE,ELIF,154676,42,0.601,0.619,1,-5.815,0,0.242,0.0508,1e-06,0.0555,0.146,172.346
63,SCHWARZ,NACHT,7vswtCdKBzC5XN9ojwh8u0,ELIF,135184,45,0.742,0.7,10,-6.937,0,0.177,0.0949,3.7e-05,0.104,0.465,75.985
64,ALASKA,NACHT,2YnYp5f38UP6fvf7q2FnPm,ELIF,217766,48,0.743,0.671,1,-5.594,0,0.0589,0.0581,0.0,0.195,0.484,76.998


In [32]:
filtered_top_t = top_tracks_t[top_tracks_t["artist"].isin(common_artists["name"])
                             & ~top_tracks_t["track_id"].isin(last_week_duo["track_id"])]
filtered_top_t

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
105,Lass Sie Gehn,BAM BAM,4RYKr1R3tXrITqY1zWiTNi,Seeed,192853,54,0.711,0.706,7,-4.543,1,0.289,0.0776,0.0,0.127,0.695,138.81
107,Lass Das Licht An (feat. Deichkind),BAM BAM,0BJ5N5yfEPBAimh9LprQvc,Seeed,188561,50,0.904,0.664,2,-4.637,1,0.288,0.0753,0.0,0.151,0.72,143.746
126,Aufstehn! (feat. CeeLo Green),Next!,2KpvM181ZwtIWwIElZx4vI,Seeed,230960,59,0.849,0.83,6,-3.907,1,0.186,0.112,0.0,0.384,0.873,127.279
128,Sie Is Geladen (feat. Nura),BAM BAM,4Qt7LInzG4ruqcIrowldhg,Seeed,218702,46,0.685,0.885,9,-4.248,0,0.211,0.0168,2.3e-05,0.179,0.629,144.05


#### Potential issues

This filtering approach often leads to lists of songs that only contain very few artists but a couple of songs by that artist (it's also due to spotify really noticing when you can't stop listening to an album..)
To not have too many songs by the same artist I will sample from the above dataframes.I am assigning weights to the rows depending on how often an artist occurs.

This approach worked reasonably well however still has some flaws (which might be partially driven by my girlfriends and my individual listening behavior)

In [33]:
from collections import Counter

Counter(filtered_top_m["artist"]) ,  Counter(filtered_top_t["artist"])

(Counter({'KitschKrieg': 1, 'Joyner Lucas': 4, 'ELIF': 12}),
 Counter({'Seeed': 4}))

In [34]:
weights_m = [1/len(filtered_top_m)/7 if Counter(filtered_top_m["artist"])[x] > 2 else 1/len(filtered_top_m) for x in filtered_top_m["artist"]] 

In [35]:
weights_t = [1/len(filtered_top_t)/7 if Counter(filtered_top_t["artist"])[x] > 2 else 1/len(filtered_top_m) for x in filtered_top_t["artist"]] 

I tried the sampling with and without weights 10 times. With weights artists that occur very often in the filtered dataframe are not too overrepresented just as planned. Without weights the sample sometimes contained just one or two artists which is not desired.

In [36]:
sample_n = (25-len(new_playlist_df))//2
if sample_n > 3: sample_n = 3
sample_n

3

In [37]:
filtered_top_m.sample(sample_n,weights = weights_m)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
57,FEUER,NACHT,0Se4w42WIJiTgld4SYbv8S,ELIF,203386,45,0.769,0.604,8,-6.769,1,0.0455,0.039,0.000277,0.0738,0.227,97.996
9,Oh Junge,KitschKrieg,0JqbpesudPE6j901fBEzo2,KitschKrieg,193946,57,0.878,0.515,1,-7.715,1,0.188,0.0292,0.0,0.0835,0.269,156.0
47,Evolution,Evolution,2VopDw2GlF3uwD1kihHmTT,Joyner Lucas,153250,60,0.687,0.819,9,-6.67,0,0.431,0.218,0.0,0.392,0.568,81.185


In [38]:
filtered_top_t.sample(sample_n, weights= weights_t)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
105,Lass Sie Gehn,BAM BAM,4RYKr1R3tXrITqY1zWiTNi,Seeed,192853,54,0.711,0.706,7,-4.543,1,0.289,0.0776,0.0,0.127,0.695,138.81
128,Sie Is Geladen (feat. Nura),BAM BAM,4Qt7LInzG4ruqcIrowldhg,Seeed,218702,46,0.685,0.885,9,-4.248,0,0.211,0.0168,2.3e-05,0.179,0.629,144.05
126,Aufstehn! (feat. CeeLo Green),Next!,2KpvM181ZwtIWwIElZx4vI,Seeed,230960,59,0.849,0.83,6,-3.907,1,0.186,0.112,0.0,0.384,0.873,127.279


In [39]:
new_playlist_df = new_playlist_df.append(filtered_top_m.sample(sample_n,weights = weights_m))
new_playlist_df = new_playlist_df.append(filtered_top_t.sample(sample_n,weights = weights_t))

In [40]:
new_playlist_df = new_playlist_df.drop_duplicates().reset_index(drop=True)
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Huldra - Other Version,Huldra,569gNjph2g07MmjtMm6vKm,Gidge,490000,38,0.772,0.514,10,-11.961,0,0.0754,0.148,0.906,0.0934,0.103,118.015
1,Дни и ночи,Дни и ночи,3A88HuZw5yiZOAcon4RkZk,Geegun,186000,48,0.917,0.526,2,-8.15,0,0.125,0.298,5.8e-05,0.108,0.514,119.985
2,Phoenix,Phoenix,6zAiRKvAMlXHxEtyO4yxIO,League of Legends,197633,75,0.42,0.723,10,-6.86,0,0.175,0.0638,1.2e-05,0.123,0.288,167.806
3,Zunder,Bittersweet,10Sp3ZHJUkSoYCN9NZO7QL,Marek Hemmann,306976,48,0.811,0.639,9,-9.956,1,0.0515,0.0118,0.893,0.085,0.277,126.03
4,By My Side,Ancient Shadows,0eSWvkrCADGx0409mebsP4,Ecepta,193804,38,0.587,0.529,7,-13.713,0,0.0376,0.285,0.729,0.147,0.175,138.047
5,Wenn der Winter kommt,Mittelpunkt der Welt,6aj2zW1HmhZNQoeGR1avif,Element Of Crime,274306,29,0.332,0.454,4,-8.373,1,0.0443,0.278,0.0,0.11,0.244,193.468
6,Rolls Royce,Rolls Royce,22XFe65IH0P2RY0uowBqdI,Geegun,143437,68,0.872,0.423,1,-7.337,1,0.213,0.142,1e-05,0.104,0.373,95.98
7,Points of Authority / 99 Problems / One Step C...,Collision Course (Deluxe Version),65eohvrL4ttjA7EfFkQOhX,JAY-Z,295826,58,0.547,0.951,1,-4.079,0,0.151,0.00272,1.1e-05,0.0862,0.51,95.031
8,WAIDMANNS HEIL,LIEBE IST FÜR ALLE DA (SPECIAL EDITION),6ey8jr96hlSORKyeJk9d8d,Rammstein,212800,0,0.578,0.969,9,-3.894,1,0.0317,0.00113,0.0234,0.0579,0.684,104.999
9,Handsome,Heavy Is The Head,766N0mjf1KzAAYsA1eAJaN,Stormzy,152733,56,0.762,0.559,1,-6.854,1,0.153,0.173,0.0,0.127,0.282,139.955


### Sampling from saved tracks

I am aiming for around 25 known tracks (and 25 new ones through recommendations). To achieve this and to somehow account for the somewhat random nature of the previous steps I am filling the playlist with sampled saved tracks.

In [41]:
#sample the remaining 25-len(new_playlist_df) from saved_tracks
#first get audio_features
saved_tracks_m = append_audio_features(saved_tracks_m, sp_m)
saved_tracks_t = append_audio_features(saved_tracks_t,sp_t)

In [42]:
#filter again so artists are not already in new_playlist_df
filtered_saved_m = saved_tracks_m[~saved_tracks_m["artist"].isin(new_playlist_df["artist"])]
filtered_saved_m

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,From America,Play Me Again,5BMg8D9Wl4yvPqzTq7rWRC,Kid Francescoli,181500,41,0.891,0.428,2,-8.766,0,0.0598,0.0316,0.0373,0.0974,0.322,143.987
2,The Beginning,The Beginning,4njhWDhTAjhReWtYkiMH9t,NR:TN,201904,45,0.783,0.714,7,-7.408,1,0.246,0.243,0.86,0.446,0.0341,125.955
3,Between Breaths,Stateless,4DJGTc1OsgqFsUGI6W8Mtx,Riyoon,479885,44,0.738,0.665,7,-10.441,0,0.0438,0.00748,0.877,0.582,0.0347,99.992
4,Berlin Nights,Berlin Nights,6gG1R1bFdJeNc2ERAwXxCb,Vnce Dolanbay,292115,39,0.901,0.457,10,-13.238,0,0.163,0.229,0.424,0.0977,0.531,127.999
8,Money In The Grave (Drake ft. Rick Ross),The Best In The World Pack,5ry2OE6R2zPQFDO85XkgRb,Drake,205426,82,0.831,0.502,10,-4.045,0,0.046,0.101,0.0,0.122,0.101,100.541
9,Beifahrersitz,Beifahrersitz,01qOl2pM8emx1sxdBQc05g,LEA,199586,69,0.712,0.774,6,-3.967,0,0.133,0.346,0.0,0.176,0.471,159.977
10,Twingo,POP,6vFEvkXOnOTUMgPHxogIRK,Fynn Kliemann,186984,56,0.619,0.638,2,-5.754,0,0.0906,0.297,1e-06,0.44,0.36,141.967
13,When I'm Small,Eyelid Movies,3498wF96LsgVgMkGmJzJOC,Phantogram,249066,0,0.646,0.758,10,-4.34,1,0.0314,0.191,0.097,0.103,0.424,91.998
14,Risky Business - Mathame Remix,Risky Business (Mathame Remix),5R4hprpCcdgKz1DsPoh9p2,ZHU,410078,44,0.635,0.75,1,-7.693,0,0.0498,0.0185,0.84,0.0954,0.0352,127.005
15,Aura,Aura (Exclusive Version),6sDb7wNlVXQGqnhEHiNt8B,Kool Savas,168093,53,0.615,0.84,2,-6.058,1,0.376,0.254,0.0,0.055,0.256,94.921


In [43]:
filtered_saved_t = saved_tracks_t[~saved_tracks_t["artist"].isin(new_playlist_df["artist"])]
filtered_saved_t

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Крыльями,Источник,1wAkp6u1biHGKX2q5IwBhf,Erika Lundmoen,158358,55,0.426,0.566,1,-6.036,0,0.244,0.576,0.0,0.109,0.288,168.73
1,"Last Night in Sant Celoni - 12"" Mix",Last Night in Sant Celoni (feat. Jaz James),6zAqmi9YkgnMxbqLcf7hDv,Payfone,427150,46,0.823,0.436,9,-10.439,0,0.105,0.682,0.0519,0.343,0.452,105.096
2,Medellín,#EnAttendantMachakil,4Nmrb5DL6lN5kkb7LzX8yE,TripleGo,231619,49,0.781,0.3,3,-16.75,0,0.0888,0.56,0.692,0.102,0.18,103.02
3,Bati,"Chelina, Vol. 1",03xV05Oll19y59x8GDkWVL,Chelina,260713,10,0.575,0.361,2,-8.074,0,0.0291,0.584,0.0103,0.117,0.43,148.154
4,Sai Bai,"Chelina, Vol. 1",3vAJlGxRNNYYMBJdnhxynu,Chelina,201875,21,0.695,0.677,1,-4.925,0,0.0656,0.0364,0.00643,0.181,0.394,176.174
5,Пожар,Расстояние,772bNQ8WjMAXwiBywGcHb7,МЫ,237000,45,0.576,0.752,0,-9.11,1,0.0397,0.657,0.646,0.0828,0.119,120.001
6,Insan,Insan,01ZAEc3eyGGJLLZuAgdSMD,Yousef Kekhia,432923,38,0.584,0.509,1,-12.036,1,0.03,0.428,0.863,0.136,0.221,129.981
7,The World Retreats - Marino Canal Remix,The World Retreats (Marino Canal Remix),5dyVrDue2N9ArxW34JyeO4,David O'Dowda,426000,37,0.774,0.462,11,-11.724,0,0.0485,0.677,0.863,0.107,0.168,120.001
8,Opa Gäärd,A Long Way,1eb0mORiTlz0OLkH0NPb9Z,Melokind,423597,41,0.641,0.466,3,-13.505,0,0.0403,0.434,0.943,0.0972,0.352,99.98
9,sugar,next chapter,5IjIbGO7lih9CVDBFLCtTT,Zubi,205000,56,0.774,0.609,3,-10.13,0,0.213,0.397,0.0458,0.111,0.891,95.598


In [44]:
sample_n = (25-len(new_playlist_df))//2
sample_n

4

In [45]:
new_playlist_df = pd.concat([new_playlist_df,filtered_saved_m.sample(sample_n),filtered_saved_t.sample(sample_n)])
new_playlist_df.reset_index(drop = True, inplace= True)

In [46]:
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Huldra - Other Version,Huldra,569gNjph2g07MmjtMm6vKm,Gidge,490000,38,0.772,0.514,10,-11.961,0,0.0754,0.148,0.906,0.0934,0.103,118.015
1,Дни и ночи,Дни и ночи,3A88HuZw5yiZOAcon4RkZk,Geegun,186000,48,0.917,0.526,2,-8.15,0,0.125,0.298,5.8e-05,0.108,0.514,119.985
2,Phoenix,Phoenix,6zAiRKvAMlXHxEtyO4yxIO,League of Legends,197633,75,0.42,0.723,10,-6.86,0,0.175,0.0638,1.2e-05,0.123,0.288,167.806
3,Zunder,Bittersweet,10Sp3ZHJUkSoYCN9NZO7QL,Marek Hemmann,306976,48,0.811,0.639,9,-9.956,1,0.0515,0.0118,0.893,0.085,0.277,126.03
4,By My Side,Ancient Shadows,0eSWvkrCADGx0409mebsP4,Ecepta,193804,38,0.587,0.529,7,-13.713,0,0.0376,0.285,0.729,0.147,0.175,138.047
5,Wenn der Winter kommt,Mittelpunkt der Welt,6aj2zW1HmhZNQoeGR1avif,Element Of Crime,274306,29,0.332,0.454,4,-8.373,1,0.0443,0.278,0.0,0.11,0.244,193.468
6,Rolls Royce,Rolls Royce,22XFe65IH0P2RY0uowBqdI,Geegun,143437,68,0.872,0.423,1,-7.337,1,0.213,0.142,1e-05,0.104,0.373,95.98
7,Points of Authority / 99 Problems / One Step C...,Collision Course (Deluxe Version),65eohvrL4ttjA7EfFkQOhX,JAY-Z,295826,58,0.547,0.951,1,-4.079,0,0.151,0.00272,1.1e-05,0.0862,0.51,95.031
8,WAIDMANNS HEIL,LIEBE IST FÜR ALLE DA (SPECIAL EDITION),6ey8jr96hlSORKyeJk9d8d,Rammstein,212800,0,0.578,0.969,9,-3.894,1,0.0317,0.00113,0.0234,0.0579,0.684,104.999
9,Handsome,Heavy Is The Head,766N0mjf1KzAAYsA1eAJaN,Stormzy,152733,56,0.762,0.559,1,-6.854,1,0.153,0.173,0.0,0.127,0.282,139.955


### Adding new tracks from Spotify recommendations

In this last step I add new tracks to fill up the other half of the playlist.

I **don't** want to simply add songs spotify recommends based on the songs, which are already in the playlist.

Therefore getting Spotify recommendations is only the first step. I am retrieving multiple songs recommendations per song, which are then filtered again based on similarity scoring.

In [47]:
seed_tracks = new_playlist_df["track_id"].tolist()
#seed_artists = artists_m["name"].tolist() + artists_t["name"].tolist()

In [48]:
len(seed_tracks)

25

Unfortunately **the Spotify API does not accept 25 seed tracks for a recommendation query**, I therefore am splitting up the process into "packages" of 5 seed tracks, retrieving 25 tracks per "package"

In [49]:
seed_tracks[:5], seed_tracks[5:10], seed_tracks[10:15]

(['569gNjph2g07MmjtMm6vKm',
  '3A88HuZw5yiZOAcon4RkZk',
  '6zAiRKvAMlXHxEtyO4yxIO',
  '10Sp3ZHJUkSoYCN9NZO7QL',
  '0eSWvkrCADGx0409mebsP4'],
 ['6aj2zW1HmhZNQoeGR1avif',
  '22XFe65IH0P2RY0uowBqdI',
  '65eohvrL4ttjA7EfFkQOhX',
  '6ey8jr96hlSORKyeJk9d8d',
  '766N0mjf1KzAAYsA1eAJaN'],
 ['6d8mjJCtnlFtgGv3erEs7c',
  '04EJyZSlhPPfFOo1NRn2vl',
  '2dGtLJysTSI9cbQ6TulL8V',
  '17nPeSliosCi427f0lUb75',
  '0BJ5N5yfEPBAimh9LprQvc'])

In [50]:
recomms = sp_m.recommendations(seed_tracks = seed_tracks[:5],limit = 25)

In [51]:
append_audio_features(create_df_recommendations(recomms),sp_m)

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Warschauer Strasse - Original Version,Großstadtmärchen,54Ouh41CWAFULtChxDc2P3,Oliver Koletzki,370333,34,0.836,0.495,9,-10.834,0,0.0476,0.00241,0.844,0.0783,0.508,124.994
1,Sine Ira,The Dawn: Chapter 1,4N4cETijYD18C7bSJl8Vxy,Kisnou,211200,41,0.467,0.655,4,-9.961,0,0.0644,0.544,0.918,0.104,0.106,74.082
2,Norrland,Autumn Bells,7kdaeyC4sTR8TEsskWIjbQ,Gidge,484682,49,0.592,0.413,6,-13.345,0,0.0396,0.526,0.887,0.271,0.0361,112.981
3,Принцесса,313,6d4YTLAXAMUC8vxsSyBkmt,Babek Mamedrzaev,177968,60,0.79,0.922,7,-3.0,0,0.0992,0.016,0.0,0.521,0.433,116.044
4,Spree Ahoi (feat. Steven Coulter),Spree Ahoi (feat. Steven Coulter),6wvp4Mh9QrPAcs6sWiTj4x,Thomas Lizzara,340505,35,0.891,0.574,0,-7.271,1,0.064,0.00114,0.364,0.0548,0.415,126.016
5,Hate Me,Hate Me,5FgdPkPCffPktT5qnWls8Y,Nico Collins,189910,65,0.644,0.928,6,-2.966,0,0.0939,0.00123,0.0,0.0793,0.571,95.059
6,Devil,Devil,4BVKDnYVUxbfBYJIip2RBp,Barren Gates,176000,62,0.506,0.865,6,-1.421,0,0.155,0.348,2.5e-05,0.0892,0.402,150.062
7,Nil,Mare,5G3ZKjCHie2Ikr3I4QCQGt,Christian Löffler,410949,46,0.829,0.176,6,-23.885,0,0.0545,0.216,0.722,0.0952,0.143,120.008
8,Fireflies - Original,Fireflies,60NRvqEVjGSbocmylxfhDf,Enzalla,309333,41,0.59,0.496,5,-10.024,0,0.0307,0.46,0.921,0.106,0.112,134.981
9,Миллион алых роз,Миллион алых роз,2khK4aqzvb1Dsc2HqaU5zJ,Egor Kreed,191444,52,0.412,0.758,8,-5.745,1,0.132,0.0108,0.0,0.103,0.37,91.537


In [52]:
recomm_dfs = []
for i in range(5,26,5):
    recomms = sp_m.recommendations(seed_tracks = seed_tracks[i-5:i],limit = 25)
    recomms_df = append_audio_features(create_df_recommendations(recomms),sp_m)
    recomm_dfs.append(recomms_df)
recomms_df = pd.concat(recomm_dfs)

In [53]:
recomms_df.reset_index(drop = True, inplace= True)

In [54]:
recomms_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Middle Finger,Middle Finger - Single,0IK8y0vAO0sM630HsxwLht,Bohnes,229986,70,0.685,0.656,1,-6.013,0,0.0383,0.43700,0.000000,0.3470,0.392,128.003
1,Ненавижу,Ненавижу,2dxCRx56x3hNzuZfnGhzL7,Misha Marvin,186000,36,0.837,0.517,1,-6.316,1,0.0449,0.00915,0.000008,0.6460,0.222,119.986
2,Drifting,Drifting,3GWomAWy6PxtyzlfqKhTIl,TWO LANES,187000,46,0.592,0.719,7,-5.781,1,0.0757,0.02060,0.616000,0.2210,0.207,159.876
3,Do Not Do Me (Like Dis),Do Not Do Me (Like Dis),4DGrOp8YhMx5WocfOd0pQE,Moonbootica,194081,26,0.774,0.818,8,-6.445,1,0.0502,0.01820,0.012600,0.0884,0.434,119.009
4,Hypnotized,Großstadtmärchen,0V2GX0aukyZMt6nSMxfOJk,Oliver Koletzki,340560,58,0.764,0.381,7,-11.845,1,0.0535,0.04580,0.714000,0.1800,0.298,122.988
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,Oloro Nyager - Club Mix,Oloro Nyager,4cPqGklG4Ffb7AVlWhuWQt,Djuma Soundsystem,424918,25,0.797,0.694,4,-10.212,1,0.0473,0.03100,0.847000,0.6680,0.415,122.006
121,Stickin' (feat. Masego & VanJess),Stickin' (feat. Masego & VanJess),24KUvSg9QsX6FWsOmN0ZxP,Sinead Harnett,188805,64,0.668,0.450,8,-8.257,0,0.3140,0.07220,0.000000,0.1090,0.633,82.999
122,Eyes Closed,Moments Of Truth,5AyqWYv3gHsvQ0FJuasFs9,Tim Engelhardt,216784,23,0.835,0.373,6,-12.303,0,0.0602,0.22800,0.620000,0.1180,0.208,115.004
123,Baptize (with JID & EARTHGANG feat. Ant Clemons),Spilligion,5zWOqc9si4XnemdxZH4WGG,Spillage Village,293873,65,0.669,0.667,6,-7.028,0,0.3620,0.18700,0.000000,0.1720,0.512,77.985


The 125 recommendations are further filtered by their similarity to the known tracks in the playlist.

In [56]:
similarity_score = create_similarity_score(new_playlist_df,recomms_df)

In [58]:
new_playlist_df.shape

(25, 17)

In [59]:
recomms_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Middle Finger,Middle Finger - Single,0IK8y0vAO0sM630HsxwLht,Bohnes,229986,70,0.685,0.656,1,-6.013,0,0.0383,0.43700,0.000000,0.3470,0.392,128.003
1,Ненавижу,Ненавижу,2dxCRx56x3hNzuZfnGhzL7,Misha Marvin,186000,36,0.837,0.517,1,-6.316,1,0.0449,0.00915,0.000008,0.6460,0.222,119.986
2,Drifting,Drifting,3GWomAWy6PxtyzlfqKhTIl,TWO LANES,187000,46,0.592,0.719,7,-5.781,1,0.0757,0.02060,0.616000,0.2210,0.207,159.876
3,Do Not Do Me (Like Dis),Do Not Do Me (Like Dis),4DGrOp8YhMx5WocfOd0pQE,Moonbootica,194081,26,0.774,0.818,8,-6.445,1,0.0502,0.01820,0.012600,0.0884,0.434,119.009
4,Hypnotized,Großstadtmärchen,0V2GX0aukyZMt6nSMxfOJk,Oliver Koletzki,340560,58,0.764,0.381,7,-11.845,1,0.0535,0.04580,0.714000,0.1800,0.298,122.988
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,Oloro Nyager - Club Mix,Oloro Nyager,4cPqGklG4Ffb7AVlWhuWQt,Djuma Soundsystem,424918,25,0.797,0.694,4,-10.212,1,0.0473,0.03100,0.847000,0.6680,0.415,122.006
121,Stickin' (feat. Masego & VanJess),Stickin' (feat. Masego & VanJess),24KUvSg9QsX6FWsOmN0ZxP,Sinead Harnett,188805,64,0.668,0.450,8,-8.257,0,0.3140,0.07220,0.000000,0.1090,0.633,82.999
122,Eyes Closed,Moments Of Truth,5AyqWYv3gHsvQ0FJuasFs9,Tim Engelhardt,216784,23,0.835,0.373,6,-12.303,0,0.0602,0.22800,0.620000,0.1180,0.208,115.004
123,Baptize (with JID & EARTHGANG feat. Ant Clemons),Spilligion,5zWOqc9si4XnemdxZH4WGG,Spillage Village,293873,65,0.669,0.667,6,-7.028,0,0.3620,0.18700,0.000000,0.1720,0.512,77.985


In [60]:
[np.argmax(i) for i in similarity_score]

[124,
 23,
 44,
 111,
 114,
 98,
 121,
 32,
 42,
 68,
 12,
 101,
 84,
 117,
 54,
 54,
 76,
 89,
 109,
 96,
 99,
 103,
 93,
 100,
 63]

In [61]:
final_recomms=recomms_df.loc[[np.argmax(i) for i in similarity_score]]
final_recomms = final_recomms.drop_duplicates()

In [62]:
new_playlist_df = new_playlist_df.append(final_recomms)

In [63]:
new_playlist_df = new_playlist_df.drop_duplicates()
new_playlist_df.reset_index(drop = True, inplace = True)

# The playlist is finished!

Now the only thing left to do is to add the tracks to the playlist.

(Adding a nice picture and thanking your girlfriend for her patience in the playlist description are **not optional**)

In [64]:
new_playlist_df

Unnamed: 0,track_name,album,track_id,artist,duration,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Huldra - Other Version,Huldra,569gNjph2g07MmjtMm6vKm,Gidge,490000,38,0.772,0.514,10,-11.961,0,0.0754,0.148,0.906,0.0934,0.103,118.015
1,Дни и ночи,Дни и ночи,3A88HuZw5yiZOAcon4RkZk,Geegun,186000,48,0.917,0.526,2,-8.15,0,0.125,0.298,5.8e-05,0.108,0.514,119.985
2,Phoenix,Phoenix,6zAiRKvAMlXHxEtyO4yxIO,League of Legends,197633,75,0.42,0.723,10,-6.86,0,0.175,0.0638,1.2e-05,0.123,0.288,167.806
3,Zunder,Bittersweet,10Sp3ZHJUkSoYCN9NZO7QL,Marek Hemmann,306976,48,0.811,0.639,9,-9.956,1,0.0515,0.0118,0.893,0.085,0.277,126.03
4,By My Side,Ancient Shadows,0eSWvkrCADGx0409mebsP4,Ecepta,193804,38,0.587,0.529,7,-13.713,0,0.0376,0.285,0.729,0.147,0.175,138.047
5,Wenn der Winter kommt,Mittelpunkt der Welt,6aj2zW1HmhZNQoeGR1avif,Element Of Crime,274306,29,0.332,0.454,4,-8.373,1,0.0443,0.278,0.0,0.11,0.244,193.468
6,Rolls Royce,Rolls Royce,22XFe65IH0P2RY0uowBqdI,Geegun,143437,68,0.872,0.423,1,-7.337,1,0.213,0.142,1e-05,0.104,0.373,95.98
7,Points of Authority / 99 Problems / One Step C...,Collision Course (Deluxe Version),65eohvrL4ttjA7EfFkQOhX,JAY-Z,295826,58,0.547,0.951,1,-4.079,0,0.151,0.00272,1.1e-05,0.0862,0.51,95.031
8,WAIDMANNS HEIL,LIEBE IST FÜR ALLE DA (SPECIAL EDITION),6ey8jr96hlSORKyeJk9d8d,Rammstein,212800,0,0.578,0.969,9,-3.894,1,0.0317,0.00113,0.0234,0.0579,0.684,104.999
9,Handsome,Heavy Is The Head,766N0mjf1KzAAYsA1eAJaN,Stormzy,152733,56,0.762,0.559,1,-6.854,1,0.153,0.173,0.0,0.127,0.282,139.955


In [65]:
new_playlist_df["track_id"].tolist()

['569gNjph2g07MmjtMm6vKm',
 '3A88HuZw5yiZOAcon4RkZk',
 '6zAiRKvAMlXHxEtyO4yxIO',
 '10Sp3ZHJUkSoYCN9NZO7QL',
 '0eSWvkrCADGx0409mebsP4',
 '6aj2zW1HmhZNQoeGR1avif',
 '22XFe65IH0P2RY0uowBqdI',
 '65eohvrL4ttjA7EfFkQOhX',
 '6ey8jr96hlSORKyeJk9d8d',
 '766N0mjf1KzAAYsA1eAJaN',
 '6d8mjJCtnlFtgGv3erEs7c',
 '04EJyZSlhPPfFOo1NRn2vl',
 '2dGtLJysTSI9cbQ6TulL8V',
 '17nPeSliosCi427f0lUb75',
 '0BJ5N5yfEPBAimh9LprQvc',
 '4RYKr1R3tXrITqY1zWiTNi',
 '2KpvM181ZwtIWwIElZx4vI',
 '5R4hprpCcdgKz1DsPoh9p2',
 '6gG1R1bFdJeNc2ERAwXxCb',
 '1HK6TWeHG9q5upt2WcI0sw',
 '4njhWDhTAjhReWtYkiMH9t',
 '5lWVqc9kPplvlDmLtFls02',
 '4ZVqqf3Eo4uXSeFUrYD5lw',
 '59g3Adj1Vdhja52OrhURdE',
 '2Ee9amLUslOwgoJWZEpSSD',
 '5xWzVhDpu6hpcoiO2euL8u',
 '5Wxg8ocgiYkLozRjJhJReM',
 '4Zc7TCHzuNwL0AFBlyLdyr',
 '7F3q8BR1VVJDrDjGRi5byr',
 '07poLV9zNLuakMGUYTXvyZ',
 '1U5vvKgK8pVT0dpwY2CWRh',
 '24KUvSg9QsX6FWsOmN0ZxP',
 '77VW8u6inET54YAKN1RSnd',
 '7ymr9kOckM8Uw6qNs7My1W',
 '6WI33r2QRxyxttn3KH0XkC',
 '0BKHgXSyG6kquLakA4yNVB',
 '25GlFJq5QNAXyVgJvCZ4Mf',
 

**Note:** Here I am using `user_playlist_add_tracks()` to **add** to an existing playlist. It is possible to create one from scratch, however it wasn't necessary here. 

In the script I am using `playlist_replace_items()` as I don't just want new songs to be added but old ones to be deleted.

In [66]:
sp_m.user_playlist_add_tracks(usernames[0],
                              playlist_id="spotify:playlist:1Vcqtv3nE7QOJ4KFvK7bT8",
                              tracks = new_playlist_df["track_id"].tolist())

{'snapshot_id': 'MTMsODlmMGU0MGFjNGU1MGQ4OWUzZTc2M2ZiYmRlZTgxMWNkZmNmOTA2Mg=='}

In [67]:
new_playlist_df.to_csv(path/"Playlist.csv")