# Spotify Recommendation System

By: Bella Chang (Data Science, '24), Leah Hong (Data Science & Statistics, '23), Logan Liu (Business Administration & Data Science, '23), Tess U-Vongcharoen (Data Science, '24)

Spotify is one of the world's largest music streaming platforms, with over 430 million monthly users. As data scientists and Spotify users ourselves, our group members were interested by Spotify's "Discover Weekly" algorithm, which recommends 50 new songs to Spotify users based on their personal playlists and activity. Using similarity scores and linear algebra principles, our group ultimately created our own personal algorithm mimicking the "Discover Weekly" behavior, based off of two user playlists and the Spotipy, NumPy, NumPy, and Pandas libraries. 

Our project is broken up into 5 steps, largely influenced by the Data Science Lifecycle: 

1. Setup

undefined. Add input (user personal playlists)

undefined. Combine the playlists given by the user, clean/extract relevant data

undefined. Create weight features calculating similarity scores for each song

undefined. Create a function that calculates similarity between two songs based on weights

### Step 1: Setup

In [1]:
#import all libraries
from datascience import *
from scipy import stats
from scipy import special
import numpy as np 
import pandas as pd
import math
import random

#import spotipy libraries
!pip install spotipy
import spotipy
import spotipy.oauth2 as oauth2
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth 
import time 

#variables necessary for spotipy
cid = 'b8ef4ecc093c464191135d0ea204ea37'
secret = '9d03ba94da8a4fdeb9fc3d77a93d2552'
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Collecting spotipy
  Downloading spotipy-2.20.0-py3-none-any.whl (27 kB)
Collecting redis>=3.5.3
  Downloading redis-4.3.4-py3-none-any.whl (246 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m246.2/246.2 KB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting deprecated>=1.2.3
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Installing collected packages: deprecated, redis, spotipy
Successfully installed deprecated-1.2.13 redis-4.3.4 spotipy-2.20.0
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

### Step 2: Add input (user personal playlists)

In [5]:
#user input to be added at this point; dummy playlists added for illustration

new_playlist_name = 'Spotify Recommended Playlist #1'

#A string of the actual URL to a playlist
playlist_1_link = 'https://open.spotify.com/playlist/0Dh79mnt4dty1DGqeZ6eJh?si=3k8xlPe7RdufMFce5ZzH7Q'
#A string of the actual URL to another playlist
playlist_2_link = 'https://open.spotify.com/playlist/6DamqaZt8Dl5nphPnUMtbm?si=e2e3bdf98a3e4e9e'
total_output = 0.8 #A value between 0 to 1 that represents the percentage of songs from the added playlists to be outputted 
percent_new = 0.5 #A value between 0 and 0.5 that represents percent of output returned that should be new songs not in either playlist

The following section outputs some basic data on the playlists and how many songs will be outputted/recommended by our algorithm.

In [6]:
playlist_1 = sp.playlist_items(playlist_1_link) #draws raw playlist data from Spotify
playlist_2 = sp.playlist_items(playlist_2_link) #draws raw playlist data form Spotify

In [7]:
playlist_1_length = len(playlist_1['items'])
playlist_2_length = len(playlist_2['items'])
combined_length = playlist_1_length + playlist_2_length #combined # of tracks in both playlists
shorter_playlist = 'null'
if playlist_1_length >= playlist_2_length:
    shorter_playlist = 'Playlist 2'
else:
    shorter_playlist = 'Playlist 1'

shortest_length = 0
if shorter_playlist == 'Playlist 1':
    shortest_length = playlist_1_length + 0
else:
    shortest_length = playlist_2_length + 0

In [8]:
percentage_output = total_output * 100 #for string printing purposes
num_songs_output = int(total_output * shortest_length)

percentage_new = percent_new * 100 #for string printing purposes
num_new_songs = int(num_songs_output * percent_new)
num_og_songs = num_songs_output - num_new_songs 

In [9]:
#summary of data found by above section
print(f'The number of tracks in both playlists combined is {combined_length}')
print(f'The length of the shortest playlist, {shorter_playlist}, is {shortest_length}')
print(f'The number of tracks to be outputted is {percentage_output}% of the shorter playlist, which is {num_songs_output} tracks')
print(f'Of the songs outputted, {percentage_new}%, or {num_new_songs} tracks, will be newly recommended songs, and {num_og_songs} will be from the inputted playlists')


The number of tracks in both playlists combined is 145
The length of the shortest playlist, Playlist 2, is 58
The number of tracks to be outputted is 80.0% of the shorter playlist, which is 46 tracks
Of the songs outputted, 50.0%, or 23 tracks, will be newly recommended songs, and 23 will be from the inputted playlists


### Step 3: Combine the playlists given by the user, clean/extract relevant data

In [10]:
#getting song URIs of each respective playlist into respective lists for ease of access
playlist_1_ids = [] #Makes a list of IDs for Playlist 1
playlist_2_ids = []  #Makes a list of IDs for Playlist 2

for elem in np.arange(playlist_1_length):
    track_id = playlist_1['items'][elem]['track']['id']
    if track_id not in playlist_1_ids and track_id not in playlist_2_ids:
        playlist_1_ids.append(track_id)

for elem in np.arange(playlist_2_length):
    track_id = playlist_2['items'][elem]['track']['id']
    if track_id not in playlist_1_ids and track_id not in playlist_2_ids:
        playlist_2_ids.append(track_id)

playlist_1_length = len(playlist_1_ids)
playlist_2_length = len(playlist_2_ids)

In [11]:
#fetching labels for columns of each table
column_labels = list(sp.audio_features(playlist_1_ids)[0].keys()) #has most labels, missing 'name' and 'og playlist'

In [12]:
#make data dictionary for playlist 1
playlist_1_data = {} #make an empty dictionary for this playlist to append data to
for attr in column_labels:
    playlist_1_data[attr] = []

playlist_1_names_list = [] #start with scraping the names of all the tracks
for elem in playlist_1_ids:
    track_name = sp.tracks([elem])['tracks'][0]['name']
    playlist_1_names_list.append(track_name)
playlist_1_data['name'] = playlist_1_names_list #add the name values into the data dictionary
ones_list = [] #this is just going to make it easier to make the 'og playlist' column, which marks the original playlist a track comes from
for elem in np.arange(playlist_1_length):
    ones_list.append(1)
playlist_1_data['og playlist'] = ones_list #add the source data values into the dictionary
for song in playlist_1_ids:
    all_features = sp.audio_features(song)[0]
    for attr in column_labels: 
        value_of_interest = all_features[attr]
        (playlist_1_data[attr]).append(value_of_interest)

In [13]:
#repeat creation of data dictionary for playlist 2
playlist_2_data = {} #make an empty dicitonary for this playlist to append data to
for attr in column_labels:
    playlist_2_data[attr] = []


playlist_2_names_list = [] #start with scraping the names of all the tracks
for elem in playlist_2_ids:
    track_name = sp.tracks([elem])['tracks'][0]['name']
    playlist_2_names_list.append(track_name)
playlist_2_data['name'] = playlist_2_names_list #add the name values into the data dictionary
twos_list = [] #this is just going to make it easier to make the 'og playlist' column, which marks the original playlist a track comes from
for elem in np.arange(playlist_2_length):
    twos_list.append(2)
playlist_2_data['og playlist'] = twos_list #add the source data values into the dictionary
for song in playlist_2_ids:
    all_features = sp.audio_features(song)[0]
    for attr in column_labels:
        value_of_interest = all_features[attr]
        (playlist_2_data[attr]).append(value_of_interest)

In [14]:
#create one table with all attributes for song URIs
playlist_1_table = pd.DataFrame(playlist_1_data)
playlist_2_table = pd.DataFrame(playlist_2_data)
frames = [playlist_1_table, playlist_2_table]
combined_table = (pd.concat(frames)).set_index('id') #this makes the index of the combined table the name values
combined_table 

Unnamed: 0_level_0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms,time_signature,name,og playlist
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
5A1w94uzc1kO1Zhe8WWxC3,0.788,0.427,5,-9.918,0,0.3080,0.0659,0.000036,0.1710,0.603,97.030,audio_features,spotify:track:5A1w94uzc1kO1Zhe8WWxC3,https://api.spotify.com/v1/tracks/5A1w94uzc1kO...,https://api.spotify.com/v1/audio-analysis/5A1w...,215227,4,Right Back,1
1AHf5FSofKcUw8tyKkccKF,0.819,0.519,10,-7.160,0,0.0768,0.3840,0.000000,0.0898,0.452,100.039,audio_features,spotify:track:1AHf5FSofKcUw8tyKkccKF,https://api.spotify.com/v1/tracks/1AHf5FSofKcU...,https://api.spotify.com/v1/audio-analysis/1AHf...,177627,4,do u even miss me at all?,1
5pJPgy2jGvvNUNfHPvG3Zp,0.342,0.609,8,-5.680,1,0.0943,0.1850,0.000520,0.0799,0.401,82.989,audio_features,spotify:track:5pJPgy2jGvvNUNfHPvG3Zp,https://api.spotify.com/v1/tracks/5pJPgy2jGvvN...,https://api.spotify.com/v1/audio-analysis/5pJP...,261595,4,Warm on a Cold Night,1
5dmPNuHmRRJuHmJTDa7NuJ,0.739,0.471,8,-8.456,1,0.0436,0.0661,0.000098,0.1160,0.310,90.002,audio_features,spotify:track:5dmPNuHmRRJuHmJTDa7NuJ,https://api.spotify.com/v1/tracks/5dmPNuHmRRJu...,https://api.spotify.com/v1/audio-analysis/5dmP...,194400,4,Bambi,1
2ulMreFZwpCzxoXpDfORCh,0.902,0.536,5,-3.188,0,0.1150,0.3390,0.000005,0.0981,0.786,96.015,audio_features,spotify:track:2ulMreFZwpCzxoXpDfORCh,https://api.spotify.com/v1/tracks/2ulMreFZwpCz...,https://api.spotify.com/v1/audio-analysis/2ulM...,131094,4,Beautiful,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0iWFz0Q5Qha9bx325ocFWq,0.740,0.663,7,-5.393,1,0.2720,0.2680,0.000000,0.1210,0.519,82.004,audio_features,spotify:track:0iWFz0Q5Qha9bx325ocFWq,https://api.spotify.com/v1/tracks/0iWFz0Q5Qha9...,https://api.spotify.com/v1/audio-analysis/0iWF...,152911,4,Best Lover,2
4i7HqWsN7iJzsXpsW5h1nb,0.709,0.679,9,-4.828,1,0.0588,0.8080,0.000046,0.4280,0.852,102.653,audio_features,spotify:track:4i7HqWsN7iJzsXpsW5h1nb,https://api.spotify.com/v1/tracks/4i7HqWsN7iJz...,https://api.spotify.com/v1/audio-analysis/4i7H...,191336,4,Bad Habit - Sped Up,2
39oeKRgsLeynDIbWuXyA47,0.900,0.316,9,-8.797,1,0.3320,0.6840,0.077100,0.0853,0.844,129.914,audio_features,spotify:track:39oeKRgsLeynDIbWuXyA47,https://api.spotify.com/v1/tracks/39oeKRgsLeyn...,https://api.spotify.com/v1/audio-analysis/39oe...,107077,4,CAN'T GET OVER YOU (feat. Clams Casino),2
0zePbRMJ9sd7wsZRlbPQua,0.707,0.767,7,-7.072,1,0.0323,0.0279,0.000295,0.1690,0.832,139.985,audio_features,spotify:track:0zePbRMJ9sd7wsZRlbPQua,https://api.spotify.com/v1/tracks/0zePbRMJ9sd7...,https://api.spotify.com/v1/audio-analysis/0zeP...,204014,4,Hot Rod,2


In [15]:
#put all the combined tracks IDs and names into lists
combined_tracks_id_list = list(combined_table.index)
combined_tracks_name_list = combined_table['name'].tolist()
id_to_name = {}      #make a dictionary to easily access id from name or vice versa
for elem in combined_tracks_id_list:
    id_to_name[elem] = combined_tracks_name_list[combined_tracks_id_list.index(elem)]

### Step 4: Create weight features calculating similarity scores for each song

In [16]:
#attributes to focus on
#DISCLAIMER: these weights are largely based on personal experimentation with general datasets, but can be altered if user desires

weights_dictionary = {
    'genre_weight':17,   #need to come up with a way to translate genre differences into numerical similarity
    'artist_weight':12,   #need to come up with a way to determine if two artists are similar, probably a binary value
    'mode_weight':8,   #also a binary value
    'valence_weight':15,
    'tempo_weight':6,   #simplify this into like 4 buckets or something
    'danceability_weight':13,
    'energy_weight':11,
    'acousticness_weight':5,
    'instrumentalness_weight':3,
    'loudness_weight':0,
    'liveness_weight':5,
    'speechiness_weight':5
                        }

In [19]:
#making sure that the feature matrix is valid
weights_total = sum(list(weights_dictionary.values()))
assert sum(list(weights_dictionary.values())) == 100, f'Weights do not add up to 100, adds to {weights_total}'

focus_attributes = ['danceability', 'valence', 'energy', 'liveness', 'speechiness'] #attributes to focus on, take out loudness because it has negative values
print(f'Current attributes considered are: {focus_attributes}')

Current attributes considered are: ['danceability', 'valence', 'energy', 'liveness', 'speechiness']


### Step 5: Create a function that calculates similarity between two songs based on weights

In [20]:
tbl = combined_table 

def similarity(song_a, song_b, attributes):
    """
    Takes in two songs and calculates a similarity score by multiplying percent similarity
    in focus attributes by the weights of those respective attributes and summing together
    """
    similarity_sum = 0
    for elem in attributes:
        a_value = tbl.loc[f'{song_a}'][f'{elem}']
        b_value = tbl.loc[f'{song_b}'][f'{elem}']
        att_weight = weights_dictionary[f'{elem}_weight']
        value_diff = abs(a_value - b_value)
        percent_similar = value_diff / ((a_value + b_value) / 2) #comparing difference value to the mean of the two numbers
        weighted_similarity = percent_similar * att_weight
        similarity_sum = similarity_sum + weighted_similarity
    return similarity_sum 

In [21]:
#make the frame for the matrix that compares each song to every other song in the combined songs list
song_matrix = pd.DataFrame(0, index=combined_tracks_id_list , columns=combined_tracks_id_list, dtype=float)

#set each value that is a song to itself to 1
for elem in combined_tracks_id_list:
    song_matrix.at[f'{elem}', f'{elem}'] = 1

#iterate through each value in the song_matrix and input the similarity between those songs
for song1 in combined_tracks_id_list:
    focus_song = song1
    for song2 in combined_tracks_id_list:
        if song1 != song2:
            song_matrix.at[f'{song1}', f'{song2}'] = similarity(song1, song2, focus_attributes)

In [24]:
#create a song matrix that has a column which averages scores for each song at very right
average_similarity_scores = []

for elem in song_matrix.index.values.tolist():
    score_list = list(song_matrix.get(f'{elem}').values)
    average_score = sum(score_list) / len(score_list)
    average_similarity_scores.append(average_score)
song_matrix['average'] = average_similarity_scores
song_matrix


Unnamed: 0,5A1w94uzc1kO1Zhe8WWxC3,1AHf5FSofKcUw8tyKkccKF,5pJPgy2jGvvNUNfHPvG3Zp,5dmPNuHmRRJuHmJTDa7NuJ,2ulMreFZwpCzxoXpDfORCh,3JJYIoJ5FyY9E0DGjr7SXF,1mrPG8snf4maJMoM4Ec8Ag,3TTMUI5dFcbeNSDTTDY9M8,1ITJTMrS4cx8zdlI7DdSoo,6lY38FkInSA0QVHRb1PiEy,...,5PHaM8RbwwqZWlptITQmSg,20YNEkOUj8r3jHscotaGpe,5ziZpT9la4h3sjfvitLc1A,5expoVGQPvXuwBBFuNGqBd,0iWFz0Q5Qha9bx325ocFWq,4i7HqWsN7iJzsXpsW5h1nb,39oeKRgsLeynDIbWuXyA47,0zePbRMJ9sd7wsZRlbPQua,53ISLAcEqXWzcpKMvVzWYE,average
5A1w94uzc1kO1Zhe8WWxC3,1.000000,16.056742,29.105553,20.976153,15.468143,16.059049,23.372750,17.336406,13.642264,7.130171,...,31.570361,22.537440,23.934304,13.078485,10.159064,22.603134,13.727077,20.621307,14.589428,22.987169
1AHf5FSofKcUw8tyKkccKF,16.056742,1.000000,15.837335,12.022819,12.135506,13.749855,15.109436,10.599185,9.239782,15.827087,...,24.303815,12.619638,15.827383,10.270055,13.144151,21.871325,22.147365,22.168479,24.053967,18.513700
5pJPgy2jGvvNUNfHPvG3Zp,29.105553,15.837335,1.000000,21.718694,24.848696,21.996642,16.304599,16.449970,22.261998,27.439321,...,27.670396,18.466420,19.645734,17.621497,21.242568,30.245166,35.227269,30.536557,22.823244,22.719267
5dmPNuHmRRJuHmJTDa7NuJ,20.976153,12.022819,21.718694,1.000000,22.369778,14.855609,16.506761,17.650525,13.368337,19.643561,...,18.500212,13.900994,17.352909,15.737779,18.753756,25.730589,29.972523,22.896712,28.911065,18.607212
2ulMreFZwpCzxoXpDfORCh,15.468143,12.135506,24.848696,22.369778,1.000000,15.714825,26.118852,16.741937,11.360520,11.172818,...,32.783156,22.025239,26.180651,14.058992,16.135403,16.417200,12.329607,16.172989,17.296059,22.783414
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0iWFz0Q5Qha9bx325ocFWq,10.159064,13.144151,21.242568,18.753756,16.135403,10.736154,22.814445,7.337681,13.597628,11.242925,...,30.638895,12.858754,18.567564,14.387579,1.000000,20.142160,20.211543,18.675625,14.502704,20.475776
4i7HqWsN7iJzsXpsW5h1nb,22.603134,21.871325,30.245166,25.730589,16.417200,12.650766,29.153411,20.765228,17.012985,20.478173,...,38.394320,19.252224,14.875133,20.552534,20.142160,1.000000,24.921225,8.979133,21.111119,23.381982
39oeKRgsLeynDIbWuXyA47,13.727077,22.147365,35.227269,29.972523,12.329607,26.018266,25.969779,25.946357,21.346900,12.609156,...,33.511223,31.939640,36.087412,21.887184,20.211543,24.921225,1.000000,24.017098,20.015230,29.338599
0zePbRMJ9sd7wsZRlbPQua,20.621307,22.168479,30.536557,22.896712,16.172989,9.303467,28.967338,18.515788,13.296713,19.257182,...,37.861230,12.774945,18.634190,20.402167,18.675625,8.979133,24.017098,1.000000,18.939094,21.988515


In [25]:
#sorts all tuples by their average similarity scores in descending order (among songs)
def typeConverter(x): 
    """
    In the case where the matrix is an array (may happen when songs are repeated in playlist), 
    this will remove one of them.
    """
    y = []
    for val in x: 
        if type(val) == np.ndarray:
            y.append(val[0])
        else:
            y.append(val)
    return y

index = song_matrix.index
newAverage = typeConverter(average_similarity_scores)
zipValues = list(zip(index, newAverage))
zipValues.sort(key=lambda x: x[1], reverse=True)

In [26]:
#create generator of sorted songs
def make_generator(n):
    while len(n) >= 0:
        yield n[0][0]
        n = n[1:]

copy_zipValues = list(zipValues)
sorted_generator = make_generator(copy_zipValues)

In [28]:
og_similar_songs = []
#return the num_og_songs amount
x = num_og_songs + 0
playlist_1_counter = 0
playlist_2_counter = 0
target_playlist_1_max = int(0.7 * num_og_songs) #ensure some songs are returned from each playlist
target_playlist_2_max = int(0.7 * num_og_songs)
while x != 0 and (len(og_similar_songs) != num_og_songs):
    song_id = next(sorted_generator)
    song_title = id_to_name[song_id]
    if song_id in playlist_1_ids and playlist_1_counter <= target_playlist_1_max:
        playlist_1_counter += 1
        x -= 1
        og_similar_songs.append(song_title)
    if song_id in playlist_2_ids and playlist_2_counter <= target_playlist_2_max:
        playlist_2_counter += 1
        x -= 1
        og_similar_songs.append(song_title)

print(f'The similar {num_og_songs} songs from both playlists to be returned are: {og_similar_songs}')

The similar 23 songs from both playlists to be returned are: ['This Christmas', 'Natural', "You've Got a Friend In Me", 'OH!!', 'July on Film', 'All I Need (with Mahalia & Ty Dolla $ign)', 'p r i d e . i s . t h e . d e v i l (with Lil Baby)', 'venus fly trap', '夜に駆ける', 'icarus', 'CAN YOU HEAR THE MOON', 'Hypnotized', 'Leaves (Reboot)', 'looking for something', 'My Favourite Clothes', 'Still into You', 'Bad Habit - Sped Up', 'New Light', 'Right Back', 'Pretty Girl', 'Beautiful', "Don't Lose Sight", 'Warm on a Cold Night']


In this final section, we will return the names of newly recommended tracks!

In [29]:
#focus on a few key features in our final recommendations based on earlier notable features
original_songs_dictionary = id_to_name 
og_song_ids = list(original_songs_dictionary.keys())
recommendation_attributes = ['valence', 'danceability', 'energy']

Next, we will group all the songs together by the focus recommendation attributes. 
Currently, our approach is to not differentiate the outputs by what song attribute they were produced in. Thus, this will output one large list and then sort it by length, then pull X lists from the top, with X being the number of new songs needed to be returned.

In [30]:
#tuple sorting function by second value
def sort_tuple_list(tup):
    return(sorted(tup, key = lambda x: x[1]))

In [31]:
#VALENCE GROUP CODE
attribute = 'valence'

valence_list = list(combined_table.loc[:, f'{attribute}'])
valence_table = combined_table.loc[:, ['name', f'{attribute}']]
valence_tuple_list = []
for item in np.arange(len(list(valence_list))):
    track_id = (valence_table.index.values.tolist())[item]
    valence_value = list(valence_table[f'{attribute}'])[item]
    valence_tuple = (track_id, valence_value)
    valence_tuple_list.append(valence_tuple)
ordered_valence_tuples = sort_tuple_list(valence_tuple_list)

#group these elements

ordered_valences = []
for elem in ordered_valence_tuples:
    ordered_valences.append(elem[1])

diff = [y - x for x, y in zip(*[iter(ordered_valences)] * 2)]
avg = sum(diff) / len(diff) 

#actual grouping
valence_groups = [[ordered_valences[0]]]
for x in ordered_valences[1:]:
    if x - valence_groups[-1][0] < avg:
        valence_groups[-1].append(x)
    else:
        valence_groups.append([x])



for elem in valence_groups:
    for item in elem:
        if item in ordered_valences:
            origin = elem.index(item)
            position = ordered_valences.index(item)
            track_id = ordered_valence_tuples[position][0]
            elem.insert(origin, track_id)
            elem.remove(item)

print('Valence groups:')
print(valence_groups)

Valence groups:
[['6c1yUgFlhUHLAM9hSDkVBq', '34xTFwjPQ1dC6uJmleno7x'], ['5zsHmE2gO3RefVsPyw2e3T'], ['1GfwrTKOzOxK2cfPcrMyuP'], ['1F6IbA7di42uPc3cff8PXV'], ['4jSE5cAaa5rwTyhDSXfwQN', '5DrCWAaQ8zklweBa9abFIu'], ['7rKzXE9oHgFXDbAQ3PGgnu', '5xROgo35i9a9IbQgN56Clz', '0WeFvEY2QEjcKVd8ymli4i'], ['7pt64sepCy5QmDrgy9wOJS', '2G3l5uYzMqVAgEjcF8XtRj'], ['4sRoiXZBLpiRIklm2wy0WZ', '3XstzgzP0rp3bzElEnRVHv', '5BzpLTf8BIYKmSPZa33WXA'], ['30QNjcM3Q1GnLFIIJjWQL1', '30QNjcM3Q1GnLFIIJjWQL1', '06nKF46jG8p1zwyP4ziAyG', '7vCoNV0SP6UWQCKt3eFNKk'], ['6SEolIp22t0DzeBfCBo3hr'], ['3FpEXAupLwCHwzeUBxF99S'], ['2Hd7uGbl8PX0IxyX59VFOg'], ['1ZEFYW6nPEvIcsIvymgsLk', '5PHaM8RbwwqZWlptITQmSg', '6pPPm7T9JJOGdfjCTct54l', '6pPPm7T9JJOGdfjCTct54l', '0ACt3PP22HyKfpFIV6AQUW'], ['2tlJ22iQwiO1CWBQSma23n', '1BJIJ69DZNip7Erq6u69mu', '2DqhE7xzpGNsKYbptqblJg'], ['23c9gmiiv7RCu7twft0Mym'], ['40uMIn2zJLAQhNXghRjBed', '6XNANAB7sFvkfho6bMCp7o', '2ZwIO3ufWLFYxtEoam9ydu'], ['5dmPNuHmRRJuHmJTDa7NuJ'], ['5OUTFH5acycdnf8OVo21Gv', '3WBRfkOozHE

In [33]:
#DANCEABILTY GROUP CODE
attribute = 'danceability'

danceability_list = list(combined_table.loc[:, f'{attribute}'])
danceability_table = combined_table.loc[:, ['name', f'{attribute}']]
danceability_tuple_list = []
for item in np.arange(len(list(danceability_list))):
    track_id = (danceability_table.index.values.tolist())[item]
    danceability_value = list(danceability_table[f'{attribute}'])[item]
    danceability_tuple = (track_id, danceability_value)
    danceability_tuple_list.append(danceability_tuple)
ordered_danceability_tuples = sort_tuple_list(danceability_tuple_list)

#group these elements

ordered_danceabilitys = []
for elem in ordered_danceability_tuples:
    ordered_danceabilitys.append(elem[1])

diff = [y - x for x, y in zip(*[iter(ordered_danceabilitys)] * 2)]
avg = sum(diff) / len(diff) 

#actual grouping
danceability_groups = [[ordered_danceabilitys[0]]]
for x in ordered_danceabilitys[1:]:
    if x - danceability_groups[-1][0] < avg:
        danceability_groups[-1].append(x)
    else:
        danceability_groups.append([x])

for elem in danceability_groups:
    for item in elem:
        if item in ordered_danceabilitys:
            origin = elem.index(item)
            position = ordered_danceabilitys.index(item)
            track_id = ordered_danceability_tuples[position][0]
            elem.insert(origin, track_id)
            elem.remove(item)

print('Danceability groups:')
print(danceability_groups)

Danceability groups:
[['5BzpLTf8BIYKmSPZa33WXA', '5pJPgy2jGvvNUNfHPvG3Zp', '7pcANiSH8mEKLUIPAxiSDr'], ['4sRoiXZBLpiRIklm2wy0WZ'], ['1GfwrTKOzOxK2cfPcrMyuP'], ['30QNjcM3Q1GnLFIIJjWQL1', '3vWeZSOhhmOai0Go0zKm5j'], ['34xTFwjPQ1dC6uJmleno7x'], ['5OUTFH5acycdnf8OVo21Gv', '37IFFBgI7qnLKqGP15mmIu'], ['6SEolIp22t0DzeBfCBo3hr', '6pPPm7T9JJOGdfjCTct54l', '26ZX2JrAb8AFbr8FFfAsO7'], ['7zxLkZbUxITHabPzGN8Xgc', '7zxLkZbUxITHabPzGN8Xgc'], ['7rKzXE9oHgFXDbAQ3PGgnu', '7rKzXE9oHgFXDbAQ3PGgnu'], ['4jSE5cAaa5rwTyhDSXfwQN'], ['3FAJ6O0NOHQV8Mc5Ri6ENp'], ['491W4t6qtEvd1MupR9r3Zm'], ['3lpnEuEJeoYiRdmcM2yFwi', '0WeFvEY2QEjcKVd8ymli4i'], ['7vCoNV0SP6UWQCKt3eFNKk', '1mrPG8snf4maJMoM4Ec8Ag'], ['53ISLAcEqXWzcpKMvVzWYE', '164VgxTozx99XCinCB9ITR'], ['450Y968X52UJY9gLBUSk4s', '6ZzYETKetIfNUsZUb23jgG'], ['0VF7YLIxSQKyNiFL3X6MmN', '0ACt3PP22HyKfpFIV6AQUW'], ['3usbnvDFtOhY09cRNar8Zg', '3B8XuaY9goiMAVCiuEC771', '6Ww8GHdPCl8MqZBhPn4LKd', '6Ww8GHdPCl8MqZBhPn4LKd'], ['4sMmYKC0ot3GTbl2RzHw7T', '1BJIJ69DZNip7Erq6u69mu'], ['15

In [34]:
#ENERGY GROUP CODE
attribute = 'energy'

energy_list = list(combined_table.loc[:, f'{attribute}'])
energy_table = combined_table.loc[:, ['name', f'{attribute}']]
energy_tuple_list = []
for item in np.arange(len(list(energy_list))):
    track_id = (energy_table.index.values.tolist())[item]
    energy_value = list(energy_table[f'{attribute}'])[item]
    energy_tuple = (track_id, energy_value)
    energy_tuple_list.append(energy_tuple)
ordered_energy_tuples = sort_tuple_list(energy_tuple_list)

#group these elements

ordered_energys = []
for elem in ordered_energy_tuples:
    ordered_energys.append(elem[1])

diff = [y - x for x, y in zip(*[iter(ordered_energys)] * 2)]
avg = sum(diff) / len(diff) 

#actual grouping
energy_groups = [[ordered_energys[0]]]
for x in ordered_energys[1:]:
    if x - energy_groups[-1][0] < avg:
        energy_groups[-1].append(x)
    else:
        energy_groups.append([x])

for elem in energy_groups:
    for item in elem:
        if item in ordered_energys:
            origin = elem.index(item)
            position = ordered_energys.index(item)
            track_id = ordered_energy_tuples[position][0]
            elem.insert(origin, track_id)
            elem.remove(item)
print('Energy groups:')
print(energy_groups)

Energy groups:
[['30QNjcM3Q1GnLFIIJjWQL1'], ['34xTFwjPQ1dC6uJmleno7x'], ['1xcfviDw2U2bCqRdeiXiRg'], ['6SEolIp22t0DzeBfCBo3hr'], ['5PHaM8RbwwqZWlptITQmSg', '7FNsGGAGyMoSwjH3ivmcep'], ['1BJIJ69DZNip7Erq6u69mu', '7rKzXE9oHgFXDbAQ3PGgnu'], ['7pt64sepCy5QmDrgy9wOJS'], ['5DrCWAaQ8zklweBa9abFIu'], ['2UvMgTm9y3lStOyQE2yxKA', '3FpEXAupLwCHwzeUBxF99S', '45PxuJqJBnPXZKLxoo9Apj'], ['0WeFvEY2QEjcKVd8ymli4i'], ['1mrPG8snf4maJMoM4Ec8Ag', '4jSE5cAaa5rwTyhDSXfwQN', '39oeKRgsLeynDIbWuXyA47'], ['5BzpLTf8BIYKmSPZa33WXA', '5BzpLTf8BIYKmSPZa33WXA'], ['3Ug1BrdE7qBi7pCyt0KM4n', '3XstzgzP0rp3bzElEnRVHv', '2ZwIO3ufWLFYxtEoam9ydu'], ['4fi9IIcjYzxRTRwJUyFO6Q', '1ZEFYW6nPEvIcsIvymgsLk'], ['3B8XuaY9goiMAVCiuEC771', '0edtKj1oW6rJ9f4wRtLPPH'], ['5SMCxRA6hB2jEhroaYfw6N', '7r1MqPAD09w24mCUUbxiCI'], ['18vu0Yh3nio1TtVPI1ZFLc', '2gb07B9t0F0T1q5zJBy0ZU', '2EgfLUS0jNiujIWc3ZLEtn'], ['0UFthA0qo3JDLxqfG25kgP', '6lY38FkInSA0QVHRb1PiEy'], ['50S52c98UbsIIiG0F1uyat', '2G3l5uYzMqVAgEjcF8XtRj'], ['5A1w94uzc1kO1Zhe8WWxC3', '1iTiSQWP

In [36]:
#aggregate all of these elements into one list and sort it by length 
#groups songs through valence, danceability, and energy

aggregated_groups = []
for elem in valence_groups:
    aggregated_groups.append(elem)
for elem in danceability_groups:
    aggregated_groups.append(elem)
for elem in energy_groups:
    aggregated_groups.append(elem)

ordered_aggregated_groups = sorted(aggregated_groups, key=lambda x: len(x), reverse=True)  #this sorts in descending order the length of groups

interest_groups = []
counter = num_new_songs
while counter != 0:
    interest_groups.append(ordered_aggregated_groups[num_new_songs - counter])
    counter -= 1


Now we have a list of groups that's equal to the number of new songs we want to return. Now, we want to insert these into Spotipy recommendation function.

In [37]:
#retrieves ID's of songs to be recommended
final_song_ids = []
new_songs_ids = []

for elem in interest_groups:
    if len(elem) > 5:  #this code edits the groups, as you can only have 5 song ids passed into the recommendation function
        edited = []
        limit = 5
        while limit != 0:
            edited.append(elem[5 - limit])
            limit -= 1
        elem = edited
    song_added = False
    raw_rec = sp.recommendations(seed_tracks=elem, limit=10)
    possible_tracks = raw_rec['tracks']

    list_of_possible = []
    for item in possible_tracks:
        item_id = item['id']
        list_of_possible.append(item_id)
    while song_added == False:
        random_track = random.choice(list_of_possible)
        if random_track not in final_song_ids:
            if random_track not in og_song_ids:
                song_added = True
                new_songs_ids.append(random_track)

In [38]:
#turns ids into names of tracks
new_song_dict = {}
for elem in new_songs_ids:
    name = sp.track(elem)['name']
    new_song_dict[elem] = name

new_songs_names = []
for elem in new_songs_ids:
    name = new_song_dict[elem]
    new_songs_names.append(name)

new_songs_table = Table().with_column("Name", new_songs_names)
new_songs_table
# print(f'The new songs are: {new_songs_names}')

Name
minimal
RACECAR
EARFQUAKE
New Biome
A BOY IS A GUN*
Senior Skip Day
Until I See You Again
single (on the weekend)
Get Busy - Live in LA
Reanimator (feat. Yves Tumor)


In [39]:
final_songs = og_similar_songs + new_songs_names
og_keys = list(original_songs_dictionary.keys())
og_values = list(original_songs_dictionary.values())
new_keys = list(new_song_dict.keys())
new_values = list(new_song_dict.values())
concat_keys = og_keys + new_keys 
concat_values = og_values + new_values
reference_dictionary = {}
reference_length = len(concat_keys)
for elem in np.arange(reference_length):
    reference_dictionary[concat_keys[elem]] = concat_values[elem]

final_songs_dictionary = {}

def get_key(val):
    for key, value in reference_dictionary.items():
        if val == value:
            return key
for elem in final_songs:
    final_songs_dictionary[elem] = get_key(elem)


In [40]:
print(f"The final playlist, '{new_playlist_name}', of {num_songs_output} tracks is {final_songs}")

The final playlist, 'Spotify Recommended Playlist #1', of 46 tracks is ['This Christmas', 'Natural', "You've Got a Friend In Me", 'OH!!', 'July on Film', 'All I Need (with Mahalia & Ty Dolla $ign)', 'p r i d e . i s . t h e . d e v i l (with Lil Baby)', 'venus fly trap', '夜に駆ける', 'icarus', 'CAN YOU HEAR THE MOON', 'Hypnotized', 'Leaves (Reboot)', 'looking for something', 'My Favourite Clothes', 'Still into You', 'Bad Habit - Sped Up', 'New Light', 'Right Back', 'Pretty Girl', 'Beautiful', "Don't Lose Sight", 'Warm on a Cold Night', 'minimal', 'RACECAR', 'EARFQUAKE', 'New Biome', 'A BOY IS A GUN*', 'Senior Skip Day', 'Until I See You Again', 'single (on the weekend)', 'Get Busy - Live in LA', 'Reanimator (feat. Yves Tumor)', 'Keeping Tabs', 'Japanese Denim', 'Like A G6', 'Dizzy on the Comedown', 'Smells Like Me', 'I Do - Bonus Track', 'エンヴィーベイビー x KING', 'Flowers', 'One Right Now (with The Weeknd)', 'Rimbaud, Come and Sit For A While', 'No Return (with The Kid LAROI & Lil Durk)', 'wave'

Future edits and improvement ideas:

- More efficient code for each attribute group (improve runtime)

- Creating 'attribute groups' so that we are able to rely on more attributes efficiently and make those our features

- Combining more than 2 playlists

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=fe133659-9a1f-430c-9b82-81b4b263cc0b' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>