# Spotify Mood Session

## 1. Algorithm

a. **First Selection**. User selects a song $s_0$ from the set of all songs $S$ using some kind of search method (by name / artist) and plays that song.

b. **Played List**.  Let $P$ be a new list of played songs and let $P \gets s_0$   

c. **Cold Start**. Create a bootstrap random selection of $n$ songs from $S$ called $R$. Calculate the distance between each song and the first selection and select the closest to the chosen song . Formally:

$$
\forall s \in R \text{ dist}(s_i, s_0)
$$

*potential parameter: use a value to add the closest n songs to the initial bootstrap played list*

d. **Rating**. For every song that a user plays for more than 30 seconds, the song is rated $\text{pos}$, songs that are skipped within this time period are rated $\text{neg}$.  Songs that have not been played are rated $\text{non}$

e. **Recommending**.  Take the current set of vectors and select the next song based on some sort of agregated mean or nearest neighbour system.

repeat (d) and (e) ad infinitum.

Formally:

$$
\text{let: } s_0 \gets \text{ user}(S) \\
\text{let: } P = \{ s_0 \} \\
\text{let: } R \subset S \\
\text{let: } D = \emptyset \\ 
\text{for } s_i \text{ in } R:\\
D_i = \text{dist}(s_i, s_0) \\
\text{end for }\\
\text{select}(\text{max}(D))\\
$$

In [1]:
import json
import pandas as pd
import os
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials as cc
import numpy as np
import pandas as pd
from IPython.display import display
import warnings
import time

# create list of playlists
playlists = []
path = 'data'
count = 5
for filename in os.listdir(path):
    count -= 1
    if count < 0:
        break
    f = os.path.join(path, filename)
    if os.path.isfile(f):
        d = json.load(open(f))
        playlists.append(pd.DataFrame(d['playlists']))

playlists = pd.concat(playlists)
playlists = playlists.reset_index(drop=True)
print(playlists.shape)

(5000, 12)


In [2]:
# Add all songs from the playlists into a single list of songs
songs = [] 
for i in range(len(playlists)):
    tracks = playlists.iloc[i].loc['tracks']
    for track in tracks:
        songs.append(track)

songs_df = pd.DataFrame(songs)
print(songs_df.shape)
songs_df.head()

(335094, 8)


Unnamed: 0,pos,artist_name,track_uri,artist_uri,track_name,album_uri,duration_ms,album_name
0,0,Missy Elliott,spotify:track:0UaMYEvWZi0ZqiDOoHU3YI,spotify:artist:2wIVse2owClT7go1WT98tk,Lose Control (feat. Ciara & Fat Man Scoop),spotify:album:6vV5UrXcfyQD1wu4Qo2I9K,226863,The Cookbook
1,1,Britney Spears,spotify:track:6I9VzXrHxO9rA9A5euc8Ak,spotify:artist:26dSoYclwsYLMAKD3tpOr4,Toxic,spotify:album:0z7pVBGOD7HCIB7S8eLkLI,198800,In The Zone
2,2,Beyoncé,spotify:track:0WqIKmW4BTrj3eJFmnCKMv,spotify:artist:6vWDO969PvNqNYHIOW5v0m,Crazy In Love,spotify:album:25hVFAxTlDvXbx2X2QkUkE,235933,Dangerously In Love (Alben für die Ewigkeit)
3,3,Justin Timberlake,spotify:track:1AWQoqb9bSvzTjaLralEkT,spotify:artist:31TPClRtHm23RisEBtV3X7,Rock Your Body,spotify:album:6QPkyl04rXwTGlGlcYaRoW,267266,Justified
4,4,Shaggy,spotify:track:1lzr43nnXAijIGYnCT8M8H,spotify:artist:5EvFsr3kj42KNv97ZEnqij,It Wasn't Me,spotify:album:6NmFmPX56pcLBOFMhIiKvF,227600,Hot Shot


In [3]:
# Remove duplicate songs by URI
songs_df = songs_df.drop_duplicates('track_uri')

In [4]:
# Add score placeholder to the songs
classification = [0]*len(songs_df)
songs_df.insert(8, "class", classification) # Append data frame by one column

In [5]:
# Print song data to ensure the score was added
print(songs_df.iloc[0])

pos                                                     0
artist_name                                 Missy Elliott
track_uri            spotify:track:0UaMYEvWZi0ZqiDOoHU3YI
artist_uri          spotify:artist:2wIVse2owClT7go1WT98tk
track_name     Lose Control (feat. Ciara & Fat Man Scoop)
album_uri            spotify:album:6vV5UrXcfyQD1wu4Qo2I9K
duration_ms                                        226863
album_name                                   The Cookbook
class                                                   0
Name: 0, dtype: object


In [6]:
#CREATE DATASET WITH SONG AUDIO FEATURES FROM THE URIS OF songs_df
client_id = '898dd71dc932407e85921f0ac79f0127'
client_secret = '16d994f070064371beb8758f32d64180'
#https://developer.spotify.com/dashboard/applications/898dd71dc932407e85921f0ac79f0127
client_cc = cc(client_id=client_id,client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_cc)


sleep_min = 1
sleep_max = 2
start_time = time.time()
request_count = 0
tracks_w_features = pd.DataFrame()
for i in range(0,20):
    request_count+=1
    if request_count % 13 == 0:
        time.sleep(np.random.uniform(sleep_min, sleep_max))
    else:
        tracks_to_get = songs_df.iloc[request_count:request_count+50]["track_uri"] #get the uri of these songs to get audio features
        tracks = pd.DataFrame(sp.audio_features(tracks_to_get))
        tracks_w_features = tracks_w_features.append(tracks)

#remove useless-categorical columns, can use index to get song information later
tracks_w_features = tracks_w_features.drop(columns=['mode','key','time_signature', 'duration_ms','analysis_url','track_href','id','type']).copy()
display(tracks_w_features.head(20))
warnings.filterwarnings("ignore")
warnings.filterwarnings(action='once')

  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w

Unnamed: 0,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri
0,0.774,0.838,-3.914,0.114,0.0249,0.025,0.242,0.924,143.04,spotify:track:6I9VzXrHxO9rA9A5euc8Ak
1,0.664,0.758,-6.583,0.21,0.00238,0.0,0.0598,0.701,99.259,spotify:track:0WqIKmW4BTrj3eJFmnCKMv
2,0.892,0.714,-6.055,0.141,0.201,0.000234,0.0521,0.817,100.972,spotify:track:1AWQoqb9bSvzTjaLralEkT
3,0.853,0.606,-4.596,0.0713,0.0561,0.0,0.313,0.654,94.759,spotify:track:1lzr43nnXAijIGYnCT8M8H
4,0.881,0.788,-4.669,0.168,0.0212,0.0,0.0377,0.592,104.997,spotify:track:0XUfyU2QviPAs6bxSpXYG4
5,0.662,0.507,-8.238,0.118,0.257,0.0,0.0465,0.676,86.412,spotify:track:68vgtRHr7iZHpzGpon6Jlo
6,0.57,0.821,-4.38,0.267,0.178,0.0,0.289,0.408,210.857,spotify:track:3BxWKCI06eQ5Od8TY2JBeA
7,0.713,0.678,-3.525,0.102,0.273,0.0,0.149,0.734,138.009,spotify:track:7H6ev70Weq6DdpZyyTmUXk
8,0.727,0.974,-2.261,0.0664,0.103,0.000532,0.174,0.965,79.526,spotify:track:2PpruBYCo4H7WOBJ7Q2EwM
9,0.808,0.97,-6.098,0.0506,0.0569,6.1e-05,0.154,0.868,114.328,spotify:track:2gam98EZKrF9XuOkU13ApN


In [10]:
# Search for a particular song 
print("Search for a song name")
s = input()
pattern = [s]

# filter for rows that contain the partial string inputted by the user
results = songs_df.copy()
results[songs_df.track_name.str.contains('|'.join(pattern))]

Search for a song name


Unnamed: 0,pos,artist_name,track_uri,artist_uri,track_name,album_uri,duration_ms,album_name,class
2216,10,Sia,spotify:track:6bLopGnirdrilrpdVB6Um1,spotify:artist:5WUlDfRSoLAfcVSX1WnrxN,The Greatest,spotify:album:2eV6DIPDnGl1idcjww6xyX,210920,This Is Acting (Deluxe Version),0
3679,41,Miranda Lambert,spotify:track:6sXK2Fq7Ibi3SN6qYyQgIG,spotify:artist:66lH4jAE7pqPlOlzUKbwA0,Fastest Girl in Town,spotify:album:1Pv6Znx8kGygusP4fqBD7m,197600,Four The Record,0
6240,48,Cultura Profética,spotify:track:69q6JHLOPBZaypN9aExRsa,spotify:artist:65HuWBUC1d8ty1q6J42Nfi,Protesto,spotify:album:6klnfPtwc0fQ8dY0FMZOJP,300721,Canción de Alerta,0
6366,174,Plan B,spotify:track:4Oz2Nk8QmAHCpEY3LFocyg,spotify:artist:2jSGzJw0ebJLu7OLVSOcBP,Si No Le Contesto,spotify:album:5d6nffNXz4ywSsBXQWf0f3,221520,House Of Pleasure,0
8090,8,Busy Signal,spotify:track:2Bz7wNLTVOoeVdjqkMgz2b,spotify:artist:4RfTXjK9aiiIKDaKUHpL57,Protest,spotify:album:08W92azogN98ujQ7dtDvBC,187533,Tropical Connection Riddim,0
...,...,...,...,...,...,...,...,...,...
299933,53,Jeremy Zuckerman,spotify:track:4kuttaimghZdfXE8G3puvR,spotify:artist:1fbkVAnlEb34awQtWEDa6W,Greatest Change,spotify:album:2LOLFpqUHrlQPAiqzLVJBN,315853,The Legend of Korra: Original Music From Book One,0
312096,34,Teddy Pendergrass,spotify:track:24Uu2LvYtrQucgUL69WzMx,spotify:artist:68kACMx6A3D2BYiO056MeQ,"You're My Latest, Greatest Inspiration",spotify:album:6VrgfU55RZBu5EMsfnGDaF,322693,Love's Greatest Hits,0
322619,31,Tiësto,spotify:track:3pe2uKd3pdPRiA3JxgFoXE,spotify:artist:2o5jDhtHVPhrJdv3cEQ99Z,"Work Hard, Play Hard (Paris Fz & Simo T's Cont...",spotify:album:5DTPAaLMFMv9072Fizf9oP,307500,"Work Hard, Play Hard (Paris Fz & Simo T's Cont...",0
328879,1,Kidz Bop Kids,spotify:track:4W63PdwxlreSGoEZg6e7FJ,spotify:artist:1Vvvx45Apu6dQqwuZQxtgW,The Greatest,spotify:album:2tLmEW6OUwHh2cncZOVpQ7,171093,KIDZ BOP 34,0


In [54]:
# Ask the user to enter a track URI for the first song they want to play
print("Select your first song by copying the track_URI from above (i.e. spotify:track:2wSAWEYUHkt92X4SBAPqZE)")
#s = input()
s = "spotify:track:6bLopGnirdrilrpdVB6Um1"

# Mark this song as "Played" with apositive rating (as the user chose it)
index = tracks_w_features.index[tracks_w_features['uri'] == s]     # This returns the song with the uri of 's'
# print(index.values[0])
# songs_df.iloc[index.values[0]]

Select your first song by copying the track_URI from above (i.e. spotify:track:2wSAWEYUHkt92X4SBAPqZE)


In [55]:
import max_marg_relevance

display(songs_df['track_uri'])
display(tracks_w_features['uri'])

def getRecommendation(songs_df, liked_song_features):

    liked_songs = pd.DataFrame(songs_df.loc[songs_df['class'] == int(1)])

    # uri_small = liked_song_features['uri']
    # uri_big = liked_songs['track_uri']
    # j = 0
    # songs = []
    # for i in range(0,uri_small.shape[0]):
    #     if uri_small[i] == uri_big.iloc(j):
    #         songs.append(liked_song_features.iloc(j))
    #     j += 1

    #liked_song_features = liked_songs.where(liked_songs['track_uri'] == liked_song_features['uri'])
    average_session_songs = liked_song_features.mean(axis = 0)

    return max_marg_relevance.recommend(average_session_songs, tracks_w_features)

continue_ = True
current_song_index = songs_df.index[songs_df['track_uri'] == s].tolist()[0]
print(current_song_index)
while continue_:
    current_song_artist = songs_df.iloc[current_song_index]["artist_name"]
    current_song_name = songs_df.iloc[current_song_index]["track_name"]
    s = input(current_song_artist + "-" + current_song_name + "\n" + "Skip (y) or listen (n) or quit (q)?")
    if s == "y":
        print("skipped")
        songs_df.at[current_song_index, "class"] = -1 # negative rating
    elif s == "n":
        print("not skipped")
        songs_df.at[current_song_index, "class"] = 1  # positive rating
    elif s == "q":
        continue_ = False
    else:
        print("Invalid input")
    current_song_index = getRecommendation(songs_df,tracks_w_features)  # TODO: Get next song from the recommender system

0         spotify:track:0UaMYEvWZi0ZqiDOoHU3YI
1         spotify:track:6I9VzXrHxO9rA9A5euc8Ak
2         spotify:track:0WqIKmW4BTrj3eJFmnCKMv
3         spotify:track:1AWQoqb9bSvzTjaLralEkT
4         spotify:track:1lzr43nnXAijIGYnCT8M8H
                          ...                 
335089    spotify:track:2xiEksxtxbaIRESGkZEgMD
335090    spotify:track:7w7eGqJVdNjxGWKw8QAFB6
335091    spotify:track:6G8ehNX2Lv3WSj6oPy5zm7
335092    spotify:track:7bvSgTraYLCccICldMJ4Hz
335093    spotify:track:365jVDUTjmZicBp26AzYMi
Name: track_uri, Length: 107781, dtype: object

0     spotify:track:6I9VzXrHxO9rA9A5euc8Ak
1     spotify:track:0WqIKmW4BTrj3eJFmnCKMv
2     spotify:track:1AWQoqb9bSvzTjaLralEkT
3     spotify:track:1lzr43nnXAijIGYnCT8M8H
4     spotify:track:0XUfyU2QviPAs6bxSpXYG4
                      ...                 
45    spotify:track:1HZ3cUZUw5htSFmah1V8Ko
46    spotify:track:43pmQcgxMMhJu1gPW2ZdNz
47    spotify:track:6Vz7vzOpCwKeSQlfViibuY
48    spotify:track:4sFbojhVXQv7dBC9PVCcRn
49    spotify:track:1dv3ePjze9tPq2pk8eWJdR
Name: uri, Length: 950, dtype: object

2216
not skipped


KeyError: 0