# Spotify Mood Session

## 1. Algorithm

a. **First Selection**. User selects a song $s_0$ from the set of all songs $S$ using some kind of search method (by name / artist) and plays that song.

b. **Played List**.  Let $P$ be a new list of played songs and let $P \gets s_0$   

c. **Cold Start**. Create a bootstrap random selection of $n$ songs from $S$ called $R$. Calculate the distance between each song and the first selection and select the closest to the chosen song . Formally:

$$
\forall s \in R \text{ dist}(s_i, s_0)
$$

*potential parameter: use a value to add the closest n songs to the initial bootstrap played list*

d. **Rating**. For every song that a user plays for more than 30 seconds, the song is rated $\text{pos}$, songs that are skipped within this time period are rated $\text{neg}$.  Songs that have not been played are rated $\text{non}$

e. **Recommending**.  Take the current set of vectors and select the next song based on some sort of agregated mean or nearest neighbour system.

repeat (d) and (e) ad infinitum.

Formally:

$$
\text{let: } s_0 \gets \text{ user}(S) \\
\text{let: } P = \{ s_0 \} \\
\text{let: } R \subset S \\
\text{let: } D = \emptyset \\ 
\text{for } s_i \text{ in } R:\\
D_i = \text{dist}(s_i, s_0) \\
\text{end for }\\
\text{select}(\text{max}(D))\\
$$

In [None]:
import json
import pandas as pd
import os
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials as cc
import numpy as np
import pandas as pd
from IPython.display import display
import warnings
import time

# create list of playlists
playlists = []
path = 'data'
count = 5
for filename in os.listdir(path):
    count -= 1
    if count < 0:
        break
    f = os.path.join(path, filename)
    if os.path.isfile(f):
        d = json.load(open(f))
        playlists.append(pd.DataFrame(d['playlists']))

playlists = pd.concat(playlists)
playlists = playlists.reset_index(drop=True)
print(playlists.shape)

In [None]:
# Add all songs from the playlists into a single list of songs
songs = [] 
for i in range(len(playlists)):
    tracks = playlists.iloc[i].loc['tracks']
    for track in tracks:
        songs.append(track)

songs_df = pd.DataFrame(songs)
print(songs_df.shape)
songs_df.head()

In [None]:
# Remove duplicate songs by URI
songs_df = songs_df.drop_duplicates('track_uri')

In [None]:
# Add score placeholder to the songs
classification = [0]*len(songs_df)
songs_df.insert(8, "class", classification) # Append data frame by one column

In [None]:
# Print song data to ensure the score was added
print(songs_df.iloc[0])

In [None]:
#CREATE DATASET WITH SONG AUDIO FEATURES FROM THE URIS OF songs_df
client_id = '898dd71dc932407e85921f0ac79f0127'
client_secret = '16d994f070064371beb8758f32d64180'
#https://developer.spotify.com/dashboard/applications/898dd71dc932407e85921f0ac79f0127
client_cc = cc(client_id=client_id,client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_cc)


sleep_min = 2
sleep_max = 3
start_time = time.time()
request_count = 0
tracks_w_features = pd.DataFrame()
for i in range(0,20):
    request_count+=1
    if request_count % 2 == 0:
        time.sleep(np.random.uniform(sleep_min, sleep_max))
    else:
        tracks_to_get = songs_df.iloc[request_count:request_count+50]["track_uri"] #get the uri of these songs to get audio features
        tracks = pd.DataFrame(sp.audio_features(tracks_to_get))
        tracks_w_features = tracks_w_features.append(tracks)

#remove useless-categorical columns, can use index to get song information later
input_ = tracks_w_features.drop(columns=['mode', 'key', 'time_signature', 'duration_ms', 'analysis_url', 'track_href', 'id', 'type']).copy()
display(input_.head(20))
warnings.filterwarnings("ignore")
warnings.filterwarnings(action='once')

  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)
  tracks_w_features = tracks_w_features.append(tracks)


Unnamed: 0,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri
0,0.774,0.838,-3.914,0.114,0.0249,0.025,0.242,0.924,143.04,spotify:track:6I9VzXrHxO9rA9A5euc8Ak
1,0.664,0.758,-6.583,0.21,0.00238,0.0,0.0598,0.701,99.259,spotify:track:0WqIKmW4BTrj3eJFmnCKMv
2,0.892,0.714,-6.055,0.141,0.201,0.000234,0.0521,0.817,100.972,spotify:track:1AWQoqb9bSvzTjaLralEkT
3,0.853,0.606,-4.596,0.0713,0.0561,0.0,0.313,0.654,94.759,spotify:track:1lzr43nnXAijIGYnCT8M8H
4,0.881,0.788,-4.669,0.168,0.0212,0.0,0.0377,0.592,104.997,spotify:track:0XUfyU2QviPAs6bxSpXYG4
5,0.662,0.507,-8.238,0.118,0.257,0.0,0.0465,0.676,86.412,spotify:track:68vgtRHr7iZHpzGpon6Jlo
6,0.57,0.821,-4.38,0.267,0.178,0.0,0.289,0.408,210.857,spotify:track:3BxWKCI06eQ5Od8TY2JBeA
7,0.713,0.678,-3.525,0.102,0.273,0.0,0.149,0.734,138.009,spotify:track:7H6ev70Weq6DdpZyyTmUXk
8,0.727,0.974,-2.261,0.0664,0.103,0.000532,0.174,0.965,79.526,spotify:track:2PpruBYCo4H7WOBJ7Q2EwM
9,0.808,0.97,-6.098,0.0506,0.0569,6.1e-05,0.154,0.868,114.328,spotify:track:2gam98EZKrF9XuOkU13ApN


In [None]:
# Search for a particular song 
print("Search for a song name")
s = input()
pattern = [s]

# filter for rows that contain the partial string inputted by the user
results = songs_df.copy()
results[songs_df.track_name.str.contains('|'.join(pattern))]

Search for a song name


Unnamed: 0,pos,artist_name,track_uri,artist_uri,track_name,album_uri,duration_ms,album_name,class
540,26,Kendrick Lamar,spotify:track:439TlnnznSiBbQbgXiBqAd,spotify:artist:2YZyLoL8N0Wb9xBt1NhZWg,m.A.A.d city,spotify:album:6PBZN8cbwkqm1ERj2BGXJ1,350120,"good kid, m.A.A.d city",0
2784,72,System Of A Down,spotify:track:7MeXwzhSDGfqNfuFANgzVC,spotify:artist:5eAWCfyUhZtHHtBdNk56l1,Toxicity,spotify:album:4DR0GWo7w2GJyQnFVa4jAB,218933,Toxicity,0
32399,29,Maximo Park,spotify:track:1yG1OpMiV8WtDeOYEBvNW8,spotify:artist:048FBwXjFYBWxSggPDipic,Our Velocity,spotify:album:6mDHmXgvy79alh2CwEmMxJ,200386,Our Earthly Pleasures,0
39011,75,Simian Mobile Disco,spotify:track:7A4hUAE33LMKn7r8Lsuyeg,spotify:artist:31DXlldabwPHwu6dYevuzK,Audacity Of Huge,spotify:album:3Utmuuw9ni2zqiBuuB2ZAR,227973,Temporary Pleasure,0
42623,177,Orchestral Manoeuvres In The Dark,spotify:track:60TmG8cFPMBo7kTTFOLiAB,spotify:artist:7wJ9NwdRWtN92NunmXuwBk,Electricity - 2003 Digital Remaster,spotify:album:5AN8aqTrcuMZO7rpD9U2yF,219066,Orchestral Manoeuvres In The Dark,0
50210,3,Front Line Assembly,spotify:track:2tPvXrTDTSGxE9Yv8VOkDZ,spotify:artist:5QYAExYvxixCVwX70SRf7F,Plasticity,spotify:album:2L5LdDSC4z4liUhYI4b8mC,408986,Plasticity,0
51024,14,Kendrick Lamar,spotify:track:5v501hmUMvbHPKa7qm0MGD,spotify:artist:2YZyLoL8N0Wb9xBt1NhZWg,F*ck Your Ethnicity,spotify:album:13WjgUEEAQp0d9JqojlWp1,224671,Section.80,0
55020,15,Elton John,spotify:track:3HEdgvJdbR9ok8mWLBzWch,spotify:artist:3PhoLpVuITZKcymswpck5b,Electricity,spotify:album:2gGHpqMYZMd416OqT26Znh,353626,Billy Elliot: The Original Cast Recording,0
82082,54,Iration,spotify:track:5UOOrcCcs5vX8ZVYqAIxh0,spotify:artist:0q9lPhJHW5R9J7RXIJRbTk,Electricity,spotify:album:0usoUmIYyLaIk9ziGojF8a,241208,Sample This - EP,0
115186,37,Rising Appalachia,spotify:track:2aN3P9gtIWfvg03RicBqeJ,spotify:artist:3I6e2ZqqoxQhXc9z7Tp5ci,Synchronicity,spotify:album:31Zi0Y9LF7yN08gNWgLHrs,248573,Wider Circles,0


In [29]:
# Ask the user to enter a track URI for the first song they want to play
print("Select your first song by copying the track_URI from above (i.e. spotify:track:2wSAWEYUHkt92X4SBAPqZE)")
s = input()

# Mark this song as "Played" with apositive rating (as the user chose it)
index = songs_df.index[songs_df['track_uri'] == s]     # This returns the song with the uri of 's'
# print(index.values[0])
# songs_df.iloc[index.values[0]]


Select your first song by copying the track_URI from above (i.e. spotify:track:2wSAWEYUHkt92X4SBAPqZE)


In [16]:
print(tracks_w_features.head(20))


    danceability  energy  key  loudness  mode  speechiness  acousticness  \
0          0.774   0.838    5    -3.914     0       0.1140      0.024900   
1          0.664   0.758    2    -6.583     0       0.2100      0.002380   
2          0.892   0.714    4    -6.055     0       0.1410      0.201000   
3          0.853   0.606    0    -4.596     1       0.0713      0.056100   
4          0.881   0.788    2    -4.669     1       0.1680      0.021200   
5          0.662   0.507    5    -8.238     1       0.1180      0.257000   
6          0.570   0.821    2    -4.380     1       0.2670      0.178000   
7          0.713   0.678    5    -3.525     0       0.1020      0.273000   
8          0.727   0.974    4    -2.261     0       0.0664      0.103000   
9          0.808   0.970   10    -6.098     0       0.0506      0.056900   
10         0.710   0.553    4    -4.722     0       0.0292      0.002060   
11         0.660   0.666    9    -4.342     1       0.0472      0.075900   
12         0

In [31]:
continue_ = True
current_song_index = songs_df.index[songs_df['track_uri'] == s].tolist()[0]
prev_features = []

while continue_:
    current_song_artist = songs_df.iloc[current_song_index]["artist_name"]
    current_song_name = songs_df.iloc[current_song_index]["track_name"]

    # Compute explanation
    selected_feature = "ERROR" # Default value
    track_uri = songs_df.iloc[current_song_index]["track_uri"]
    print(tracks_w_features.where(tracks_w_features["uri"] == track_uri))
    print("URI" + track_uri)

    explanation_string = "This song was chosen because the %s fits your session best.\n" % selected_feature

    # Add features to previous features
    #prev_features.append(input_.iloc[current_song_index])

    # Compare to average

    s = input(explanation_string + current_song_artist + " - " + current_song_name + "\n" + "Skip (y) or listen (n) or quit (q)?")
    if s == "y":
        print("skipped")
        songs_df.at[current_song_index, "class"] = -1 # negative rating
    elif s == "n":
        print("not skipped")
        songs_df.at[current_song_index, "class"] = 1  # positive rating
    elif s == "q":
        continue_ = False
    else:
        print("Invalid input")
    current_song_index = 0  # TODO: Get next song from the recommender system


    danceability  energy  key  loudness  mode  speechiness  acousticness  \
0            NaN     NaN  NaN       NaN   NaN          NaN           NaN   
1            NaN     NaN  NaN       NaN   NaN          NaN           NaN   
2            NaN     NaN  NaN       NaN   NaN          NaN           NaN   
3            NaN     NaN  NaN       NaN   NaN          NaN           NaN   
4            NaN     NaN  NaN       NaN   NaN          NaN           NaN   
..           ...     ...  ...       ...   ...          ...           ...   
45           NaN     NaN  NaN       NaN   NaN          NaN           NaN   
46           NaN     NaN  NaN       NaN   NaN          NaN           NaN   
47           NaN     NaN  NaN       NaN   NaN          NaN           NaN   
48           NaN     NaN  NaN       NaN   NaN          NaN           NaN   
49           NaN     NaN  NaN       NaN   NaN          NaN           NaN   

    instrumentalness  liveness  valence  tempo type   id  uri track_href  \
0          