# Spotify Recommendation Systems

In this notebook I try a couple of different recommendation systems for music on Spotify.

1. Content-Based
    - Using Spotify's track audio features, I compute the euclidean distance between songs in an attempt to find songs that sound similar to each other.  I then recommend the most similar songs.  This will take into account features provided by Spotify such as acousticness, loudness, energy, key, and tempo.
    
2. Collaborative Filtering
    - I wanted to attempt this but I don't believe it's possible to get data on that many specific users, even anonymized.
    - This version takes into account a user's history.  It identifies songs that a user has liked, finds users that have liked the most similar songs, and then recommends songs those users liked that the original user has not liked (presuming that they have not yet heard the song, not that they already disliked it). 

In [1]:
import requests
import spotipy

import sys
sys.path.append('../modules/')
import lyrics_grab
import credentials
from spotipy.oauth2 import SpotifyClientCredentials
import pickle
import numpy as np
import pandas as pd

In [2]:
auth_manager = SpotifyClientCredentials(client_id=credentials.spotify_client_id,
                                        client_secret=credentials.spotify_client_secret)
sp = spotipy.Spotify(auth_manager=auth_manager)

In [3]:
with open('../data/metal_raw.pickle','rb') as rf:
    metal_raw = pickle.load(rf)

In [4]:
artists = lyrics_grab.extract_artist_info(metal_raw)

In [5]:
# songs = lyrics_grab.extract_song_info(list(artists.keys()))

In [6]:
# with open('../data/song_info.pickle','wb') as out:
#     pickle.dump(songs,out)

In [7]:
# song_ids = lyrics_grab.get_song_ids(songs)

In [8]:
# audio_features = lyrics_grab.get_audio_features(song_ids)

In [9]:
# with open('../data/audio_features.pickle','wb') as out:
#     pickle.dump(audio_features,out)

In [10]:
with open('../data/song_info.pickle','rb') as rf:
    songs = pickle.load(rf)
    
with open('../data/audio_features.pickle','rb') as rf:
    audio_features = pickle.load(rf)

In [11]:
audio_df = pd.DataFrame(audio_features)
songs_df = pd.DataFrame(songs)

In [12]:
df = songs_df.merge(audio_df,left_on='id',right_on='id')

In [13]:
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

In [14]:
scaler = StandardScaler()

In [15]:
num_data = df.select_dtypes('number')
scaled_data = scaler.fit_transform(num_data)

In [38]:
scores = np.prod(num_data,axis=1)

In [39]:
scores.shape

(48554,)

In [40]:
df['scores'] = scores

In [41]:
df = df.sort_values('scores',axis=0).reset_index(drop=True)

In [42]:
ind = df.index[(df.song_name=="Waking the Demon") & (df.artist_name=='Bullet For My Valentine')][0]

In [43]:
def make_clickable(val):
    return f'<a href="{val}">{val}</a>'

In [48]:
pd.concat([df.iloc[ind-50:ind],df.iloc[ind+1:ind+51]])[['artist_name','song_name','popularity','link']].sort_values('popularity',ascending=False).head(10).style.format({'link':make_clickable})

Unnamed: 0,artist_name,song_name,popularity,link
41058,The Amity Affliction,Drag the Lake,61,https://open.spotify.com/track/4vVzmaBsxXuKrJeMSLRF4y
41068,Arch Enemy,War Eternal,58,https://open.spotify.com/track/0WZZENH0kt3O2cBE8q5IRq
41117,Shaman's Harvest,In Chains,55,https://open.spotify.com/track/4PX6YOc2ysHxiBqwNNca2F
41132,Bon Jovi,Limitless,54,https://open.spotify.com/track/71JuDGXgyY7MbmXtldZ4C3
41084,The Cult,Love Removal Machine,53,https://open.spotify.com/track/3w6URtAxoSs2JAw9PvYhbH
41077,Architects,Modern Misery,52,https://open.spotify.com/track/6jaJVPLZOME4TtgW1Es5NT
41045,Avenged Sevenfold,Brompton Cocktail,51,https://open.spotify.com/track/3H2VmsXSSxjIv1UUPEQ30d
41076,3 Doors Down,The Road I'm On,48,https://open.spotify.com/track/2C1m7Lw5cUpFWUkN6Tovik
41094,Silverchair,Straight Lines,46,https://open.spotify.com/track/3gHeBUfAPPUxAJoQ8lsjt2
41063,"Woe, Is Me",Vengeance,43,https://open.spotify.com/track/77o5lCGzfnfVzeEhCgn1BD


In [50]:
artist_scores = df.groupby('artist_name')[['scores','popularity']].mean().reset_index().sort_values('scores',ascending=False).reset_index(drop=True)
artist_scores

Unnamed: 0,artist_name,scores,popularity
0,Isaiah Rashad,3.145670e+02,68.00
1,The Alchemists (double CD),0.000000e+00,7.00
2,Stellar Exodus,0.000000e+00,31.00
3,Sylvia Trouble,0.000000e+00,33.00
4,Sylar 67,0.000000e+00,5.00
...,...,...,...
2564,Týr,-1.174523e+13,46.50
2565,Mudvayne,-1.264509e+13,39.52
2566,Picturesque,-1.272779e+13,28.54
2567,apocalyptical,-1.294045e+13,39.50


In [None]:
artist_scores = artist_scores[artist_scores.artist_name!='Rauw Alejandro']
artist_scores = artist_scores[artist_scores.artist_name!='Jowell & Randy']
artist_scores = artist_scores[artist_scores.artist_name!='Rachel Platten']
artist_scores = artist_scores[artist_scores.artist_name!='Corinne Baily Rae']
artist_scores = artist_scores[artist_scores.artist_name!="Rag'n'Bone Man"]
artist_scores = artist_scores[artist_scores.artist_name!='Lenin Ramírez']
artist_scores = artist_scores[artist_scores.artist_name!='Au/Ra']
artist_scores = artist_scores[artist_scores.artist_name!='Hot Chelle Rae']
artist_scores = artist_scores[artist_scores.artist_name!='Don Omar']
artist_scores = artist_scores[artist_scores.artist_name!='Rae Sremmurd']
artist_scores = artist_scores[artist_scores.artist_name!='Don Omar']
artist_scores = artist_scores[artist_scores.artist_name!='girl in red']
artist_scores = artist_scores[artist_scores.artist_name!='Corinne Bailey Rae']
artist_scores = artist_scores[artist_scores.artist_name!='Ray Parker Jr.']
artist_scores = artist_scores[artist_scores.artist_name!='Elle King']
artist_scores = artist_scores[artist_scores.artist_name!='Omarion']
artist_scores = artist_scores[artist_scores.artist_name!='Leona Lewis']
artist_scores = artist_scores[artist_scores.artist_name!='Chance the Rapper']
artist_scores = artist_scores[artist_scores.artist_name!='Lin-Manuel Miranda']
artist_scores = artist_scores[artist_scores.artist_name!='White Noise Baby Sleep']
artist_scores = artist_scores[artist_scores.artist_name!='Ray LaMontagne']
artist_scores = artist_scores[artist_scores.artist_name!='Rain Sounds']
artist_scores = artist_scores[artist_scores.artist_name!='The Weeknd']
artist_scores = artist_scores[artist_scores.artist_name!='Rain Sounds For Sleep']
artist_scores = artist_scores[artist_scores.artist_name!='RaeLynn']
artist_scores = artist_scores[artist_scores.artist_name!='Carin Leon']
artist_scores = artist_scores[artist_scores.artist_name!='Rascal Flatts']
artist_scores = artist_scores[artist_scores.artist_name!='Isaiah Rashad']
artist_scores = artist_scores[artist_scores.artist_name!='Ray J']
artist_scores = artist_scores[artist_scores.artist_name!='Céline Dion']
artist_scores = artist_scores[artist_scores.artist_name!='Baby Sleep']
artist_scores = artist_scores[artist_scores.artist_name!='Taylor Ray Holbrook']
artist_scores = artist_scores[artist_scores.artist_name!='Vancouver Sleep Clinic']

In [72]:
ind = artist_scores.index[artist_scores.artist_name=='Avenged Sevenfold'][0]

In [73]:
pd.concat([artist_scores.iloc[ind-20:ind],artist_scores.iloc[ind+1:ind+21]]).sort_values('popularity',ascending=False).head(10).style.format({'link':make_clickable})

Unnamed: 0,artist_name,scores,popularity
1874,The Offspring,-25413413016.020702,55.673469
1879,Video Games Live,-26184593146.003838,51.0
1897,I Prevail,-28780295123.79045,50.458333
1878,Poison,-25715911696.92153,46.066667
1904,Killswitch Engage,-29818443549.32227,45.86
1896,Accept,-28740174133.42641,44.315789
1876,Bloodhound Gang,-25632123710.837147,41.4
1901,John Petrucci,-29314746847.200523,39.5
1906,Nile Rodgers,-30801056658.47974,39.25
1883,Blaze and the Monster Machines,-26638353531.304665,39.0


In [74]:
with open('../../lyric-nlp/artist_scores.pickle','wb') as out:
    pickle.dump(artist_scores,out)

In [49]:
with open('../../lyric-nlp/song_scores.pickle','wb') as out:
    pickle.dump(df,out)

In [67]:
pd.options.display.max_rows=None
artist_scores.sort_values('popularity',ascending=False)

Unnamed: 0,artist_name,scores,popularity
869,Peach Tree Rascals,-5634955.0,83.0
416,Radiohead,0.0,82.0
126,The Killers,0.0,74.0
297,New Radicals,0.0,74.0
412,Rain Recordings,0.0,74.0
599,The Supremes,0.0,73.0
892,Kings of Leon,-8191682.0,72.0
1102,Sleeping At Last,-120234600.0,70.6
576,The Proclaimers,0.0,70.0
1684,Coldplay,-10307190000.0,68.846154
