**Table of contents**<a id='toc0_'></a>    
- [🌐 Spotify Recommender System](#toc1_)    
  - [Getting the data](#toc1_1_)    
    - [Connect to the API](#toc1_1_1_)    
    - [Spotify search](#toc1_1_2_)    
    - [Other info:](#toc1_1_3_)    
    - [Embedded track player](#toc1_1_4_)    
    - [Get song information (audio features)](#toc1_1_5_)    
    - [Get album information (audio features of its songs)](#toc1_1_6_)    
    - [Get playlist information](#toc1_1_7_)    
    - [Playlist -> Album -> Songs -> Audio Features](#toc1_1_8_)    
  - [Unsupervised learning (clustering)](#toc1_2_)    
  - [Create the recommendation engine](#toc1_3_)    
- [Acknowledgments](#toc2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[🌐 Spotify Recommender System](#toc0_)

In [None]:
# You know the drill
# !pip install spotipy

In [None]:
import numpy as np
import pandas as pd
import random
import warnings
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import time
import getpass
from yellowbrick.cluster import KElbowVisualizer

warnings.filterwarnings('ignore')

## <a id='toc1_1_'></a>[Getting the data](#toc0_)

Spotify has an API with a dedicated API wrapper called `spotipy` (ha, get it?), which can be used to retrieve songs, albums, and artist information. Additionally, Spotify has developed a couple of features for the tracks (liveness, instrumentalness, etc.) which are very useful in machine learning applications as the one we'll do today!

Firstly, we will connect to the Spotify API using our credentials:

In [None]:
from config import CLIENT_SECRET, CLIENT_ID

In [None]:
# Alternatively:
# CLIENT_SECRET = getpass.getpass()
# CLIENT_ID = getpass.getpass()

### <a id='toc1_1_1_'></a>[Connect to the API](#toc0_)

In [None]:
spotify = spotipy.Spotify(
    client_credentials_manager=SpotifyClientCredentials(
        client_id=CLIENT_ID,
        client_secret=CLIENT_SECRET))

### <a id='toc1_1_2_'></a>[Spotify search](#toc0_)

We can run a search similarly to how we would in the Spotify app:

In [None]:
song = spotify.search(q="Bohemian Rhapsody", limit=3)

In [None]:
song

In [None]:
from pprint import pprint
pprint(song)

The outputs of all Spotify API calls will be JSON files, which can be treated as dictionaries:

In [None]:
song.keys()

In [None]:
song['tracks']

We notice that `song['tracks']` is also a dictionary, so we can repeat the process:

In [None]:
song["tracks"].keys()

We have a couple of keys here:
- `limit` - the song limit
- `href` - a link to the web API endpoint returning the full result of the request
- `previous` - URL of the previous page of items
- `next` - URL to the next page of items
- `offset` - the offset of items returned from the 0th page
- `total` - total results available

In tracks-items we have the number of hits we got from the search:

In [None]:
len(song["tracks"]["items"]) # As we expected, this is equal to 3

We can select the first element and keep inspecting:

In [None]:
song["tracks"]["items"][0].keys()

Now we have many more details about the specific songs, including some very relevant ones such as `album`, `artists`, `name`, and `uri`. URIs are Unique Resource Identifiers and Spotify has unique URIs for songs, albums, and playlists.

In [None]:
song["tracks"]["items"][0]["artists"][0].keys()

Who were the artists playing Bohemian Rhapsody?

In [None]:
song["tracks"]["items"][0]["artists"][0]["name"]

![](https://media3.giphy.com/media/dhgg2GTU8pv8vmkdiW/giphy.gif?cid=ecf05e47vh8cfhakzo9clp91r1cewyp82u0r9o80g319kfgj&ep=v1_gifs_search&rid=giphy.gif&ct=g)

### <a id='toc1_1_3_'></a>[Other info:](#toc0_)

In [None]:
pprint(song["tracks"]["items"][0]["artists"]) # Track artists
print("")
print("Track ID:", song["tracks"]["items"][0]["id"], "\n") # Track ID
print("Track name:", song["tracks"]["items"][0]["name"], "\n") # Track name
print("Popularity index:", song["tracks"]["items"][0]["popularity"], "\n") # Popularity index
print("Long-form track ID:", song["tracks"]["items"][0]["uri"], "\n") # Basically ID

### <a id='toc1_1_4_'></a>[Embedded track player](#toc0_)

In [None]:
from IPython.display import IFrame

track_id = '6l8GvAyoUZwWDgF1e4822w'
#track_id= 'spotify:track:3hgl7EQwTutSm6PESsB7gZ'
IFrame(src="https://open.spotify.com/embed/track/"+track_id,
       width="320",
       height="80",
       frameborder="0",
       allowtransparency="true",
       allow="encrypted-media",
      )

### <a id='toc1_1_5_'></a>[Get song information (audio features)](#toc0_)

Now that we've learnt how to access songs using Spotify's search function, we will extract audio features to build our subsequent clustering model. This time, instead of querying for a specific song, I'm using a link taken directly from Spotify:

In [None]:
song = spotify.track("https://open.spotify.com/track/6YMPu36VGIknb8Ey1ohW3j")

In [None]:
song.keys()

What song is it? :D

In [None]:
# Find out what the song is!

After retrieving the song, I can get its URI to further extract audio features:

In [None]:
# So... what is the URI?

# song_uri = 

In [None]:
spotify.audio_features(tracks=[song_uri])[0]

Nice! Now it's time to get even more songs :)

### <a id='toc1_1_6_'></a>[Get album information (audio features of its songs)](#toc0_)

We can also extract album information using a direct link:

In [52]:
album = spotify.album_tracks("https://open.spotify.com/album/2WT1pbYjLJciAR26yMebkH?si=Iqlrze6XRM6FfZQWfmRq3A")

and explore the JSON again:

In [53]:
album.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [54]:
# This is the number of songs in the album
len(album["items"])

10

We can explore details about the first song:

In [55]:
album["items"][0].keys()

dict_keys(['artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_urls', 'href', 'id', 'name', 'preview_url', 'track_number', 'type', 'uri', 'is_local'])

In [56]:
album["items"][0]["name"]

'Speak To Me - 2011 Remastered Version'

Now we can get the titles of all songs in the album:

In [57]:
for song in album["items"]:
    print(song["name"])

Speak To Me - 2011 Remastered Version
Breathe (In The Air) - 2011 Remastered Version
On The Run - 2011 Remastered Version
Time - 2011 Remastered Version
The Great Gig In The Sky - 2011 Remastered Version
Money - 2011 Remastered Version
Us And Them - 2011 Remastered Version
Any Colour You Like - 2011 Remastered Version
Brain Damage - 2011 Remastered Version
Eclipse - 2011 Remastered Version


We will get the URIs using a list comprehension so we can later extract the audio features:

In [58]:
album_uris = [song["uri"] for song in album["items"]]

In [59]:
album_track_feat = [spotify.audio_features(uri)[0] for uri in album_uris]

In [60]:
len(album_track_feat)

10

In [61]:
pd.DataFrame(album_track_feat)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.382,0.0176,1,-34.673,1,0.0563,0.251,0.823,0.0983,0.0315,179.327,audio_features,4rQYDXfKFikLX4ad674jhg,spotify:track:4rQYDXfKFikLX4ad674jhg,https://api.spotify.com/v1/tracks/4rQYDXfKFikL...,https://api.spotify.com/v1/audio-analysis/4rQY...,64333,3
1,0.429,0.343,11,-16.637,0,0.0339,0.265,0.184,0.157,0.313,128.455,audio_features,3zJRvtQkHQRTNEXSY8jQPR,spotify:track:3zJRvtQkHQRTNEXSY8jQPR,https://api.spotify.com/v1/tracks/3zJRvtQkHQRT...,https://api.spotify.com/v1/audio-analysis/3zJR...,169560,4
2,0.38,0.523,9,-22.437,1,0.0638,0.63,0.863,0.102,0.105,165.508,audio_features,51rylCDfKusBQcpo2iem6u,spotify:track:51rylCDfKusBQcpo2iem6u,https://api.spotify.com/v1/tracks/51rylCDfKusB...,https://api.spotify.com/v1/audio-analysis/51ry...,216107,4
3,0.355,0.447,9,-13.631,1,0.0894,0.453,0.00225,0.163,0.369,122.473,audio_features,4xHWH1jwV5j4mBYRhxPbwZ,spotify:track:4xHWH1jwV5j4mBYRhxPbwZ,https://api.spotify.com/v1/tracks/4xHWH1jwV5j4...,https://api.spotify.com/v1/audio-analysis/4xHW...,422853,4
4,0.274,0.195,5,-15.289,1,0.0339,0.753,0.739,0.0774,0.168,115.227,audio_features,25tZHMv3ctlzqDaHAeuU9c,spotify:track:25tZHMv3ctlzqDaHAeuU9c,https://api.spotify.com/v1/tracks/25tZHMv3ctlz...,https://api.spotify.com/v1/audio-analysis/25tZ...,284413,4
5,0.47,0.505,11,-11.966,0,0.171,0.0194,0.00104,0.206,0.767,124.065,audio_features,7Gx2q0ueNwvDp2BOZYGCMO,spotify:track:7Gx2q0ueNwvDp2BOZYGCMO,https://api.spotify.com/v1/tracks/7Gx2q0ueNwvD...,https://api.spotify.com/v1/audio-analysis/7Gx2...,380080,1
6,0.361,0.259,2,-16.279,1,0.0306,0.835,0.303,0.612,0.137,72.759,audio_features,626wlz3bovvpH06PYht5R0,spotify:track:626wlz3bovvpH06PYht5R0,https://api.spotify.com/v1/tracks/626wlz3bovvp...,https://api.spotify.com/v1/audio-analysis/626w...,472627,4
7,0.289,0.66,0,-13.985,1,0.0917,0.128,0.949,0.426,0.527,150.71,audio_features,1wGoqD0vrf7njGvxm8CEf5,spotify:track:1wGoqD0vrf7njGvxm8CEf5,https://api.spotify.com/v1/tracks/1wGoqD0vrf7n...,https://api.spotify.com/v1/audio-analysis/1wGo...,205773,4
8,0.319,0.179,2,-17.635,1,0.0315,0.0634,0.181,0.431,0.259,133.708,audio_features,7EUEl5wJb8VI777UAUvRnH,spotify:track:7EUEl5wJb8VI777UAUvRnH,https://api.spotify.com/v1/tracks/7EUEl5wJb8VI...,https://api.spotify.com/v1/audio-analysis/7EUE...,225827,4
9,0.276,0.478,10,-14.186,1,0.041,0.0338,0.841,0.0927,0.144,135.336,audio_features,3Z2RsIdWm4BNbT0LsFBuoN,spotify:track:3Z2RsIdWm4BNbT0LsFBuoN,https://api.spotify.com/v1/tracks/3Z2RsIdWm4BN...,https://api.spotify.com/v1/audio-analysis/3Z2R...,114093,3


### <a id='toc1_1_7_'></a>[Get playlist information](#toc0_)

We can apply the same strategy to extract all the songs from a playlist:

In [62]:
list_items = spotify.playlist_items("https://open.spotify.com/playlist/37i9dQZEVXbMDoHDwVN2tF?si=15bc8d87f6bf4560")

In [63]:
list_items.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [64]:
len(list_items["items"])

50

In [65]:
list_items["items"][0].keys()

dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])

In [66]:
list_items["items"][0]["track"].keys()

dict_keys(['preview_url', 'available_markets', 'explicit', 'type', 'episode', 'track', 'album', 'artists', 'disc_number', 'track_number', 'duration_ms', 'external_ids', 'external_urls', 'href', 'id', 'name', 'popularity', 'uri', 'is_local'])

In [67]:
list_items["items"][0]["track"]["name"]

'Die With A Smile'

In [68]:
list_items["items"][0]["track"]["album"].keys()

dict_keys(['available_markets', 'type', 'album_type', 'href', 'id', 'images', 'name', 'release_date', 'release_date_precision', 'uri', 'artists', 'external_urls', 'total_tracks'])

In [69]:
list_items["items"][0]["track"]["album"]["uri"]

'spotify:album:10FLjwfpbxLmW8c25Xyc2N'

### <a id='toc1_1_8_'></a>[Playlist -> Album -> Songs -> Audio Features](#toc0_)

Now we will combine all the previous steps together to build up a music dataset. We will extract all the songs in a playlist, then all the songs for each of their albums. For all the songs we collect, we will create a database with audio features that we can use later on: 

In [70]:
list_items = spotify.playlist_items("https://open.spotify.com/playlist/37i9dQZEVXbMDoHDwVN2tF?si=5f944fd835e14197")

In [71]:
list_items["items"][0].keys()

dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])

In [72]:
for item in list_items["items"]:
    print(item["track"]["album"]["name"])

Die With A Smile
APT.
HIT ME HARD AND SOFT
MUSE
Tu Boda
The Secret of Us (Deluxe)
Short n' Sweet
Short n' Sweet
Sailor Song
Good Luck, Babe!
Si Antes Te Hubiera Conocido
HIT ME HARD AND SOFT
Fireworks & Rollerblades
Timeless
Short n' Sweet
I've Tried Everything But Therapy (Part 1)
24K Magic
+57
INCÓMODO
CHROMAKOPIA
Unorthodox Jukebox
I Love You.
The Secret of Us
INCÓMODO
Where I've Been, Isn't Where I'm Going
Moonlit Floor (Kiss Me)
Parachutes
AM
Short n' Sweet
Barbie
São Paulo
PRIMERA MUSA
LA PANTERA NEGRA (DELUXE)
The Idol Episode 4 (Music from the HBO Original Series)
Doo-Wops & Hooligans
Unreal Unearth: Unaired
UTOPIA
Mantra
PERO NO TE ENAMORES
Presidente
The Emptiness Machine
Strange Trails
eternal sunshine
DECIDE
F-1 Trillion
EL COMIENZO
yustyna
Merry Christmas
Dizzy up the Girl
The Rise and Fall of a Midwest Princess


From the playlist info, we can get all the album's URI:

In [73]:
album_uris = [item["track"]["album"]["uri"] for item in list_items["items"]]

Then, with the album URIs, we can get all the songs:

In [74]:
albums = [spotify.album_tracks(uri) for uri in album_uris]

In [75]:
# I can check all the songs my dataset will have and count them
count = 0
for album in albums:
    for song in album["items"]:
        count += 1
        print(song["name"])

Die With A Smile
APT.
SKINNY
LUNCH
CHIHIRO
BIRDS OF A FEATHER
WILDFLOWER
THE GREATEST
L’AMOUR DE MA VIE
THE DINER
BITTERSUITE
BLUE
Rebirth (Intro)
Interlude : Showtime
Smeraldo Garden Marching Band (feat. Loco)
Slow Dance (feat. Sofia Carson)
Be Mine
Who
Closer Than This
Tu Boda
Felt Good About You
Risk
Blowing Smoke
I Love You, I'm Sorry
us. (feat. Taylor Swift)
Let It Happen
Tough Love
I Knew It, I Know You
Gave You I Gave You I
Normal Thing
Good Luck Charlie
Free Now
Close To You
Cool
That’s So True
I Told You Things
Packing It Up
I Love You, I'm Sorry - Live From Vevo
I Knew It, I Know You - Live From Vevo
Free Now - Live From Vevo
Taste
Please Please Please
Good Graces
Sharpest Tool
Coincidence
Bed Chem
Espresso
Dumb & Poetic
Slim Pickins
Juno
Lie To Girls
Don’t Smile
Taste
Please Please Please
Good Graces
Sharpest Tool
Coincidence
Bed Chem
Espresso
Dumb & Poetic
Slim Pickins
Juno
Lie To Girls
Don’t Smile
Sailor Song
Good Luck, Babe!
Si Antes Te Hubiera Conocido
SKINNY
LUNCH
CHIHI

In [76]:
count # How many songs did we get?

502

Now we can get all the song URIs to later extract the audio features:

In [77]:
song_uris = [song["uri"] for album in albums for song in album["items"]]

In [78]:
len(song_uris)

502

In [79]:
songs_feat = [spotify.audio_features(uri)[0] for uri in song_uris]

In [80]:
len(songs_feat)

502

In [81]:
songs_feat[0]

{'danceability': 0.521,
 'energy': 0.592,
 'key': 6,
 'loudness': -7.777,
 'mode': 0,
 'speechiness': 0.0304,
 'acousticness': 0.308,
 'instrumentalness': 0,
 'liveness': 0.122,
 'valence': 0.535,
 'tempo': 157.969,
 'type': 'audio_features',
 'id': '2plbrEY59IikOBgBGLjaoe',
 'uri': 'spotify:track:2plbrEY59IikOBgBGLjaoe',
 'track_href': 'https://api.spotify.com/v1/tracks/2plbrEY59IikOBgBGLjaoe',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/2plbrEY59IikOBgBGLjaoe',
 'duration_ms': 251668,
 'time_signature': 3}

There are some songs that do not return any results, so we will remove those:

In [82]:
while None in songs_feat:
    songs_feat.remove(None)

In [83]:
songs_feat_df = pd.DataFrame(songs_feat)

We can wrap all of the previous steps in a function to more easily extract audio features from a given playlist:

In [None]:
def get_features_from_playlist(url):
    list_items = spotify.playlist_items(url)
    album_uris = [item["track"]["album"]["uri"] for item in list_items["items"]]
    albums = [spotify.album_tracks(uri) for uri in album_uris]
    song_uris = [song["uri"] for album in albums for song in album["items"]]
    song_name = [song["name"] for album in albums for song in album["items"]]
    song_artist = [song["artists"][0]["name"] for album in albums for song in album["items"]]
    song_feat = [spotify.audio_features(uri)[0] for uri in song_uris]

    while None in songs_feat:
        songs_feat.remove(None)

    name_df = pd.DataFrame(song_name)
    name_df.columns = ["name"]
    artist_df = pd.DataFrame(song_artist)
    artist_df.columns = ["artist"]
    feat_df = pd.DataFrame(songs_feat)

    final_df = pd.concat([name_df, artist_df, feat_df], axis=1)

    return pd.DataFrame(final_df)

Let's test it:

In [85]:
my_df = get_features_from_playlist("https://open.spotify.com/playlist/37i9dQZEVXbMDoHDwVN2tF?si=94f82b9354d2421b")

Review dataframe characteristics:

In [86]:
my_df.shape

(502, 20)

In [87]:
my_df.head()

Unnamed: 0,name,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Die With A Smile,Lady Gaga,0.521,0.592,6,-7.777,0,0.0304,0.308,0.0,0.122,0.535,157.969,audio_features,2plbrEY59IikOBgBGLjaoe,spotify:track:2plbrEY59IikOBgBGLjaoe,https://api.spotify.com/v1/tracks/2plbrEY59Iik...,https://api.spotify.com/v1/audio-analysis/2plb...,251668,3
1,APT.,ROSÉ,0.777,0.783,0,-4.477,0,0.26,0.0283,0.0,0.355,0.939,149.027,audio_features,5vNRhkKd0yEAg8suGBpjeY,spotify:track:5vNRhkKd0yEAg8suGBpjeY,https://api.spotify.com/v1/tracks/5vNRhkKd0yEA...,https://api.spotify.com/v1/audio-analysis/5vNR...,169917,4
2,SKINNY,Billie Eilish,0.251,0.252,9,-14.478,1,0.0375,0.693,0.00706,0.0968,0.0395,69.988,audio_features,1CsMKhwEmNnmvHUuO5nryA,spotify:track:1CsMKhwEmNnmvHUuO5nryA,https://api.spotify.com/v1/tracks/1CsMKhwEmNnm...,https://api.spotify.com/v1/audio-analysis/1CsM...,219733,4
3,LUNCH,Billie Eilish,0.893,0.4,11,-7.981,0,0.0643,0.0452,0.0823,0.0632,0.945,124.987,audio_features,629DixmZGHc7ILtEntuiWE,spotify:track:629DixmZGHc7ILtEntuiWE,https://api.spotify.com/v1/tracks/629DixmZGHc7...,https://api.spotify.com/v1/audio-analysis/629D...,179587,4
4,CHIHIRO,Billie Eilish,0.7,0.425,7,-12.531,1,0.0529,0.144,0.879,0.083,0.521,110.015,audio_features,7BRD7x5pt8Lqa1eGYC4dzj,spotify:track:7BRD7x5pt8Lqa1eGYC4dzj,https://api.spotify.com/v1/tracks/7BRD7x5pt8Lq...,https://api.spotify.com/v1/audio-analysis/7BRD...,303440,4


In [88]:
my_df.isna().sum()

name                0
artist              0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
type                0
id                  0
uri                 0
track_href          0
analysis_url        0
duration_ms         0
time_signature      0
dtype: int64

We can fully remove songs with no features:

In [89]:
my_df = my_df.dropna()

In [90]:
my_df.dtypes

name                 object
artist               object
danceability        float64
energy              float64
key                   int64
loudness            float64
mode                  int64
speechiness         float64
acousticness        float64
instrumentalness    float64
liveness            float64
valence             float64
tempo               float64
type                 object
id                   object
uri                  object
track_href           object
analysis_url         object
duration_ms           int64
time_signature        int64
dtype: object

For our model, we only require the audio features, which are numeric, so we can filter out the rest:

In [91]:
my_df_num = my_df.select_dtypes(include=np.number)

In [92]:
my_df_num.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,0.521,0.592,6,-7.777,0,0.0304,0.308,0.0,0.122,0.535,157.969,251668,3
1,0.777,0.783,0,-4.477,0,0.26,0.0283,0.0,0.355,0.939,149.027,169917,4
2,0.251,0.252,9,-14.478,1,0.0375,0.693,0.00706,0.0968,0.0395,69.988,219733,4
3,0.893,0.4,11,-7.981,0,0.0643,0.0452,0.0823,0.0632,0.945,124.987,179587,4
4,0.7,0.425,7,-12.531,1,0.0529,0.144,0.879,0.083,0.521,110.015,303440,4


`Duration_ms` and `time_signature` are not really interesting parameters to classify the songs so we will drop them:

In [93]:
my_df_num = my_df_num.drop(columns = ["duration_ms", "time_signature"])

# <a id='toc2_'></a>[Acknowledgments](#toc0_)

Thank you, Miguel SM, for the contents of this lesson!