## Lab | API wrappers - Create your collection of songs & audio features

#### Instructions

To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [39]:
!pip3 install spotipy
!pip3 

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3.9 -m pip install --upgrade pip' command.[0m
ERROR: unknown command "yellowbrick"


In [41]:
import spotipy
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import pairwise_distances_argmin_min
from sklearn.pipeline import Pipeline

In [21]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="183e423da2a54151a0129f003513a648",
                                                           client_secret="1a581c20e6d54804bd4e58b89da9c289"))
results = sp.search(q='artist:Ters', limit=1, type='track')
results

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=artist%3ATers&type=track&offset=0&limit=1',
  'items': [{'album': {'album_type': 'single',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/5Mf1s6zvBdwT3ZmEfWVovB'},
       'href': 'https://api.spotify.com/v1/artists/5Mf1s6zvBdwT3ZmEfWVovB',
       'id': '5Mf1s6zvBdwT3ZmEfWVovB',
       'name': 'Ters',
       'type': 'artist',
       'uri': 'spotify:artist:5Mf1s6zvBdwT3ZmEfWVovB'}],
     'available_markets': ['AD',
      'AE',
      'AG',
      'AL',
      'AM',
      'AO',
      'AR',
      'AT',
      'AU',
      'AZ',
      'BA',
      'BB',
      'BD',
      'BE',
      'BF',
      'BG',
      'BH',
      'BI',
      'BJ',
      'BN',
      'BO',
      'BR',
      'BS',
      'BT',
      'BW',
      'BY',
      'BZ',
      'CA',
      'CD',
      'CG',
      'CH',
      'CI',
      'CL',
      'CM',
      'CO',
      'CR',
      'CV',
      'CW',
      'CY',
      'CZ',
      'DE',
      'DJ

In [22]:
playlist = sp.user_playlist_tracks("spotify", "37i9dQZF1DWZ5kgu17cbcC")
playlist

{'href': 'https://api.spotify.com/v1/playlists/37i9dQZF1DWZ5kgu17cbcC/tracks?offset=0&limit=100&additional_types=track',
 'items': [{'added_at': '2020-02-12T05:01:00Z',
   'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/'},
    'href': 'https://api.spotify.com/v1/users/',
    'id': '',
    'type': 'user',
    'uri': 'spotify:user:'},
   'is_local': False,
   'primary_color': None,
   'track': {'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2IJJSEnYTWNaL8Ezk6yzAE'},
       'href': 'https://api.spotify.com/v1/artists/2IJJSEnYTWNaL8Ezk6yzAE',
       'id': '2IJJSEnYTWNaL8Ezk6yzAE',
       'name': "Satan's Rats",
       'type': 'artist',
       'uri': 'spotify:artist:2IJJSEnYTWNaL8Ezk6yzAE'}],
     'available_markets': ['AD',
      'AE',
      'AG',
      'AL',
      'AM',
      'AO',
      'AR',
      'AT',
      'AU',
      'AZ',
      'BA',
      'BB',
      'BD',
      'BE',
      'BF',
      'BG'

In [23]:
playlist["total"]

129

In [24]:
def get_playlist_tracks(username, playlist_id):
    
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    
    return tracks

tracks = get_playlist_tracks("spotify", "37i9dQZF1DWZ5kgu17cbcC")
tracks

[{'added_at': '2020-02-12T05:01:00Z',
  'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/'},
   'href': 'https://api.spotify.com/v1/users/',
   'id': '',
   'type': 'user',
   'uri': 'spotify:user:'},
  'is_local': False,
  'primary_color': None,
  'track': {'album': {'album_type': 'album',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2IJJSEnYTWNaL8Ezk6yzAE'},
      'href': 'https://api.spotify.com/v1/artists/2IJJSEnYTWNaL8Ezk6yzAE',
      'id': '2IJJSEnYTWNaL8Ezk6yzAE',
      'name': "Satan's Rats",
      'type': 'artist',
      'uri': 'spotify:artist:2IJJSEnYTWNaL8Ezk6yzAE'}],
    'available_markets': ['AD',
     'AE',
     'AG',
     'AL',
     'AM',
     'AO',
     'AR',
     'AT',
     'AU',
     'AZ',
     'BA',
     'BB',
     'BD',
     'BE',
     'BF',
     'BG',
     'BH',
     'BI',
     'BJ',
     'BN',
     'BO',
     'BR',
     'BS',
     'BT',
     'BW',
     'BY',
     'BZ',
     'CA',
     'CD',
     'CG',
     

In [25]:
len(tracks)

129

In [26]:
def get_artists_from_playlist(playlist_id):
    
    tracks_from_playlist = get_playlist_tracks("spotify", playlist_id)
    
    artists = []
    
    for track in tracks_from_playlist:
        artists_info = track['track']['artists']
        
        for artist_info in artists_info:
            artists.append(artist_info['name'])
    
    return list(set(artists))

In [27]:
def get_artists_ids_from_playlist(playlist_id):
    
    tracks_from_playlist = get_playlist_tracks("spotify", playlist_id)
    
    artists_ids = []
    
    for track in tracks_from_playlist:
        artists_info = track['track']['artists']
        
        for artist_info in artists_info:
            artists_ids.append(artist_info['id'])
            
    return list(set(artists_ids))

In [28]:
artists_high = get_artists_from_playlist("37i9dQZF1DWZ5kgu17cbcC")
artists_high

['Radio Stars',
 "Sinéad O'Connor",
 'The Golden Filter',
 'Omar Apollo',
 "Satan's Rats",
 'David Bowie',
 'Kaleta & Super Yamba Band',
 'Way Yes',
 'Holy Ghost!',
 'The Men',
 'Tim and Foty',
 'Raekwon',
 'Ann Peebles',
 'El Freaky',
 'John Moods',
 'Sun Ra',
 'Los Tones',
 'Rikki Ililonga',
 'Durand Jones & The Indications',
 'TEYMORI',
 'Otis Brown',
 'Piero Umiliani',
 'Grapetooth',
 'Mathis Hunter',
 'Sonny Rollins',
 'Yvonne Fair',
 'Silk Rhodes',
 'Dexter Lee Moore',
 'Nick Drake',
 'Lea Lea',
 'Bad Brains',
 'Empire ISIS',
 'WITCH',
 'Stiff Little Fingers',
 'Nina Simone',
 'Ptaf',
 'Dillinger',
 'Janet Jackson',
 'De Lux',
 'U-God',
 'Serguei',
 'Impala Syndrome',
 'Manzanita y su Conjunto',
 'Talking Heads',
 'The Roots',
 'Marvin Gaye',
 'Sylvester',
 'Blondie',
 'Mavis John',
 'Ghostface Killah',
 'MUNYA',
 'Heavy Trash',
 'Ras Jahonnan',
 'The Makers',
 'Denise Darlington',
 'Ebo Taylor',
 'Ike & Tina Turner',
 'Stevie Wonder',
 'Ronnie Taylor',
 'The Notorious B.I.G.',
 

In [29]:
artist_O = get_artists_from_playlist('2X02jK3CNuNmB8xg7V01te')
artist_O

['Wilson Pickett',
 "Sinéad O'Connor",
 'Journey',
 'Sixpence None The Richer',
 "Des'ree",
 'LeAnn Rimes',
 'Barry Manilow',
 'Katrina & The Waves',
 '4 Non Blondes',
 'Rick Astley',
 'Lou Reed',
 'Donna Summer',
 'The Temptations',
 'Eagles',
 'Depeche Mode',
 'Foreigner',
 'David Bowie',
 'Survivor',
 'Mc Miker "G" & Deejay Sven',
 'Roy Orbison',
 'Gabrielle',
 'Etta James',
 'Evanescence',
 'America',
 'Tammi Terrell',
 'blink-182',
 'Creedence Clearwater Revival',
 'UB40',
 'Dolly Parton',
 'Björn Skifs',
 'The Police',
 'Heart',
 'Bobby McFerrin',
 'Christopher Cross',
 'Otis Redding',
 'Neil Diamond',
 'The Spencer Davis Group',
 'Percy Sledge',
 'Daryl Hall & John Oates',
 'Cheryl Lynn',
 'Gwen Stefani',
 'The Rembrandts',
 'Backstreet Boys',
 'Elton John',
 'Tracy Chapman',
 'Donny Hathaway',
 'Jimmy Cliff',
 'Ronan Keating',
 'Commodores',
 'Barry White',
 'Texas',
 'Elvin Bishop',
 'Sam & Dave',
 'Chicago',
 'Jack Joseph Puig',
 'MC Hammer',
 'The Righteous Brothers',
 'Solo

In [30]:
for i in artists_high:
    artist_O.append(i)
    
artist_O

['Wilson Pickett',
 "Sinéad O'Connor",
 'Journey',
 'Sixpence None The Richer',
 "Des'ree",
 'LeAnn Rimes',
 'Barry Manilow',
 'Katrina & The Waves',
 '4 Non Blondes',
 'Rick Astley',
 'Lou Reed',
 'Donna Summer',
 'The Temptations',
 'Eagles',
 'Depeche Mode',
 'Foreigner',
 'David Bowie',
 'Survivor',
 'Mc Miker "G" & Deejay Sven',
 'Roy Orbison',
 'Gabrielle',
 'Etta James',
 'Evanescence',
 'America',
 'Tammi Terrell',
 'blink-182',
 'Creedence Clearwater Revival',
 'UB40',
 'Dolly Parton',
 'Björn Skifs',
 'The Police',
 'Heart',
 'Bobby McFerrin',
 'Christopher Cross',
 'Otis Redding',
 'Neil Diamond',
 'The Spencer Davis Group',
 'Percy Sledge',
 'Daryl Hall & John Oates',
 'Cheryl Lynn',
 'Gwen Stefani',
 'The Rembrandts',
 'Backstreet Boys',
 'Elton John',
 'Tracy Chapman',
 'Donny Hathaway',
 'Jimmy Cliff',
 'Ronan Keating',
 'Commodores',
 'Barry White',
 'Texas',
 'Elvin Bishop',
 'Sam & Dave',
 'Chicago',
 'Jack Joseph Puig',
 'MC Hammer',
 'The Righteous Brothers',
 'Solo

In [31]:
artists = artist_O

In [32]:
def get_audio_features(artist):
    # get tracks from artist
    results = sp.search(q=f'artist:{artist}', limit=50)
    # extract the track ids
    track_ids = [track['id'] for track in results['tracks']['items']]
    song_names = [track['name'] for track in results['tracks']['items']]
    # extract the audio features
    audio_features = sp.audio_features(track_ids)
    # store audio features in a dataframe
    df = pd.DataFrame(audio_features)
    df['artist'] = artist
    df['song name'] = song_names
    return df

In [33]:
df = pd.DataFrame()

for artist in artists:
    try:
        print (artist)
        df_artist = get_audio_features(artist)
        df = pd.concat([df, df_artist])
    except:
        continue
df = df.reset_index(drop=True)

df

Wilson Pickett
Sinéad O'Connor
Journey
Sixpence None The Richer
Des'ree
LeAnn Rimes
Barry Manilow
Katrina & The Waves
4 Non Blondes
Rick Astley
Lou Reed
Donna Summer
The Temptations
Eagles
Depeche Mode
Foreigner
David Bowie
Survivor
Mc Miker "G" & Deejay Sven
Roy Orbison
Gabrielle
Etta James
Evanescence
America
Tammi Terrell
blink-182
Creedence Clearwater Revival
UB40
Dolly Parton
Björn Skifs
The Police
Heart
Bobby McFerrin
Christopher Cross
Otis Redding
Neil Diamond
The Spencer Davis Group
Percy Sledge
Daryl Hall & John Oates
Cheryl Lynn
Gwen Stefani
The Rembrandts
Backstreet Boys
Elton John
Tracy Chapman
Donny Hathaway
Jimmy Cliff
Ronan Keating
Commodores
Barry White
Texas
Elvin Bishop
Sam & Dave
Chicago
Jack Joseph Puig
MC Hammer
The Righteous Brothers
Solomon Burke
Van Morrison
Harry Nilsson
Paul Simon
The Isley Brothers
Cher
Whitney Houston
New Radicals
Nina Simone
3 Doors Down
Inner Circle
Billy Paul
Curtis Mayfield
Eminem
Scorpions
Bap Kennedy
Billy Joel
Akon
Boney M.
Four Tops


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,artist,song name
0,0.725,0.327,0,-17.714,1,0.0387,0.1130,0.326000,0.214,0.688,110.438,audio_features,5Sz09kaSzvpTC8lgm5W8Mt,spotify:track:5Sz09kaSzvpTC8lgm5W8Mt,https://api.spotify.com/v1/tracks/5Sz09kaSzvpT...,https://api.spotify.com/v1/audio-analysis/5Sz0...,187760,4,Wilson Pickett,Mustang Sally
1,0.618,0.588,2,-11.624,1,0.0735,0.0128,0.026600,0.351,0.768,86.903,audio_features,76ICmoJ4PcoMWoooaTxnQs,spotify:track:76ICmoJ4PcoMWoooaTxnQs,https://api.spotify.com/v1/tracks/76ICmoJ4PcoM...,https://api.spotify.com/v1/audio-analysis/76IC...,146973,4,Wilson Pickett,Land of 1000 Dances
2,0.750,0.444,4,-8.630,1,0.0403,0.1200,0.000004,0.118,0.849,111.919,audio_features,4NRQwaks9r58tTDvr4iEyv,spotify:track:4NRQwaks9r58tTDvr4iEyv,https://api.spotify.com/v1/tracks/4NRQwaks9r58...,https://api.spotify.com/v1/audio-analysis/4NRQ...,157160,4,Wilson Pickett,In the Midnight Hour
3,0.561,0.385,6,-15.330,1,0.0323,0.1460,0.000496,0.171,0.664,81.599,audio_features,1MMp1H2Kib2BCDtdL5nL63,spotify:track:1MMp1H2Kib2BCDtdL5nL63,https://api.spotify.com/v1/tracks/1MMp1H2Kib2B...,https://api.spotify.com/v1/audio-analysis/1MMp...,247733,4,Wilson Pickett,Hey Jude
4,0.760,0.522,0,-8.088,1,0.0348,0.0906,0.000003,0.174,0.551,109.741,audio_features,7mRak6wBx9OGKXr3zStoHW,spotify:track:7mRak6wBx9OGKXr3zStoHW,https://api.spotify.com/v1/tracks/7mRak6wBx9OG...,https://api.spotify.com/v1/audio-analysis/7mRa...,187827,4,Wilson Pickett,Mustang Sally
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15971,0.472,0.736,7,-10.208,1,0.1110,0.4080,0.000000,0.970,0.559,119.538,audio_features,5NQdAm6jRqZa3mBdtHEfXO,spotify:track:5NQdAm6jRqZa3mBdtHEfXO,https://api.spotify.com/v1/tracks/5NQdAm6jRqZa...,https://api.spotify.com/v1/audio-analysis/5NQd...,130000,4,Wings,I've Just Seen A Face - Live / Remastered
15972,0.514,0.363,2,-9.400,1,0.0270,0.5290,0.000848,0.586,0.494,78.268,audio_features,1hVl3NtSqi6LXZ7pXxHPrC,spotify:track:1hVl3NtSqi6LXZ7pXxHPrC,https://api.spotify.com/v1/tracks/1hVl3NtSqi6L...,https://api.spotify.com/v1/audio-analysis/1hVl...,78800,4,Wings,Venus And Mars - Remastered 2014
15973,0.481,0.645,7,-7.609,1,0.0335,0.0864,0.000005,0.336,0.518,113.024,audio_features,3hFYMKAolw99FZMXEGgGwv,spotify:track:3hFYMKAolw99FZMXEGgGwv,https://api.spotify.com/v1/tracks/3hFYMKAolw99...,https://api.spotify.com/v1/audio-analysis/3hFY...,235360,4,Wings,Listen To What The Man Said
15974,0.741,0.535,1,-15.884,0,0.0439,0.2070,0.002340,0.147,0.943,123.331,audio_features,6mDouoIwwH5Pjbx1KUhXIB,spotify:track:6mDouoIwwH5Pjbx1KUhXIB,https://api.spotify.com/v1/tracks/6mDouoIwwH5P...,https://api.spotify.com/v1/audio-analysis/6mDo...,259067,4,Wings,Goodnight Tonight


In [34]:
df.isna().sum()

danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
type                0
id                  0
uri                 0
track_href          0
analysis_url        0
duration_ms         0
time_signature      0
artist              0
song name           0
dtype: int64

In [58]:
x = df[['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 
     'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]

In [59]:
scaler = StandardScaler()

x_prep = scaler.fit_transform(x)

kmeans = KMeans(n_clusters=10, random_state=42)
kmeans.fit(x_prep)

clusters = kmeans.predict(x_prep)

scaled_df = pd.DataFrame(x_prep, columns=x.columns)
scaled_df['song name'] = df['song name']
scaled_df['artist'] = df['artist']
scaled_df['cluster'] = clusters
scaled_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,song name,artist,cluster
0,0.900424,-1.202512,-1.460367,-1.992559,0.668068,-0.373041,-0.636538,1.034559,0.127989,0.472658,-0.320438,Mustang Sally,Wilson Pickett,1
1,0.239124,-0.065561,-0.905195,-0.553799,0.668068,0.027010,-0.974456,-0.277706,0.949717,0.787803,-1.149011,Land of 1000 Dances,Wilson Pickett,1
2,1.054933,-0.692844,-0.350023,0.153533,0.668068,-0.354648,-0.612931,-0.394277,-0.447820,1.106887,-0.268298,In the Midnight Hour,Wilson Pickett,1
3,-0.113157,-0.949856,0.205149,-1.429340,0.668068,-0.446614,-0.525247,-0.392119,-0.129925,0.378114,-1.335743,Hey Jude,Wilson Pickett,6
4,1.116736,-0.353066,-1.460367,0.281580,0.668068,-0.417875,-0.712080,-0.394278,-0.111931,-0.067027,-0.344977,Mustang Sally,Wilson Pickett,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15971,-0.663210,0.579146,0.482735,-0.219269,0.668068,0.458100,0.358332,-0.394293,4.662485,-0.035513,-0.000064,I've Just Seen A Face - Live / Remastered,Wings,3
15972,-0.403635,-1.045691,-0.905195,-0.028380,0.668068,-0.507541,0.766397,-0.390576,2.359249,-0.291568,-1.453014,Venus And Mars - Remastered 2014,Wings,3
15973,-0.607587,0.182738,0.482735,0.394744,0.668068,-0.432819,-0.726245,-0.394271,0.859746,-0.197025,-0.229396,Listen To What The Man Said,Wings,8
15974,0.999309,-0.296436,-1.182781,-1.560223,-1.496853,-0.313264,-0.319528,-0.384037,-0.273878,1.477182,0.133472,Goodnight Tonight,Wings,2


In [60]:
scaled_df.groupby(['cluster', 'artist'], as_index=False).count().sort_values(['cluster', 'key'], ascending=[True, False])[['artist', 'cluster', 'key']].reset_index(drop=True)

Unnamed: 0,artist,cluster,key
0,Dead Kennedys,0,28
1,Avril Lavigne,0,27
2,Hoobastank,0,26
3,Minor Threat,0,24
4,GBH,0,23
...,...,...,...
2358,Thomas Doherty,9,1
2359,William Onyeabor,9,1
2360,Wilson Pickett,9,1
2361,Y La Bamba,9,1


In [61]:
def recommend_song():
    # get song id
    song_name = input('Choose a song: ')
    results = sp.search(q=f'track:{song_name}', limit=1)
    track_id = results['tracks']['items'][0]['id']
    # get song features with the obtained id
    audio_features = sp.audio_features(track_id)
    # create dataframe
    df_ = pd.DataFrame(audio_features)
    new_features = df_[x.columns]
    # scale features
    scaled_x = scaler.transform(new_features)
    # predict cluster
    cluster = kmeans.predict(scaled_x)
    # filter dataset to predicted cluster
    filtered_df = scaled_df[scaled_df['cluster'] == cluster[0]][x.columns]
    # get closest song from filtered dataset
    closest, _ = pairwise_distances_argmin_min(scaled_x, filtered_df)
    # return it in a readable way
    print('\n [RECOMMENDED SONG]')
    return ' - '.join([scaled_df.loc[closest]['song name'].values[0], scaled_df.loc[closest]['artist'].values[0]])

In [None]:
recommend_song()