## Spotipy API

Create an Spotify account and follow these steps to register an app: https://developer.spotify.com/documentation/general/guides/app-settings/

After the app is created, you can see it on your dashboard
https://developer.spotify.com/dashboard/applications

Click on it and you'll find the client id and client secret.

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

#some extra useful libraries
import pandas as pd
import json
import pprint
import getpass

In [2]:
client_id = getpass.getpass(prompt="Spotify client_id: ")
client_secret = getpass.getpass(prompt="Spotify client_secret: ")

#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(
    client_id=client_id,
    client_secret=client_secret)
                    )

Spotify client_id: ········
Spotify client_secret: ········


In [9]:
#Quick test - Searching songs with 'queries' with sp.search

results = sp.search(q='The Weeknd', limit=50)
results

#(reminder- raw output will be json 
#this feels like a good time to look at the documentation!)

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=The+Weeknd&type=track&offset=0&limit=50',
  'items': [{'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ'},
       'href': 'https://api.spotify.com/v1/artists/1Xyo4u8uXC1ZmMpatF05PJ',
       'id': '1Xyo4u8uXC1ZmMpatF05PJ',
       'name': 'The Weeknd',
       'type': 'artist',
       'uri': 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ'}],
     'available_markets': ['AD',
      'AE',
      'AL',
      'AR',
      'AT',
      'AU',
      'BA',
      'BE',
      'BG',
      'BH',
      'BO',
      'BR',
      'BY',
      'CA',
      'CH',
      'CL',
      'CO',
      'CR',
      'CY',
      'CZ',
      'DE',
      'DK',
      'DO',
      'DZ',
      'EC',
      'EE',
      'EG',
      'ES',
      'FI',
      'FR',
      'GB',
      'GR',
      'GT',
      'HK',
      'HN',
      'HR',
      'HU',
      'ID',
      'IE',
      'IL',
      'IN',
      

## documentation - and useful links 

developer app 
https://developer.spotify.com/documentation/general/guides/app-settings/

spotipy documentation
https://spotipy.readthedocs.io/en/2.16.1/
    
examples of usage of spotipy including functions 
https://github.com/plamere/spotipy/tree/master/examples 
    
spotify development space with docs 
https://developer.spotify.com/
- the documentation is fully searchable and has neat try it feature (with temporary token), explains the input parameters, clarifies what your query looks like underneath and lists all the key pairs and data in the output with definitions 

AND you can have the spotify web player open https://open.spotify.com/

a) You can start with exploring the results of your query

In [10]:
#we know that JSON objects are written in key/value pairs.
# so what are the keys of this data set? 
results.keys()

dict_keys(['tracks'])

In [11]:
#what is inside the track key? ie what can I navigate with?
results["tracks"].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [12]:
#remember the href ? - thats your query- 
#lets look at what we have searched using the wrapper
results["tracks"]["href"]

'https://api.spotify.com/v1/search?query=The+Weeknd&type=track&offset=0&limit=50'

In [13]:
#in tracks - we can get the json on the tracks (its still hard to read)
results["tracks"]["items"] 

[{'album': {'album_type': 'album',
   'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ'},
     'href': 'https://api.spotify.com/v1/artists/1Xyo4u8uXC1ZmMpatF05PJ',
     'id': '1Xyo4u8uXC1ZmMpatF05PJ',
     'name': 'The Weeknd',
     'type': 'artist',
     'uri': 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ'}],
   'available_markets': ['AD',
    'AE',
    'AL',
    'AR',
    'AT',
    'AU',
    'BA',
    'BE',
    'BG',
    'BH',
    'BO',
    'BR',
    'BY',
    'CA',
    'CH',
    'CL',
    'CO',
    'CR',
    'CY',
    'CZ',
    'DE',
    'DK',
    'DO',
    'DZ',
    'EC',
    'EE',
    'EG',
    'ES',
    'FI',
    'FR',
    'GB',
    'GR',
    'GT',
    'HK',
    'HN',
    'HR',
    'HU',
    'ID',
    'IE',
    'IL',
    'IN',
    'IS',
    'IT',
    'JO',
    'JP',
    'KW',
    'KZ',
    'LB',
    'LI',
    'LT',
    'LU',
    'LV',
    'MA',
    'MC',
    'MD',
    'ME',
    'MK',
    'MT',
    'MX',
    'MY',
    'NI',
    'NL',
    

In [None]:
#we did run a limit of 50 in our query - but can easily go to next 50
results["tracks"]["next"]

In [None]:
results

In [None]:
#if we want to go back to the prev 50 - shortcut
results["tracks"]["previous"] 

In [14]:
#total number of matches from the original query string
results["tracks"]["total"]

1540

b) next, we can drill in to one row of your results - ie one track

In [15]:
#in my example I am using the index value 5, ie the 6th member returned 
results["tracks"]["items"][5].keys() #items (actual tracks)

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])

In [16]:
#inside the json string we have information about the album for the selected track
results["tracks"]["items"][5]["album"]

{'album_type': 'album',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ'},
   'href': 'https://api.spotify.com/v1/artists/1Xyo4u8uXC1ZmMpatF05PJ',
   'id': '1Xyo4u8uXC1ZmMpatF05PJ',
   'name': 'The Weeknd',
   'type': 'artist',
   'uri': 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ'}],
 'available_markets': ['AD',
  'AE',
  'AL',
  'AR',
  'AT',
  'AU',
  'BA',
  'BE',
  'BG',
  'BH',
  'BO',
  'BR',
  'CA',
  'CH',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DE',
  'DK',
  'DO',
  'DZ',
  'EC',
  'EE',
  'EG',
  'ES',
  'FI',
  'FR',
  'GB',
  'GR',
  'GT',
  'HK',
  'HN',
  'HR',
  'HU',
  'ID',
  'IE',
  'IL',
  'IN',
  'IS',
  'IT',
  'JO',
  'JP',
  'KW',
  'KZ',
  'LB',
  'LI',
  'LT',
  'LU',
  'LV',
  'MA',
  'MC',
  'MD',
  'ME',
  'MK',
  'MT',
  'MX',
  'MY',
  'NI',
  'NL',
  'NO',
  'NZ',
  'OM',
  'PA',
  'PE',
  'PH',
  'PL',
  'PS',
  'PT',
  'PY',
  'QA',
  'RO',
  'RS',
  'RU',
  'SA',
  'SE',
  'SG',
  'SI',
  'SK',
  'SV',

In [17]:
#and the artist
results["tracks"]["items"][5]["artists"]

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ'},
  'href': 'https://api.spotify.com/v1/artists/1Xyo4u8uXC1ZmMpatF05PJ',
  'id': '1Xyo4u8uXC1ZmMpatF05PJ',
  'name': 'The Weeknd',
  'type': 'artist',
  'uri': 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ'},
 {'external_urls': {'spotify': 'https://open.spotify.com/artist/4tZwfgrHOc3mvqYlEYSvVi'},
  'href': 'https://api.spotify.com/v1/artists/4tZwfgrHOc3mvqYlEYSvVi',
  'id': '4tZwfgrHOc3mvqYlEYSvVi',
  'name': 'Daft Punk',
  'type': 'artist',
  'uri': 'spotify:artist:4tZwfgrHOc3mvqYlEYSvVi'}]

In [18]:
#the name of the song 
results["tracks"]["items"][5]["name"]

'Starboy'

In [20]:
# how popular is it on spotify?
results["tracks"]["items"][1]["popularity"]

90

What is a popularity score? 

The score is received from the Spotify API. The value will be between 0 and 100, with 100 being the most popular.

The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.

Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity. Note that the popularity value may lag actual popularity by a few days: the value is not updated in real time.

In [24]:
# to get the uri of a song
results["tracks"]["items"][5]["uri"]

'spotify:track:7MXVkk9YMctZqd1Srtv4MB'

Spotify songs are identified by either a "url", a "uri" or an "id". 

- The `id` is an alphanumeric code, and it's the nuclear part of the identifier.

- The `uri` contains "spotify:track" before the id. An uri is useful because it can be searched manually in the Spotify app.

- The `url` is a link to the song on the Spotify web player.

We'll use the `uri` in this code-along, but feel free to use whatever you think fits best your needs.

In [22]:
#what can we pull out of the api for this track ? 
results["tracks"]["items"][5].keys() 

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])

Artist analysis - this is more just for fun than for the project 

In [None]:
#get all albums for your favourite artist
#- using spotify web browser to find uri of chosen artist 

artist = sp.artist("https://open.spotify.com/artist/1dfeR4HaWDbWqFHLkxsg1d")

albums = []
results = sp.artist_albums(artist['id'], album_type='album')
albums.extend(results['items'])
while results['next']:
    results = sp.next(results)
    albums.extend(results['items'])

albums.sort(key=lambda album:album['name'].lower())
for album in albums:
    name = album['name']
    print((' ' + name))

print(len(albums))

In [25]:
#how to query more than one artist at once
artists = ["Katy Perry", "Duffy", "Adele"]

In [27]:
my_3_artists = [sp.search(q=artist, limit=50) for artist in artists]
my_3_artists

[{'tracks': {'href': 'https://api.spotify.com/v1/search?query=Katy+Perry&type=track&offset=0&limit=50',
   'items': [{'album': {'album_type': 'single',
      'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/6jJ0s89eD6GaHleKKya26X'},
        'href': 'https://api.spotify.com/v1/artists/6jJ0s89eD6GaHleKKya26X',
        'id': '6jJ0s89eD6GaHleKKya26X',
        'name': 'Katy Perry',
        'type': 'artist',
        'uri': 'spotify:artist:6jJ0s89eD6GaHleKKya26X'}],
      'available_markets': ['AD',
       'AE',
       'AL',
       'AR',
       'AT',
       'AU',
       'BA',
       'BE',
       'BG',
       'BH',
       'BO',
       'BR',
       'BY',
       'CA',
       'CH',
       'CL',
       'CO',
       'CR',
       'CY',
       'CZ',
       'DE',
       'DK',
       'DO',
       'DZ',
       'EC',
       'EE',
       'EG',
       'ES',
       'FI',
       'FR',
       'GB',
       'GR',
       'GT',
       'HK',
       'HN',
       'HR',
       'HU',
       '

In [33]:
my_3_artists[0]["tracks"]["total"]+my_3_artists[1]["tracks"]["total"]+my_3_artists[2]["tracks"]["total"]

19322

In [34]:
my_3_artists[0]["tracks"]["items"][7]["uri"]

'spotify:track:5bcTCxgc7xVfSaMV3RuVke'

In [35]:
#Function to get the artists involved in a song:

def get_artists_from_track(track):
    return [artist["name"] for artist in track["artists"]]

In [36]:
#here we are returning to using the results set from our earlier query 
#but you could also build upon any other results from the API
#go back and rerun results query
my_track = results["tracks"]["items"][5]

In [37]:
get_artists_from_track(my_track)

['The Weeknd', 'Daft Punk']

In [38]:
#Function to get the "id's" of the artists from a track:

def get_artists_ids_from_track(track):
    return[artist["id"] for artist in track["artists"]]

In [39]:
#part 2 of this function 

get_artists_ids_from_track(my_track)

['1Xyo4u8uXC1ZmMpatF05PJ', '4tZwfgrHOc3mvqYlEYSvVi']

### Playlists

We will need to collect a "database" of songs. Playlists are a good way to access relatively large amounts of songs.

do you already have a playlist of your own you can use ? 

do one of your classmates have a great playlist for this?

or can you find a playlist on spotify which suits your needs ? 

hint: this is a shortcut to doing this part of your MVP

In [40]:
#I am using an example playlist from spotify - 2020 hits&best music

#read a playlist 

playlist_id = 'spotify:user:spotifycharts:playlist:6FKDzNYZ8IW1pvYVF4zUN2'
results = sp.playlist(playlist_id)
print(json.dumps(results, indent=4))

{
    "collaborative": false,
    "description": "updated with new songs randomly *** insta @alexmarty.00 *** dm to request songs&#x2F;remove songs",
    "external_urls": {
        "spotify": "https://open.spotify.com/playlist/6FKDzNYZ8IW1pvYVF4zUN2"
    },
    "followers": {
        "href": null,
        "total": 2860
    },
    "href": "https://api.spotify.com/v1/playlists/6FKDzNYZ8IW1pvYVF4zUN2?additional_types=track",
    "id": "6FKDzNYZ8IW1pvYVF4zUN2",
    "images": [
        {
            "height": null,
            "url": "https://i.scdn.co/image/ab67706c0000bebb87aeae5e0d895b0f0164e69e",
            "width": null
        }
    ],
    "name": "Longest Playlist on Spotify",
    "owner": {
        "display_name": "Alex Marty",
        "external_urls": {
            "spotify": "https://open.spotify.com/user/exiedous"
        },
        "href": "https://api.spotify.com/v1/users/exiedous",
        "id": "exiedous",
        "type": "user",
        "uri": "spotify:user:exiedous"
    },

In [41]:

#capture features of your own playlists into a df - enter your id and playlist id

playlist = sp.user_playlist("exiedous", "6FKDzNYZ8IW1pvYVF4zUN2") 
songs = playlist["tracks"]["items"] 
ids = [] 
for i in range(len(songs)): 
    ids.append(songs[i]["track"]["id"]) 
features = sp.audio_features(ids) 
df = pd.DataFrame(features)
df.head()


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.647,0.933,7,-4.056,1,0.111,0.000351,0.00277,0.334,0.332,119.921,audio_features,6pGUGTIaZ1H4jKHIL4Fged,spotify:track:6pGUGTIaZ1H4jKHIL4Fged,https://api.spotify.com/v1/tracks/6pGUGTIaZ1H4...,https://api.spotify.com/v1/audio-analysis/6pGU...,235107,4
1,0.733,0.818,10,-7.222,0,0.0859,0.0241,0.0,0.0636,0.253,116.019,audio_features,09TcIuH1ZO7i4vicWKoaN2,spotify:track:09TcIuH1ZO7i4vicWKoaN2,https://api.spotify.com/v1/tracks/09TcIuH1ZO7i...,https://api.spotify.com/v1/audio-analysis/09Tc...,232147,4
2,0.692,0.711,0,-7.498,0,0.0317,0.225,0.0,0.12,0.875,125.135,audio_features,1TfqLAPs4K3s2rJMoCokcS,spotify:track:1TfqLAPs4K3s2rJMoCokcS,https://api.spotify.com/v1/tracks/1TfqLAPs4K3s...,https://api.spotify.com/v1/audio-analysis/1Tfq...,216933,4
3,0.327,0.895,9,-7.428,1,0.0367,0.000564,0.0159,0.104,0.898,169.39,audio_features,3w2GGz0HjIu9OcWXINRFJR,spotify:track:3w2GGz0HjIu9OcWXINRFJR,https://api.spotify.com/v1/tracks/3w2GGz0HjIu9...,https://api.spotify.com/v1/audio-analysis/3w2G...,219800,4
4,0.76,0.652,6,-7.321,1,0.232,0.0348,0.0,0.307,0.759,100.315,audio_features,4X4tgBEUiT6WqB2oTJ5ynH,spotify:track:4X4tgBEUiT6WqB2oTJ5ynH,https://api.spotify.com/v1/tracks/4X4tgBEUiT6W...,https://api.spotify.com/v1/audio-analysis/4X4t...,177685,4


In [43]:
#or define from a public playlist

playlist = sp.user_playlist_tracks("spotify", "6FKDzNYZ8IW1pvYVF4zUN2")

In [44]:
#check its working - first track 
playlist["items"][0]["track"]["uri"]

'spotify:track:6pGUGTIaZ1H4jKHIL4Fged'

In [45]:
for song in playlist["items"]:
    print(song.keys())

dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track

In [46]:
#how many songs on that playlist?
playlist["total"]

10000

In [47]:
#function to get tracks from playlist 

def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks("spotify",playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

In [50]:
tracks = get_playlist_tracks("6FKDzNYZ8IW1pvYVF4zUN2")

### Audio features

You can check here an explanation of the audio features: https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/

In [None]:
#get the results of a playlist
playlist = sp.user_playlist_tracks("spotify", "36UuuONPIdnKZykWOt2Poz")

In [None]:
# get the uri of a single song:
song_uri = playlist["items"][0]["track"]["uri"]

In [None]:
#by the way, what is that song ?
playlist["items"][0]["track"]["name"]

In [None]:
# get the audio features for that song
sp.audio_features(song_uri)

In [None]:
#function to extract all Uris from a playlist 
#- we already had the function to take the songs from the playlist 

def get_song_uris(playlist_id):
    tracks = get_playlist_tracks(playlist_id)
    uris = [track["track"]["uri"] for track in tracks]
    return uris

In [None]:
#part 2 of function

IH_uris = get_song_uris("36UuuONPIdnKZykWOt2Poz")

Above, we stored all the uri's of a playlist into a list called IH_uris. We're going to get all the audio features from that playlist's songs now.

In [None]:
aud_feat = []

for song in IH_uris:
    aud_feat.append(sp.audio_features(song))

In [None]:
#take a look at the features of all the songs in the playlist 

aud_feat

### Searching the audio features for a song

When the user inputs a song, you are gonna want to retrieve the audio features of that song. How to do it?

1. Search the user input using the `sp.search()` function. This function works similarly to the "search" bar on the spotify app - using Spotify's intelligent search engine. That means that it can handle names of any songs or artists - even certain typos.

2. Find the uri of the song that the API gives you back.

3. Use `sp.audio_features` to retrieve the audio features of the song.

In [None]:
#1 
usersearch = sp.search(q='bohemian', limit=1)

In [None]:
# 2
usersearch["tracks"]["items"][0]["uri"]

In [None]:
# 3
sp.audio_features(usersearch["tracks"]["items"][0]["uri"])

### Lab: Create your collection of songs & audio features

To move forward witht the project, you need to create a collection of songs with their audio features - as large as possible! 

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster.

The more songs you have, the more accurate recommendations you'll be able to give. Although, you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!