### Intro

Spotify has a very robust REST API available for general use.  There are methods for working with artist and track info, as well as user account info and playlist manipulation.  We'll look at both here, starting with basic artist and track information.

### Getting artist and track data

Here we'll do some artist, album, and track retrieval.

The basics.  CLIENT_ID and CLIENT_SECRET are provided by Spotify after you register your app with them.  Don't share them!

In [2]:
import pandas as pd
import json
import requests

CLIENT_ID = '<redacted>'
CLIENT_SECRET = '<redacted>'
#REDIRECT_URI = 'http://127.0.0.1:9090'
REDIRECT_URI = 'https://open.spotify.com/collection/playlists'

Rerun this cell to generate the new access token after expiry:

In [3]:
TOKEN_URL = 'https://accounts.spotify.com/api/token'

# POST
auth_response = requests.post(TOKEN_URL, {
    'grant_type': 'client_credentials',
    'client_id': CLIENT_ID,
    'client_secret': CLIENT_SECRET,
})

# convert the response to JSON
auth_response_data = auth_response.json()

# save the access token
access_token = auth_response_data['access_token']

headers = {'Authorization': 'Bearer {token}'.format(token=access_token)}

In [4]:
# base URL of all Spotify API endpoints
BASE_URL = 'https://api.spotify.com/v1/'

# Get info for an individual track from its URI
# https://open.spotify.com/track/2f0P7iELCvAlV8j6Z3rGDE?si=36cefded2e1e4ae6
track_id = '2f0P7iELCvAlV8j6Z3rGDE'  # Stranglehold by Ted Nugent

# actual GET request with proper header
r = requests.get(BASE_URL + 'audio-features/' + track_id, headers=headers).json()

r

{'danceability': 0.484,
 'energy': 0.711,
 'key': 2,
 'loudness': -7.784,
 'mode': 1,
 'speechiness': 0.0331,
 'acousticness': 0.0183,
 'instrumentalness': 0.173,
 'liveness': 0.0941,
 'valence': 0.49,
 'tempo': 148.189,
 'type': 'audio_features',
 'id': '2f0P7iELCvAlV8j6Z3rGDE',
 'uri': 'spotify:track:2f0P7iELCvAlV8j6Z3rGDE',
 'track_href': 'https://api.spotify.com/v1/tracks/2f0P7iELCvAlV8j6Z3rGDE',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/2f0P7iELCvAlV8j6Z3rGDE',
 'duration_ms': 503067,
 'time_signature': 4}

Get all albums by, e.g., AC/DC, again from the URI:

In [5]:
## https://open.spotify.com/artist/711MCceyCBcFnzjGY4Q7Un?si=OAicjoOQSjCf9wsq7v31Sw
artist_id = '711MCceyCBcFnzjGY4Q7Un'

# pull all artists albums
r = requests.get(BASE_URL + 'artists/' + artist_id + '/albums', 
                 headers=headers, 
                 params={'include_groups': 'album', 'limit': 50})
d = r.json()

In [6]:
for album in d['items']:
    print(album['name'], '\t\t', album['release_date'])

POWER UP 		 2020-11-13
Rock or Bust 		 2014-11-28
Live at River Plate 		 2012-11-19
Black Ice 		 2008-10-20
Stiff Upper Lip 		 2000-02-25
Ballbreaker 		 1995-09-22
Live 		 1992-10-27
Live (Collector's Edition) 		 1992-10-27
The Razors Edge 		 1990-09-24
Blow Up Your Video 		 1988-02-01
Who Made Who 		 1986-05-24
Fly On the Wall 		 1985-06-28
Flick of the Switch 		 1983-08-15
For Those About to Rock (We Salute You) 		 1981-11-23
Back In Black 		 1980-07-25
Highway to Hell 		 1979-07-27
If You Want Blood You've Got It (Live) 		 1978-10-13
Powerage 		 1978-05-05
Let There Be Rock 		 1977-03-21
Dirty Deeds Done Dirt Cheap 		 1976-09-20
High Voltage 		 1976-05-14


In [7]:
data = []   # will hold all track info
albums = [] # to keep track of duplicates

# loop over albums and get all tracks
for album in d['items']:
    album_name = album['name']

    ## we don't care about who made who b/c it's really a compilation, but isn't flagged as such
    if not 'live' in album_name.lower() and not 'who made who' in album_name.lower():
        
        # here's a hacky way to skip over albums we've already grabbed
        trim_name = album_name.split('(')[0].strip()
        if trim_name.upper() in albums: #or int(album['release_date'][:4]) > 1983: ## stop at '83
            continue
        albums.append(trim_name.upper()) # use upper() to standardize
    
        # this takes a few seconds so let's keep track of progress    
        print(album_name)
    
        # pull all tracks from this album
        r = requests.get(BASE_URL + 'albums/' + album['id'] + '/tracks', 
            headers=headers)
        tracks = r.json()['items']
    
        for track in tracks:
            # get audio features (key, liveness, danceability, ...)
            f = requests.get(BASE_URL + 'audio-features/' + track['id'], 
                headers=headers)
            f = f.json()
        
            # combine with album info
            f.update({
                'track_name': track['name'],
                'album_name': album_name,
                'short_album_name': trim_name,
                'release_date': album['release_date'],
                'album_id': album['id']
            })
        
        data.append(f)

POWER UP
Rock or Bust
Black Ice
Stiff Upper Lip
Ballbreaker
The Razors Edge
Blow Up Your Video
Fly On the Wall
Flick of the Switch
For Those About to Rock (We Salute You)
Back In Black
Highway to Hell
Powerage
Let There Be Rock
Dirty Deeds Done Dirt Cheap
High Voltage


In [8]:
## put all albums into a pandas dataframe
df = pd.DataFrame(data) #, index=df['track_name']) -- setting this breaks t-snes below
cols = ['short_album_name', 'release_date', 'danceability', 'energy', 'loudness',
        'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence']

# convert release_date to an actual date, and sort by it
df['release_date'] = pd.to_datetime(df['release_date'])
df = df.sort_values(by='release_date') #, ascending=False)

df[cols].head()

Unnamed: 0,short_album_name,release_date,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence
15,High Voltage,1976-05-14,0.561,0.785,-4.973,0.0629,0.0282,0.451,0.0615,0.546
14,Dirty Deeds Done Dirt Cheap,1976-09-20,0.475,0.607,-6.528,0.0929,0.0755,0.0802,0.169,0.668
13,Let There Be Rock,1977-03-21,0.292,0.838,-4.459,0.078,0.0014,0.479,0.0953,0.423
12,Powerage,1978-05-05,0.266,0.817,-4.711,0.0779,0.000781,0.0795,0.0759,0.311
11,Highway to Hell,1979-07-27,0.251,0.73,-5.002,0.0615,0.024,0.0137,0.181,0.482


### Spotipy

The freely available Spotipy library is very powerful.  We'll look at a few uses here, including using OAuth to authenticate in order to access specific user info.  OAuth requires a separate endpoint to be configured in order to allow user access, which isn't shown here.

Get a list of all albums by the band Tool:

In [10]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

res = {}

tool_uri = 'spotify:artist:2yEwvVSSSUkcLeSTNyHKh8'
spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials(client_id=CLIENT_ID,
                          client_secret=CLIENT_SECRET))

results = spotify.artist_albums(tool_uri) #, album_type='album')

albums = results['items']

while results['next']:
    print(results['next'])
    results = spotify.next(results)
    albums.extend(results['items'])

for album in albums:
    print(album['name'], '--', album['release_date'])
    res[album['name']] = album['release_date']

Fear Inoculum -- 2019-08-30
10,000 Days -- 2006-04-28
Lateralus -- 2001-05-15
Ænima -- 1996-09-17
Undertow -- 1993-04-06
Opiate² -- 2022-03-01
Opiate -- 1992-03-10


This is where the OAuth occurs, but, as mentioned above it takes some tweaking due to Spotify redirecting the user to a webpage to allow access:

In [12]:
from spotipy.oauth2 import SpotifyOAuth
from spotipy.oauth2 import CacheFileHandler

scope = 'user-library-read, user-read-recently-played, playlist-read-collaborative, \
         user-read-playback-state, user-top-read, user-library-read, playlist-read-private'

## currently writing .tokens in ~/jupyter/.cache-<username>.  setting cache_path causes
## unable to read/write errors
#handler = CacheFileHandler(cache_path='/tmp/', username='tattri')
handler = CacheFileHandler(username='tattri')

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope, client_id=CLIENT_ID, 
                     client_secret=CLIENT_SECRET, redirect_uri=REDIRECT_URI,
                     cache_handler=handler))

After we've authenticated, get the currently playing track, if applicable.  If there is no song currently playing, return the last song played:

In [14]:
artists = []
track, results = {}, {}
pop = 0

results = sp.current_user_playing_track()

if results != None:
    print('Now playing:  ', end='')
    
    ## in case there is more than one artist on a track
    for i in range(len(results['item']['artists'])):
        artists.append(dict(results['item']['artists'][i])['name'])

    trackName = results['item']['name']
    album = results['item']['album']['name']
    pop = results['item']['popularity']

    print(trackName, 'by', str.join(', ', artists), 'from the album', album, '-- popularity',pop)

else:
    print('No song currently playing, the last song played is:  ', end='')
    results = sp.current_user_recently_played(limit=1)
    
    track = dict(results['items'][0])
    for i in range(len(track['track']['artists'])):
        artists.append(dict(track['track']['artists'][i])['name'])

    trackName = track['track']['name']
    album = track['track']['album']['name']
    pop = track['track']['popularity']

    print(trackName, ' by ', str.join(', ', artists), ' from the album ', 
          album, ' --- popularity ', pop, ', played at ', track['played_at'], sep='')


No song currently playing, the last song played is:  Lucky You (feat. Joyner Lucas) by Eminem, Joyner Lucas from the album Curtain Call 2 --- popularity 62, played at 2022-09-11T14:20:21.007Z
