Reference:

- https://developer.spotify.com/documentation/web-api/reference/get-audio-features

### Notes

Probably it will make sense to have:
- table for nodes: tracks, artists, albums
- table for edges: (track<>artist, artists<>albums, etc.) aka the edges, by ID

so this is like creating a network table (A<>B), but with reference data for A, B.

Note: genres, release dates defined on artists and albums. However, tracks have very specific data (loudness, analysis and features, etc.). So let's take a music microdata approach.

How do networks of artists microadjust musical styles in response to macro shocks?

In [1]:
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy
import json
import pandas as pd
from datetime import datetime
YMD_FMT = "%Y-%m-%d %H:%M:%S"
CURR_TIME = datetime.now().strftime(YMD_FMT)
print(CURR_TIME)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

with open("credentials.json", "r") as fp:
    creds = json.load(fp)
    client_id=creds['client_id']
    client_secret=creds['client_secret']

# Reference:
# https://spotipy.readthedocs.io/en/2.22.1/#module-spotipy.client
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id, client_secret=client_secret))

2023-04-09 11:16:04


In [2]:
tracks_data = {}
artists_data = {}
network_data = [] # edges

### We iterate over tracks as the fundamental unit of interest.
results = sp.search(q='Armin Van Buuren', limit=3)
for idx, track in enumerate(results['tracks']['items']):
    ### For each track
    
    print(idx, track['id'], track['name'], '|||', ' ; '.join([t['name'] for t in track['artists']]))
    track_id = track['id']
    track_name = track['name']
    track_date = track['album']['release_date'] # always has associated album
    
    ### Let track be the primitive unit of interest.
    ### When we look up a track, populate:
    ### 1. data for track
    ### 2. data for its artist(s)
    ### 3. edges for track-artist.
    ### (ignore albums for now, noting a track-artist edge can have multiple release dates)
    
    ### ----------------------------------------------------------------------------------------
    ### 1. data for track
            
    if track_id not in tracks_data:
        rowdata = {}
        
        # get more album info
        album_info = sp.album(track['album']["external_urls"]["spotify"])
        # can add albums as a third entity later if needed.

        # construct dataframe
        rowdata['date'] = track_date
        rowdata['id'] = track_id
        rowdata['name'] = track_name
        rowdata['album'] = track['album']['name']
        rowdata['album_genres'] = ';'.join(sorted(album_info['genres'])) # NULLABLE
        rowdata['popularity'] = track['popularity']
        rowdata['duration_seconds'] = track['duration_ms']/1000.0
        rowdata['url'] = track['external_urls']['spotify']

        # audio analysis for track
        # https://developer.spotify.com/documentation/web-api/reference/get-audio-analysis
        audio_analysis = sp.audio_analysis(track_id)
        for field in ['loudness','tempo','tempo_confidence',
                      'time_signature','time_signature_confidence',
                      'key','key_confidence', 'mode','mode_confidence']:
            rowdata[field] = audio_analysis['track'][field]

        # audio features of a track
        audio_features = sp.audio_features(track_id)[0] # list of tracks --> list of results
        for field in ['danceability','energy','speechiness','acousticness','liveness','valence']:
            rowdata[field] = audio_features[field]
        tracks_data[track_id] = rowdata
    
    ### ----------------------------------------------------------------------------------------
    ### 2. data for its artist(s)
    
    for artist in track['artists']:
        artist_id = artist['id']
        artist_name = artist['name']

        # artist info
        artist_info = sp.artist(artist_id)
        # get more artist info (genres, etc.)
        # https://stackoverflow.com/questions/61624487/extract-artist-genre-and-song-release-date-using-spotipy
        artist_info2 = sp.artist(artist_info["external_urls"]["spotify"])
        
        if artist_id not in artists_data:
            rowdata = {}
            rowdata['id'] = artist_id
            rowdata['name'] = artist_name
            genres = ';'.join(sorted(artist_info2["genres"]))
            rowdata['genres'] = genres
            rowdata['popularity'] = artist_info['popularity']
            rowdata['followers'] = artist_info['followers']['total']
            rowdata['url'] = artist_info['external_urls']['spotify']
            artists_data[artist_id] = rowdata
        
    ### ----------------------------------------------------------------------------------------
    ### 3. edges for track-artist.
    for artist in track['artists']:
        artist_id = artist['id']
        network_data.append({
            'date': track_date, # release date
            'track_id': track_id,
            'artist_id': artist_id
        })
    
### Combine all results.
index_cols = ['id','date','name','url']
df_tracks = pd.DataFrame(tracks_data).T.reset_index().drop(columns=['id']).rename(columns={'index': 'id'})
df_tracks = df_tracks[index_cols + [c for c in df_tracks if c not in index_cols]]
df_artists = pd.DataFrame(artists_data).T.reset_index().drop(columns=['id']).rename(columns={'index': 'id'})
df_network = pd.DataFrame(network_data)
for df in [df_tracks, df_artists, df_network]:
    df.insert(0, 'scrape_time', CURR_TIME)

0 19bua05ITWjFrk9iGxOA7r On & On ||| Armin van Buuren ; Punctual ; Alika
1 17IEdpDXGZdtrJxjTWoZIL Vulnerable ||| Armin van Buuren ; Vanessa Campagna
2 53QLyfbHqEtXK0ys7YW2Q4 Transmission - Armin van Buuren Remix ||| Eelke Kleijn ; Armin van Buuren


In [3]:
df_tracks.T

Unnamed: 0,0,1,2
scrape_time,2023-04-09 11:16:04,2023-04-09 11:16:04,2023-04-09 11:16:04
id,19bua05ITWjFrk9iGxOA7r,17IEdpDXGZdtrJxjTWoZIL,53QLyfbHqEtXK0ys7YW2Q4
date,2023-03-29,2023-03-10,2023-03-17
name,On & On,Vulnerable,Transmission - Armin van Buuren Remix
url,https://open.spotify.com/track/19bua05ITWjFrk9...,https://open.spotify.com/track/17IEdpDXGZdtrJx...,https://open.spotify.com/track/53QLyfbHqEtXK0y...
album,On & On,Vulnerable,Transmission (Armin van Buuren Remix)
album_genres,,,
popularity,64,60,60
duration_seconds,170.578,199.687,152.012
loudness,-2.855,-3.342,-4.394


In [4]:
df_artists

Unnamed: 0,scrape_time,id,name,genres,popularity,followers,url
0,2023-04-09 11:16:04,0SfsnGyD8FpIN4U4WCkBZ5,Armin van Buuren,dutch trance;edm;pop dance;progressive house;t...,75,4238022,https://open.spotify.com/artist/0SfsnGyD8FpIN4...
1,2023-04-09 11:16:04,1ocnIbhFWM9bSPrd7Hu4zF,Punctual,dance pop;edm;electro house;house;pop dance;po...,63,31352,https://open.spotify.com/artist/1ocnIbhFWM9bSP...
2,2023-04-09 11:16:04,1U3n3wACHLjc8sS1obMEuf,Alika,,53,2405,https://open.spotify.com/artist/1U3n3wACHLjc8s...
3,2023-04-09 11:16:04,4a4KcYuu2DZrL8sTGyu5sM,Vanessa Campagna,,53,3348,https://open.spotify.com/artist/4a4KcYuu2DZrL8...
4,2023-04-09 11:16:04,1FY8kqUQKHwjibwLbp5cey,Eelke Kleijn,progressive house;tech house,58,99761,https://open.spotify.com/artist/1FY8kqUQKHwjib...


In [5]:
df_network

Unnamed: 0,scrape_time,date,track_id,artist_id
0,2023-04-09 11:16:04,2023-03-29,19bua05ITWjFrk9iGxOA7r,0SfsnGyD8FpIN4U4WCkBZ5
1,2023-04-09 11:16:04,2023-03-29,19bua05ITWjFrk9iGxOA7r,1ocnIbhFWM9bSPrd7Hu4zF
2,2023-04-09 11:16:04,2023-03-29,19bua05ITWjFrk9iGxOA7r,1U3n3wACHLjc8sS1obMEuf
3,2023-04-09 11:16:04,2023-03-10,17IEdpDXGZdtrJxjTWoZIL,0SfsnGyD8FpIN4U4WCkBZ5
4,2023-04-09 11:16:04,2023-03-10,17IEdpDXGZdtrJxjTWoZIL,4a4KcYuu2DZrL8sTGyu5sM
5,2023-04-09 11:16:04,2023-03-17,53QLyfbHqEtXK0ys7YW2Q4,1FY8kqUQKHwjibwLbp5cey
6,2023-04-09 11:16:04,2023-03-17,53QLyfbHqEtXK0ys7YW2Q4,0SfsnGyD8FpIN4U4WCkBZ5
