# The National Spotify Stats and Visualization

This is a small project where Spotify Audio Features for my favorite band, The National, are retrieved from the Spotify API with the spotipy library. These audio features for The National are then analyzed and visualized.

Read more about the different Spotify Audio Features here:
<https://developer.spotify.com/documentation/web-api/reference/#object-audiofeaturesobject>

## Retrieving Audio Features from Spotify API

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import time
import pandas as pd
import numpy as np

### Oauth2 Authorization

Obtain client_id and client_secret after creating a new app from: <https://developer.spotify.com/dashboard/applications>

In [2]:
start_time = time.time()

client_id = "210e37ebca874b6db50d781e9100f83c"
client_secret = "f4834b95875b438e853f0c4e39987261"

In [3]:
# Client Credentials flow for server-to-server authentication
auth_manager = SpotifyClientCredentials(client_id = client_id,
                                        client_secret = client_secret)
# Spotify Client module
sp = spotipy.Spotify(auth_manager=auth_manager)

### Retrieve all albums of The National on Spotify

In [4]:
national_uri = "spotify:artist:2cCUtGK9sDU2EoElnk0GNB"
results = sp.artist_albums(national_uri, 
                           album_type='album', 
                           country='US')

In [5]:
# initialize some lists to store results
album_names = []
album_uris = []
album_release_dates = []
album_total_tracks = []

In [6]:
# store info for each album in lists
for album in results['items']:
    album_names.append(album['name'])
    album_uris.append(album['uri'])
    album_release_dates.append(album['release_date'])
    album_total_tracks.append(album['total_tracks'])

In [7]:
for album in album_names:
    print(album)

Sad Songs for Dirty Lovers (2021 Remaster)
Juicy Sonic Magic (Live in Berkeley September 24-25 2018)
I Am Easy to Find
Boxer (Live in Brussels)
Sleep Well Beast
Trouble Will Find Me
High Violet (Expanded Edition)
High Violet
High Violet
High Violet
Boxer
Boxer
Alligator
Sad Songs for Dirty Lovers
The National (2021 Remaster)
The National


Several albums like High Violet appear multiple times in the Spotify results

### Retrieve list of tracks for each album
Create a nested dictionary to store track information of each album

In [8]:
def get_album_tracks(name, uri):
    '''
    Retrieves and stores track information for each album
    '''
    
    national_albums[uri] = {} # create nested dictionary for album
    
    # initialize empty lists inside nested dictionary for album
    national_albums[uri]['album'] = []
    national_albums[uri]['track_number'] = []
    national_albums[uri]['name'] = []
    national_albums[uri]['uri'] = []
    national_albums[uri]['duration_ms'] = []
    
    # get album tracks
    tracks = sp.album_tracks(uri, market='US')
    
    for track in tracks['items']:
        national_albums[uri]['album'].append(name)
        national_albums[uri]['track_number'].append(track['track_number'])
        national_albums[uri]['name'].append(track['name'])
        national_albums[uri]['uri'].append(track['uri'])
        national_albums[uri]['duration_ms'].append(track['duration_ms'])

In [9]:
national_albums = {}

for (name, uri) in zip(album_names, album_uris):
    get_album_tracks(name, uri)

In [10]:
def get_track_features(album_uri):
    '''
    Retrieves and stores Spotify audio features and popularity for each track
    '''
    
    print("Retrieving audio features for album: {}".format(national_albums[album_uri]['album'][0]))
    
    feature_list = ['acousticness',
                   'danceability',
                   'energy',
                   'instrumentalness',
                   'liveness',
                   'loudness',
                   'speechiness',
                   'tempo',
                   'valence',
                   'key',
                   'time_signature']

    # initialize to store track features
    for feature in feature_list:
        national_albums[album_uri][feature] = []
    
    national_albums[album_uri]['popularity'] = []
    
    # iterate through each track
    for track in national_albums[album_uri]['uri']:
        try:
            # get track features and popularity
            track_features = sp.audio_features(track)[0]
            track_popularity = sp.track(track)
            
            # popularity is stored in track, separate from audio_features
            national_albums[album_uri]['popularity'].append(track_popularity['popularity'])

            # add audio features for each track
            for feature in feature_list:
                national_albums[album_uri][feature].append(track_features[feature])
                
            time.sleep(1)
            
        # not all tracks have audio features
        except TypeError:
            print("Could not retrieve audio features for track: {}".format(track))
            for feature in feature_list:
                national_albums[album_uri][feature].append("NA")

In [11]:
for album in national_albums:
    get_track_features(album)
    time.sleep(5)

Retrieving audio features for album: Sad Songs for Dirty Lovers (2021 Remaster)
Retrieving audio features for album: Juicy Sonic Magic (Live in Berkeley September 24-25 2018)
Could not retrieve audio features for track: spotify:track:1vqZ6Lza85ENA860s8vtCs
Retrieving audio features for album: I Am Easy to Find
Retrieving audio features for album: Boxer (Live in Brussels)
Retrieving audio features for album: Sleep Well Beast
Retrieving audio features for album: Trouble Will Find Me
Retrieving audio features for album: High Violet (Expanded Edition)
Retrieving audio features for album: High Violet
Retrieving audio features for album: High Violet
Retrieving audio features for album: High Violet
Retrieving audio features for album: Boxer
Retrieving audio features for album: Boxer
Retrieving audio features for album: Alligator
Retrieving audio features for album: Sad Songs for Dirty Lovers
Retrieving audio features for album: The National (2021 Remaster)
Retrieving audio features for album:

In [44]:
# unpack the nested dictionary and convert to a dataframe
national_dict = {}

all_features = list(national_albums[album_uris[0]].keys())

# initialize empty lists in dictionary to store results
for feature in all_features:
    national_dict[feature] = []

# add all results to lists in dictionary
for album in national_albums:
    for feature in national_albums[album]:
        national_dict[feature].extend(national_albums[album][feature])

# create a pandas dataframe
national_df = pd.DataFrame.from_dict(national_dict)
national_df['artist'] = name

## Dataframe cleaning

There are several duplicate albums present, remasters, and not all albums are major studio releases (Juicy Sonic Magic). The dataframe will be filtered to only the major studio release albums and the original albums (no remasters).

The number of plays/listens are not available from the Spotify API and will need to be manually added to the csv file.

Popularity provides a proxy measurement for the number of listens. 

"The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are." - from <https://developer.spotify.com/documentation/web-api/reference/#object-trackobject>

In [45]:
# drop duplicate tracks
national_df.drop_duplicates(inplace=True)

In [46]:
# Set NA values to be NaN
national_df.replace({'NA': np.nan}, inplace=True)

In [47]:
# show NaNs
national_df[national_df.isna().any(axis=1)]

Unnamed: 0,album,track_number,name,uri,duration_ms,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,key,time_signature,popularity,artist
48,Juicy Sonic Magic (Live in Berkeley September ...,15,England - Live at Berkeley 9/25/18,spotify:track:1vqZ6Lza85ENA860s8vtCs,324191,,,,,,,,,,,,16,The National


In [48]:
# drop row with NaN
national_df.dropna(inplace = True)

Only 1 song, spotify:track:1vqZ6Lza85ENA860s8vtCs England - Live at Berkeley 9/25/18, did not have Audio Features associated with the track.

In [49]:
# set album order
album_order = {
    'The National': 0,
    'Sad Songs for Dirty Lovers': 1,
    'Alligator': 2,
    'Boxer': 3,
    'High Violet': 4,
    'Trouble Will Find Me': 5,
    'Sleep Well Beast': 6,
    'I Am Easy to Find': 7}

national_df['album_order'] = national_df['album'].map(album_order)

In [50]:
# create a new dataframe with only the main studio albums

studio_albums = ['The National',
                'Sad Songs for Dirty Lovers',
                'Alligator',
                'Boxer',
                'High Violet',
                'Trouble Will Find Me',
                'Sleep Well Beast',
                'I Am Easy to Find']

studio_albums_df = national_df[national_df['album']\
                    .isin(studio_albums)]\
                    .sort_values(by = ['album_order', 'track_number'])\
                    .reset_index(drop = True)

studio_albums_df

Unnamed: 0,album,track_number,name,uri,duration_ms,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,key,time_signature,popularity,artist,album_order
0,The National,1,Beautiful Head,spotify:track:5qjxbESLSYtV20JKeiwsaZ,188400,0.5470,0.564,0.667,0.04300,0.0959,-8.774,0.0529,148.091,0.5460,4.0,4.0,27,The National,0.0
1,The National,2,Cold Girl Fever,spotify:track:77JYrFDjiqBSUQ6qMLfhaL,246000,0.7560,0.788,0.392,0.75900,0.1070,-11.739,0.0432,118.013,0.5730,9.0,4.0,40,The National,0.0
2,The National,3,The Perfect Song,spotify:track:1cZnwa4gquHoJ6jTIAYAkg,195200,0.0268,0.541,0.521,0.28100,0.0913,-9.377,0.0313,125.675,0.4800,7.0,4.0,28,The National,0.0
3,The National,4,American Mary,spotify:track:1IGBD3TlzViHUHb1FKeEHz,242893,0.5630,0.642,0.318,0.00643,0.0939,-10.876,0.0268,135.867,0.3360,7.0,4.0,26,The National,0.0
4,The National,5,Son,spotify:track:2qNifuc2Z75HavDfBpCuGf,319733,0.0532,0.762,0.480,0.69300,0.0800,-10.709,0.0303,125.025,0.3050,0.0,4.0,25,The National,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,I Am Easy to Find,12,Dust Swirls In Strange Light,spotify:track:7esYGv7h7qbaUO1q6MdqKh,198800,0.6570,0.603,0.563,0.11600,0.1280,-9.959,0.0344,112.059,0.1860,4.0,4.0,36,The National,7.0
99,I Am Easy to Find,13,Hairpin Turns,spotify:track:30SPSWQRfDagq4lJ2KoaR0,267520,0.5220,0.650,0.449,0.07020,0.0906,-9.200,0.0471,136.091,0.4160,8.0,4.0,40,The National,7.0
100,I Am Easy to Find,14,Rylan,spotify:track:6XxPXXqkE4lG7MVkpom6F8,223853,0.4030,0.609,0.789,0.07740,0.0446,-5.565,0.0466,144.050,0.7240,11.0,4.0,51,The National,7.0
101,I Am Easy to Find,15,Underwater,spotify:track:2jMqVZRotAEs6h6RHkWi0e,81480,0.9940,0.195,0.250,0.96700,0.1340,-14.277,0.0341,78.099,0.0376,5.0,4.0,36,The National,7.0


In [51]:
# set data types

data_types = {'duration_ms': 'int64',
             'acousticness': 'float64',
             'danceability': 'float64',
             'energy': 'float64',
             'instrumentalness': 'float64',
             'liveness': 'float64',
             'loudness': 'float64',
             'speechiness': 'float64',
             'tempo': 'float64',
             'valence': 'float64',
             'key': 'int64',
             'time_signature': 'int64',
             'popularity': 'int64',
             'album_order': 'int64'}

In [52]:
studio_albums_df = studio_albums_df.astype(data_types)
studio_albums_df.dtypes

album                object
track_number          int64
name                 object
uri                  object
duration_ms           int64
acousticness        float64
danceability        float64
energy              float64
instrumentalness    float64
liveness            float64
loudness            float64
speechiness         float64
tempo               float64
valence             float64
key                   int64
time_signature        int64
popularity            int64
artist               object
album_order           int64
dtype: object

In [53]:
studio_albums_df.head()

Unnamed: 0,album,track_number,name,uri,duration_ms,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,key,time_signature,popularity,artist,album_order
0,The National,1,Beautiful Head,spotify:track:5qjxbESLSYtV20JKeiwsaZ,188400,0.547,0.564,0.667,0.043,0.0959,-8.774,0.0529,148.091,0.546,4,4,27,The National,0
1,The National,2,Cold Girl Fever,spotify:track:77JYrFDjiqBSUQ6qMLfhaL,246000,0.756,0.788,0.392,0.759,0.107,-11.739,0.0432,118.013,0.573,9,4,40,The National,0
2,The National,3,The Perfect Song,spotify:track:1cZnwa4gquHoJ6jTIAYAkg,195200,0.0268,0.541,0.521,0.281,0.0913,-9.377,0.0313,125.675,0.48,7,4,28,The National,0
3,The National,4,American Mary,spotify:track:1IGBD3TlzViHUHb1FKeEHz,242893,0.563,0.642,0.318,0.00643,0.0939,-10.876,0.0268,135.867,0.336,7,4,26,The National,0
4,The National,5,Son,spotify:track:2qNifuc2Z75HavDfBpCuGf,319733,0.0532,0.762,0.48,0.693,0.08,-10.709,0.0303,125.025,0.305,0,4,25,The National,0


In [54]:
# export dataframe to .csv 
studio_albums_df.to_csv("National_studio_albums.csv",
                       index = False)

### Original vs Remastered Albums

In [55]:
self_title = national_df[(national_df['album'] == 'The National') |\
            (national_df['album'] == 'The National (2021 Remaster)')]\
            .sort_values(by = ['track_number', 'album'])\
            .reset_index(drop = True)

In [56]:
sad_songs = national_df[(national_df['album'] == 'Sad Songs for Dirty Lovers') |\
            (national_df['album'] == 'Sad Songs for Dirty Lovers (2021 Remaster)')]\
            .sort_values(by = ['track_number', 'album'])\
            .reset_index(drop = True)

In [57]:
self_title.groupby('album').mean().round(2)

Unnamed: 0_level_0,track_number,duration_ms,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,key,time_signature,popularity,album_order
album,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
The National,6.5,219713.08,0.34,0.64,0.46,0.31,0.15,-11.31,0.05,123.32,0.4,4.42,3.67,26.17,0.0
The National (2021 Remaster),6.5,219856.42,0.36,0.64,0.45,0.38,0.15,-11.82,0.05,123.56,0.41,4.58,3.67,18.83,


In [58]:
sad_songs.groupby('album').mean().round(2)

Unnamed: 0_level_0,track_number,duration_ms,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,key,time_signature,popularity,album_order
album,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Sad Songs for Dirty Lovers,6.5,224970.75,0.34,0.51,0.58,0.52,0.16,-8.26,0.04,130.08,0.38,3.58,4.0,29.5,1.0
Sad Songs for Dirty Lovers (2021 Remaster),6.5,225001.92,0.31,0.53,0.59,0.5,0.17,-7.87,0.04,130.11,0.39,4.33,4.0,34.92,
