# 0 - Accessing ML Audio Features from Spotify
---------------
This notebook demonstrates how to access the audio features to be used in random forest classification to automatically come up with a new playlist based on an old playlist!

We will be using a third-party Python API to Spotify called [`spotipy`](https://spotipy.readthedocs.io/en/2.22.1/) to access our playlists and their metadata in Python.

In [1]:
## Relevant imports
from typing import Union, Iterable

from dotenv import dotenv_values
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
import numpy as np

The first thing we need is a client ID and secret from [Spotify's developer API](https://developer.spotify.com/). Go through the steps of registering for a developer account and make an app in accordance with [these directions](https://developer.spotify.com/documentation/web-api/tutorials/getting-started).

Once you have these keys, we shouldn't paste them directly into a notebook, particularly if it's a public one! Each user should use their own client ID and secret. We use the [dotenv package](https://github.com/theskumar/python-dotenv) to solve this issue. Create a file named `.env` in the `spotify_recs/tutorial` directory and paste your client ID and secret there in the following format:

        SPOTIFY_CLIENT_ID=<your_client_id_string>
        SPOTIFY_CLIENT_SECRET=<your_client_secret_string>

Then run the next cell below.

In [2]:
# Set up Spotify API client credentials
config = dotenv_values('.env')
client_id = config["SPOTIFY_CLIENT_ID"]
client_secret = config["SPOTIFY_CLIENT_SECRET"]

client_credentials_manager = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

This class puts all of the information extracted by a `spotipy.playlist` into a more machine learning-friendly format, namely the `ml_data` and `ml_audio_features` attributes. All you have to do is create a `Playlist` instance, initializing it with the Spotify playlist URL.

In [3]:
class Playlist():
    '''Access all audio features of a Spotify playlist in an machine-learning 
    friendly format.'''

    def __init__(self, playlist_id):
        ## Get track metadata
        self.playlist = sp.playlist(playlist_id=playlist_id)
        self.raw_tracks = self.playlist['tracks']['items']
        self.track_attribute_labels = list(self.raw_tracks[0]['track'].keys())
        self.length = len(self.raw_tracks)

        ## All useful information
        self.track_ids = self.get_info('id')
        self.track_names = self.get_info('name')
        self.track_artists = self.get_info('artist')

        audio_features = sp.audio_features(self.track_ids)
        self.audio_features = [{k.strip():v for k,v in d.items()} 
                               for d in audio_features]
        
        ## Machine Learning-friendly formatting
        self.data = pd.DataFrame(index=[self.track_ids, self.track_names], 
                                 data=self.audio_features)
        self.data = self.data.sort_index(axis='columns')
        self.data.columns = self.data.columns.str.strip()
        self.data.index.names = ['id', 'name']
        self.data['like'] = np.nan
        self.audio_feature_labels = self.data.columns

        ## ML specifically (e.g. random forest)
        self.ml_feature_labels = list(set(self.audio_feature_labels)-\
                {'type','id','uri','track_href','analysis_url','like'})
        # X
        self.ml_data = self.data[self.ml_feature_labels].sort_index(
                                                        axis='columns')
        # y, or "likes". Initialize to None
        self.ml_likes = pd.DataFrame(index=self.ml_data.index, 
                                     columns=['like'],
                                     data=[np.nan]*self.length)

    def get_info(self, info_tag:str):
        '''Unnest information from raw_tracks dict.'''

        if any([info_tag in track['track'].keys() 
                                            for track in self.raw_tracks]):
            attributes = [track['track'][info_tag] 
                            if info_tag in track['track'].keys()
                            else None
                            for track in self.raw_tracks]
            return attributes
        
    def set_like_status(self, like_labels:Union[Iterable, int]):
        '''Set the like status of songs'''

        if isinstance(like_labels, Iterable):
            if len(like_labels) == self.data.shape[0] and \
                not (isinstance(like_labels, dict) or 
                     isinstance(like_labels, pd.Series)):
                # A list-like object of ordered like values
                self.ml_likes.loc[:,'like'] = like_labels
            elif type(like_labels) in {dict, pd.Series}:
                # Set the like value of specific songs
                if isinstance(like_labels, dict):
                    keys = like_labels.keys()
                    vals = like_labels.values()
                else:
                    keys = like_labels.index
                    vals = like_labels
                try:
                    self.data.loc[keys, 'like'] = vals
                    self.ml_likes.loc[keys, 'like'] = vals
                except KeyError:
                    print(f"Specified songs don't exist in playlist. "+
                          "Like status not set.")

        elif isinstance(like_labels, int):
            # Set all like values of playlist to one number
            likes = [int(like_labels)]*self.data.shape[0]
            self.data.loc[:,'like'] = likes
            self.ml_likes.loc[:,'like'] = likes

Now to access the metadata from your own playlist, all you have to do is get the Spotify playlist's ID, which can be accessed from the Share Playlist URL. An example is listed below:

In [4]:
# Specify the playlist ID for which you want to obtain song IDs and attributes
rb_playlist_id = '5Os91wBXvZCdwiOa2aJV5I' # https://open.spotify.com/playlist/5Os91wBXvZCdwiOa2aJV5I?si=22eadc4636634b6d

# Get the playlist information using the playlist ID
rb_playlist = Playlist(rb_playlist_id)

In [10]:
df = pd.DataFrame(index=[rb_playlist.track_ids, rb_playlist.track_names], 
                                 data=rb_playlist.audio_features)

df

Unnamed: 0,Unnamed: 1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
5awNIWVrh2ISfvPd5IUZNh,PTT (Paint The Town),0.781,0.814,8,-2.437,1,0.1,0.0735,1.9e-05,0.0565,0.546,124.028,audio_features,5awNIWVrh2ISfvPd5IUZNh,spotify:track:5awNIWVrh2ISfvPd5IUZNh,https://api.spotify.com/v1/tracks/5awNIWVrh2IS...,https://api.spotify.com/v1/audio-analysis/5awN...,201120,4
65497v9jzFfu5sSA8ADap5,Sistema De Patio,0.914,0.948,6,-2.243,0,0.21,0.07,0.000782,0.328,0.694,109.999,audio_features,65497v9jzFfu5sSA8ADap5,spotify:track:65497v9jzFfu5sSA8ADap5,https://api.spotify.com/v1/tracks/65497v9jzFfu...,https://api.spotify.com/v1/audio-analysis/6549...,134223,4
7I0vl97TYZW86qYdgcD1qu,Back Door,0.82,0.842,7,-3.853,1,0.272,0.157,0.0,0.112,0.653,107.952,audio_features,7I0vl97TYZW86qYdgcD1qu,spotify:track:7I0vl97TYZW86qYdgcD1qu,https://api.spotify.com/v1/tracks/7I0vl97TYZW8...,https://api.spotify.com/v1/audio-analysis/7I0v...,189287,4
7dMTCS9BLzBqYTlAuHP8TM,XT4S1S,0.797,0.677,9,-4.163,0,0.0857,0.101,0.0,0.441,0.919,130.034,audio_features,7dMTCS9BLzBqYTlAuHP8TM,spotify:track:7dMTCS9BLzBqYTlAuHP8TM,https://api.spotify.com/v1/tracks/7dMTCS9BLzBq...,https://api.spotify.com/v1/audio-analysis/7dMT...,219320,4
0Z99Xe1lGBmq60RwJ5YU18,PING PONG,0.855,0.87,10,-2.059,0,0.0826,0.0122,0.000675,0.0202,0.863,103.028,audio_features,0Z99Xe1lGBmq60RwJ5YU18,spotify:track:0Z99Xe1lGBmq60RwJ5YU18,https://api.spotify.com/v1/tracks/0Z99Xe1lGBmq...,https://api.spotify.com/v1/audio-analysis/0Z99...,158253,4
1t8sqIScEIP0B4bQzBuI2P,MY BAG,0.838,0.823,8,-3.929,0,0.164,0.0864,7.1e-05,0.156,0.759,94.024,audio_features,1t8sqIScEIP0B4bQzBuI2P,spotify:track:1t8sqIScEIP0B4bQzBuI2P,https://api.spotify.com/v1/tracks/1t8sqIScEIP0...,https://api.spotify.com/v1/audio-analysis/1t8s...,160520,4
0bspC5fLlWgwCM6Rx1YUIh,BIG MAD,0.958,0.523,8,-6.1,1,0.0591,0.0293,0.0,0.106,0.309,102.039,audio_features,0bspC5fLlWgwCM6Rx1YUIh,spotify:track:0bspC5fLlWgwCM6Rx1YUIh,https://api.spotify.com/v1/tracks/0bspC5fLlWgw...,https://api.spotify.com/v1/audio-analysis/0bsp...,143111,4
7gRFDGEzF9UkBV233yv2dc,Shut Down,0.818,0.69,0,-5.119,1,0.0454,0.00327,0.0,0.186,0.688,110.071,audio_features,7gRFDGEzF9UkBV233yv2dc,spotify:track:7gRFDGEzF9UkBV233yv2dc,https://api.spotify.com/v1/tracks/7gRFDGEzF9Uk...,https://api.spotify.com/v1/audio-analysis/7gRF...,175889,3
4kXxEhuatrvwrTQycA7s9B,BIZCOCHITO,0.93,0.774,1,-4.203,1,0.0744,0.605,0.0238,0.12,0.781,116.972,audio_features,4kXxEhuatrvwrTQycA7s9B,spotify:track:4kXxEhuatrvwrTQycA7s9B,https://api.spotify.com/v1/tracks/4kXxEhuatrvw...,https://api.spotify.com/v1/audio-analysis/4kXx...,109120,4
0W6I02J9xcqK8MtSeosEXb,Get Into It (Yuh),0.912,0.655,8,-6.029,0,0.162,0.315,3.4e-05,0.0894,0.793,91.942,audio_features,0W6I02J9xcqK8MtSeosEXb,spotify:track:0W6I02J9xcqK8MtSeosEXb,https://api.spotify.com/v1/tracks/0W6I02J9xcqK...,https://api.spotify.com/v1/audio-analysis/0W6I...,138293,4


In [8]:
rb_playlist.data

Unnamed: 0_level_0,Unnamed: 1_level_0,acousticness,analysis_url,danceability,duration_ms,energy,id,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,track_href,type,uri,valence,like
id,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
5awNIWVrh2ISfvPd5IUZNh,PTT (Paint The Town),0.0735,https://api.spotify.com/v1/audio-analysis/5awN...,0.781,201120,0.814,5awNIWVrh2ISfvPd5IUZNh,1.9e-05,8,0.0565,-2.437,1,0.1,124.028,4,https://api.spotify.com/v1/tracks/5awNIWVrh2IS...,audio_features,spotify:track:5awNIWVrh2ISfvPd5IUZNh,0.546,1.0
65497v9jzFfu5sSA8ADap5,Sistema De Patio,0.07,https://api.spotify.com/v1/audio-analysis/6549...,0.914,134223,0.948,65497v9jzFfu5sSA8ADap5,0.000782,6,0.328,-2.243,0,0.21,109.999,4,https://api.spotify.com/v1/tracks/65497v9jzFfu...,audio_features,spotify:track:65497v9jzFfu5sSA8ADap5,0.694,1.0
7I0vl97TYZW86qYdgcD1qu,Back Door,0.157,https://api.spotify.com/v1/audio-analysis/7I0v...,0.82,189287,0.842,7I0vl97TYZW86qYdgcD1qu,0.0,7,0.112,-3.853,1,0.272,107.952,4,https://api.spotify.com/v1/tracks/7I0vl97TYZW8...,audio_features,spotify:track:7I0vl97TYZW86qYdgcD1qu,0.653,1.0
7dMTCS9BLzBqYTlAuHP8TM,XT4S1S,0.101,https://api.spotify.com/v1/audio-analysis/7dMT...,0.797,219320,0.677,7dMTCS9BLzBqYTlAuHP8TM,0.0,9,0.441,-4.163,0,0.0857,130.034,4,https://api.spotify.com/v1/tracks/7dMTCS9BLzBq...,audio_features,spotify:track:7dMTCS9BLzBqYTlAuHP8TM,0.919,1.0
0Z99Xe1lGBmq60RwJ5YU18,PING PONG,0.0122,https://api.spotify.com/v1/audio-analysis/0Z99...,0.855,158253,0.87,0Z99Xe1lGBmq60RwJ5YU18,0.000675,10,0.0202,-2.059,0,0.0826,103.028,4,https://api.spotify.com/v1/tracks/0Z99Xe1lGBmq...,audio_features,spotify:track:0Z99Xe1lGBmq60RwJ5YU18,0.863,1.0
1t8sqIScEIP0B4bQzBuI2P,MY BAG,0.0864,https://api.spotify.com/v1/audio-analysis/1t8s...,0.838,160520,0.823,1t8sqIScEIP0B4bQzBuI2P,7.1e-05,8,0.156,-3.929,0,0.164,94.024,4,https://api.spotify.com/v1/tracks/1t8sqIScEIP0...,audio_features,spotify:track:1t8sqIScEIP0B4bQzBuI2P,0.759,1.0
0bspC5fLlWgwCM6Rx1YUIh,BIG MAD,0.0293,https://api.spotify.com/v1/audio-analysis/0bsp...,0.958,143111,0.523,0bspC5fLlWgwCM6Rx1YUIh,0.0,8,0.106,-6.1,1,0.0591,102.039,4,https://api.spotify.com/v1/tracks/0bspC5fLlWgw...,audio_features,spotify:track:0bspC5fLlWgwCM6Rx1YUIh,0.309,1.0
7gRFDGEzF9UkBV233yv2dc,Shut Down,0.00327,https://api.spotify.com/v1/audio-analysis/7gRF...,0.818,175889,0.69,7gRFDGEzF9UkBV233yv2dc,0.0,0,0.186,-5.119,1,0.0454,110.071,3,https://api.spotify.com/v1/tracks/7gRFDGEzF9Uk...,audio_features,spotify:track:7gRFDGEzF9UkBV233yv2dc,0.688,1.0
4kXxEhuatrvwrTQycA7s9B,BIZCOCHITO,0.605,https://api.spotify.com/v1/audio-analysis/4kXx...,0.93,109120,0.774,4kXxEhuatrvwrTQycA7s9B,0.0238,1,0.12,-4.203,1,0.0744,116.972,4,https://api.spotify.com/v1/tracks/4kXxEhuatrvw...,audio_features,spotify:track:4kXxEhuatrvwrTQycA7s9B,0.781,1.0
0W6I02J9xcqK8MtSeosEXb,Get Into It (Yuh),0.315,https://api.spotify.com/v1/audio-analysis/0W6I...,0.912,138293,0.655,0W6I02J9xcqK8MtSeosEXb,3.4e-05,8,0.0894,-6.029,0,0.162,91.942,4,https://api.spotify.com/v1/tracks/0W6I02J9xcqK...,audio_features,spotify:track:0W6I02J9xcqK8MtSeosEXb,0.793,1.0


In [7]:
rb_playlist.data.columns

Index(['acousticness', 'analysis_url', 'danceability', 'duration_ms', 'energy',
       'id', 'instrumentalness', 'key', 'liveness', 'loudness', 'mode',
       'speechiness', 'tempo', 'time_signature', 'track_href', 'type', 'uri',
       'valence', 'like'],
      dtype='object')

In [5]:
rb_playlist.ml_data

Unnamed: 0_level_0,Unnamed: 1_level_0,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence
id,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
5awNIWVrh2ISfvPd5IUZNh,PTT (Paint The Town),0.0735,0.781,201120,0.814,1.9e-05,8,0.0565,-2.437,1,0.1,124.028,4,0.546
65497v9jzFfu5sSA8ADap5,Sistema De Patio,0.07,0.914,134223,0.948,0.000782,6,0.328,-2.243,0,0.21,109.999,4,0.694
7I0vl97TYZW86qYdgcD1qu,Back Door,0.157,0.82,189287,0.842,0.0,7,0.112,-3.853,1,0.272,107.952,4,0.653
7dMTCS9BLzBqYTlAuHP8TM,XT4S1S,0.101,0.797,219320,0.677,0.0,9,0.441,-4.163,0,0.0857,130.034,4,0.919
0Z99Xe1lGBmq60RwJ5YU18,PING PONG,0.0122,0.855,158253,0.87,0.000675,10,0.0202,-2.059,0,0.0826,103.028,4,0.863
1t8sqIScEIP0B4bQzBuI2P,MY BAG,0.0864,0.838,160520,0.823,7.1e-05,8,0.156,-3.929,0,0.164,94.024,4,0.759
0bspC5fLlWgwCM6Rx1YUIh,BIG MAD,0.0293,0.958,143111,0.523,0.0,8,0.106,-6.1,1,0.0591,102.039,4,0.309
7gRFDGEzF9UkBV233yv2dc,Shut Down,0.00327,0.818,175889,0.69,0.0,0,0.186,-5.119,1,0.0454,110.071,3,0.688
4kXxEhuatrvwrTQycA7s9B,BIZCOCHITO,0.605,0.93,109120,0.774,0.0238,1,0.12,-4.203,1,0.0744,116.972,4,0.781
0W6I02J9xcqK8MtSeosEXb,Get Into It (Yuh),0.315,0.912,138293,0.655,3.4e-05,8,0.0894,-6.029,0,0.162,91.942,4,0.793


Audio features are the metrics Spotify uses to describe "the vibe" of songs in a quantitative way. Explanations of these features are listed on the Spotify website [here](https://developer.spotify.com/documentation/web-api/reference/get-several-audio-features).These are very useful for using machine learning, especially for architectures like [random forest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2)!

## Liking Songs

In order to use machine learning to recommend a playlist, we need to define how we will decide whether we will include a given song or not. The easiest way to do that is to define a "like" attribute that is 1 if you like the song, and 0 if you don't (or don't want to include it in your playlist). We can set the like status of each song by passing an integer (0 or 1), which sets the like status of all the songs in the playlist to that value, by passing a list of 1s and 0s in the order that they appear in the playlist, or by a `dict`/`pandas` `Series` with key-value pairs.

In [6]:
rb_playlist.set_like_status(1)
rb_playlist.ml_likes

Unnamed: 0_level_0,Unnamed: 1_level_0,like
id,name,Unnamed: 2_level_1
5awNIWVrh2ISfvPd5IUZNh,PTT (Paint The Town),1.0
65497v9jzFfu5sSA8ADap5,Sistema De Patio,1.0
7I0vl97TYZW86qYdgcD1qu,Back Door,1.0
7dMTCS9BLzBqYTlAuHP8TM,XT4S1S,1.0
0Z99Xe1lGBmq60RwJ5YU18,PING PONG,1.0
1t8sqIScEIP0B4bQzBuI2P,MY BAG,1.0
0bspC5fLlWgwCM6Rx1YUIh,BIG MAD,1.0
7gRFDGEzF9UkBV233yv2dc,Shut Down,1.0
4kXxEhuatrvwrTQycA7s9B,BIZCOCHITO,1.0
0W6I02J9xcqK8MtSeosEXb,Get Into It (Yuh),1.0


## Appendix

Here are some helpful examples of how to get other attributes of your playlist.

In [7]:
## These are attributes available for songs that aren't necessarily going to be used for machine learning but are nice to know about
rb_playlist.track_attribute_labels

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'episode', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track', 'track_number', 'type', 'uri'])

In [8]:
# Example of how to get name from id
rb_playlist.data.loc['7dMTCS9BLzBqYTlAuHP8TM'].index.get_level_values('name')

Index(['XT4S1S'], dtype='object', name='name')