# Accessing ML Audio Features from Spotify
---------------
This notebook demonstrates how to access the audio features to be used in random forest classification to automatically come up with a new playlist based on an old playlist!

We will be using a third-party Python API to Spotify called [`spotipy`](https://spotipy.readthedocs.io/en/2.22.1/) to access our playlists and their metadata in Python.

In [15]:
## Relevant imports
from typing import Union, Iterable

from dotenv import dotenv_values
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
import numpy as np

The first thing we need is a client ID and secret from [Spotify's developer API](https://developer.spotify.com/). Go through the steps of registering for a developer account and make an app in accordance with [these directions](https://developer.spotify.com/documentation/web-api/tutorials/getting-started).

Once you have these keys, we shouldn't paste them directly into a notebook, particularly if it's a public one! Each user should use their own client ID and secret. We use the [dotenv package](https://github.com/theskumar/python-dotenv) to solve this issue. Create a file named `.env` in the `spotify_recs` directory and paste your client ID and secret there in the following format:

        SPOTIFY_CLIENT_ID=<your_client_id_string>
        SPOTIFY_CLIENT_SECRET=<your_client_secret_string>

Then run the next cell below.

In [16]:
# Set up Spotify API client credentials
config = dotenv_values('.env')
client_id = config["SPOTIFY_CLIENT_ID"]
client_secret = config["SPOTIFY_CLIENT_SECRET"]

client_credentials_manager = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

This class puts all of the information extracted by a `spotipy.playlist` into a more machine learning-friendly format, namely the `ml_data` and `ml_audio_features` attributes. All you have to do is create a `Playlist` instance, initializing it with the Spotify playlist URL.

In [67]:
class Playlist():

    def __init__(self, playlist_id):
        ## Get track metadata
        self.playlist = sp.playlist(playlist_id=playlist_id)
        self.raw_tracks = self.playlist['tracks']['items']
        self.track_attribute_labels = self.raw_tracks[0]['track'].keys()
        self.length = len(self.raw_tracks)

        ## All useful information
        self.track_ids = self.get_info('id')
        self.track_names = self.get_info('name')
        self.track_artists = self.get_info('artist')

        audio_features = sp.audio_features(self.track_ids)
        self.audio_features = [{k.strip():v for k,v in d.items()} 
                               for d in audio_features]
        
        ## Machine Learning-friendly formatting
        self.data = pd.DataFrame(index=[self.track_ids, self.track_names], 
                                 data=self.audio_features)
        self.data.columns = self.data.columns.str.strip()
        self.data.index.names = ['id', 'name']
        self.audio_feature_labels = self.data.columns

        ## ML specifically (e.g. random forest)
        # initialize likes to None
        self.data['like'] = np.nan
        self.ml_feature_labels = list(set(self.audio_feature_labels)-\
                {'type', 'id', 'uri', 'track_href', 'analysis_url'})
        self.ml_data = self.data[self.ml_feature_labels + ['like']]

    def get_info(self, info_tag:str):
        '''Unnest information from raw_tracks dict.'''

        if any([info_tag in track['track'].keys() 
                                            for track in self.raw_tracks]):
            attributes = [track['track'][info_tag] 
                            if info_tag in track['track'].keys()
                            else None
                            for track in self.raw_tracks]
            return attributes
        
    def set_like_status(self, like_labels:Union[Iterable, int]):
        '''Set the like status of songs'''

        if isinstance(like_labels, Iterable):
            if len(like_labels) == self.data.shape[0] and \
                not (isinstance(like_labels, dict) or 
                     isinstance(like_labels, pd.Series)):
                # A list-like object of ordered like values
                self.data.loc[:,'like'] = like_labels
            elif type(like_labels) in {dict, pd.Series}:
                # Set the like value of specific songs
                if isinstance(like_labels, dict):
                    keys = like_labels.keys()
                    vals = like_labels.values()
                else:
                    keys = like_labels.index
                    vals = like_labels
                try:
                    self.data.loc[keys, 'like'] = vals
                    self.ml_data.loc[keys, 'like'] = vals
                except KeyError:
                    print(f"Specified songs don't exist in playlist. Like status not set.")

        elif isinstance(like_labels, int):
            # Set all like values of playlist to one number
            likes = [like_labels]*self.data.shape[0]
            self.data.loc[:,'like'] = likes
            self.ml_data.loc[:,'like'] = likes


Now to access the metadata from your own playlist, all you have to do is get the Spotify playlist's ID, which can be accessed from the Share Playlist URL. An example is listed below:

In [68]:
# Specify the playlist ID for which you want to obtain song IDs and attributes
rb_playlist_id = '5Os91wBXvZCdwiOa2aJV5I' # https://open.spotify.com/playlist/5Os91wBXvZCdwiOa2aJV5I?si=22eadc4636634b6d

# Get the playlist information using the playlist ID
rb_playlist = Playlist(rb_playlist_id)

In [69]:
rb_playlist.ml_feature_labels + ['like']

['speechiness',
 'tempo',
 'loudness',
 'mode',
 'liveness',
 'acousticness',
 'valence',
 'danceability',
 'key',
 'time_signature',
 'duration_ms',
 'energy',
 'instrumentalness',
 'like']

In [70]:
rb_playlist.ml_data['acousticness']

id                      name                                                  
5awNIWVrh2ISfvPd5IUZNh  PTT (Paint The Town)                                      0.07350
65497v9jzFfu5sSA8ADap5  Sistema De Patio                                          0.07000
7I0vl97TYZW86qYdgcD1qu  Back Door                                                 0.15700
7dMTCS9BLzBqYTlAuHP8TM  XT4S1S                                                    0.10100
0Z99Xe1lGBmq60RwJ5YU18  PING PONG                                                 0.01220
1t8sqIScEIP0B4bQzBuI2P  MY BAG                                                    0.08640
0bspC5fLlWgwCM6Rx1YUIh  BIG MAD                                                   0.02930
7gRFDGEzF9UkBV233yv2dc  Shut Down                                                 0.00327
4kXxEhuatrvwrTQycA7s9B  BIZCOCHITO                                                0.60500
0W6I02J9xcqK8MtSeosEXb  Get Into It (Yuh)                                         0.31500
78qd8dvwea0Gosb6Fe6j3

Audio features are the metrics Spotify uses to describe "the vibe" of songs in a quantitative way. Explanations of these features are listed on the Spotify website [here](https://developer.spotify.com/documentation/web-api/reference/get-several-audio-features).These are very useful for using machine learning, especially for architectures like [random forest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2)!

## Liking Songs

In order to use machine learning to recommend a playlist, we need to define how we will decide whether we will include a given song or not. The easiest way to do that is to define a "like" attribute that is 1 if you like the song, and 0 if you don't (or don't want to include it in your playlist). We can set the like status of each song by passing an integer (0 or 1), which sets the like status of all the songs in the playlist to that value, by passing a list of 1s and 0s in the order that they appear in the playlist, or by a `dict`/`pandas` `Series` with key-value pairs.

In [71]:
rb_playlist.set_like_status(1)
rb_playlist.ml_data['like']

id                      name                                                  
5awNIWVrh2ISfvPd5IUZNh  PTT (Paint The Town)                                      1.0
65497v9jzFfu5sSA8ADap5  Sistema De Patio                                          1.0
7I0vl97TYZW86qYdgcD1qu  Back Door                                                 1.0
7dMTCS9BLzBqYTlAuHP8TM  XT4S1S                                                    1.0
0Z99Xe1lGBmq60RwJ5YU18  PING PONG                                                 1.0
1t8sqIScEIP0B4bQzBuI2P  MY BAG                                                    1.0
0bspC5fLlWgwCM6Rx1YUIh  BIG MAD                                                   1.0
7gRFDGEzF9UkBV233yv2dc  Shut Down                                                 1.0
4kXxEhuatrvwrTQycA7s9B  BIZCOCHITO                                                1.0
0W6I02J9xcqK8MtSeosEXb  Get Into It (Yuh)                                         1.0
78qd8dvwea0Gosb6Fe6j3k  Boss Bitch                           

## Appendix

Here are some helpful examples of how to get other attributes of your playlist.

In [66]:
## These are attributes available for songs that aren't necessarily going to be used for machine learning but are nice to know about
rb_playlist.track_attribute_labels

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'episode', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track', 'track_number', 'type', 'uri'])

In [39]:
# Example of how to get name from id
rb_playlist.data.loc['7dMTCS9BLzBqYTlAuHP8TM'].index.get_level_values('name')

Index(['XT4S1S'], dtype='object', name='name')