# Spotify Track Audio Analysis by Popular Genre: Machine Learning
We're going to be fitting a k-nearest neighbors for classification model to determine the genre of any given Spotify song, using its audio analysis features. To do that, we first need to determine which audio features we want to include in the model. 

We can start with some well-educated guesses and work our way from there.

## Audio Features We Care About:
Acousticness
* 0.0 to 1.0, measure of natural acoustic sounds

Danceability 
* 0.0 to 1.0, combination of tempo, rhythm stability, beat strength, and regularilty

Energy
* 0.0 to 1.0, intensity and activity of the song. Energetic sounds tend to feel fast, loud, and noisy (think death metal)

Loudness (Overall loudness in decibels)
* Some genres' songs tend to be louder than others'. Think rock vs. classical.

Instrumentalness
* 0.0 to 1.0, the probability that there are no vocals. For example, rap is very vocal.

Speechiness (Presence of spoken words)
* Some genres are distinguished by their amount of speech, this ought to help categorize tracks into those genres.

## Audio Features We (Probably) Don't Care About:

Key, Mode, Time Signature
* These features should not have any effect on genre, as any song in any genre can be written in any key, mode, or time signature.

Liveness (Probability of an audience present in the recording)
* Whether an audience is present does not affect the genre of the song.

Tempo (Overall beats per minute of a song)
* Not very useful by itself, we prefer the Energy feature that includes this idea.

Valence (Musical positiveness)
* Whether a song sounds happy or sad shouldn't affect its genre.

## Fitting a model
We can use the model with our guesses for which features to include as our "base" model, try some models with different features, and see which one is estimated to be the best. We'll first have to grab the DataFrame with audio data, though.

In [1]:
from google.colab import drive
import pandas as pd

# Mount the google drive
drive.mount('/content/gdrive')

# Grab df_audio from the csv file from before
df_audio = pd.read_csv('gdrive/My Drive/audio.csv')

# Define initial X_train and y_train 
X_train = df_audio[['acousticness', 'danceability', 'energy', 
                    'loudness', 'instrumentalness', 'speechiness']]
y_train = df_audio['genre']

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


We'll try a basic KNeighborsClassifier with an arbitrary value of k, lets say 15, and try out a bunch of combinations of features. We'll be evaluating the models on their f1 scores to determine which one is the "best".

In [2]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score

# Define pipeline
pipeline = make_pipeline(
    MinMaxScaler(),
    KNeighborsClassifier(n_neighbors=15)
)

# Fit the pipeline
pipeline.fit(X_train, y_train)

# Get f1 test error estimate 
cross_val_score(pipeline, 
                X_train, 
                y_train, 
                cv=10, 
                scoring="f1_macro").mean()

0.5691154005120315

Our "base" model has an f1 score of 0.590, lets see if we can add features to X_train and beat that. Liveness, key, mode, and time signature we can almost certinaly rule out, so lets try adding tempo and valence.

In [3]:
test_features = [df_audio[['acousticness', 'danceability', 'energy', 'loudness', 
                           'instrumentalness', 'speechiness', 'tempo']],
                 df_audio[['acousticness', 'danceability', 'energy', 'loudness', 
                           'instrumentalness', 'speechiness', 'valence']],
                 df_audio[['acousticness', 'danceability', 'energy', 'loudness', 
                           'instrumentalness', 'speechiness', 'tempo', 'valence']]]

# For each group of features, re-fit the pipeline and get the new estimated test error.
for features in test_features:
  pipeline.fit(features, y_train)
  print(cross_val_score(pipeline, 
                        features, 
                        y_train, 
                        cv=10, 
                        scoring="f1_macro").mean())

0.555218341970047
0.5865770648905764
0.5613017498071212


Looks like our model did the best with both tempo and valence, so lets use those for our final model. Now, we need to determine the best number of neighbors and the metric to measure distance with.

In [4]:
from sklearn.model_selection import GridSearchCV

# Set up a GridSearch
grid_search = GridSearchCV(
    pipeline,
    param_grid={"kneighborsclassifier__n_neighbors": range(1, 50),
                "kneighborsclassifier__metric": ['minkowski', 'euclidean', 'manhattan']},
    scoring="f1_macro",
    cv=10
)

# Set X_train to the set of features we found were optimal before
X_train = df_audio[['acousticness', 'danceability', 'energy', 'loudness', 
                    'instrumentalness', 'speechiness', 'valence']]
grid_search.fit(X_train, y_train)
grid_search.best_params_

{'kneighborsclassifier__metric': 'manhattan',
 'kneighborsclassifier__n_neighbors': 24}

Now that we have our optimal distance metric and number of neighbors, we can redefine the pipeline with those metrics, and check the macro precision and recall to see how well it's doing on our training set.

In [5]:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

# Redefine pipeline with optimal metrics from GridSearch
params = grid_search.best_params_

pipeline = make_pipeline(
    StandardScaler(),
    KNeighborsClassifier(n_neighbors=params['kneighborsclassifier__n_neighbors'], 
                         metric=params['kneighborsclassifier__metric'])
)

# Fit pipeline and get macro precision and recall
pipeline.fit(X_train, y_train)
print(precision_score(y_train, pipeline.predict(X_train), average='macro'))
print(recall_score(y_train, pipeline.predict(X_train), average='macro'))

0.6632744561641964
0.6457142857142858


In [6]:
cross_val_score(pipeline, 
                X_train, 
                y_train, 
                cv=10, 
                scoring="f1_macro").mean()

0.589976958533988

## Testing Out the Model
Precision and recall are both close to each other so that's a good sign! Lets test it out on some real songs from Spotify. We'll need to get access credentials again though.

In [0]:
import requests
import base64

# Create base64 encoding of client id and secret for POST request header
client_id = "c60d8ca0501a415d97a33cbd2af92725"
client_secret = "a45f6fa2b35b4e3fbdf87fefe6b9df7c"
b64_auth_str = base64.b64encode((client_id + ":" + client_secret).encode()).decode()

# Make the actual request and get an access token using client credentials
response = requests.post('https://accounts.spotify.com/api/token',
                         headers={'Authorization': 'Basic ' + b64_auth_str},
                         data={'grant_type': 'client_credentials'})

# Store the access token for future use
access_token = response.json()['access_token']

Lets build a list of songs we want to try and predict genre for! Spotify doesn't have an api that lets you grab the track id based on song name, so we have to go look around and get track ids ourselves. Bummer!

In [0]:
import numpy as np

# Search up some songs and their ids
track_names = ["Knights of Cydonia - Muse", 
               "One More Time - Daft Punk", 
               "Take Me Home, Country Roads - John Denver",
               "Kids - MGMT",
               "Moonlight Sonata 1st Movement - Ludwig van Beethoven",
               "Fly Me To The Moon (In Other Words) - Frank Sinatra",
               "Young, Wild, & Free - Snoop Dogg, Wiz Khalifa, Bruno Mars",
               "Take on Me - a-ha"]
track_ids = ["7ouMYWpwJ422jRcDASZB7P", 
             "0DiWol3AO6WpXZgp0goxAV", 
             "39q7xibBdRboeMKUbZEB6g",
             "1jJci4qxiYcOHhQR247rEU",
             "7xfSCgVOkQJhVxnqzepATH",
             "5b7OgznPJJr1vHNYGyvxau",
             "6YbhspuOar1D9WSSnfe7ds",
             "2WfaOiMkCvy7F5fcp2zZ8L"]

df_tracks = pd.DataFrame(data=track_ids, index=track_names, columns=["track_id"])
df_tracks.index.name = "track_name"

Now that we have our track names and ids, we have to go and get the audio analysis for each one like we did for the training data before.

In [0]:
audio_features_endpoint = 'https://api.spotify.com/v1/audio-features?ids=%s'

# Get the audio features of each song and put it into a json
response = requests.get(audio_features_endpoint % ','.join(df_tracks['track_id']),
                        headers={'Authorization': 'Bearer ' + access_token})
response_json = response.json()

In [0]:
from pandas.io.json import json_normalize

df_test = pd.DataFrame()

# Clean the audio features json into a DataFrame
for i in range(len(response_json['audio_features'])):
  df = json_normalize(response_json['audio_features'][i])
  df['track_name'] = df_tracks.index[i]
  df_test = pd.concat([df, df_test])

Now that we have some songs to test the model on, lets see if our predictions make sense!

In [11]:
features = ['acousticness', 'danceability', 'energy', 'loudness', 
            'instrumentalness', 'speechiness', 'valence']

# Predict the genre for each of our songs
for i in range(len(df_test)):
  print("'%s' : %s" % (df_test['track_name'].iloc[i], 
                       pipeline.predict([df_test[features].iloc[i]])))

'Take on Me - a-ha' : ['rock']
'Young, Wild, & Free - Snoop Dogg, Wiz Khalifa, Bruno Mars' : ['hip-hop']
'Fly Me To The Moon (In Other Words) - Frank Sinatra' : ['country']
'Moonlight Sonata 1st Movement - Ludwig van Beethoven' : ['classical']
'Kids - MGMT' : ['electronic']
'Take Me Home, Country Roads - John Denver' : ['jazz']
'One More Time - Daft Punk' : ['pop']
'Knights of Cydonia - Muse' : ['rock']
