# Classifying Genre:<br>Using a Random Forest Model to predict genre

Blake Spencer<br>
February 2019

The goal of this project was to build a model that would take in features of a song provided by Spotiy's API to then predict what genre it belongs to. Choosing six generic genres defined by curated spotify playlists, 6000 songs were used to train the model. The main steps were:

1. **Obtain the features of a song in a playlist and create dataframes** (this file)
2. [Train a model being able to predict and create data for Flask app to use](https://github.com/blakespencer/classifying-genre/blob/master/analysis.ipynb)
3. [Link Spotiy's API to try out the model on any song in their library](https://blake-spencer-projects.herokuapp.com/classification#model)

Each of the links above is a Jupyter Notebook file with Python code to complete each step.

The flask app front end and back end code:

- [Flask app code in Python](https://github.com/blakespencer/personal-site-backend)
- [React app code in Javascript](https://github.com/blakespencer/personal-site-frontend)


In [1]:
import os
import sys
import spotipy
import webbrowser
import spotipy.util as util
import pandas as pd
import pickle
pd.options.display.max_columns = None
from time import sleep

### Setting up Spotify API credentials

In [2]:
from spotipy.oauth2 import SpotifyClientCredentials
client_credentials_manager = SpotifyClientCredentials(client_id=os.environ['SPOTIPY_CLIENT_ID'], client_secret=os.environ['SPOTIPY_CLIENT_SECRET'])
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

### Create a function that given a track uri, it will create a row with the relevent features of a track

In [130]:
def get_spotify_row(uri):
    track = sp.track(uri)
    features = sp.audio_features(tracks=[uri])[0]
    analysis = sp.audio_analysis(uri)
    analysis = analysis['track']

    features_data = [{
        'track': track['name'],
        'name': track['album']['artists'][0]['name'],
        'acousticness': features['acousticness'],
        'danceability': features['danceability'],
        'energy': features['energy'],
        'speechiness': features['speechiness'],
        'valence': features['valence'],
        'instrumentalness': features['instrumentalness'],
        'liveness': features['liveness'],
        'end_of_fade_in': analysis['end_of_fade_in'],
        'start_of_fade_out': analysis['start_of_fade_out'],
        'loudness':analysis['loudness'],
        'tempo': analysis['tempo'],
        'tempo_confidence': analysis['tempo_confidence'],
        'time_signature_confidence': analysis['time_signature_confidence'],
        'time_signature': analysis['time_signature']
    }]
    return pd.DataFrame(features_data)

In [129]:
get_spotify_row('1wxF8huN8OO1HkiDCEFLR2')

Unnamed: 0,num_samples,duration,offset_seconds,window_seconds,analysis_sample_rate,analysis_channels,end_of_fade_in,start_of_fade_out,loudness,tempo,tempo_confidence,time_signature,time_signature_confidence,key,key_confidence,mode,mode_confidence,code_version,echoprint_version,synch_version,rhythm_version,acousticness,danceability,duration_ms,energy,instrumentalness,liveness,speechiness,valence,name,track
0,5544840,251.46667,0,0,22050,1,0.12649,245.44073,-8.489,166.222,0.195,4,0.907,3,0.015,0,0.167,3.15,4.12,1.0,1.0,0.746,0.665,251467,0.637,0.000131,0.0948,0.0404,0.966,Aaron Neville,Hercules


### A function that given a playlist uri, it will return all the track uris

In [137]:
def get_playlist_tracks(playlist_id):
    playlist = sp.user_playlist('Spotify', playlist_id)
    track_uris = [i['track']['uri'] for i in playlist['tracks']['items']]
    return track_uris
    

In [142]:
list_of_uris = get_playlist_tracks('spotify:user:spotify:playlist:37i9dQZF1DX4GMJS146m00')

In [144]:
list_of_uris[0: 2]

['spotify:track:1E4sRb8cHi8cmK14NLFfa9',
 'spotify:track:000T3UePdtzQlaOq99B32r']

### Function that given a playlist id, it will create a dataframe with all the tracks features in that playlist and label it with the given genre

In [138]:
def create_genre_df(playlist_id, genre):
    tracks = get_playlist_tracks(playlist_id)
    df = None
    isFirst = True
    for i in tracks:
        if(isFirst):
            df = get_spotify_row(i) 
            isFirst = False
        else:
            df = df.append(get_spotify_row(i), ignore_index=True)
    df['genre'] = genre
    return df

In [139]:
df_soul = create_genre_df('spotify:user:spotify:playlist:37i9dQZF1DX4GMJS146m00', 'soul')

In [146]:
df_soul.head()

Unnamed: 0,acousticness,danceability,end_of_fade_in,energy,instrumentalness,liveness,loudness,name,speechiness,start_of_fade_out,tempo,tempo_confidence,time_signature,time_signature_confidence,track,valence,genre
0,0.288,0.605,0.17701,0.315,0.0,0.599,-8.958,Various Artists,0.0264,155.94522,111.252,0.267,3,0.816,(You Make Me Feel Like) A Natural Woman,0.441,soul
1,0.45,0.764,0.26118,0.69,0.0,0.0815,-5.961,The Temptations,0.0473,145.3395,119.481,0.901,4,0.914,Ain't Too Proud To Beg,0.939,soul
2,0.318,0.634,0.25782,0.677,0.0,0.0902,-7.19,Jackie Wilson,0.0448,168.55946,95.792,0.197,4,1.0,(Your Love Keeps Lifting Me) Higher & Higher,0.939,soul
3,0.651,0.386,0.26635,0.578,0.0,0.308,-5.584,Sam Cooke,0.0365,154.79583,106.505,0.031,3,1.0,Bring It On Home To Me,0.583,soul
4,0.628,0.697,0.37156,0.656,2.4e-05,0.422,-5.38,Various Artists,0.0424,160.61243,125.624,0.155,4,1.0,My Guy,0.89,soul


### Create a function that given a list of playlist uris will return a dataframe 

In [149]:
def create_track_df_from_playlists(playlist_uris, genre):
    df = None
    first = True
    for i in playlist_uris:
        if(first):
            print(i)
            df = create_genre_df(i, genre)
            first = False
        else:
            sleep(4)
            df.append(create_genre_df(i, genre))
            
    df.drop_duplicates(inplace=True)
    return df
            

In [150]:
playlist_uris = ([
    'spotify:user:spotify:playlist:37i9dQZF1DX6PKX5dyBKeq',
    'spotify:user:spotify:playlist:37i9dQZF1DWSOkubnsDCSS',
    'spotify:user:2018hits:playlist:2wHOIKXfBwd2roCCyJGwSC',
    'spotify:user:spotify:playlist:37i9dQZF1DX0XUsuxWHRQd'
])

In [151]:
df_rap = create_track_df_from_playlists(playlist_uris, 'rap')

spotify:user:spotify:playlist:37i9dQZF1DX6PKX5dyBKeq


In [206]:
with open('aws_data/spotify_rap_df.pkl', 'wb') as picklefile:
    pickle.dump(df_rap, picklefile)