<H1> Creating an EDM SubGenre Classifier that can Classify between 2 or more EDM song Subgenre using Spotify API </H1>

<h2> 1. Overview </h2>
For many people, identifying a song's genre just by listening is easy. However, for those who want to understand the characteristics of different genres more deeply, determining a song's genre can be challenging. To address this, we will create a genre classifier that can categorize songs into two or more genres using the Spotify API.

Spotify API can give us the information of the song such as song name, artist, duration, and the "audio features" also. The "Audio Features" are metrics that created by spotify to relate to music theory for example like Tempo, Energy, Valence, and more. And the "Audio Feature" can tell us about how people percieve the song like its danceability, energy, and more. 

So, the objective of this project is to create a genre classifier that can categorize EDM songs to its SUB Genre like House, Techno, and Progressive House (I could add more genre if I want) based on the audio features of the song.

<h2> 2. ETL Process</h2>
Using the Spotify API to get data from spotify. this can be done easiar by the Spotipy Library.  This library makes it easy to authenticate and access the API endpoints without needing to write every request.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials


In [10]:
client_id = os.environ.get("CLID")
client_secret = os.environ.get("SECRET")

print("Client ID:", client_id)
print("Client Secret:", client_secret)

Client ID: b345cf7b6a354589a849d481c1432848
Client Secret: 918eff9c7274460ca41cb66919dcd2d2


In [12]:
credential_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager = credential_manager)

Data Extraction Process : 

1. A list of artist in EDM Subgenre will be provided, along with the genre
2. The API will find the artist ID to access songs/albums and tracks infos (sp.search())
3. The artist albums that wanted to be found by me will be found (sp.albums()) 
4. For each album, the API will find the tracks (sp.album_tracks())
5. The API will get the audio features of each track (sp.audio_features()), get the album name (sp.artist_album()), artist name (sp.artist()), track name (sp.track()), the genre provided in step 1 and track will be stored in the data
6. The user will have control over how many albums that will be extracted terms that would cause an album to be ignored if they are in its name like "Live" and some other things like that

the following code will be used to get the data from spotify

In [16]:
def artist_tracks(sp, df, artist_name, genre = np.nan, exclude = [], quantity = np.inf, max_no_album=np.inf):
    artist_id = "spotify:artist:" + sp.search(q=artist_name, type="artist")["artists"]["items"][0]["id"]
    # fid the id of the artist that appears on the search results

    track = []
    artist = sp.artist(artist_id) # get the artist data
    artist_albums = sp.artist_albums(artist_id)["items"] # get the artist albums 
    c = 0
    if quantity == 0 : 
        return df
    for album in artist_albums : 
        if album["total_tracks"] > 8 and album["total_trakcs"] < max_no_album and not any(True for item in exclude if item in album["name"]):
            # condition to consider an album as valid to extract
            c+=1 # extracted albums counter
            album_tracks = sp.album_tracks(album["id"])["items"] # get tracks from album
            for tracks in album_tracks :  # iterate over the album tracks
                tracks_data = sp.audio_features(tracks["id"])[0] # get audio features of the tracks
                tracks_data["artist"] = artist["name"] # Storing data from the artist, track and album
                tracks_data["name"] = tracks["name"]
                tracks_data["album"] = album["name"]
                tracks_data["release"] = album["release_date"]
                tracks_data["genre"] = genre
                track.append(tracks_data) # add tracks data to the dict list
        if c >= quantity : # check if the quantity of albums has been reachedQ
            break
        tracks_df = pd.DataFrame(track) # creating a dataframe with the track data
        tracks_df = tracks_df.set_index("id") # trun the track id to index for dataframe
        if (len(df) == 0) :  #if the dataframe is empty, return to created dataframe
            return tracks_df
        else : # if the dataframe is not empty, append the new data to the existing dataframe
            return df.append(tracks_df)



here the list of the artist that will get extracted : 

House = 

Techno = 

Progressive House =