## **Genre Classification**

I've chosen to build a genre classifier for my final project! This project has the most readily available data (scraping Spotify using the `spotipy` library) and should be doable within my timeline

## **Problem Statement**

Using features derived from an audio source itself, can a classification model predict the genre of a 30-second audio clip with high enough accuracy to organize new songs into their respective genres?

Classifying genre is important for music distribution and streaming platforms, it also helps listeners find new bands they might like, and it, in turn, helps musicians connect with new audiences.

My goal is to be able to predict genre using **features derived from the audio signal** to classify genre. This could be helpful for building playlists algorithmically without needing humans to manually input data, and can help artists assign their music to all the genres that their music fits the description of.

**The metric for success here is accuracy**--any incorrect response is a bad one.

## **Methods and models**

### Data Collection

Scraping 30-second song samples using `spotipy` - I will start with 5 general genres and dig down into the subgenres of each:

* Original 5 Genres:
  * Classical
  * Progressive Bluegrass
  * Rock
  * Rap
  * R & B
 
* Extra 5 Genres:
 * Tropical House
 * Pop
 * Baroque
 * Serialism
 * Hip Hop
 
Note that after my first round of modeling, I was getting about 84% accuracy with the original 5 genres. With classical, **I was getting 95% accuracy.** I included Baroque, Serialism, and Ambient in the additional set of 5 genres to test the suspicion that Classical was easier to predict **because it was quieter** than the other 4 original genres. Baroque and Serialism are subsets of classical music. The **Baroque period is from approximatlely the 1600s-1800s** (there's debate about the blurry start and end, but it's not particularly important here). **Serialism is a subset of classical music emerging in the 20th century**. If my model can continue to predict between classical, baroque, and serialism well given these closely-related genres, I'd be very impressed!

Similarly, I included Tropical House and Folk to try to add genres closely related to the Rap and Progressive Bluegrass genres, respectively. Unfortunately, Folk didn't have enough unique songs to meet my threshold of at least 500 unique songs before sampling.

### Methods for cleaning and preprocessing data

Using existing techniques gleaned from the [Music information retrieval](https://musicinformationretrieval.com/index.html) community, I'll **normalize** 30-second song samples to an equal volume, then use various methods for extracting features from the music itself. I still have more research to do to solidify exactly what methods I'll be using, but some include:
* `Energy and Root Mean Squared Energy (RMSE)` - energy and RMSE are different measurements of loudness, can be measured in windows across a 30-sec range
* `Fast Fourier Transform (fft)` - converting a time-window of an audio source into a snapshot of the frequency spectrum
* `Mel Frequency Cepstral Coefficients (mfcc)` - Creates overlapping bins along the log frequency spectrum and stores the power of each of those bins across windows of time.

### Models

**Support Vector Machines** and **Convolutional Neural Networks** seem to get good results when dealing with audio classification problems. Audio signal can be converted into images using the above preprocessing steps.[1]

## **Importing data**

Using the Spotipy api, I'm going to pull song data from many different spotify playlists across the platform and store them in a dataframe for future use. Spotify also shares its own pre-packaged and human-interpretable audio features derived from the source audio, including things like `loudness`, `danceability`, and `acousticness` [among many others](https://developer.spotify.com/documentation/web-api/reference/tracks/get-several-audio-features/). The way these features are calculated is likely similar to the process I will use, so I'm going to use these features to establish a baseline model.

In [15]:
import os
import spotipy
from auth import generate_token
import numpy as np
from funcs import *
import pandas as pd
from tqdm import tqdm
import regex
from string import punctuation
import time

  from pandas import Panel


In [16]:
# metal_playlists = [
#                    ('spotify:user:22dpf7epvioqx3sieesnk7uvq' , 'spotify:playlist:2zDq0w1An95BO6MdOPg7sR'),
#                    ('spotify:user:nateher0' , 'spotify:playlist:12dwJPPLTUBABbABa01Qsc'),
#                    ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWTcqUzwhNmKv'),
#                    ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX5J7FIl4q56G'),
#                    ('spotify:user:celsum76' , 'spotify:playlist:27gN69ebwiJRtXEboL12Ih'),
#                    ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWXNFSTtym834'),
#                    ('spotify:user:i0ejynyffy65v7f568eh8y3k6' , 'spotify:playlist:3SqMCb7nx6wSlFkXBE6wD8'),
#                    ('spotify:user:evanstuder', 'spotify:playlist:6JpQsEf9FrpDAmhKNWIV3B')
# ]

In [17]:
# rock_playlists = {
#     ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWXRqgorJj26U'),
#                  ('spotify:user:sonymusicfinland' , 'spotify:playlist:5BygwTQ3OrbiwVsQhXFHMz'),
#                  ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX7Ku6cgJPhh5'),
#                  ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWWwzidNQX6jx'),
#                 ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWWzBc3TOlaAV'),
#                 ('spotify:user:luccyyy' , 'spotify:playlist:5e1bpazQUEHijFhcJobkAp'),
#                 ('spotify:user:12165290026' , 'spotify:playlist:5RkXZzyPCKrovrl1XF92vo')
# }

In [18]:
# classical_playlists = {'spotify:user:steelo__407' , 'spotify:playlist:5tXCRZAUKp2uqtmJZNkQxY',
#                      'spotify:user:topsify' , 'spotify:playlist:62n7TtrAWY1BeNg54yigFe',
#                      'spotify:user:smittysez' , 'spotify:playlist:2BqPf9szRgMit0n0vRJdZ3',
#                      'spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DXcN1fAVSf7CR',
#                      'spotify:user:1258025883' , 'spotify:playlist:7qvZykTVPjvEX2LCcXoHog'}

In [19]:
# rap_playlists = [
#                 ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX0XUsuxWHRQd'),
#                 ('spotify:user:katerinavanderen' , 'spotify:playlist:4gdyJJFph3i2oMdpRnCONw'),
#                 ('spotify:user:42wu5pff089byrz1gagsgddbk' , 'spotify:playlist:6kqwmyEVgvABMR6mbjIVX2'),
#                 ('spotify:user:q4cz8cjd8gckx1u52rf3r11lf' , 'spotify:playlist:2cnUVlszyv9NoeFfmQglOb'),
#                 ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX6PKX5dyBKeq')
#                 ]

In [20]:
bluegrass_playlists = [
#                      ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX0RwLEp3x6W4'),
#                      ('spotify:user:122904536' , 'spotify:playlist:0xsQEqfdhaKWedD265YSkS'),
#                      ('spotify:user:121507465' , 'spotify:playlist:52IXyhbeQJuIEGAGCiyIfK'),
#                      ('spotify:user:joel.chamberlain3,' : 'spotify:playlist:4e1W79GoylSVIROhGPdZES'),
#                      ('spotify:user:6y24urmtnizh9q2osg5j7d73u' , 'spotify:playlist:3HCYaqvPSC1NYldqNd0E79'),
#                       ('spotify:user:1262499258' , 'spotify:playlist:7fAm3STfjNSuUpvDxzb9eJ'),
#                       ('spotify:user:carter.santos' , 'spotify:playlist:5xNCTxGS5uC7qPpN4rmijp'),
#                       ('spotify:user:w1d20uonp4mkidfxrxe6gip1y' , 'spotify:playlist:41306y74XVJ0KD4g9sNCwZ'),
#                       ('spotify:user:sambo235' , 'spotify:playlist:3CGIlYRnPERhzhJ8LiA5iL'),
#                       ('spotify:user:tyduscaladbolg' , 'spotify:playlist:6ELFR9iVl3Bj3baHg6Fvh3'),
#                       ('spotify:user:121210099' , 'spotify:playlist:12R3pyyYc13V0xgmy3o6jb'),
#                      ('spotify:user:oldmtnman', 'spotify:playlist:2iZ1Yi5SZTWanQvsM14kHp'),
#                      ('spotify:user:dorian.liao' , 'spotify:playlist:4p8MzdlPzePZ7G7VVLo40T')
]

In [21]:
rnb_playlists = [
#     ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWYmmr74INQlb')
#                ('spotify:user:paulapooh' : 'spotify:playlist:56MTyW6qYrCrVLidmREddO'),
#                ('spotify:user:h4cv0w4529u3y9ylmpq4nc65c' , 'spotify:playlist:72NjyM9mYLTnbNGCMw3tL5'),
#                ('spotify:user:42wu5pff089byrz1gagsgddbk' , 'spotify:playlist:2E8Wt4GejkH47w3curReBP'),
#                ('spotify:user:uin5isqodxt8b078auhck708k' , 'spotify:playlist:4fvRHwyk2SW9bfxYMupBE7'),
#                ('spotify:user:2us2lww0gyq6nl2bfzxdzl2i4' , 'spotify:playlist:1kMyrcNKPws587cSAOjyDP'),
#                ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX4CB6zI8FWXS'),
#     ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWWzBc3TOlaAV'),
#     ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX0QKpU3cGsyb'),
#     ('spotify:user:n0q0uqysqdf0jeljsx8qnn0r9' , 'spotify:playlist:4WdOOas9UoL3XjHEs57BTX'),
#       ('spotify:user:princessqueroda' , 'spotify:playlist:5s9HaeFd5O2coKQ5YppY1d'),
#       ('spotify:user:karolrodis' , 'spotify:playlist:7rIUjHHI7hlbpXG7VSqfg5'),
#     ('spotify:user:fkn.jairo' , 'spotify:playlist:4BnpLG8fcq10UO4M179uLF'),
#     ('spotify:user:jr8cdrlna5kjb6s7v36gna8sx' , 'spotify:playlist:1fb2kUcysJ6BvfFYbUPJ3h'),
#     ('spotify:user:uin5isqodxt8b078auhck708k' , 'spotify:playlist:4fvRHwyk2SW9bfxYMupBE7'),
#     ('spotify:user:alaskiantemple100' , 'spotify:playlist:1cqX4l5c5qiRbaYoyk0hOJ'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DWXnexX7CktaI'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DWVEvzGeX3eRs'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DX2UgsUIg75Vg'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DXaXDsfv6nvZ5'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DX9zR5aXbFFRA'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWZKEBMCmjsXt'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWUbo613Z2iWO'),
    ('spotify:user:biu2dkmr1v2ojx2z2fwtr24ll', 'spotify:playlist:6y37llELgriD8YWdBlqA0m'),
    ('spotify:user:m2ytuqr5lwp6n0s4hdd47bty1', 'spotify:playlist:1xOXYTNpTiknAl9Rq5Y2aO')
    
]

In [22]:
baroque_playlists = [
    ('spotify:user:sonyclassicalandjazzsweden', 'spotify:playlist:4DvteColbVCrs7iIgc4r6x'),
    ('spotify:user:vsqhzd4nqeprs9vtl7pvtsa58', 'spotify:playlist:2MsgVhkocgCM5L5a6yS70n'),
    ('spotify:user:halidon', 'spotify:playlist:2xwP2mUA0QRT5TwMEkBvtH'),
    ('spotify:user:redzeno52', 'spotify:playlist:7slyBmxiW9t0Fl9rm2E00c')    
]

In [34]:
house_playlists = [
#     ('spotify:user:topsify', 'spotify:playlist:2otQLmbi8QWHjDfq3eL0DC'),
#     ('spotify:user:selectedbase', 'spotify:playlist:6vDGVr652ztNWKZuHvsFvx'),
#     ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX2TRYkJECvfC'),
#     ('spotify:user:chillyourmind', 'spotify:playlist:7wDZ5nB0Wb1tcoloILplN8'),
#     ('spotify:user:11887295', 'spotify:playlist:210GNuboojT87jL85tgMzT'),
#     ('spotify:user:86i98g8zh2722dm6sllzia2ue', 'spotify:playlist:6XYYZvwTr4Fl5MdDCfH64g'),
#     ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DXbXD9pMSZomS'),
#     ('spotify:user:futureofhouse', 'spotify:playlist:7DyH8C8HXh5RzYKKEy2BQI'),
#     ('spotify:user:sonymusicentertainment', 'spotify:playlist:3oRNodhtGLVnZl0Q32FJHB'),
#     ('spotify:user:1119307854', 'spotify:playlist:4k9yqrIc5UyUOSYWFrAkur'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX0AMssoUKCz7'),
    ('spotify:user:12158374076', 'spotify:playlist:1FUdwVcOAkqzWYeHecDhSE'),
    ('spotify:user:bohlinmarcus', 'spotify:playlist:707bUrcQ1qPN0yCVTb4m1J'),
]

In [24]:
folk_playlists = [
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX6z20IXmBjWI'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWVmps5U8gHNv'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX2taNm7KfjOX'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DXaiAJKcabR16'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWSIcimvN18p3'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWWv6MSZULLBi'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWTyjRnMgESue')  
]

In [25]:
ambient_playlists = [
    ('spotify:user:p64dq7tnb8e2yzc45hke20les', 'spotify:playlist:1kqBP6eE24L0agNpnTIKtc'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX3Ogo9pFvBkY'),
    ('spotify:user:nikolaid82', 'spotify:playlist:5NbleROaHyKOZDwJEPm7f5'),
    ('spotify:user:cotter', 'spotify:playlist:0I41QKgHkF8TPUSiUtnL6n'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX0x36cwEyOTG')    
]

In [26]:
serialism_playlists = [
    ('spotify:user:mwt_sqr', 'spotify:playlist:7yFENYAoc1xyUsHKDpD1IS'),
    ('spotify:user:112953089', 'spotify:playlist:5ATv6ZUXYByS57mMXD70h7'),
    ('spotify:user:thesoundsofspotify', 'spotify:playlist:6L5r0Dapop0UDxN5ple8pT'),
    ('spotify:user:mbd16mhwfe5ukzk2gbbbq8e4w', 'spotify:playlist:7mghhD4B90EtSE4Y2vtnZG'),
    ('spotify:user:musicdepartment', 'spotify:playlist:1bITJl6earxOQEwYzMziek')    
]

In [35]:
playlists = [
#     metal_playlists,
#     rock_playlists, 
#     classical_playlists, 
#     rap_playlists, 
#     bluegrass_playlists, 
#     rnb_playlists,
#     baroque_playlists,
    house_playlists,
#     folk_playlists,
#     ambient_playlists,
#     serialism_playlists
]

In [28]:
# token = generate_token()
# spotify = spotipy.Spotify(auth=token) # Authorization token
# results = spotify.user_playlist(user='spotify:user:spotify', 
#                                 playlist_id='spotify:playlist:37i9dQZF1DX0QKpU3cGsyb')

In [29]:
# results = results['tracks']
# tracks = results['items']
# tracks[0]['track']['id']

In [36]:
for playlist in playlists:
    for user, playlist_id in playlist:
        playlist_to_genres(user, playlist_id)

100%|██████████| 100/100 [00:15<00:00,  6.36it/s]
100%|██████████| 100/100 [00:29<00:00,  3.36it/s]
100%|██████████| 92/92 [00:37<00:00,  2.48it/s]
100%|██████████| 100/100 [00:22<00:00,  4.50it/s]
100%|██████████| 100/100 [00:28<00:00,  3.46it/s]
100%|██████████| 100/100 [00:29<00:00,  3.40it/s]
100%|██████████| 100/100 [00:29<00:00,  3.38it/s]
100%|██████████| 100/100 [00:29<00:00,  3.38it/s]
100%|██████████| 100/100 [00:30<00:00,  3.29it/s]
100%|██████████| 100/100 [00:29<00:00,  3.35it/s]
100%|██████████| 100/100 [00:29<00:00,  3.36it/s]
100%|██████████| 100/100 [00:29<00:00,  3.38it/s]
100%|██████████| 92/92 [00:22<00:00,  4.17it/s]
