## **Genre Classification**

I've chosen to build a genre classifier for my final project! This project has the most readily available data (scraping Spotify using the `spotipy` library) and should be doable within my timeline

## **Problem Statement**

Using features derived from an audio source itself, can a classification model predict the genre of a 30-second audio clip with high enough accuracy to organize new songs into their respective genres?

Classifying genre is important for music distribution and streaming platforms, it also helps listeners find new bands they might like, and it, in turn, helps musicians connect with new audiences.

My goal is to be able to predict genre using **features derived from the audio signal** to classify genre. This could be helpful in building playlists that translate across genre, but maintain a similar sound world for a consistent listening experience.

## **Methods and models**

### Data Collection

Scraping 30-second song samples using `spotipy` - I will start with 5 general genres and dig down into the subgenres of each:

* Classical
 * Baroque
 * Contemporary
* Country
 * Bluegrass
 * Folk
* Rock
 * Classic Rock
 * Metal
* Electronic
 * Trance
 * House
* R & B
 * Soul
 * Funk

### Methods for cleaning and preprocessing data

Using existing techniques gleaned from the [Music information retrieval](https://musicinformationretrieval.com/index.html) community, I'll **normalize** 30-second song samples to an equal volume, then use various methods for extracting features from the music itself. I still have more research to do to solidify exactly what methods I'll be using, but some include:
* `Fast Fourier Transform (fft)` - converting a time-window of an audio source into a snapshot of the frequency spectrum
* `Mel Frequency Cepstral Coefficients (mfcc)` - Still not 100% sure what this does, but it creates bins within a frequency range (from 
* `Noise reduction` - I shouldn't have too much noise since these are songs that have been posted to Spotify
* `Frequency Band Selection/isolation` - Used to try to isolate instruments/sounds
* `Spectral analysis/isolation` - detailed approach for instrument/sound extraction

### Models

Audio signal can be converted into images using the above preprocessing steps. It's increasingly popular to use neural networks to classify audio. Neural nets have proven very effective at classifying bird calls and instruments, so I will use one to classify genre, as well.

My computer is set up to be able to run neural networks, but I still have research to do before I create my first model.

There may already be existing models for genre classification, and if I find those, I'll likely include those in my project to increase the strength of my model.

## **Importing data**

This is my first time using spotipy, so I'm just grabbing random songs to perform some basic EDA, these will not be the genres I eventually use for my model.

In [1]:
import os
import spotipy
from auth import generate_token
import numpy as np
from funcs import *
import pandas as pd
from tqdm import tqdm
import regex
from string import punctuation
import time

In [112]:
metal_playlists = [
                   ('spotify:user:22dpf7epvioqx3sieesnk7uvq' , 'spotify:playlist:2zDq0w1An95BO6MdOPg7sR'),
                   ('spotify:user:nateher0' , 'spotify:playlist:12dwJPPLTUBABbABa01Qsc'),
                   ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWTcqUzwhNmKv'),
                   ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX5J7FIl4q56G'),
                   ('spotify:user:celsum76' , 'spotify:playlist:27gN69ebwiJRtXEboL12Ih'),
                   ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWXNFSTtym834'),
                   ('spotify:user:i0ejynyffy65v7f568eh8y3k6' , 'spotify:playlist:3SqMCb7nx6wSlFkXBE6wD8'),
                   ('spotify:user:evanstuder', 'spotify:playlist:6JpQsEf9FrpDAmhKNWIV3B')
]

In [113]:
rock_playlists = {
    ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWXRqgorJj26U'),
                 ('spotify:user:sonymusicfinland' , 'spotify:playlist:5BygwTQ3OrbiwVsQhXFHMz'),
                 ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX7Ku6cgJPhh5'),
                 ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWWwzidNQX6jx'),
                ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWWzBc3TOlaAV'),
                ('spotify:user:luccyyy' , 'spotify:playlist:5e1bpazQUEHijFhcJobkAp'),
                ('spotify:user:12165290026' , 'spotify:playlist:5RkXZzyPCKrovrl1XF92vo')
}

In [114]:
classical_playlists = {'spotify:user:steelo__407' , 'spotify:playlist:5tXCRZAUKp2uqtmJZNkQxY',
                     'spotify:user:topsify' , 'spotify:playlist:62n7TtrAWY1BeNg54yigFe',
                     'spotify:user:smittysez' , 'spotify:playlist:2BqPf9szRgMit0n0vRJdZ3',
                     'spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DXcN1fAVSf7CR',
                     'spotify:user:1258025883' , 'spotify:playlist:7qvZykTVPjvEX2LCcXoHog'}

In [115]:
rap_playlists = [
                ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX0XUsuxWHRQd'),
                ('spotify:user:katerinavanderen' , 'spotify:playlist:4gdyJJFph3i2oMdpRnCONw'),
                ('spotify:user:42wu5pff089byrz1gagsgddbk' , 'spotify:playlist:6kqwmyEVgvABMR6mbjIVX2'),
                ('spotify:user:q4cz8cjd8gckx1u52rf3r11lf' , 'spotify:playlist:2cnUVlszyv9NoeFfmQglOb'),
                ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX6PKX5dyBKeq')
                ]

In [64]:
bluegrass_playlists = [
#                      ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX0RwLEp3x6W4'),
#                      ('spotify:user:122904536' , 'spotify:playlist:0xsQEqfdhaKWedD265YSkS'),
#                      ('spotify:user:121507465' , 'spotify:playlist:52IXyhbeQJuIEGAGCiyIfK'),
#                      ('spotify:user:joel.chamberlain3,' : 'spotify:playlist:4e1W79GoylSVIROhGPdZES'),
#                      ('spotify:user:6y24urmtnizh9q2osg5j7d73u' , 'spotify:playlist:3HCYaqvPSC1NYldqNd0E79'),
#                       ('spotify:user:1262499258' , 'spotify:playlist:7fAm3STfjNSuUpvDxzb9eJ'),
#                       ('spotify:user:carter.santos' , 'spotify:playlist:5xNCTxGS5uC7qPpN4rmijp'),
#                       ('spotify:user:w1d20uonp4mkidfxrxe6gip1y' , 'spotify:playlist:41306y74XVJ0KD4g9sNCwZ'),
#                       ('spotify:user:sambo235' , 'spotify:playlist:3CGIlYRnPERhzhJ8LiA5iL'),
#                       ('spotify:user:tyduscaladbolg' , 'spotify:playlist:6ELFR9iVl3Bj3baHg6Fvh3'),
#                       ('spotify:user:121210099' , 'spotify:playlist:12R3pyyYc13V0xgmy3o6jb'),
#                      ('spotify:user:oldmtnman', 'spotify:playlist:2iZ1Yi5SZTWanQvsM14kHp'),
#                      ('spotify:user:dorian.liao' , 'spotify:playlist:4p8MzdlPzePZ7G7VVLo40T')
]

In [129]:
rnb_playlists = [
#     ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWYmmr74INQlb',)
#                ('spotify:user:paulapooh' : 'spotify:playlist:56MTyW6qYrCrVLidmREddO'),
#                ('spotify:user:h4cv0w4529u3y9ylmpq4nc65c' , 'spotify:playlist:72NjyM9mYLTnbNGCMw3tL5'),
#                ('spotify:user:42wu5pff089byrz1gagsgddbk' , 'spotify:playlist:2E8Wt4GejkH47w3curReBP'),
#                ('spotify:user:uin5isqodxt8b078auhck708k' , 'spotify:playlist:4fvRHwyk2SW9bfxYMupBE7'),
#                ('spotify:user:2us2lww0gyq6nl2bfzxdzl2i4' , 'spotify:playlist:1kMyrcNKPws587cSAOjyDP'),
#                ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DX4CB6zI8FWXS'),
#     ('spotify:user:spotify' , 'spotify:playlist:37i9dQZF1DWWzBc3TOlaAV'),
#     ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DX0QKpU3cGsyb'),
#     ('spotify:user:n0q0uqysqdf0jeljsx8qnn0r9' , 'spotify:playlist:4WdOOas9UoL3XjHEs57BTX'),
#       ('spotify:user:princessqueroda' , 'spotify:playlist:5s9HaeFd5O2coKQ5YppY1d'),
#       ('spotify:user:karolrodis' , 'spotify:playlist:7rIUjHHI7hlbpXG7VSqfg5'),
#     ('spotify:user:fkn.jairo' , 'spotify:playlist:4BnpLG8fcq10UO4M179uLF'),
#     ('spotify:user:jr8cdrlna5kjb6s7v36gna8sx' , 'spotify:playlist:1fb2kUcysJ6BvfFYbUPJ3h'),
#     ('spotify:user:uin5isqodxt8b078auhck708k' , 'spotify:playlist:4fvRHwyk2SW9bfxYMupBE7'),
#     ('spotify:user:alaskiantemple100' , 'spotify:playlist:1cqX4l5c5qiRbaYoyk0hOJ'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DWXnexX7CktaI'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DWVEvzGeX3eRs'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DX2UgsUIg75Vg'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DXaXDsfv6nvZ5'),
#     ('spotify:user:spotify' ,'spotify:playlist:37i9dQZF1DX9zR5aXbFFRA'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWZKEBMCmjsXt'),
    ('spotify:user:spotify', 'spotify:playlist:37i9dQZF1DWUbo613Z2iWO'),
    ('spotify:user:biu2dkmr1v2ojx2z2fwtr24ll', 'spotify:playlist:6y37llELgriD8YWdBlqA0m'),
    ('spotify:user:m2ytuqr5lwp6n0s4hdd47bty1', 'spotify:playlist:1xOXYTNpTiknAl9Rq5Y2aO')
    
]

In [130]:
playlists = [
#     metal_playlists,
#              rock_playlists, 
#              classical_playlists, 
#              rap_playlists, 
#              bluegrass_playlists, 
             rnb_playlists]

In [131]:
# token = generate_token()
# spotify = spotipy.Spotify(auth=token) # Authorization token
# results = spotify.user_playlist(user='spotify:user:spotify', 
#                                 playlist_id='spotify:playlist:37i9dQZF1DX0QKpU3cGsyb')

In [132]:
# results = results['tracks']
# tracks = results['items']
# tracks[0]['track']['id']

In [133]:
for playlist in playlists:
    for user, playlist_id in playlist:
        playlist_to_genres(user, playlist_id)

100%|██████████| 31/31 [00:02<00:00, 12.50it/s]
100%|██████████| 21/21 [00:01<00:00, 11.73it/s]
100%|██████████| 25/25 [00:02<00:00, 12.03it/s]
100%|██████████| 100/100 [00:27<00:00,  3.67it/s]
100%|██████████| 13/13 [00:00<00:00, 13.35it/s]
