# Classify songs into different genres using Machine Learning
After having an overview of the acoustic signal, their features and their feature extraction process, it is time to utilise our newly developed skill to work on a Machine Learning Problem.
## Objective
In his section, we will try to model a classifier to classify songs into different genres. Let us assume a scenario in which, for some reason, we find a bunch of randomly named MP3 files on our hard disk, which are assumed to contain music. Our task is to sort them according to the music genre into different folders such as jazz, classical, country, pop, rock, and metal.
### Dataset
We will be using the famous GITZAN dataset for our case study. This dataset was used for the well-known paper in genre classification “ Musical genre classification of audio signals “ by G. Tzanetakis and P. Cook in IEEE Transactions on Audio and Speech Processing 2002. The dataset consists of 1000 audio tracks each 30 seconds long. It contains 10 genres namely, blues, classical, country, disco, hiphop, jazz, reggae, rock, metal and pop. Each genre consists of 100 sound clips.

# Music genre classification notebook
## Importing Libraries
**Source:** https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f

In [39]:
# feature extractoring and preprocessing data
import librosa
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
import sys
from PIL import Image #PIL is the Python Imaging Library by Fredrik Lundh and Contributors.
import pathlib
# The pathlib module provides an object oriented approach to handling filesystem paths. 
# The module also provides functionality appropriate for various operating systems. Classes defined in this module are of 
# two types – pure path types and concrete path types. 
# While pure paths can only perform purely computational operations, concrete paths are capable of doing I/O operations too.
# https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
working_dir = pathlib.PureWindowsPath('C:\\Users\\alvar\\Downloads\\genres')
os.chdir(working_dir)

import csv

#Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

#Keras
import keras

import warnings
warnings.filterwarnings('ignore')

## Extracting music and features
### Dataset
We use GTZAN genre collection dataset for classification. 

The dataset consists of 10 genres i.e

Blues
Classical
Country
Disco
Hiphop
Jazz
Metal
Pop
Reggae
Rock
Each genre contains 100 songs. Total dataset: 1000 songs

### Extracting the Spectrogram for every Audio

In [18]:
cmap = plt.get_cmap('inferno')

plt.figure(figsize=(10,10))
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()

for g in genres:
    pathlib.Path(f'img_data/{g}').mkdir(parents=True, exist_ok=True)
#     f-strings provide a way to embed expressions inside string literals, using a minimal syntax. 
#     It should be noted that an f-string is really an expression evaluated at run time, not a constant value. 
#     In Python source code, an f-string 
#     is a literal string, prefixed with 'f', 
#     which contains expressions inside braces. The expressions are replaced with their values.
    for filename in os.listdir(f'./genres/{g}'):
        songname = f'./genres/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=5)
        plt.specgram(y, NFFT=2048, Fs=2, Fc=0, noverlap=128, cmap=cmap, sides='default', mode='default', scale='dB');
        plt.axis('off');
        plt.savefig(f'img_data/{g}/{filename[:-3].replace(".", "")}.png')
        plt.clf() #Clear the current figure


<Figure size 720x720 with 0 Axes>

All the audio files get converted into their respective spectrograms .WE can now easily **extract features** from them.
### Extracting features from Spectrogram
We will extract:

* Mel-frequency cepstral coefficients (MFCC)(20 in number)
* Spectral Centroid,
* Zero Crossing Rate
* Chroma Frequencies
* Spectral Roll-off.

In [40]:
header = 'filename chroma_stft rmse spectral_centroid spectral_bandwidth rolloff zero_crossing_rate'

for i in range(1,21):
    header += f' mfcc{i}'
header += ' label'
header = header.split()

### Writing data to csv file
We write the data to a csv file

In [43]:
file = open('data.csv', 'w', newline='')
with file:
    writer = csv.writer(file)
    writer.writerow(header)
    
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
#genres = ['rock']
for g in genres:
    for filename in os.listdir(f'./genres/{g}'):
        songname = f'./genres/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=30)
        chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
        rmse = librosa.feature.rmse(y=y)
        spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
        spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
        rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
        zcr = librosa.feature.zero_crossing_rate(y)
        mfcc = librosa.feature.mfcc(y=y, sr=sr)
        to_append = f'{filename} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'
        
        for e in mfcc:
            to_append += f' {np.mean(e)}'
        to_append += f' {g}'
        file = open('data.csv', 'a', newline='')
        with file:
            writer = csv.writer(file)
            writer.writerow(to_append.split())

### Analysing the Data in Pandas

In [44]:
data = pd.read_csv('data.csv')
data.head()

Unnamed: 0,filename,chroma_stft,rmse,spectral_centroid,spectral_bandwidth,rolloff,zero_crossing_rate,mfcc1,mfcc2,mfcc3,...,mfcc12,mfcc13,mfcc14,mfcc15,mfcc16,mfcc17,mfcc18,mfcc19,mfcc20,label
0,blues.00000.wav,0.349943,0.130225,1784.420446,2002.650192,3806.485316,0.083066,-113.596742,121.557302,-19.158825,...,8.810668,-3.667367,5.75169,-5.162761,0.750947,-1.691937,-0.409954,-2.300208,1.219928,blues
1,blues.00001.wav,0.340983,0.095918,1529.835316,2038.617579,3548.820207,0.056044,-207.556796,124.006717,8.930562,...,5.376802,-2.239119,4.216963,-6.012273,0.936109,-0.716537,0.293875,-0.287431,0.531573,blues
2,blues.00002.wav,0.363603,0.175573,1552.481958,1747.165985,3040.514948,0.076301,-90.754394,140.459907,-29.109965,...,5.789265,-8.905224,-1.08372,-9.218359,2.455805,-7.726901,-1.815724,-3.433434,-2.226821,blues
3,blues.00003.wav,0.404779,0.141191,1070.119953,1596.333948,2185.028454,0.033309,-199.431144,150.099218,5.647594,...,6.087676,-2.47642,-1.07389,-2.874777,0.780976,-3.316932,0.637981,-0.61969,-3.408233,blues
4,blues.00004.wav,0.30859,0.091563,1835.494603,1748.362448,3580.945013,0.1015,-160.266031,126.1988,-35.605448,...,-2.806385,-6.934122,-7.558619,-9.173552,-4.512166,-5.453538,-0.924162,-4.409333,-11.703781,blues


In [45]:
data.shape

(1000, 28)

In [46]:
# Dropping unneccesary columns
data = data.drop(['filename'],axis=1)

In [49]:
genre_list = data.iloc[:, -1]
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)

### Scaling the Feature columns

In [50]:
scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))
X

array([[-0.35174862, -0.01072298, -0.58330334, ..., -0.23719158,
         0.00761145,  0.60349813],
       [-0.46146578, -0.53326615, -0.93906628, ..., -0.05518978,
         0.5438236 ,  0.42403528],
       [-0.18448399,  0.68001209, -0.90741936, ..., -0.60070707,
        -0.29428464, -0.29511278],
       ...,
       [ 0.65431762, -0.75110651, -0.17418012, ...,  0.76028053,
        -2.73474414, -0.26387449],
       [-0.19983726, -0.71651358, -1.12235633, ...,  0.2717664 ,
        -0.72311185, -0.64936228],
       [-0.25070236, -1.16473892, -0.82782084, ..., -0.12506872,
         0.08171799,  0.58748963]])

### Dividing data into training and Testing set

In [51]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [52]:
len(y_train)

800

In [53]:
len(y_test)

200

In [66]:
X_train[10]

array([-0.53545232,  0.98658127,  0.45295673,  1.15213444,  0.62443893,
       -0.00227625,  0.71007663, -0.06666625,  1.18949389, -2.21302668,
        0.68231981, -0.82446666,  0.59165798, -1.49787199,  0.96551524,
       -0.35347937,  0.02811474, -1.39848204,  0.77350933, -0.96373133,
        0.76171841, -0.77148173,  1.0308406 ,  0.0454997 ,  1.14503488,
        0.49551365])

### Classification with Keras
#### Building our Network

In [67]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_shape=(X_train.shape[1],)))

model.add(layers.Dense(128, activation='relu'))

model.add(layers.Dense(64, activation='relu'))

model.add(layers.Dense(10, activation='softmax'))

In [68]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [69]:
history = model.fit(X_train,
                    y_train,
                    epochs=20,
                    batch_size=128)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [70]:
test_loss, test_acc = model.evaluate(X_test,y_test)



In [71]:
print('test_acc: ',test_acc)

test_acc:  0.665


### Validating our approach
Let's set apart 200 samples in our training data to use as a validation set:

In [72]:
x_val = X_train[:200]
partial_x_train = X_train[200:]

y_val = y_train[:200]
partial_y_train = y_train[200:]

Now let's train our network for 20 epochs:

In [73]:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(partial_x_train,
          partial_y_train,
          epochs=30,
          batch_size=512,
          validation_data=(x_val, y_val))

results = model.evaluate(X_test, y_test)

Train on 600 samples, validate on 200 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [74]:
results

[1.1210992336273193, 0.615]

## Predictions on Test Data

In [190]:
predictions = model.predict(X_test)

In [191]:
predictions[0].shape

(10,)

In [192]:
np.sum(predictions[0])

0.9999998

In [193]:
np.argmax(predictions[0])

7

## Next Steps
Music Genre Classification is one of the many branches of Music Information Retrieval. From here you can perform other tasks on musical data like beat tracking, music generation, recommender systems, track separation and instrument recognition etc. Music analysis is a diverse field and also an interesting one. A music session somehow represents a moment for the user. Finding these moments and describing them is an interesting challenge in the field of Data Science.