<a href="https://colab.research.google.com/gist/parulnith/7f8c174e6ac099e86f0495d3d9a4c01e/untitled9.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Music genre classification notebook

## Importing Libraries

In [1]:
# feature extractoring and preprocessing data
import librosa
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
from PIL import Image
import pathlib
import csv

# Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

#Keras
import keras

import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.


## Extracting music and features

### Dataset

We use [GTZAN genre collection](http://marsyasweb.appspot.com/download/data_sets/) dataset for classification. 
<br>
<br>
The dataset consists of 10 genres i.e
 * Blues
 * Classical
 * Country
 * Disco
 * Hiphop
 * Jazz
 * Metal
 * Pop
 * Reggae
 * Rock
 
Each genre contains 100 songs. Total dataset: 1000 songs

## Extracting the Spectrogram for every Audio

In [2]:
cmap = plt.get_cmap('inferno')

plt.figure(figsize=(10,10))
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
for g in genres:
    pathlib.Path(f'img_data/{g}').mkdir(parents=True, exist_ok=True)     
    for filename in os.listdir(f'./Music_GTZAN/genres/{g}'):
        songname = f'./Music_GTZAN/genres/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=5)
        plt.specgram(y, NFFT=2048, Fs=2, Fc=0, noverlap=128, cmap=cmap, sides='default', mode='default', scale='dB');
        plt.axis('off');
        plt.savefig(f'img_data/{g}/{filename[:-3].replace(".", "")}.png')
        plt.clf()
 

<Figure size 720x720 with 0 Axes>

All the audio files get converted into their respective spectrograms .WE can noe easily extract features from them.

## Extracting features from Spectrogram


We will extract

* Mel-frequency cepstral coefficients (MFCC)(20 in number)
* Spectral Centroid,
* Zero Crossing Rate
* Chroma Frequencies
* Spectral Roll-off.

In [3]:
header = 'filename chroma_stft rmse spectral_centroid spectral_bandwidth rolloff zero_crossing_rate'
for i in range(1, 21):
    header += f' mfcc{i}'
header += ' label'
header = header.split()

## Writing data to csv file

We write the data to a csv file 

In [6]:
file = open('data_MYMUSIC.csv', 'w', newline='')
with file:
    writer = csv.writer(file)
    writer.writerow(header)
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
for g in genres:
    for filename in os.listdir(f'./Music_GTZAN/genres/{g}'):
        songname = f'./Music_GTZAN/genres/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=30)
        chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
        spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
        spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
        rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
        zcr = librosa.feature.zero_crossing_rate(y)
        mfcc = librosa.feature.mfcc(y=y, sr=sr)
        #to_append = f'{filename} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'    
        to_append = f'{filename} {np.mean(chroma_stft)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'    
        for e in mfcc:
            to_append += f' {np.mean(e)}'
        to_append += f' {g}'
        file = open('data_MYMUSIC.csv', 'a', newline='')
        with file:
            writer = csv.writer(file)
            writer.writerow(to_append.split())

The data has been extracted into a [data.csv](https://github.com/parulnith/Music-Genre-Classification-with-Python/blob/master/data.csv) file.

# Analysing the Data in Pandas

In [7]:
data = pd.read_csv('data_MYMUSIC.csv')
data.head()

Unnamed: 0,filename,chroma_stft,rmse,spectral_centroid,spectral_bandwidth,rolloff,zero_crossing_rate,mfcc1,mfcc2,mfcc3,...,mfcc12,mfcc13,mfcc14,mfcc15,mfcc16,mfcc17,mfcc18,mfcc19,mfcc20,label
0,blues.00000.au,0.349943,1784.420446,2002.650192,3806.485316,0.083066,-113.596748,121.557297,-19.158825,42.351032,...,-3.667369,5.75169,-5.162763,0.750948,-1.691938,-0.409953,-2.300209,1.219929,blues,
1,blues.00001.au,0.340983,1529.835316,2038.617579,3548.820207,0.056044,-207.556793,124.006721,8.93056,35.874687,...,-2.23912,4.216963,-6.012273,0.93611,-0.716537,0.293876,-0.287431,0.531574,blues,
2,blues.00002.au,0.363603,1552.481958,1747.165985,3040.514948,0.076301,-90.754387,140.459915,-29.109968,31.689013,...,-8.905224,-1.08372,-9.21836,2.455806,-7.726901,-1.815723,-3.433434,-2.226821,blues,
3,blues.00003.au,0.404779,1070.119953,1596.333948,2185.028454,0.033309,-199.431152,150.099213,5.647593,26.871927,...,-2.476421,-1.07389,-2.874778,0.780977,-3.316932,0.637981,-0.61969,-3.408233,blues,
4,blues.00004.au,0.30859,1835.494603,1748.362448,3580.945013,0.1015,-160.266037,126.198807,-35.60545,22.153301,...,-6.934123,-7.558618,-9.173553,-4.512165,-5.453538,-0.924161,-4.409333,-11.70378,blues,


In [8]:
data.shape

(960, 28)

In [9]:
# Dropping unneccesary columns
data = data.drop(['filename'],axis=1)

## Encoding the Labels

In [24]:
genre_list = data.iloc[:, -2] #-1
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)

In [25]:
print(y)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4
 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 4 4 4 4 4 4 4 4 4 4 4 4 

## Scaling the Feature columns

In [12]:
scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-2], dtype = float))  #-1

In [20]:
print(X)

[[-0.34809486 -0.57181952 -0.45041864 ... -0.22257026 -0.03193404
   0.59571367]
 [-0.45620808 -0.92220568 -0.38308858 ... -0.04126695  0.5142383
   0.41833093]
 [-0.18327533 -0.89103705 -0.92867878 ... -0.58469081 -0.3394376
  -0.29248134]
 ...
 [ 0.13631436  1.7742102   1.86913117 ... -0.41816317 -0.03417004
  -0.28903263]
 [-0.93754092  0.32687711  1.02314642 ...  1.19576383  0.62602022
   0.71284742]
 [-0.47455628  0.30923015  1.28053478 ... -1.40288114 -0.09057725
  -0.82636867]]


## Dividing data into training and Testing set

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [14]:
len(y_train)

768

In [15]:
len(y_test)

192

In [16]:
X_train[10]

array([ 0.44504307,  0.14984605,  0.06484586,  0.23676155,  0.3398798 ,
        0.86754038, -0.07830451, -0.43512188,  0.54167302, -0.50801288,
        0.80887264, -1.89731808,  1.5514967 , -0.46989026,  0.37006088,
       -1.39805024,  0.84963478, -0.75350357,  0.40249497, -1.17129768,
        0.51812087, -0.94829384,  0.3104734 , -1.03887614,  1.41969388])

In [21]:
X_train.shape[1]

25

In [23]:
y_test

array([706, 300, 631, 945,  71, 456,  74, 776, 559, 147, 873, 348, 818,
       890, 451, 656, 525, 591, 234, 318, 181, 442, 360, 240, 505, 281,
       206, 219, 697, 913,  69,  78, 132, 804, 297, 378, 294,  51, 951,
       313, 108,  19, 392, 624, 924, 413, 806, 669, 831, 638, 619, 382,
         3, 121, 119, 921, 874, 406,  40, 249, 695, 714, 396, 205, 833,
       342,  83, 716, 795, 380, 900, 418,  23, 194,  36, 271, 535, 644,
       727, 862,   0,  46, 845, 453, 828, 152, 954, 737, 233, 687,  26,
       467, 409, 118, 158, 293,  39, 917, 428, 298, 142, 640, 896, 742,
       710, 302, 295, 675, 812, 516, 791, 698, 586, 353, 252, 165, 512,
       127, 410, 654, 892, 627,  89, 168, 718, 247, 116, 701, 272, 894,
       612, 137, 668, 617, 635, 436, 957, 620, 326,  65, 198, 331, 228,
       800, 568, 630, 330, 914, 754,  67, 807, 633, 783, 202, 869, 550,
       577, 242, 478, 239, 802, 243, 390, 260, 328, 712, 393, 950, 779,
       253, 946, 493, 927, 958, 888,  47, 411, 940, 230, 518, 55

# Classification with Keras

## Building our Network

In [17]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_shape=(X_train.shape[1],)))

model.add(layers.Dense(128, activation='relu'))

model.add(layers.Dense(64, activation='relu'))

model.add(layers.Dense(10, activation='softmax'))

In [18]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [19]:
history = model.fit(X_train,
                    y_train,
                    epochs=20,
                    batch_size=128)
                   

Epoch 1/20


InvalidArgumentError:  Received a label value of 959 which is outside the valid range of [0, 10).  Label values: 899 825 549 887 431 758 56 772 696 600 719 886 479 598 761 846 212 190 261 648 365 173 856 176 22 762 21 417 385 72 299 596 841 241 759 880 590 681 73 186 296 369 450 461 210 449 642 138 15 948 653 438 543 207 257 28 161 66 401 7 768 920 746 359 18 573 844 556 208 143 734 774 551 751 172 429 229 430 906 92 220 934 236 829 407 292 372 153 871 513 662 376 839 492 255 809 304 959 858 8 863 685 815 678 554 925 43 773 816 567 876 792 38 861 117 311 730 129 944 676 97 458 546 344 358 911 826 262
	 [[node loss/dense_4_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at C:\Users\ana_2\Anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_1136]

Function call stack:
keras_scratch_graph


In [20]:
test_loss, test_acc = model.evaluate(X_test,y_test)



In [21]:
print('test_acc: ',test_acc)

test_acc:  0.68


Tes accuracy is less than training dataa accuracy. This hints at Overfitting

## Validating our approach
Let's set apart 200 samples in our training data to use as a validation set:

In [0]:
x_val = X_train[:200]
partial_x_train = X_train[200:]

y_val = y_train[:200]
partial_y_train = y_train[200:]

Now let's train our network for 20 epochs:

In [37]:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(partial_x_train,
          partial_y_train,
          epochs=30,
          batch_size=512,
          validation_data=(x_val, y_val))
results = model.evaluate(X_test, y_test)

Train on 600 samples, validate on 200 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [38]:
results

[1.2261371064186095, 0.65]

## Predictions on Test Data

In [0]:
predictions = model.predict(X_test)

In [26]:
predictions[0].shape

(10,)

In [27]:
np.sum(predictions[0])

1.0

In [28]:
np.argmax(predictions[0])

8