# Music Genre Classification

### Classifying the genre of a music using deep neural networks.

Music Genre classification is one of the branches of Music Information Retrieval (MIR). A robust recommendation system begins with the categorization of music genres. Sound processing is a huge reaseach area through which we can find solutions to various medical or mental issues through music theraphy solutions. There are various music applications such as Spotify, Google Play, Apple Music, etc., but for implementation, one of the most important steps is to classify the genre of a music which requires audio processing, it is one of the most complex tasks that involves time signal processing, time series, spectrograms, spectral coefficients, and audio feature extraction to feed a neural network.

## Import libraries and Dataset:

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import scipy

In [None]:
import os
import pickle
import librosa
import librosa.display
import IPython.display as ipd
from IPython.display import Audio
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential

In [None]:
#For warnings-

import warnings
warnings.filterwarnings('ignore')

In [None]:
#Reading the csv file-

df = pd.read_csv("C:\\Users\\DELL\\Downloads\\archive\\Data\\features_3_sec.csv")
df2 = pd.read_csv("C:\\Users\\DELL\\Downloads\\archive\\Data\\features_30_sec.csv")

In [None]:
df

In [None]:
df2

#### Termology-

1. filename: The name or identifier of the audio file.
2. length: The duration or length of the audio file in seconds.
3. chroma_stft_mean: The mean value of the chroma feature computed using the short-time Fourier transform (STFT). It represents the distribution of pitches or musical notes in the audio.
4. chroma_stft_var: The variance of the chroma feature.
5. rms_mean: The root mean square (RMS) value of the audio signal, which represents the overall amplitude or loudness.
6. rms_var: The variance of the RMS value.
7. spectral_centroid_mean: The mean frequency value weighted by the magnitude spectrum. It represents the center of mass of the spectrum and provides information about the brightness of the sound.
8. spectral_centroid_var: The variance of the spectral centroid.
9. spectral_bandwidth_mean: The mean bandwidth of the magnitude spectrum. It represents the range of frequencies in the signal.
10. spectral_bandwidth_var: The variance of the spectral bandwidth.
11. rolloff_mean: The mean frequency value below which a specified percentage of the total spectral energy lies.
12. rolloff_var: The variance of the spectral rolloff.
13. zero_crossing_rate_mean: The rate of sign-changes of the audio signal. It represents the frequency content or noisiness of the signal.
14. zero_crossing_rate_var: The variance of the zero-crossing rate.
15. harmony_mean: The mean value of the harmonic component of the audio signal.
16. harmony_var: The variance of the harmonic component.
17. perceptr_mean: The mean value of the perceptual spread of the audio signal.
18. perceptr_var: The variance of the perceptual spread.
19. tempo: The estimated tempo or beats per minute (BPM) of the audio.
20. mfcc1_mean to mfcc20_var: Mel-frequency cepstral coefficients (MFCCs) are commonly used features in audio signal processing. These terms represent the mean and variance values of the 20 MFCCs.
21. label: The class label or category assigned to the audio file.

## EDA:

In [None]:
#Checking Top 5 Reccords-

df.head()

In [None]:
#Checking last 5 records-

df.tail()

In [None]:
df.shape

In [None]:
list[df.columns]

In [None]:
#Getting familiar with the structure of the dataset-

df.info()

In [None]:
df.dtypes

In [None]:
#To know more about the dataset with transpose – here T is for the transpose

df.describe().T

In [None]:
df.label.describe()

In [None]:
#Skewness-

df.skew()

#### Skewness is a measurement of the distortion of symmetrical distribution or asymmetry in a data set. Skewness is demonstrated on a bell curve when data points are not distributed symmetrically to the left and right sides of the median on a bell curve.

In [None]:
df.kurtosis()

#### Kurtosis is a measure of the tailedness of a distribution. Tailedness is how often outliers occur. Excess kurtosis is the tailedness of a distribution relative to a normal distribution. Distributions with medium kurtosis (medium tails) are mesokurtic. Distributions with low kurtosis (thin tails) are platykurtic.

In [None]:
#Checking Duplicates- 

df.duplicated().sum()

In [None]:
#let’s check that if our dataset have null values or not

df.isnull().head(10)

In [None]:
df.isnull().sum()

In [None]:
#Missing value heatmap-

sns.heatmap(df.isnull(),cbar=False,cmap='Pastel1')

In [None]:
df.label.value_counts()

In [None]:
#Bar Graph-

plt.figure(figsize=(8, 6))
df['label'] = df['label'].astype('category')
plt.title('Labels')
sns.countplot(data=df, x='label')
plt.show()

In [None]:
#Scatter and density plots-

def plotScatterMatrix(df, plotSize, textSize):
    df = df.select_dtypes(include =[np.number]) # keep only numerical columns
    # Remove rows and columns that would lead to df being singular
    df = df.dropna('columns')
    df = df[[col for col in df if df[col].nunique() > 1]] # keep columns where there are more than 1 unique values
    columnNames = list(df)
    if len(columnNames) > 10: # reduce the number of columns for matrix inversion of kernel density plots
        columnNames = columnNames[:10]
    df = df[columnNames]
    ax = pd.plotting.scatter_matrix(df, alpha=0.75, figsize=[plotSize, plotSize], diagonal='kde')
    corrs = df.corr().values
    for i, j in zip(*plt.np.triu_indices_from(ax, k = 1)):
        ax[i, j].annotate('Corr. coef = %.3f' % corrs[i, j], (0.8, 0.2), xycoords='axes fraction', 
                          ha='center', va='center', size=textSize)
    plt.suptitle('Scatter and Density Plot')
    plt.show()
    
plotScatterMatrix(df, 20, 10)    

In [None]:
#Correlation-

df.corr()

In [None]:
corrmat = df.corr()
f, ax = plt.subplots(figsize = (12, 9))
sns.heatmap(corrmat, vmax = 1, square = True)

In [None]:
#Correlation Detection-

correlation_mat = df.corr().abs()

mask = np.triu(np.ones_like(correlation_mat, dtype = np.bool_))
f, ax = plt.subplots(figsize = (40, 40))
cmap = sns.diverging_palette(255, 0, as_cmap = True)
sns.heatmap(correlation_mat, mask = mask, cmap = cmap,\
vmax = None,center = 0, square = True, annot = True, \
linewidths = .5, cbar_kws = {"shrink": 0.9})

#Print out almost perfect correlated features-

upper_triangle = correlation_mat.where(np.triu(np.ones\
(correlation_mat.shape),k = 1).astype(np.bool_))

## Audio files:

In [None]:
#Loading a sample audio from the dataset-

audio = "C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\rock\\rock.00091.wav"

In [None]:
df ,sr = librosa.load(audio)
print(type(df),type(sr))

In [None]:
#Initializing sample rate to 45600 we obtain the signal value array-

librosa.load(audio,sr=45600)

In [None]:
#Taking Short-time Fourier transform of the signal-

y = librosa.stft(df)  
S_db = librosa.amplitude_to_db(np.abs(y), ref=np.max)

In [None]:
#Playing audio file-

import IPython
IPython.display.Audio(df,rate=sr)

In [None]:
#Wave form of the audio-

plt.figure(figsize=(10,4))
librosa.display.waveshow(df, color="#2B4F72", alpha = 0.5)
plt.show()

In [None]:
#Spectrogram of the audio-

stft=librosa.stft(df)
stft_db=librosa.amplitude_to_db(abs(stft))
plt.figure(figsize=(7,6))
librosa.display.specshow(stft_db,sr=sr,x_axis='time',y_axis='hz')
plt.colorbar()

#### A spectrogram is a visual representation of the signal loudness of a signal over time at different frequencies included in a certain waveform. We can examine increase or decrease of energy over period of time. Spectrograms are also known as sonographs, voiceprints, and voicegrams. We can also know how energy levels change over time period.

### 2.1 Data Preprocessing:

#### Extracting Audio features-
The process of extraction of features from the data to utilize them for analysis is known as feature extraction. Each audio signal consists of various audio features however we must extract features that are relevant to the problem that we are solving. Here are some features listed which are used in our project.

#### Spectral roll off-
It computes the rolloff frequency for each frame in a given signal. The frequency under which some percentage (cutoff) of the total energy of a spectrum is obtained. It can be used to differentiate between the harmonic and noisy sounds. Spectral Roll off

In [None]:
spectral_rolloff=librosa.feature.spectral_rolloff(y=df,sr=sr)[0]
plt.figure(figsize=(7,6))
librosa.display.waveshow(df,sr=sr,alpha=0.4,color="#FF5858")

#### Chroma feature-
It closely relates with the twelve different pitch classes. Chroma based features are also called as pitch class profiles. It is the powerful tool for analyzing and categorizing them. Harmonic and melodic characteristics of music are captured by them. Chroma featue

In [None]:
import librosa.display as lplt

In [None]:
chroma = librosa.feature.chroma_stft(y=df,sr=sr)
plt.figure(figsize=(7,4))
lplt.specshow(chroma,sr=sr,x_axis="time",y_axis="chroma",cmap="BuPu")
plt.colorbar()
plt.title("Chroma Features")
plt.show()

#### Zero Crossing Rate-
It is the rate at which a signal transitions from positive to zero to negative or from negative to zero or simply said the number of times the signal crosses x-axis is as the zero-crossing rate (ZCR).

In [None]:
start=1000
end=1200
plt.figure(figsize=(12,4))
plt.plot(df[start:end],color="#2B4F72")

In [None]:
#Printing the number of times signal crosses the x-axis-

zero_cross_rate=librosa.zero_crossings(df[start:end],pad=False)
print("The number of zero_crossings are :", sum(zero_cross_rate))

### 2.2 EDA OF Audio-

Vizualizing the audio files, wave plots and spectrograms for all the 10 genre classes

#### 1. BLUES

In [None]:
audio1= 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\blues\\blues.00001.wav'
df, sr = librosa.load(audio1)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(df, sr=sr,alpha=0.4,)
plt.title('Waveplot - BLUES')

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - BLUES')
plt.colorbar(format='%+2.0f dB');

In [None]:
#playing audio-

ipd.Audio(audio1) 

#### 2. CLASSICAL 

In [None]:
audio2= 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\classical\\classical.00001.wav'
data, sr = librosa.load(audio2)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(df, sr=sr,alpha=0.4)
plt.title('Waveplot - CLASSICAL') 

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - CLASSICAL')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio2) 

#### 3. COUNTRY-

In [None]:
audio3 = 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\country\\country.00001.wav'
data, sr = librosa.load(audio3)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(df, sr=sr,alpha=0.4)
plt.title('Waveplot - COUNTRY') 

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - COUNTRY')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio3) 

#### 4. DISCO-

In [None]:
audio4= 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\disco\\disco.00036.wav'
data, sr = librosa.load(audio4)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(df, sr=sr,alpha=0.4)
plt.title('Waveplot - DISCO') 

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - DISCO')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio4) 

#### 5. HIPHOP-

In [None]:
audio5 = 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\hiphop\\hiphop.00001.wav'
data, sr = librosa.load(audio5)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(df, sr=sr,alpha=0.4)
plt.title('Waveplot - DISCO') 

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - HIPHOP')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio5) 

#### 6. JAZZ-

In [None]:
audio6= 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\jazz\\jazz.00001.wav'
data, sr = librosa.load(audio6)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(data, sr=sr,alpha=0.4)
plt.title('Waveplot - JAZZ')

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - JAZZ')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio6) 

#### 7. METAL-

In [None]:
audio7= 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\metal\\metal.00024.wav'
data, sr = librosa.load(audio7)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(data, sr=sr,alpha=0.4)
plt.title('Waveplot - METAL')

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - METAL')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio7) 

#### 8. POP-

In [None]:
audio8= 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\pop\\pop.00028.wav'
data, sr = librosa.load(audio8)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(data, sr=sr,alpha=0.4)
plt.title('Waveplot - METAL')

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - POP')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio8) 

#### 9. REGGAE-

In [None]:
audio9 = 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\reggae\\reggae.00030.wav'
data, sr = librosa.load(audio9)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(data, sr=sr,alpha=0.4)
plt.title('Waveplot - REGGAE')

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - REGGAE')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio9) 

#### 10. ROCK-

In [None]:
audio10 = 'C:\\Users\\DELL\\Downloads\\archive\\Data\\genres_original\\rock\\rock.00032.wav'
data, sr = librosa.load(audio10)
plt.figure(figsize=(7, 3))
librosa.display.waveshow(data, sr=sr,alpha=0.4)
plt.title('Waveplot - ROCK')

In [None]:
#Creating log mel spectrogram-

plt.figure(figsize=(7, 4))
spectrogram = librosa.feature.melspectrogram(y=df, sr=sr, n_mels=128,fmax=8000) 
spectrogram = librosa.power_to_db(spectrogram)
librosa.display.specshow(spectrogram, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram - ROCK')
plt.colorbar(format='%+2.0f dB')

In [None]:
#Playing audio-

ipd.Audio(audio10) 

## Label Encoding and Scaling:

#### It encodes the categorical classes with numerical integer values for training.

Blues - 0<br>
Classical - 1<br>
Country - 2<br>
Disco - 3<br>
Hip-hop - 4<br> 
Jazz - 5 <br> 
Metal - 6<br> 
Pop - 7<br>
Reggae - 8<br>
Rock - 9<br>

In [None]:
class_encod = df.iloc[:,-1]
converter = LabelEncoder()
y = converter.fit_transform(class_encod)
y

In [None]:
#Drop the column filename as it is no longer required for training- 

df=df.drop(labels="filename",axis=1)

In [None]:
df.head()

In [None]:
#scaling-

from sklearn.preprocessing import StandardScaler
fit=StandardScaler()
X=fit.fit_transform(np.array(df.iloc[:,:-1],dtype=float))

## Building the Model:

In [None]:
#Splitting 70% data into training set and the remaining 30% to test set-

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)

In [None]:
#Test data size-

len(y_test)

In [None]:
#Size of training data-

len(y_train)

## K-Nearest Neighbors (KNN):

KNN is a fundamental Machine learning algorithm that is most commonly used among all kinds of problems. It classifies the data points based on the point that is near them by finding the euclidians distance given by d = ((x2-x1)^2 - (y2-y1)^2)^1/2 as a metric.

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
clf1=KNeighborsClassifier(n_neighbors=3)

In [None]:
clf1.fit(X_train,y_train)
y_pred=clf1.predict(X_test)

In [None]:
print("Training set score: {:.3f}".format(clf1.score(X_train, y_train)))
print("Test set score: {:.3f}".format(clf1.score(X_test, y_test)))

In [None]:
cf_matrix = confusion_matrix(y_test, y_pred)
sns.set(rc = {'figure.figsize':(8,3)})
sns.heatmap(cf_matrix, annot=True)
print(classification_report(y_test,y_pred))

## Support Vector Machine (SVM):

SVM is one of the best machine learning models. Since the data is not linearly separable, we have used the SVM kernel function as sigmoid. The sigmoid function is given by K(yn,yi) = tanh(-gamma*(yn,yi)+r)

In [None]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf', degree=8)

In [None]:
svclassifier.fit(X_train, y_train)
print("Training set score: {:.3f}".format(svclassifier.score(X_train, y_train)))
print("Test set score: {:.3f}".format(svclassifier.score(X_test, y_test)))

In [None]:
y_pred = svclassifier.predict(X_test)
cf_matrix3 = confusion_matrix(y_test, y_pred)
sns.set(rc = {'figure.figsize':(9,4)})
sns.heatmap(cf_matrix3, annot=True)
print(classification_report(y_test, y_pred))

## Convolutional Neural Networks (CNN):

Using neural networks is the best way to classify huge data to draw predictions. Convolutions can solve the given problem very precisely and the algorithm has already been used most widely in classifying the image data.

In [None]:
def train_model(model,epochs,optimizer):
    batch_size=256
    model.compile(optimizer=optimizer,loss='sparse_categorical_crossentropy',metrics='accuracy')
    return model.fit(X_train,y_train,validation_data=(X_test,y_test),epochs=epochs,batch_size=batch_size)

In [None]:
def Validation_plot(history):
    print("Validation Accuracy",max(history.history["val_accuracy"]))
    pd.DataFrame(history.history).plot(figsize=(12,6))
    plt.show()

Keras is the high-level API of TensorFlow 2: an approachable, highly-productive interface for solving machine learning problems, with a focus on modern deep learning. It provides essential abstractions and building blocks for developing and shipping machine learning solutions with high iteration velocity.

In [None]:
model=tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(X.shape[1],)),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(512,activation='relu'),
    keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(256,activation='relu'),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(64,activation='relu'),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(32,activation='relu'),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(10,activation='softmax'),
])

optimizer = tf.keras.optimizers.Adam(learning_rate=0.000146)
model.compile(optimizer=optimizer,
             loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
model.summary()
model_history=train_model(model=model,epochs=500,optimizer='adam')

In [None]:
#The plot dipicts how training and testing data performed-

Validation_plot(model_history)

In [None]:
#Sample testing-

sample = X_test
sample = sample[np.newaxis, ...]
prediction = model.predict(X_test)
predicted_index = np.argmax(prediction, axis = 1)
print("Expected Index: {}, Predicted Index: {}".format(y_test, predicted_index))

In [None]:
#Plotting the confusion matrix for analizing the true positives and negatives-

import seaborn as sn
import matplotlib.pyplot as plt
pred_x = model.predict(X_test)
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test,predicted_index )
cm

### Conclusion:
As expected CNN outperformed KNN and SVM. It produced best results in both testing and taring data. As we increased the number of epochs the loss percentage decreased with a gradual increase in accuracy scores. It can be clearly seen in the above validation plot in which the curves almost coincided with each other.