<a href="https://colab.research.google.com/github/TurkeyBlaster/music-classification/blob/master/Music_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Music Classification
Ananth Madan

This was an idea I got from reading <a href=https://towardsdatascience.com/using-cnns-and-rnns-for-music-genre-recognition-2435fb2ed6af>this article</a> and thought would be an interesting step-up in projects. Prior to this, I had only done tutorial projects and copied others in an attempt to understand the syntax and process of Deep Learning through Keras and Sklearn. *This* project was my first experience of writing code on my own (aided by StackOverflow and the Keras API)

In [0]:
from google.colab import drive

In [0]:
drive.mount('/content/gdrive') # Mounting drive to colab to access data

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
!pip install tensorflow==2.1.0 # Installing tensorflow 2.0 (as per the tensorflow API)

from __future__ import absolute_import, division, print_function, unicode_literals

try:
  import tensorflow.compat.v2 as tf
except Exception:
  pass

tf.enable_v2_behavior()

Collecting tensorflow==2.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/85/d4/c0cd1057b331bc38b65478302114194bd8e1b9c2bbc06e300935c0e93d90/tensorflow-2.1.0-cp36-cp36m-manylinux2010_x86_64.whl (421.8MB)
[K     |████████████████████████████████| 421.8MB 31kB/s 
Collecting tensorboard<2.2.0,>=2.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/40/23/53ffe290341cd0855d595b0a2e7485932f473798af173bbe3a584b99bb06/tensorboard-2.1.0-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 45.4MB/s 
Collecting tensorflow-estimator<2.2.0,>=2.1.0rc0
[?25l  Downloading https://files.pythonhosted.org/packages/18/90/b77c328a1304437ab1310b463e533fa7689f4bfc41549593056d812fab8e/tensorflow_estimator-2.1.0-py2.py3-none-any.whl (448kB)
[K     |████████████████████████████████| 450kB 49.4MB/s 
Collecting google-auth<2,>=1.6.3
[?25l  Downloading https://files.pythonhosted.org/packages/1c/6d/7aae38a9022f982cf8167775c7fc299f203417b698c27080ce09060bba07/googl

In [0]:
!pip install scikit-optimize # Installing scikit-optimize

Collecting scikit-optimize
[?25l  Downloading https://files.pythonhosted.org/packages/cd/ff/4cd204e8ad092d7db2ddf383adc3747166117915a6a47df025c6b727a500/scikit_optimize-0.7.1-py2.py3-none-any.whl (77kB)
[K     |████▎                           | 10kB 19.2MB/s eta 0:00:01[K     |████████▌                       | 20kB 2.2MB/s eta 0:00:01[K     |████████████▊                   | 30kB 2.8MB/s eta 0:00:01[K     |█████████████████               | 40kB 2.0MB/s eta 0:00:01[K     |█████████████████████▏          | 51kB 2.3MB/s eta 0:00:01[K     |█████████████████████████▍      | 61kB 2.7MB/s eta 0:00:01[K     |█████████████████████████████▋  | 71kB 3.0MB/s eta 0:00:01[K     |████████████████████████████████| 81kB 2.8MB/s 
Collecting pyaml
  Downloading https://files.pythonhosted.org/packages/35/1e/eda9fe07f752ced7afcef590e7d74390f0d9c9c0b7ff98317afbaa0697e3/pyaml-19.12.0-py2.py3-none-any.whl
Installing collected packages: pyaml, scikit-optimize
Successfully installed pyaml-19.12

# Imports
Each of the following imports will help in one of four categories:
* Data Preprocessing: Importing and transforming data
* Model Construction: Building the Neural Network
* Model Optimization: Tuning the hyperparameters of the model
* Model Visualization: Displaying the results of the model and history

Each of these categories is vital and are made significantly easier with the following imports:

In [0]:
# Imports
import tensorflow as tf

import tensorflow.keras as keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Dropout, concatenate
from tensorflow.keras.layers import Conv2D, BatchNormalization, Flatten, MaxPool2D
from tensorflow.keras.layers import GRU, Bidirectional, Embedding, Lambda, LayerNormalization

from tensorflow.keras.optimizers import Adam, RMSprop, Nadam
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping

from tensorflow.keras import backend as K
from tensorflow.keras.utils import to_categorical

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

from skopt import gbrt_minimize, gp_minimize
from skopt.utils import use_named_args
from skopt.space import Real, Categorical, Integer

import librosa
import librosa.display

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from collections import deque

# Importing Data
After uploading the data to our drive and mounting it to colab, we can import it. In this case, the data has been compressed as numpy <a href=https://imageio.readthedocs.io/en/stable/format_npz.html>npz</a> files, where each file contain two arrays: the first contains the arrays that describe the spectrograms of each song, and the second contains the genres, encoded as a digit from (1-7). Using numpy, we can easily load and extract the data from the npz files. Furthermore, we can plot the data using librosa and matplotlib to refamiliarize ourselves with what we are working with.

In [0]:
root_path = 'gdrive/My Drive/WWHS AI/Data' # Getting the folder path to the data folder

In [0]:
# Using numpy to load the specified file from the root folder
# as 'train_npz'
with np.load(root_path + '/train_shuffled.npz') as train_npz:

  X_train = train_npz['arr_0'] # The first array becomes the training data
  y_train = train_npz['arr_1'] # The second array becomes the training labels

print(X_train.shape)
print(y_train.shape)

In [0]:
# Same process as the training data
with np.load(root_path + '/valid_shuffled.npz') as valid_npz:

  X_valid = valid_npz['arr_0']
  y_valid = valid_npz['arr_1']

print(X_valid.shape)
print(y_valid.shape)

In [0]:
# Same process as the training data
with np.load(root_path + '/test.npz') as test_npz:

  X_test = test_npz['arr_0']
  y_test = test_npz['arr_1']

print(X_test.shape)
print(y_test.shape)

In [0]:
# Dictionary mapping number to actual genre
reverse_map = {
    
    0: 'Electronic',
    1: 'Experimental',
    2: 'Folk',
    3: 'Hip-Hop',
    4: 'Instrumental',
    5: 'International',
    6: 'Pop',
    7: 'Rock'
}

In [0]:
# To test if our data is still functional, we can replot it
def plot_rand_mels(mels, genres, reverse_map=reverse_map):

  # Matplotlib figure customization
  plt.figure(figsize=(20, 8))
  plt.subplots_adjust(wspace=0.2, hspace=0.35)

  ind = np.random.randint(mels.shape[0]) # Start at a random part in the array
  seeking_genre = 0 # The current genre that is being sought

  # Iterating through the genres
  while seeking_genre < 8:

    if genres[ind] == seeking_genre:

      # Plot the mel-Spectrogram in a respective subplot
      plt.subplot(2, 4, seeking_genre + 1)

      librosa.display.specshow(mels[ind], x_axis='time')
      plt.colorbar(format='%+2.0f dB')
      plt.title(reverse_map[genres[ind]]) # Converting the numbers to their respective genres

      seeking_genre += 1 # Change the genre sought after

    ind += 1

    # Loop to the beginning if we reach the end
    if ind == mels.shape[0]:

      ind = 0

  plt.show()

In [0]:
plot_rand_mels(X_train, y_train)

# Model
The main focus of this project is the actual model, and for the most part, the two categories are Construction and Optimization. In this case, the framework of the model built and then fit to be retroactively tuned by a Bayesian algorithm.

## Construction
This model implements a structure dictated in <a href=https://arxiv.org/abs/1712.08370>this paper</a>.

## Optimization
The model implements Bayesian Tuning (via skopt), with the surrogate functions being a standard Gaussian Process and a Gradient Boosted Regressor Tree algorithm (separately).

In [0]:
# Hyperparameters
num_classes = 8
mel_features = X_train.shape[1]
mel_time = X_train.shape[2]

In [0]:
dims = [
        
        Real(low=1e-4, high=1e1, prior='log-uniform', name='learning_rate'),
        Categorical([Adam, RMSprop, Nadam], name='optimizer'),

        Integer(low=1, high=128, name='batch_size'),
        Integer(10, 60, name='epochs'),

        Integer(5, 8, name='num_conv_layers'),
        Integer(16, 256, name='num_conv_filters_1'),
        Integer(16, 256, name='num_conv_filters_2'),
        Integer(16, 256, name='num_conv_filters_3'),
        Integer(16, 256, name='num_conv_filters_4'),
        Integer(16, 256, name='num_conv_filters_5'),
        Integer(16, 256, name='num_conv_filters_6'),
        Integer(16, 256, name='num_conv_filters_7'),
        Integer(16, 256, name='num_conv_filters_8'),

        Categorical(categories=['relu', 'elu', 'selu'], name='conv_activ_1'),
        Categorical(['relu', 'elu', 'selu'], name='conv_activ_2'),
        Categorical(['relu', 'elu', 'selu'], name='conv_activ_3'),
        Categorical(['relu', 'elu', 'selu'], name='conv_activ_4'),
        Categorical(['relu', 'elu', 'selu'], name='conv_activ_5'),
        Categorical(['relu', 'elu', 'selu'], name='conv_activ_6'),
        Categorical(['relu', 'elu', 'selu'], name='conv_activ_7'),
        Categorical(['relu', 'elu', 'selu'], name='conv_activ_8'),

        Integer(16, 256, name='num_gru_units'),
        Real(0.0, 0.5, name='gru_dropout_rate')
]

default_dims = [
                
                1e-3,
                Adam,

                32,
                15,

                5,
                16,
                32,
                64,
                64,
                64,
                128,
                128,
                256,

                'relu',
                'relu',
                'relu',
                'relu',
                'relu',
                'relu',
                'relu',
                'relu',

                64,
                0.0
]

In [0]:
# Actual Model
def parallel_model(model_input, **params):
  
  # CNN Layer
  conv_input = model_input

  # Pool Sizes for each Variable Size of 5-8
  pool_sizes_dict = {
      
      5: [(2, 2), (2, 2), (2, 2), (4, 4), (4, 4)],
      6: [(2, 2), (2, 2), (2, 2), (2, 2), (2, 2), (4, 4)],
      7: [(2, 2), (2, 2), None, (2, 2), (2, 2), (2, 2), (4, 4)],
      8: [(2, 2), (2, 2), None, (2, 2), (2, 2), (2, 2), (2, 2), (2, 2)],
  }

  num_conv_layers = params['num_conv_layers']
  pool_sizes = pool_sizes_dict[num_conv_layers]
  for i in range(num_conv_layers):
    
    # Conv2D Layer
    conv_input = Conv2D(
        filters = params[f'num_conv_filters_{i + 1}'],
        kernel_size = (1, 3),
        activation = params[f'conv_activ_{i + 1}'],
        name = f'Conv{i + 1}'
        ) (conv_input)

    # Pooling Layer
    pool_size = pool_sizes[i]
    if pool_size != None:
      conv_input = MaxPool2D(pool_size, name=f'Pool{i + 1}') (conv_input)

    # Normalizing Layer
    if i != num_conv_layers - 1:
        conv_input = BatchNormalization(name=f'Norm{i + 1}') (conv_input)

  flat = Flatten(name='flat') (conv)

  # RNN Layer
  pool_rnn = MaxPool2D((2, 4), name='PoolRNN') (model_input)
  squeeze = Lambda(lambda x: K.squeeze(x, axis=-1), name='Squeeze') (pool_rnn)
  bigru = Bidirectional(GRU(
      units = params['num_gru_units'],
      recurrent_dropout = params['gru_dropout_rate'],
      name = 'BiGRU'
      )) (squeeze)

  concat = concatenate([flat, bigru], name='Concat') # Merge layers
  model_output = Dense(num_classes, activation='sigmoid', name='Output') (concat) # Sigmoid

  model = Model(model_input, model_output, name='Parallel_Model')
  
  # Compile Parameters
  model.compile(
      loss = 'categorical_crossentropy',
      optimizer = params['optimizer'] (lr=params['learning_rate']),
      metrics = ['accuracy']
  )

  return model

In [0]:
# Data generator to not crash memory
def data_generator(data, labels, batch_size):

  while True:

    for i in range(0, data.shape[0], batch_size):

      yield data[i: i + batch_size], labels[i: i + batch_size]

In [0]:
# Function to train model
def train_model(train_data, train_labels, valid_data, valid_labels, model=None, **params):

  batch_size = params['batch_size']

  train_data = K.expand_dims(train_data, axis=-1)
  valid_data = K.expand_dims(valid_data, -1)

  train_generator = data_generator(train_data, train_labels, batch_size=batch_size)
  valid_generator = data_generator(valid_data, valid_labels, batch_size)

  reduce_lr = ReduceLROnPlateau(
      
      monitor = 'val_accuracy',
      factor = 0.5,
      min_delta = 0.01,
      patience = 4,
      mode = 'auto',
      verbose = 1
  )
  
  early_stop = EarlyStopping(
      
      monitor = 'val_accuracy',
      min_delta = 0.001,
      patience = 5,
      mode = 'auto',
      verbose = 1
  )

  if model == None:

    model_input = Input(shape=(mel_features, mel_time, 1), name='Input')
    model = parallel_model(model_input, **params)
    print(model.summary())

  history = model.fit(
      
      x = train_generator,
      epochs = params['epochs'],
      steps_per_epoch = train_data.shape[0] // batch_size,
      validation_data = valid_generator,
      validation_steps = valid_data.shape[0] // batch_size,
      callbacks = [None],
      verbose = 1
  )

  return model, history

In [0]:
# Function to train model over several slices of data.
# Regular tensor overflows the hardcoded limit (2GB)
def train_in_slices(train_data, train_labels, valid_data, valid_labels, slices, **params):

  model, history = None, None
  
  train_slice = train_data.shape[0] // slices
  valid_slice = valid_data.shape[0] // slices

  train_labels = to_categorical(train_labels)
  valid_labels = to_categorical(valid_labels)

  for i in range(slices):
    
    train_ind = train_slice * i
    valid_ind = valid_slice * i

    model, history = train_model(
        
        train_data = train_data[train_ind: train_ind + train_slice],
        train_labels = train_labels[train_ind: train_ind + train_slice],
        valid_data = valid_data[valid_ind: valid_ind + valid_slice],
        valid_labels = valid_labels[valid_ind: valid_ind + valid_slice],
        model = model,
        **params
    )

  return model, history

In [0]:
best_model_hist = None
best_preds = None

In [0]:
# Transforming X_test for compatability
X_test_expanded = np.expand_dims(X_test, -1)

# Skopt Bayesian Optimization
@use_named_args(dims)
def fitness(**params):
  
  for key, val in params.items():

    print(f'{key}: {val}')

  blackbox, history = train_in_slices(X_train, y_train, X_valid, y_valid, 2, **params)

  y_pred = blackbox.predict(X_test_expanded)
  y_pred = np.argmax(y_pred, axis=1)

  acc = accuracy_score(y_test, y_pred)
  print(f'Model Accuracy: {acc:.02%}')

  # Updating variables for later show
  if best_acc is None or best_acc < acc:
      
      best_model_hist = hist
      best_preds = y_pred

  del blackbox
  del history

  K.clear_session()
  tf.compat.v1.reset_default_graph()

  return -acc # Returns negative as optimization tries to find minimum

In [0]:
# Using Gaussian Process as Surrogate
gp_result = gp_minimize(
    
    func = fitness,
    dimensions = dims,
    n_calls = 25,
    n_jobs = -1,
    x0 = default_dims,
    kappa = 5,
    verbose = True
)
K.clear_session()
tf.compat.v1.reset_default_graph()

In [0]:
print(gp_result)

In [0]:
def plot_history(history):

  # Plot training & validation accuracy values
  plt.plot(history.history['accuracy'])
  plt.plot(history.history['val_accuracy'])
  plt.title('Model Accuracy')
  plt.ylabel('Accuracy')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Valid'], loc='upper left')
  plt.show()

  # Plot training & validation loss values
  plt.plot(history.history['loss'])
  plt.plot(history.history['val_loss'])
  plt.title('Model Loss')
  plt.ylabel('Loss')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Valid'], loc='upper left')
  plt.show()

In [0]:
plot_history(best_model_hist)

In [0]:
classification_report(y_test, best_preds)

In [0]:
conf_mat = confusion_matrix(y_test, best_preds)

font_axis = {
    'family': 'sans serif',
    'color':  'darkred',
    'weight': 'normal',
    'size': 12,
}

font_title = {
    'family': 'sans serif',
    'color': 'black',
    'weight': 'bold',
    'size': 16
}

sns.heatmap(
    data = conf_mat,
    annot = True,
    fmt = '',
    cbar = False,
    square = True,
    xticklabels = reverse_map.values(),
    yticklabels = reverse_map.values()
)
plt.title('HeatMap', fontdict=font_title, pad=10)
plt.xlabel('Predicted', fontdict=font_axis, labelpad=10)
plt.ylabel('Actual', fontdict=font_axis, labelpad=5)