<a href="https://colab.research.google.com/github/jameswebbtc/RAVDESS_Speech_Classification/blob/main/Hyperparameter_Tuning_Approach1_VGG16.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Hyperparameter Tuning of Fine-Tuned VGG16 + DNN Model**
In this notebook, we tune the hyperparameters of one of the models described in SER_models_on_Mel_Spectrograms.ipynb, ie, the Fine-tuned VGG16 + DNN network.

#**Importing Necessary Libraries**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import glob
import time
import random

In [2]:
import tensorflow as tf

In [3]:
import cv2
import PIL
from PIL import Image

In [4]:
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras import regularizers
from keras.layers import Dropout

In [5]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
import seaborn as sns

In [6]:
from imblearn.over_sampling import RandomOverSampler

In [7]:
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers, models

In [8]:
from keras.layers import Flatten, Dense, Conv2D, MaxPooling2D, Input, Dropout, AveragePooling2D
from keras.models import Model, Sequential
from keras.optimizers import Adam

In [9]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


#**Training Inputs**
In this section, we create the inputs for training and testing the model. We also read the arrays corresponding to Mel spectrograms for training the model. These were created using 'SER_models_on_Mel_Spectrograms.ipynb', while training the model without hyperparameter tuning.

In [10]:
# There are 8 classes in the dataset representing 8 emotions
classes = {1, 2, 3, 4, 5, 6, 7, 8}

In [11]:
dataset_location = '/content/gdrive/MyDrive/Capstone_Project/Spectrogram_inputs'

In [12]:
# Get a list of all image files in the dataset
image_files = glob.glob(os.path.join(dataset_location, "*.png"))

In [13]:
# Extract label from filename
labels = [int(file.split("_")[7].split(".")[0]) for file in image_files]

In [14]:
# Make a dataframe with the filepaths and labels
df = pd.DataFrame(image_files, columns=['File_Path'])
df['pos_labels'] = labels - np.ones(len(labels))
df.head(2)

Unnamed: 0,File_Path,pos_labels
0,/content/gdrive/MyDrive/Capstone_Project/Spect...,2.0
1,/content/gdrive/MyDrive/Capstone_Project/Spect...,4.0


In [15]:
# Shuffle the rows
df= df.sample(frac=1)
df.head(2)

Unnamed: 0,File_Path,pos_labels
538,/content/gdrive/MyDrive/Capstone_Project/Spect...,7.0
1161,/content/gdrive/MyDrive/Capstone_Project/Spect...,4.0


### Resampling to balance classes

During EDA, it was seen that 'neutral' class had half the number of elements as compared to other classes. Hence, this class has been over-sampled.

In [16]:
# Count the occurrences of each class
class_counts = df['pos_labels'].value_counts()

# Identify the minority class
minority_class = class_counts.idxmin()

# Identify the majority class
majority_class = class_counts.idxmax()

# Define oversampling strategy
oversample_strategy = {minority_class: class_counts[majority_class]}

# Oversample using RandomOverSampler
oversampler = RandomOverSampler(sampling_strategy=oversample_strategy, random_state=42)
X_resampled, y_resampled = oversampler.fit_resample(df[['File_Path']], df['pos_labels'])

# Combine resampled data into a new DataFrame
df_resampled = pd.concat([X_resampled, y_resampled], axis=1)

In [17]:
df_resampled

Unnamed: 0,File_Path,pos_labels
0,/content/gdrive/MyDrive/Capstone_Project/Spect...,7.0
1,/content/gdrive/MyDrive/Capstone_Project/Spect...,4.0
2,/content/gdrive/MyDrive/Capstone_Project/Spect...,7.0
3,/content/gdrive/MyDrive/Capstone_Project/Spect...,0.0
4,/content/gdrive/MyDrive/Capstone_Project/Spect...,1.0
...,...,...
1531,/content/gdrive/MyDrive/Capstone_Project/Spect...,0.0
1532,/content/gdrive/MyDrive/Capstone_Project/Spect...,0.0
1533,/content/gdrive/MyDrive/Capstone_Project/Spect...,0.0
1534,/content/gdrive/MyDrive/Capstone_Project/Spect...,0.0


In [18]:
# Shuffle the rows
df_resampled= df_resampled.sample(frac=1)
df_resampled.head(2)

Unnamed: 0,File_Path,pos_labels
386,/content/gdrive/MyDrive/Capstone_Project/Spect...,7.0
928,/content/gdrive/MyDrive/Capstone_Project/Spect...,7.0


In [19]:
# Split the dataset into training, validation and test sets
train_data, test_data = train_test_split(df_resampled, test_size=0.2, random_state=42,  stratify=df_resampled['pos_labels'])
test_data, val_data   = train_test_split(test_data, test_size=0.5, random_state=42,  stratify=test_data['pos_labels'])

In [20]:
train_data.shape

(1228, 2)

In [21]:
test_data.shape

(154, 2)

In [22]:
val_data.shape

(154, 2)

In [23]:
df_resampled.shape

(1536, 2)

In [24]:
# Convert to numpy arrays for use in the subsequent sections
train_files, train_labels = train_data.values[:,0], train_data.values[:,1].astype('int32')
val_files, val_labels     = val_data.values[:,0], val_data.values[:,1].astype('int32')
test_files, test_labels   = test_data.values[:,0], test_data.values[:,1].astype('int32')

In [25]:
# Load the pre-processed arrays
input_shape = (300,50,3)

train_images = np.load('/content/gdrive/MyDrive/Capstone_Project/train_images_class_balanced_no_augm_color.npy')
val_images = np.load('/content/gdrive/MyDrive/Capstone_Project/val_images_class_balanced_no_augm_color.npy')
test_images = np.load('/content/gdrive/MyDrive/Capstone_Project/test_images_class_balanced_no_augm_color.npy')

In [26]:
# Use ImageDataGenerator to create batches of data for training and validation
batch_size           = 50
train_datagen        = ImageDataGenerator(rescale=1)
val_datagen          = ImageDataGenerator(rescale=1)
test_datagen         = ImageDataGenerator(rescale=1)

train_generator      = train_datagen.flow(train_images, train_labels, batch_size=batch_size)
validation_generator = val_datagen.flow(val_images, val_labels, batch_size=batch_size)
test_generator       = test_datagen.flow(test_images, test_labels, batch_size=test_images.shape[0],shuffle=False)

#**Function for Compiling a Network as per the Given Hyperparameters**

In [27]:
def create_model(n_layers, n_neurons, n_layer_vgg_tune, dropout, lr):
    base_model = VGG16(weights='imagenet', include_top=False, input_shape=(50,300,3))

    # Modify the input layer for grayscale images
    inputs = layers.Input(shape=input_shape)
    # Replicate the single channel to create a three-channel image
    replicated_inputs = inputs
    # Use the VGG16 base model
    x = base_model(replicated_inputs)

    # Add the custom dense layers
    x = Flatten()(x)
    for i in range(0, n_layers):
      x = Dense(n_neurons, activation='relu',kernel_regularizer=regularizers.l2(0.01))(x)
      x = Dropout(dropout)(x)

    output = Dense(len(classes), activation='softmax')(x)  # Set num_classes to the number of your classes

    # Create the fine-tuned model
    model = Model(inputs=inputs, outputs=output)

    # Freeze layers up to a certain point
    for layer in base_model.layers[:-n_layer_vgg_tune]:
      layer.trainable = False

    adam_optimizer = Adam(learning_rate=lr)
    model.compile(optimizer=adam_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

#**Grid Search**
We chose to tune the number of layers and neurons per layer in the DNN classifier. This would change the capacity of the network to learn from the dataset. We also chose the number of trainable layers in the VGG16 model as a hyperparameter. Thus, fine-tuning of the pre-trained model using our dataset happens only on certain layers, and we expect the variation in this number to impact the model performance. Since overfitting on the training data was a major challenge for this study, we also tuned the learning rate and dropout values for the DNN layers.

In this section, we perform grid search for these parameters.

In [46]:
# Grid search parameters
n_layer_val         = [2, 4]
n_neuron_val        = [100, 300]
n_layers_vgg_tune_val   = [2, 3]
dropout_val         = [0.2, 0.4]
learn_rate_val       = [0.001, 0.005]

In [47]:
# Grid search
n_layer_vec              = np.array([])
n_neuron_vec             = np.array([])
n_layers_vgg_tune_vec    = np.array([])
dropout_vec              = np.array([])
learn_rate_vec           = np.array([])

val_accuracy      = np.array([])
epochs            = 25

for n_layer in n_layer_val:
  for n_neuron in n_neuron_val:
    for n_layers_vgg_tune in n_layers_vgg_tune_val:
      for dropout in dropout_val:
        for lr in learn_rate_val:
            model_search  = create_model(n_layer, n_neuron, n_layers_vgg_tune, dropout, lr)
            model_search.fit(train_generator,
                              steps_per_epoch=len(train_generator),
                              validation_data=validation_generator,
                              validation_steps=len(validation_generator),
                              epochs=epochs, verbose=0
                              )
            acc = model_search.evaluate(validation_generator)
            val_accuracy      = np.append(val_accuracy, acc[1])
            n_layer_vec       = np.append(n_layer_vec, n_layer)
            n_neuron_vec      = np.append(n_neuron_vec, n_neuron)
            n_layers_vgg_tune_vec= np.append(n_layers_vgg_tune_vec, n_layers_vgg_tune)
            dropout_vec          = np.append(dropout_vec, dropout)
            learn_rate_vec       = np.append(learn_rate_vec, lr)
            del model_search
            tf.keras.backend.clear_session()

results_grid_search                       = pd.DataFrame (n_layer_vec, columns=['Layers in DNN'])
results_grid_search['Neurons in DNN layers']  = n_neuron_vec
results_grid_search['Number of trainable layers in VGG16']      = n_layers_vgg_tune_vec
results_grid_search['Dropout'] = dropout_vec
results_grid_search['Learning Rate'] = learn_rate_vec
results_grid_search['Validation Accuracy'] = val_accuracy



In [49]:
results_grid_search.to_excel('/content/gdrive/MyDrive/Capstone_Project/results_grid_search.xlsx', index=False)
results_grid_search

Unnamed: 0,Layers in DNN,Neurons in DNN layers,Number of trainable layers in VGG16,Dropout,Learning Rate,Validation Accuracy
0,2.0,100.0,2.0,0.2,0.001,0.74026
1,2.0,100.0,2.0,0.2,0.005,0.720779
2,2.0,100.0,2.0,0.4,0.001,0.720779
3,2.0,100.0,2.0,0.4,0.005,0.688312
4,2.0,100.0,3.0,0.2,0.001,0.75974
5,2.0,100.0,3.0,0.2,0.005,0.123377
6,2.0,100.0,3.0,0.4,0.001,0.74026
7,2.0,100.0,3.0,0.4,0.005,0.123377
8,2.0,300.0,2.0,0.2,0.001,0.792208
9,2.0,300.0,2.0,0.2,0.005,0.681818


#**Conclusion**
The best model had an accuracy of 79.2% as compared to 75.97% without hyperparameter tuning. It is observed that a lower dropout and learning rate led to better performance.
