# <center> Speech & Song Emotion Recognition Using Multilayer Perceptron and Standard Vector Machine  </center>

## Behzad Javaheri

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
### This notebook contains all the steps taken to process. These are:

#### 1. Description of the dataset
#### 2. Importing required libraries
#### 3. Predictor extraction
#### 4. Data exploration and visualisation of extracted features
#### 5. Model constructions using SVM and MLP algorithms
#### 6. Data augmentation and address data imbalance. Optimised SVM and MLP will run on the augment dataset
#### 7. Data split by vocal channel into speech and song. Optimised SVM and MLP will run on the split datasets.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


### 1. Description of dataset
#### The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) 

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) was utilised in this study. The audio files recorded in speech or song vocal channels by 24 (12 males, 12 females) and 23 (12 males, 11 females) actors in each vocal channel, respectively are available in .wav format. The emotions expressed within the speech channel include calm, happy, sad, angry, fearful, disgust and surprise, and for the song, channel include calm, happy, sad, angry, and fearful. The speech channel has 1440 (60 trials per actor x 24 actors) and song channel 1012 (44 trials per actor x 23 actors), providing 2452 audio files. The naming system for these audio files used are as follow: So for convenience, here's the filename identifiers as per the official RAVDESS website:

* Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
* Vocal channel (01 = speech, 02 = song).
* Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
* Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
* Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
* Repetition (01 = 1st repetition, 02 = 2nd repetition).
* Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).


### 2. Import libraries 

In [None]:
from timeit import Timer

import librosa
import librosa.display
import soundfile

import os, glob, pickle
import numpy as np
import pandas as pd
import utils
import time
from random import shuffle
from scipy import stats

import IPython.display as ipd 
import plotly.io as pio
from matplotlib.pyplot import specgram
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import seaborn as sns
# to improve matplotlib graphs
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
pio.renderers.default='notebook'
mpl.rcParams['figure.dpi'] = 300
from mpl_toolkits.mplot3d import Axes3D

from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, StratifiedKFold, train_test_split, cross_val_predict, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, precision_score, recall_score, f1_score
from sklearn.metrics import average_precision_score, precision_recall_curve, plot_precision_recall_curve,accuracy_score 
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler, StandardScaler, OneHotEncoder, LabelEncoder
from sklearn.datasets import make_classification
from imblearn.datasets import make_imbalance
from imblearn.over_sampling import SMOTE
from joblib import dump, load

import yellowbrick as yb
from yellowbrick.classifier import ClassificationReport, PrecisionRecallCurve, ClassPredictionError, ROCAUC, ConfusionMatrix
from yellowbrick.regressor import PredictionError, ResidualsPlot
from yellowbrick.model_selection import LearningCurve, ValidationCurve, FeatureImportances, CVScores
from yellowbrick.style.rcmod import set_aesthetic, set_palette

from pydiogment.auga import fade_in_and_out
from pydiogment.augf import change_tone

import warnings
warnings.simplefilter("ignore")
warnings.filterwarnings('ignore')
def ignore_warn(*args, **kwargs):
    pass
warnings.warn = ignore_warn

### 3. Predictor extraction

Here we define a function to extract 5 predictors detailed below from these audio files. 

* mfcc: Mel Frequency Cepstral Coefficient, represents the short-term power spectrum of a sound
* chroma: Pertains to the 12 different pitch classes
* mel: Mel Spectrogram Frequency
* Contrast 
* Tonnetz (tonnetz)

#### 3.1 The function for predictor extraction

In [None]:
#Source: https://github.com/MasazI/audio_classification/blob/master/features.py
def extract_feature(file_name, mfcc, chroma, mel, tonnetz, contrast):
        X, sample_rate = librosa.load(os.path.join(file_name), res_type='kaiser_fast')
        result = np.array([])
        if chroma or contrast:
            stft = np.abs(librosa.stft(X))
            result = np.array([])
        if mfcc:
            mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
            result = np.hstack((result, mfccs))
        if chroma:
            chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
            result = np.hstack((result, chroma))
        if mel:
            mel = np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
            result = np.hstack((result, mel))
        if contrast:
            contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0)
            result = np.hstack((result, contrast))
        if tonnetz:
            tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(X), sr=sample_rate).T,axis=0)
            result = np.hstack((result, tonnetz))
        return result

#### 3.2 Assigning emotions to numerical correspondence

In [None]:
# Emotions in the RAVDESS dataset
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}
# Emotions to observe
observed_emotions = ['neutral','calm','happy', 'sad','angry','fearful','disgust','surprised']


#### 3.3 Defining a function that will ieterate through all audio files for predictor extraction

In [None]:
## The audio files can be downloaded from:
## Song files from: https://zenodo.org/record/1188976/files/Audio_Song_Actors_01-24.zip?download=1
## Speech files: https://zenodo.org/record/1188976/files/Audio_Speech_Actors_01-24.zip?download=1
## These downloaded folders need to be combined prior to predictor extractions

def load_data():
    X = []
    gender = []
    channel = []
    y =[]
    actor=[]
    for file in glob.glob('..\\data\\RAV\\Actor_*\\*.wav'):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]]
        part = file_name.split('.')[0].split('-')
        #actor.append(int(part[6]))
        #bg = int(part[6])
               
        temp1 = int(part[6])
        if temp1%2 == 0:
            temp1 = 0 # for females
        else:
            temp1 = 1 # for males
        gender.append(temp1)
        
        temp2 = int(part[1])
        if temp2%2 == 0:
            temp2 = 2 # for song
        else:
            temp2 = 1 # for speech
        channel.append(temp2)
        
        # we allow only AVAILABLE_EMOTIONS we set
        if emotion not in observed_emotions:
            continue
            
         # extract speech features
        feature=extract_feature(file, 
                                mfcc=True, 
                                chroma=True, 
                                mel=True, 
                                tonnetz=True, 
                                contrast=True)
    
        X.append(feature)
        y.append(emotion)

    return {"X":X, "channel":channel, "gender":gender, "y":y, }


#### 3.4 Performing predictor extraction using above function

In [None]:
data = load_data()

# Defining different elements extracted
X = pd.DataFrame(data["X"])
channel = pd.DataFrame(data["channel"],columns=['channel'])
gender = pd.DataFrame(data["gender"],columns=['gender']) 
y = pd.DataFrame(data["y"],columns=['emotions'])

###Observe the shape of the training and testing datasets:

# number of samples in training data
print("[+] Number of training samples:", X.shape[0])
# number of samples in testing data
print("[+] Number of testing samples:", y.shape[0])
# number of samples in channel data
print("[+] Number of testing samples:", channel.shape[0])
# number of samples in gender data
print("[+] Number of testing samples:", gender.shape[0])

# Get the number of features extracted
print(f'Features extracted: {X.shape[1]}')

#### 3.5 Saving the extracted variables into a csv file

In [None]:
data = pd.concat([X, channel, gender, y], axis =1)
data.to_csv("RAV_extracted_features.csv")

### 4. Data exploration and visualisation of extracted features

In this step two visulisation will be performed to display audio files waveform and extracted Mel Spectrogram that will be used as a predictor

#### 4.1 Visulisation of waveform

In [None]:
################# Visualisation of waveform for all emotions #####################################
sns.set_context('paper', font_scale=2)
with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-01-01-01-01-01.wav') as audio:
    waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate
    plt.figure(figsize=(30,10))
    plt.subplot(3, 2, 1)
    librosa.display.waveplot(waveform, sr=sample_rate)
    plt.title('Neutral')

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-02-01-02-01-01.wav') as audio:
    waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate
    plt.subplot(3, 2, 2)
    librosa.display.waveplot(waveform, sr=sample_rate)
    plt.title('Calm')

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-03-01-01-01-01.wav') as audio:
    waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate
    plt.figure(figsize=(30,10))
    plt.subplot(3, 2, 3)
    librosa.display.waveplot(waveform, sr=sample_rate)
    plt.title('Happy')

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-04-01-01-01-01.wav') as audio:
    waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate
    plt.subplot(3, 2, 4)
    librosa.display.waveplot(waveform, sr=sample_rate)
    plt.title('Sad')
    
with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-05-01-01-01-01.wav') as audio:
    waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate
    plt.figure(figsize=(30,10))
    plt.subplot(3, 2, 5)
    librosa.display.waveplot(waveform, sr=sample_rate)
    plt.title('Angry')

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-06-01-01-01-01.wav') as audio:
    waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate
    plt.subplot(3, 2, 6)
    librosa.display.waveplot(waveform, sr=sample_rate)
    plt.title('Fearful')

#### 4.2 Visualisation of Mel Spectrogram

In [None]:
## https://towardsdatascience.com/speech-emotion-recognition-using-ravdess-audio-dataset-ce19d162690

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-01-01-01-01-01.wav') as audio:
    neutral_waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-02-01-01-01-01.wav') as audio:
    calm_waveform = audio.read(dtype="float32")
    # same sample rate

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-03-01-01-01-01.wav') as audio:
    happy_waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-04-01-01-01-01.wav') as audio:
    sad_waveform = audio.read(dtype="float32")
    # same sample rate

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-05-01-01-01-01.wav') as audio:
    angry_waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-06-01-01-01-01.wav') as audio:
    fearful_waveform = audio.read(dtype="float32")
    # same sample rate

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-07-01-01-01-01.wav') as audio:
    disgust_waveform = audio.read(dtype="float32")
    sample_rate = audio.samplerate

with soundfile.SoundFile('..\\data\\RAV\\actor_01\\03-01-08-01-01-01-01.wav') as audio:
    surprised_waveform = audio.read(dtype="float32")
    # same sample rate

sns.set_context('paper', font_scale=3)
melspectrogram = librosa.feature.melspectrogram(y=neutral_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.figure(figsize=(30, 10))
plt.subplot(2, 4, 1)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
#plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Neutral')

melspectrogram = librosa.feature.melspectrogram(y=calm_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.subplot(2, 4, 2)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
#plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Calm')
plt.tight_layout()

melspectrogram = librosa.feature.melspectrogram(y=happy_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.subplot(2, 4, 3)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
#plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Happy')

melspectrogram = librosa.feature.melspectrogram(y=sad_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.subplot(2, 4, 4)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Sad')
plt.tight_layout()

melspectrogram = librosa.feature.melspectrogram(y=angry_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.subplot(2, 4, 5)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
#plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Angry')

melspectrogram = librosa.feature.melspectrogram(y=fearful_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.subplot(2, 4, 6)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
#plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Fearful')
plt.tight_layout()

melspectrogram = librosa.feature.melspectrogram(y=disgust_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.subplot(2, 4, 7)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
#plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Disgust')

melspectrogram = librosa.feature.melspectrogram(y=surprised_waveform, sr=sample_rate, n_mels=128, fmax=8000)
plt.subplot(2, 4, 8)
librosa.display.specshow(librosa.power_to_db(S=melspectrogram, ref=np.mean),y_axis='mel',fmax=8000, x_axis='time', norm=Normalize(vmin=-20,vmax=20))
plt.colorbar(format='%+2.0f dB',label='Amplitude')
plt.xlim(2,6)
plt.ylabel('Mels')
plt.title('Surprised')
plt.tight_layout()
plt.suptitle('Mel spectrogram of audio files in the RAV emotion dataset', fontsize=36, weight="bold");
plt.subplots_adjust(top=0.88)

### 5. Model constructions using SVM and MLP algorithms

#### 5.1 Loading dataset and getting description

In [None]:
# Openning the dataset
data = pd.read_csv('RAV_extracted_features.csv')
## Disgust and surprised emoptions will be dropped as these emotions are only present in the speech channel.
data.drop(data[data['emotions'] == "disgust"].index, inplace = True)
data.drop(data[data['emotions'] == "surprised"].index, inplace = True)
# Description of dataset to check distribution
data.describe()

#### 5.2 Defining predictors and tagte variables

In [None]:
# shuffling the data to avoid over/under representation of one class in the training/test dataset
data = data.sample(frac=1)

# Features and target columns
X = data.drop(['emotions', 'gender'], axis = 1).values
y = data['emotions'].values


#### 5.3 Predictor scaling using two alternative approaches

Examination of data showed significant variations in data distribution. It was therefore decided to scale the data. Initially, the predictors and the target variable (emotions) were defined. From the sklearn library, the train_test_split function was used to split predictors and target into 80 and 20% for train and test subsets, respectively. Subsequently, to identify the most appropriate scaling approach, two available scaling methods within the sklearn library were used. StandardScaler (standard) achieves scaling by producing predictors with mean zero and scaling data to unit variance, that is, variance and standard deviation of 1. Min-Max scaling will transform predictors into a range between [0, 1] or [-1, 1]. Furthermore, the target multiclass variable (emotions) was in categorical form, therefore encoded using LabelEncoder. 

In [None]:
# Split the dataset into train and test
X_train, X_test, y_train, y_test= train_test_split(X, y, random_state=42, test_size=0.20)

In [None]:
## Encoding multiclass categorical y
y_train=LabelEncoder().fit_transform(y_train)
y_test=LabelEncoder().fit_transform(y_test)

## Standard scaling
scaler1 = MinMaxScaler()
X_train_minmax = X_train
X_test_minmax = X_test
X_train_minmax = scaler1.fit_transform(X_train_minmax)
X_test_minmax = scaler1.fit_transform(X_test_minmax)

# Min-Max scaling
scaler2 = StandardScaler()
X_train_scaled = X_train
X_test_scaled = X_test 
X_train_scaled = scaler2.fit_transform(X_train_scaled)
X_test_scaled = scaler2.fit_transform(X_test_scaled)

# Print scaled dataset to check whether they have been scaled
X_train_minmax_df = pd.DataFrame(X_train_minmax)
X_train_scaled_df = pd.DataFrame(X_train_scaled)


# print some details
# number of samples in training data
print("[+] Number of unscaled training samples:", X_train.shape[0])
print("[+] Number of scaled training samples:", X_train_scaled.shape[0])
print("[+] Number of minmax scaled training samples:", X_train_minmax.shape[0])

# number of samples in testing data
print("[+] Number of unscaled testing samples:", X_test.shape[0])
print("[+] Number of scaled testing samples:", X_test_scaled.shape[0])
print("[+] Number of minmax scaled testing samples:", X_test_minmax.shape[0])

# Get the number of features extracted
print(f'Features extracted for unscaled: {X_train.shape[1]}')
print(f'Features extracted for scaled: {X_train_scaled.shape[1]}')
print(f'Features extracted for minmax scaled: {X_train_minmax.shape[1]}')


#### 5.5 The SVM algorithm for emotion classification

#### 5.5.1 SVM model using unscaled data and default parameters

In [None]:
model1 = SVC()

model1.fit(X_train, y_train)
model1.score(X_test, y_test)

print(f'SVC Model\'s accuracy on training set is {100*model1.score(X_train, y_train):.2f}%')
print(f'SVC Model\'s accuracy on test set is {100*model1.score(X_test, y_test):.2f}%')

f, axes = plt.subplots(2, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    ClassificationReport(model1, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    ClassPredictionError(model1, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    PrecisionRecallCurve(model1, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ROCAUC(model1, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1])
]

for viz in visualgrid:
    viz.fit(X_train, y_train)
    viz.score(X_test, y_test)
    viz.finalize()
f.suptitle('Performance of SVC classifier with default parameters using non-scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model1.svg")

#### 5.5.2 SVM model using min-max scaled data and default parameters

In [None]:
model2 = SVC()

model2.fit(X_train_minmax, y_train)
model2.score(X_test_minmax, y_test)

print(f'SVC Model\'s accuracy on training set is {100*model2.score(X_train_minmax, y_train):.2f}%')
print(f'SVC Model\'s accuracy on test set is {100*model2.score(X_test_minmax, y_test):.2f}%')

f, axes = plt.subplots(2, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    ClassificationReport(model2, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    ClassPredictionError(model2, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    PrecisionRecallCurve(model2, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ROCAUC(model2, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1])
]

for viz in visualgrid:
    viz.fit(X_train_minmax, y_train)
    viz.score(X_test_minmax, y_test)
    viz.finalize()
f.suptitle('Performance of SVM classifier with default parameters on min-max scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model2.svg")

#### 5.5.3 SVM model using standard scaled data and default parameters

In [None]:
model3 = SVC()

model3.fit(X_train_scaled, y_train)
model3.score(X_test_scaled, y_test)

print(f'SVC Model\'s accuracy on training set is {100*model3.score(X_train_scaled, y_train):.2f}%')
print(f'SVC Model\'s accuracy on test set is {100*model3.score(X_test_scaled, y_test):.2f}%')

f, axes = plt.subplots(2, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    ClassificationReport(model3, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    ClassPredictionError(model3, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    PrecisionRecallCurve(model3, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ROCAUC(model3, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled, y_train)
    viz.score(X_test_scaled, y_test)
    viz.finalize()
f.suptitle('Performance of SVM classifier with default parameters on standard scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model3.svg")

#### 5.5.4 Grid search to find optimal SVM parameters

In [None]:
############################################################################# Do not run ############################################################################
# RANDOM SEARCH FOR 20 COMBINATIONS OF PARAMETERS

tic = time.perf_counter()
k = StratifiedKFold(n_splits=3, shuffle=False)
param_grid1 = {'kernel': ['rbf', 'poly', 'linear'],
            "C": stats.uniform(2, 50),
            "gamma": stats.uniform(0.01, 1)}
              
grid1 = RandomizedSearchCV(SVC(), param_distributions = param_grid1, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid1.fit(X_train_scaled, y_train)

#print(grid1.cv_results_)
print(grid1.best_params_)
print(grid1.score(X_test_scaled, y_test))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for SVC is {toc - tic:0.4f} seconds")
print()

dump(grid1.best_estimator_, 'SVM_best_estimator_grid1.joblib', compress = 1) # Saving the best estimators
dump(grid1.cv_results_, 'SVM_CV_result_grid1.pkl')                           # Saving the whole object
dump(grid1, 'SVM_whole_object_grid1.pkl')                                    # Saving the whole object

#### Visualisation of the search performance
df_grid1 = load('SVM_CV_result_grid1.pkl')
df_grid1 = pd.DataFrame(df_grid1)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid1['param_gamma']
y_points = df_grid1['mean_score_time']
z_points = df_grid1['mean_test_score']
colour = df_grid1['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of SVC Grid1 randomized SearchCV performance", transform=ax.transAxes)
fig.savefig("grid1_3d.svg")

#### 5.5.5 SVM with suggested optimal hyperparameters from the grid search

In [None]:
model4 = SVC(
    C = 41.80717878639727,
    gamma = 0.03079471598867322,
    kernel = 'rbf',
    random_state = 69
)

model4.fit(X_train_scaled, y_train)

print(f'SVC Model\'s accuracy on training set is {100*model4.score(X_train_scaled, y_train):.2f}%')
print(f'SVC Model\'s accuracy on test set is {100*model4.score(X_test_scaled, y_test):.2f}%')

f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model4, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model4, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model4, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model4, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled, y_train)
    viz.score(X_test_scaled, y_test)
    viz.finalize()
f.suptitle('Performance of SVM classifier with optimal parameters on standard scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model4.svg")

#### 5.5.6 Confusion matrix of models

In [None]:
from yellowbrick.style import set_palette
sns.set_context('paper', font_scale=2)
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2,figsize=(25, 12))
viz = ConfusionMatrix(model1, support=True,  cmap='GnBu', title="SVM classification on unscaled data", ax=ax1)
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.finalize()
viz = ConfusionMatrix(model2, support=True, cmap='GnBu', title="SVM classification on min-max data", ax=ax2)
viz.fit(X_train_minmax, y_train)
viz.score(X_test_minmax, y_test)
viz.finalize()
viz = ConfusionMatrix(model3, support=True, cmap='GnBu', title="SVM classification on standard scaled data", ax=ax3)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ConfusionMatrix(model4, support=True, cmap='GnBu', title="Optimised SVM classification on standard scaled data", ax=ax4)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
f.suptitle('Confusion matrix of 4 SVM models on unscaled, min-max and standard scaled and after optimisation on standard scaled', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
f.savefig("Confusion_matrix1_SVC.svg")

### 5.6 The MLP Model for Emotion Classification

#### 5.6.1 MLP model using non-scaled data and default parameters

In [None]:
#MLP classifier with default parameter
model5 = MLPClassifier()

model5.fit(X_train, y_train)
pred_y = model5.predict(X_test)

print(f'MLP Model\'s accuracy on training set is {100*model5.score(X_train, y_train):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model5.score(X_test, y_test):.2f}%')

f, axes = plt.subplots(2, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    ClassificationReport(model5, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    ClassPredictionError(model5, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    PrecisionRecallCurve(model5, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ROCAUC(model5, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1])
]

for viz in visualgrid:
    viz.fit(X_train, y_train)
    viz.score(X_test, y_test)
    viz.finalize()
f.suptitle('Performance of MLP classifier with default parameters on non-scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model5.svg")

#### 5.6.2 MLP model using min-max scaled data and default parameters

In [None]:
#MLP classifier with default parameter
model6 = MLPClassifier()

model6.fit(X_train_minmax, y_train)
pred_y = model6.predict(X_test_minmax)

print(f'MLP Model\'s accuracy on training set is {100*model6.score(X_train_minmax, y_train):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model6.score(X_test_minmax, y_test):.2f}%')

f, axes = plt.subplots(2, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    ClassificationReport(model6, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    ClassPredictionError(model6, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    PrecisionRecallCurve(model6, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ROCAUC(model6, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1])
]

for viz in visualgrid:
    viz.fit(X_train_minmax, y_train)
    viz.score(X_test_minmax, y_test)
    viz.finalize()
f.suptitle('Performance of MLP classifier with default parameters on min-max scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model6.svg")

#### 5.6.3 MLP model using standard scaled data and default parameters

In [None]:
#MLP classifier with default parameter
model7 = MLPClassifier()

model7.fit(X_train_scaled, y_train)
pred_y = model7.predict(X_test_scaled)

print(f'MLP Model\'s accuracy on training set is {100*model7.score(X_train_scaled, y_train):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model7.score(X_test_scaled, y_test):.2f}%')

f, axes = plt.subplots(2, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    ClassificationReport(model7, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    ClassPredictionError(model7, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    PrecisionRecallCurve(model7, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ROCAUC(model7, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled, y_train)
    viz.score(X_test_scaled, y_test)
    viz.finalize()
f.suptitle('Performance of MLP classifier with default parameters on standard scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model7.svg")

#### 5.6.4 Grid search to find optimal parameters for MLP

In [None]:
################################################################################ Do not run ###################################################################
# Initialize the MLP Classifier and choose parameters we want to keep constant
tic = time.perf_counter()

k = StratifiedKFold(n_splits=3, shuffle=False)
mlp = MLPClassifier(
    # tune batch size later 
    batch_size=256,  
    # keep random state constant to accurately compare subsequent models
    random_state=42
)

# Choose the grid of hyperparameters we want to use for Grid Search to build our candidate models
param_grid2 = {
    'hidden_layer_sizes': [(8,), (180,), (300,),(100,50,),(10,10,10)], 
    'activation': ['tanh','relu', 'logistic'],
    'solver': ['sgd', 'adam'],
    'alpha': [0.0001, 0.001, 0.01],
    'epsilon': [1e-08, 0.1],
    'learning_rate': ['adaptive', 'constant'],
}
              
grid2 = RandomizedSearchCV(mlp, param_distributions = param_grid2, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid2.fit(X_train_scaled, y_train)

#print(grid2.cv_results_)
print(grid2.best_params_)
print(grid2.score(X_test_scaled, y_test))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for SVC is {toc - tic:0.4f} seconds")
print()

dump(grid2.best_estimator_, 'MLP_best_estimator_grid2.joblib', compress = 1) # Saving the best estimators
dump(grid2.cv_results_, 'MLP_CV_result_grid2.pkl')                           # Saving the whole object
dump(grid2, 'MLP_whole_object_grid2.pkl')                                    # Saving the whole object

### Visualisation of search performance
df_grid2 = load('MLP_CV_result_grid2.pkl')
df_grid2 = pd.DataFrame(df_grid2)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid2['param_alpha']
y_points = df_grid2['mean_score_time']
z_points = df_grid2['mean_test_score']
colour = df_grid2['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of MLP Grid2 randomized SearchCV performance", transform=ax.transAxes)
fig.savefig("grid2_3d.svg")

#### 5.6.5 MLP with suggested optimal hyperparameters from the grid search

In [None]:
# MLP classifer with best parameters from the grid search with the scaled data

model8 = MLPClassifier(
    activation='relu', 
    solver='adam', 
    alpha=0.001, 
    beta_1=0.9,
    beta_2=0.999,
    batch_size=256, 
    epsilon=1e-08, 
    hidden_layer_sizes=(100, 50), 
    learning_rate='constant',
    max_iter=1000,
    early_stopping=False, # without early stopping
    shuffle = False
)

model8.fit(X_train_scaled, y_train)

print(f'MLP Model\'s accuracy on training set is {100*model8.score(X_train_scaled, y_train):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model8.score(X_test_scaled, y_test):.2f}%')

f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model8, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model8, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model8, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model8, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model8, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model8, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled, y_train)
    viz.score(X_test_scaled, y_test)
    viz.finalize()
f.suptitle('Performance of MLP classifier with optimal parameters on standard scaled dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model8.svg")

#### 5.6.6 MLP with suggested optimal hyperparameters and early stopping to prevent overfitting

In [None]:
# MLP classifer with best parameters from the grid search with the scaled data

model9 = MLPClassifier(
    activation='relu', 
    solver='adam', 
    alpha=0.001, 
    beta_1=0.9,
    beta_2=0.999,
    batch_size=256, 
    epsilon=1e-08, 
    hidden_layer_sizes=(100, 50), 
    learning_rate='constant',
    max_iter=1000,
    early_stopping=True, # without early stopping
    shuffle = False
)

model9.fit(X_train_scaled, y_train)

print(f'MLP Model\'s accuracy on training set is {100*model9.score(X_train_scaled, y_train):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model9.score(X_test_scaled, y_test):.2f}%')


f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model9, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model9, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model9, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model9, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model9, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model9, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled, y_train)
    viz.score(X_test_scaled, y_test)
    viz.finalize()
f.suptitle('Performance of optimised MLP classifier with early stopping to avoid overfitting', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model9.svg")

#### 5.6.7 Confusion matrix

In [None]:
from yellowbrick.style import set_palette
set_palette('paired')
f, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(2, 3, figsize=(25, 12))
viz = ConfusionMatrix(model5, support=True, cmap='GnBu', title="MLP classification on unscaled data", ax=ax1)
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.finalize()
viz = ConfusionMatrix(model6, support=True, cmap='GnBu', title="MLP classification on min-max data", ax=ax2)
viz.fit(X_train_minmax, y_train)
viz.score(X_test_minmax, y_test)
viz.finalize()
viz = ConfusionMatrix(model7, support=True, cmap='GnBu', title="MLP classification on standard scaled data", ax=ax3)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ConfusionMatrix(model8, support=True, cmap='GnBu', title="Optimised MLP classification on standard scaled data", ax=ax4)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ConfusionMatrix(model9, support=True, cmap='GnBu', title="Optimised MLP classification on standard scaled data with early stopping", ax=ax5)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
f.suptitle('Confusion matrix of 5 MLP models on unscaled, min-max, standard scaled and optimised standard scaled with/without early stopping', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
f.savefig("Confusion_matrix2_MLP.svg")

#### 5.6.8 Comparison of best (optimised) SVM and MLP models (model 4 and 9) on standard scaled dataset

In [None]:
sns.set_context('paper', font_scale=2)
f, ((ax1, ax2, ax3,), (ax4, ax5, ax6)) = plt.subplots(2, 3, figsize=(25, 12))
set_palette('paired')
viz = ClassificationReport(model4, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], title="SVM classification report using standard scaled data", ax=ax1)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ClassPredictionError(model4, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="SVM prediction error using standard scaled data", ax=ax2)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ConfusionMatrix(model4, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], title="SVM confusion matrix using standard scaled data", ax=ax3)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ClassificationReport(model8, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], title="MLP classification report using standard scaled data", ax=ax4)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ClassPredictionError(model8, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="MLP prediction error using standard scaled data", ax=ax5)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
viz = ConfusionMatrix(model8, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], title="MLP confusion matrix using standard scaled data", ax=ax6)
viz.fit(X_train_scaled, y_train)
viz.score(X_test_scaled, y_test)
viz.finalize()
f.suptitle('Comparative performance of optimised SVM and MLP classification using standard scaled data', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
f.savefig("Comparative_performance1.svg")

## 6. Data augmentation and addressing data imbalance using SMOTE

#### 6.1 Audio file augmentation and extraction of predictors from these audio files

In [None]:
### Data augmentation using pydiogment library. Here two sets of additional files created by changing the original .wav files through two process of "fade in and out"
# and "change tone"
for file in glob.glob('..\\data\\RAV\\Actor_*\\*.wav'):
    fade_in_and_out(file)
    change_tone(file, 1.1)

### Iterating through augmented datset for predictor eaxtraction    
augmented_data = load_data()

X_augmented = pd.DataFrame(augmented_data["X"])
channel_augmented = pd.DataFrame(augmented_data["channel"],columns=['channel'])
gender_augmented = pd.DataFrame(augmented_data["gender"],columns=['gender']) 
y_augmented = pd.DataFrame(augmented_data["y"],columns=['emotions'])

# print some details
# number of samples in training data
print("[+] Number of training samples:", X_augmented.shape[0])
# number of samples in testing data
print("[+] Number of testing samples:", y_augmented.shape[0])
# Get the number of features extracted
print(f'Features extracted: {X_augmented.shape[1]}')

### Saving the augmented dataset
data_augmented = pd.concat([X_augmented, channel_augmented, gender_augmented, y_augmented], axis =1)
data_augmented.to_csv("RAV_extracted_features_augmented.csv")


#### 6.2 Loading augmented dataset, predictor and target definition, SMOTE and split

In [None]:
### Openning the saved augmented dataset
data_augmented = pd.read_csv('RAV_extracted_features_augmented.csv')
data_augmented.drop(data_augmented[data_augmented['emotions'] == "disgust"].index, inplace = True)
data_augmented.drop(data_augmented[data_augmented['emotions'] == "surprised"].index, inplace = True)

# shuffling the data to avoid over/under representation of one class in the training/test dataset
data_augmented = data_augmented.sample(frac=1)

# Features and target columns
X_augmented = data_augmented.drop(['emotions', 'gender'], axis = 1).values
y_augmented = data_augmented['emotions'].values
oversample = SMOTE()
X_augmented, y_augmented = oversample.fit_resample(X_augmented, y_augmented)

# Split the dataset unscaled
X_train_augmented, X_test_augmented, y_train_augmented, y_test_augmented= train_test_split(X_augmented, y_augmented, random_state=42, test_size=0.20)

### Encoding the target labels
## Encoding multiclass categorical y
y_train_augmented=LabelEncoder().fit_transform(y_train_augmented)
y_test_augmented=LabelEncoder().fit_transform(y_test_augmented)

# Min-Max scaling
scaler2 = StandardScaler()
X_train_scaled_augmented = X_train_augmented
X_test_scaled_augmented = X_test_augmented 
X_train_scaled_augmented = scaler2.fit_transform(X_train_scaled_augmented)
X_test_scaled_augmented = scaler2.fit_transform(X_test_scaled_augmented)

# Print scaled dataset to check whether they have been scaled
X_train_scaled_augmented_df = pd.DataFrame(X_train_scaled_augmented)


# print some details
# number of samples in training data
print("[+] Number of unscaled training samples:", X_train_augmented.shape[0])
print("[+] Number of scaled training samples:", X_train_scaled_augmented.shape[0])

# number of samples in testing data
print("[+] Number of unscaled testing samples:", X_test_augmented.shape[0])
print("[+] Number of scaled testing samples:", X_test_scaled_augmented.shape[0])

# Get the number of features extracted
print(f'Features extracted for unscaled: {X_train_augmented.shape[1]}')
print(f'Features extracted for scaled: {X_train_scaled_augmented.shape[1]}')

#### 6.3 RandomizedSearchCV to tune SVM parameters using the augmented data

In [None]:
################################################################################# Do not run ######################################################################
tic = time.perf_counter()
k = StratifiedKFold(n_splits=3, shuffle=False)

param_grid3 = {'kernel': ['rbf', 'poly', 'linear'],
            "C": stats.uniform(2, 50),
            "gamma": stats.uniform(0.01, 1)}
              
grid3 = RandomizedSearchCV(SVC(), param_distributions = param_grid3, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid3.fit(X_train_scaled_augmented, y_train_augmented)

#print(grid1.cv_results_)
print(grid3.best_params_)
print(grid3.score(X_test_scaled_augmented, y_test_augmented))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for SVC is {toc - tic:0.4f} seconds")
print()

dump(grid3.best_estimator_, 'SVM_best_estimator_augmented_grid3.joblib', compress = 1) # Saving the best estimators
dump(grid3.cv_results_, 'SVM_CV_result_grid3.pkl')                           # Saving the whole object
dump(grid3, 'SVM_whole_object_augmented_grid3.pkl')                          # Saving the whole object

### 3D visualisation of search performance
df_grid3 = load('SVM_CV_result_grid3.pkl')
df_grid3 = pd.DataFrame(df_grid3)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid3['param_gamma']
y_points = df_grid3['mean_score_time']
z_points = df_grid3['mean_test_score']
colour = df_grid3['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of SVM grid3 randomized SearchCV performance on the augmented data", transform=ax.transAxes)
fig.savefig("grid3_3d.svg")

#### 6.4 Performing SVM classification on augmented data using optimal parameters

In [None]:
model10 = SVC(
    C = 44.98800565318603,
    gamma = 0.1554129361088037,
    kernel = 'poly',
    random_state = 69
)

model10.fit(X_train_scaled_augmented, y_train_augmented)

print(f'SVC Model\'s accuracy on training set is {100*model10.score(X_train_scaled_augmented, y_train_augmented):.2f}%')
print(f'SVC Model\'s accuracy on test set is {100*model10.score(X_test_scaled_augmented, y_test_augmented):.2f}%')

f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model10, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model10, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model10, cmap='GnBu', support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model10, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model10, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model10, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented, y_train_augmented)
    viz.score(X_test_scaled_augmented, y_test_augmented)
    viz.finalize()
f.suptitle('Performance of optimised SVM classifier on augmented dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model10.svg")

#### 6.5 RandomizedSearchCV to tune MLP parameters using the augmented data

In [None]:
################################################################################ Do not run ###################################################################
# Initialize the MLP Classifier and choose parameters we want to keep constant
tic = time.perf_counter()

k = StratifiedKFold(n_splits=3, shuffle=False)
mlp = MLPClassifier(
    # tune batch size later 
    batch_size=256,  
    # keep random state constant to accurately compare subsequent models
    random_state=42
)

# Choose the grid of hyperparameters we want to use for Grid Search to build our candidate models
param_grid4 = {
    'hidden_layer_sizes': [(8,), (180,), (300,),(100,50,),(10,10,10)], 
    'activation': ['tanh','relu', 'logistic'],
    'solver': ['sgd', 'adam'],
    'alpha': [0.0001, 0.001, 0.01],
    'epsilon': [1e-08, 0.1],
    'learning_rate': ['adaptive', 'constant'],
}
              
grid4 = RandomizedSearchCV(mlp, param_distributions = param_grid4, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid4.fit(X_train_scaled_augmented, y_train_augmented)

#print(grid4.cv_results_)
print(grid4.best_params_)
print(grid4.score(X_test_scaled_augmented, y_test_augmented))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for SVC is {toc - tic:0.4f} seconds")
print()

dump(grid4.best_estimator_, 'MLP_best_estimator_grid4.joblib', compress = 1) # Saving the best estimators
dump(grid4.cv_results_, 'MLP_CV_result_grid4.pkl')                           # Saving the whole object
dump(grid4, 'MLP_whole_object_grid4.pkl')                                    # Saving the whole object

### 3D visualisation of search performance
df_grid4 = load('MLP_CV_result_grid4.pkl')
df_grid4 = pd.DataFrame(df_grid4)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid4['param_alpha']
y_points = df_grid4['mean_score_time']
z_points = df_grid4['mean_test_score']
colour = df_grid4['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of MLP Grid2 randomized SearchCV performance on the augmented data", transform=ax.transAxes)
fig.savefig("grid4_3d.svg")

#### 6.6 Performing MLP classification on augmented data using optimal parameters

In [None]:
# MLP classifer with best parameters from the grid search with the scaled data

model11 = MLPClassifier(
    activation='relu', 
    solver='adam', 
    alpha=0.0001, 
    beta_1=0.9,
    beta_2=0.999,
    batch_size=256, 
    epsilon=1e-08, 
    hidden_layer_sizes=(300,), 
    learning_rate='adaptive',
    max_iter=1000,
    early_stopping=True, # without early stopping
    shuffle = False
)

model11.fit(X_train_scaled_augmented, y_train_augmented)

print(f'MLP Model\'s accuracy on training set is {100*model11.score(X_train_scaled_augmented, y_train_augmented):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model11.score(X_test_scaled_augmented, y_test_augmented):.2f}%')

f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model11, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model11, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model11, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model11, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model11, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model11, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented, y_train_augmented)
    viz.score(X_test_scaled_augmented, y_test_augmented)
    viz.finalize()
f.suptitle('Performance of optimised MLP classifier on augmented dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model11.svg")

#### 6.7 Confusion matrix

In [None]:
f, (ax1, ax2) = plt.subplots(1,2, figsize=(25, 12))
set_aesthetic(palette='dark', font='Arial', font_scale=2, color_codes=True, rc=None)
classes=['neutral','calm','happy', 'sad','angry','fearful']
visualgrid = [
    ConfusionMatrix(model10, classes=classes, cmap='GnBu', title="Optimised SVM classification on augmented data", ax=ax1),
    ConfusionMatrix(model11, classes=classes, cmap='GnBu', title="Optimised MLP classification on augmented data", ax=ax2),
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented, y_train_augmented)
    viz.score(X_test_scaled_augmented, y_test_augmented)
    viz.finalize()
f.suptitle('Confusion matrix for optimised SVM and MLP models on augmented data', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
f.savefig("Confusion_matrix3_MLP_SVM_augmented.svg")

#### 6.8 Comparative performance of optimised SVM and MLP classification (model 10 and 11) on augmented data

In [None]:
sns.set_context('paper', font_scale=2)
set_palette('paired')
f, ((ax1, ax2, ax3,), (ax4, ax5, ax6)) = plt.subplots(2, 3, figsize=(25, 12))
viz = ClassificationReport(model10, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="SVM classification report using augmented data", ax=ax1)
viz.fit(X_train_scaled_augmented, y_train_augmented)
viz.score(X_test_scaled_augmented, y_test_augmented)
viz.finalize()
viz = ClassPredictionError(model10, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="SVM prediction error using augmented data", ax=ax2)
viz.fit(X_train_scaled_augmented, y_train_augmented)
viz.score(X_test_scaled_augmented, y_test_augmented)
viz.finalize()
viz = ConfusionMatrix(model10, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="SVM confusion matrix using augmented data", ax=ax3)
viz.fit(X_train_scaled_augmented, y_train_augmented)
viz.score(X_test_scaled_augmented, y_test_augmented)
viz.finalize()
viz = ClassificationReport(model11, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="MLP classification report using augmented data", ax=ax4)
viz.fit(X_train_scaled_augmented, y_train_augmented)
viz.score(X_test_scaled_augmented, y_test_augmented)
viz.finalize()
viz = ClassPredictionError(model11, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="MLP prediction error using augmented data", ax=ax5)
viz.fit(X_train_scaled_augmented, y_train_augmented)
viz.score(X_test_scaled_augmented, y_test_augmented)
viz.finalize()
viz = ConfusionMatrix(model11, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="MLP confusion matrix using augmented data", ax=ax6)
viz.fit(X_train_scaled_augmented, y_train_augmented)
viz.score(X_test_scaled_augmented, y_test_augmented)
viz.finalize()
f.suptitle('Comparative performance of optimised SVM amd MLP classification using augmented data', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
f.savefig("Comparative_performance2.svg")

### 7. Spliting the dataset based on channel (speech and song)

In [None]:
# Features and target columns
data_augmented = pd.read_csv('RAV_extracted_features_augmented.csv')
speech_data_augmented = data_augmented.loc[data_augmented['channel'] == 1]
song_data_augmented = data_augmented.loc[data_augmented['channel'] == 2]

speech_data_augmented.drop(speech_data_augmented[speech_data_augmented['emotions'] == "disgust"].index, inplace = True)
speech_data_augmented.drop(speech_data_augmented[speech_data_augmented['emotions'] == "surprised"].index, inplace = True)

song_data_augmented.drop(song_data_augmented[song_data_augmented['emotions'] == "disgust"].index, inplace = True)
song_data_augmented.drop(song_data_augmented[song_data_augmented['emotions'] == "surprised"].index, inplace = True)

### Speech
X_augmented_speech = speech_data_augmented.drop(['emotions', 'gender', 'channel'], axis = 1).values
y_augmented_speech = speech_data_augmented['emotions'].values
oversample = SMOTE()
X_augmented_speech, y_augmented_speech = oversample.fit_resample(X_augmented_speech, y_augmented_speech)

# Split the dataset unscaled
X_train_augmented_speech, X_test_augmented_speech, y_train_augmented_speech, y_test_augmented_speech= train_test_split(X_augmented_speech, y_augmented_speech, random_state=42, test_size=0.20)
## y label encoding
y_train_augmented_speech=LabelEncoder().fit_transform(y_train_augmented_speech)
y_test_augmented_speech=LabelEncoder().fit_transform(y_test_augmented_speech)
# Standard scaling
scaler2 = StandardScaler()
X_train_scaled_augmented_speech = X_train_augmented_speech
X_test_scaled_augmented_speech = X_test_augmented_speech 
X_train_scaled_augmented_speech = scaler2.fit_transform(X_train_scaled_augmented_speech)
X_test_scaled_augmented_speech = scaler2.fit_transform(X_test_scaled_augmented_speech)

### song
X_augmented_song = song_data_augmented.drop(['emotions', 'gender', 'channel'], axis = 1).values
y_augmented_song = song_data_augmented['emotions'].values
X_augmented_song, y_augmented_song = oversample.fit_resample(X_augmented_song, y_augmented_song)
# Split the dataset unscaled
X_train_augmented_song, X_test_augmented_song, y_train_augmented_song, y_test_augmented_song= train_test_split(X_augmented_song, y_augmented_song, random_state=42, test_size=0.20)
## y label encoding
y_train_augmented_song=LabelEncoder().fit_transform(y_train_augmented_song)
y_test_augmented_song=LabelEncoder().fit_transform(y_test_augmented_song)
# Standard scaling
scaler2 = StandardScaler()
X_train_scaled_augmented_song = X_train_augmented_song
X_test_scaled_augmented_song = X_test_augmented_song 
X_train_scaled_augmented_song = scaler2.fit_transform(X_train_scaled_augmented_song)
X_test_scaled_augmented_song = scaler2.fit_transform(X_test_scaled_augmented_song)


### 7.1 RandomizedSearchCV to tune SVM parameters using the speech channel of the augmented data

In [None]:
################################################################################# Do not run ######################################################################
tic = time.perf_counter()
k = StratifiedKFold(n_splits=3, shuffle=False)

param_grid5 = {'kernel': ['rbf', 'poly', 'linear'],
            "C": stats.uniform(2, 50),
            "gamma": stats.uniform(0.01, 1)}
              
grid5 = RandomizedSearchCV(SVC(), param_distributions = param_grid5, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid5.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)

#print(grid1.cv_results_)
print(grid5.best_params_)
print(grid5.score(X_test_scaled_augmented_speech, y_test_augmented_speech))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for SVC is {toc - tic:0.4f} seconds")
print()

dump(grid5.best_estimator_, 'SVM_best_estimator_augmented_speech_grid5.joblib', compress = 1) # Saving the best estimators
dump(grid5.cv_results_, 'SVM_CV_result_augmented_speech_grid5.pkl')                           # Saving the whole object
dump(grid5, 'SVM_whole_object_augmented_speech_grid5.pkl')                                    # Saving the whole object

## 3D visualisation of search performance
df_grid5 = load('SVM_CV_result_augmented_speech_grid5.pkl')
df_grid5 = pd.DataFrame(df_grid5)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid5['param_gamma']
y_points = df_grid5['mean_score_time']
z_points = df_grid5['mean_test_score']
colour = df_grid5['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of SVM Grid5 randomized SearchCV performance on the speech channel of augmented data", transform=ax.transAxes)
fig.savefig("grid5_3d.svg")

#### 7.2 SVM classification using optimal parameters on speech channel of the augmented data

In [None]:
model12 = SVC(
    C = 39.41309679301661,
    gamma = 0.060427225273544265,
    kernel = 'poly',
    random_state = 69
)

model12.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)

print(f'SVC Model\'s accuracy on training set is {100*model12.score(X_train_scaled_augmented_speech, y_train_augmented_speech):.2f}%')
print(f'SVC Model\'s accuracy on test set is {100*model12.score(X_test_scaled_augmented_speech, y_test_augmented_speech):.2f}%')

f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model12, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model12, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model12, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model12, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model12, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model12, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
    viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
    viz.finalize()
f.suptitle('Performance of optimised SVM classifier with optimal parameters on speech channel', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model12.svg")

### 7.3 RandomizedSearchCV to tune SVM parameters using the song channel of the augmented data

In [None]:
################################################################################# Do not run ######################################################################
tic = time.perf_counter()
k = StratifiedKFold(n_splits=3, shuffle=False)

param_grid6 = {'kernel': ['rbf', 'poly', 'linear'],
            "C": stats.uniform(2, 50),
            "gamma": stats.uniform(0.01, 1)}
              
grid6 = RandomizedSearchCV(SVC(), param_distributions = param_grid6, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid6.fit(X_train_scaled_augmented_song, y_train_augmented_song)

#print(grid1.cv_results_)
print(grid6.best_params_)
print(grid6.score(X_test_scaled_augmented_song, y_test_augmented_song))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for SVC is {toc - tic:0.4f} seconds")
print()

dump(grid6.best_estimator_, 'SVM_best_estimator_augmented_song_grid6.joblib', compress = 1) # Saving the best estimators
dump(grid6.cv_results_, 'SVM_CV_result_augmented_song_grid6.pkl')                           # Saving the whole object
dump(grid6, 'SVM_whole_object_augmented_song_grid6.pkl')                                    # Saving the whole object

### 3D visualisation of search performance
df_grid6 = load('SVM_CV_result_augmented_song_grid6.pkl')
df_grid6 = pd.DataFrame(df_grid6)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid6['param_gamma']
y_points = df_grid6['mean_score_time']
z_points = df_grid6['mean_test_score']
colour = df_grid6['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of SVM Grid6 randomized SearchCV performance on the song data", transform=ax.transAxes)
fig.savefig("grid6_3d.svg")

#### 7.4 Performing SVM classification using optimal parameters on song channel of augmented data

In [None]:
model13 = SVC(
    C = 49.69210418127462,
    gamma = 0.7632177027607225,
    kernel = 'poly',
    random_state = 69
)

model13.fit(X_train_scaled_augmented_song, y_train_augmented_song)

print(f'SVC Model\'s accuracy on training set is {100*model13.score(X_train_scaled_augmented_song, y_train_augmented_song):.2f}%')
print(f'SVC Model\'s accuracy on test set is {100*model13.score(X_test_scaled_augmented_song, y_test_augmented_song):.2f}%')

f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model13, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model13, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model13, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model13, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model13, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model13, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
    viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
    viz.finalize()
f.suptitle('Performance of optimised SVM classifier with optimal parameters on song channel', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model13.svg")

### 7.5 RandomizedSearchCV to tune MLP parameters using the speech channel of the augmented data

In [None]:
################################################################################ Do not run ###################################################################
# Initialize the MLP Classifier and choose parameters we want to keep constant
tic = time.perf_counter()

k = StratifiedKFold(n_splits=3, shuffle=False)
mlp = MLPClassifier(
    # tune batch size later 
    batch_size=256,  
    # keep random state constant to accurately compare subsequent models
    random_state=42
)

# Choose the grid of hyperparameters we want to use for Grid Search to build our candidate models
param_grid7 = {
    'hidden_layer_sizes': [(8,), (180,), (300,),(100,50,),(10,10,10)], 
    'activation': ['tanh','relu', 'logistic'],
    'solver': ['sgd', 'adam'],
    'alpha': [0.0001, 0.001, 0.01],
    'epsilon': [1e-08, 0.1],
    'learning_rate': ['adaptive', 'constant'],
}
              
grid7 = RandomizedSearchCV(mlp, param_distributions = param_grid7, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid7.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)

#print(grid7.cv_results_)
print(grid7.best_params_)
print(grid7.score(X_test_scaled_augmented_speech, y_test_augmented_speech))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for MLP is {toc - tic:0.4f} seconds")
print()

dump(grid7.best_estimator_, 'MLP_best_estimator_augmented_speech_grid7.joblib', compress = 1) # Saving the best estimators
dump(grid7.cv_results_, 'MLP_CV_result_augmented_speech_grid7.pkl')                           # Saving the whole object
dump(grid7, 'MLP_whole_object_augmented_speech_grid7.pkl')                                    # Saving the whole object


### Visualisation of search performance
df_grid7 = load('MLP_CV_result_augmented_speech_grid7.pkl')
df_grid7 = pd.DataFrame(df_grid7)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid7['param_alpha']
y_points = df_grid7['mean_score_time']
z_points = df_grid7['mean_test_score']
colour = df_grid7['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of MLP Grid7 randomized SearchCV performance on the speech data", transform=ax.transAxes)
fig.savefig("grid7_3d.svg")

#### 7.6 Performing MLP classification using optimal parameters on speech channel of augmented data

In [None]:
# MLP classifer with best parameters from the grid search with the scaled data

model14 = MLPClassifier(
    activation='relu', 
    solver='adam', 
    alpha=0.001, 
    beta_1=0.9,
    beta_2=0.999,
    batch_size=256, 
    epsilon=1e-08, 
    hidden_layer_sizes=(180,), 
    learning_rate='adaptive',
    max_iter=1000,
    early_stopping=True, # without early stopping
    shuffle = False
)

model14.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)

print(f'MLP Model\'s accuracy on training set is {100*model14.score(X_train_scaled_augmented_speech, y_train_augmented_speech):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model14.score(X_test_scaled_augmented_speech, y_test_augmented_speech):.2f}%')


f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model14, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model14, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model14, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model14, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model14, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model14, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
    viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
    viz.finalize()
f.suptitle('Performance of optimised MLP classifier with optimal parameters on speech channel', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model14.svg")

### 7.7 RandomizedSearchCV to tune MLP parameters using the song channel of the augmented data

In [None]:
################################################################################ Do not run ###################################################################
# Initialize the MLP Classifier and choose parameters we want to keep constant
tic = time.perf_counter()

k = StratifiedKFold(n_splits=3, shuffle=False)
mlp = MLPClassifier(
    # tune batch size later 
    batch_size=256,  
    # keep random state constant to accurately compare subsequent models
    random_state=42
)

# Choose the grid of hyperparameters we want to use for Grid Search to build our candidate models
param_grid8 = {
    'hidden_layer_sizes': [(8,), (180,), (300,),(100,50,),(10,10,10)], 
    'activation': ['tanh','relu', 'logistic'],
    'solver': ['sgd', 'adam'],
    'alpha': [0.0001, 0.001, 0.01],
    'epsilon': [1e-08, 0.1],
    'learning_rate': ['adaptive', 'constant'],
}

grid8 = RandomizedSearchCV(mlp, param_distributions = param_grid8, n_iter = 20, n_jobs = -1, cv = k, scoring = "accuracy") 
grid8.fit(X_train_scaled_augmented_song, y_train_augmented_song)

#print(grid8.cv_results_)
print(grid8.best_params_)
print(grid8.score(X_test_scaled_augmented_song, y_test_augmented_song))
print()

toc = time.perf_counter()
print(f"Time to run the RandomizedSearchCV for MLP is {toc - tic:0.4f} seconds")
print()

dump(grid8.best_estimator_, 'MLP_best_estimator_augmented_song_grid8.joblib', compress = 1) # Saving the best estimators
dump(grid8.cv_results_, 'MLP_CV_result_augmented_song_grid8.pkl')                           # Saving the whole object
dump(grid8, 'MLP_whole_object_augmented_song_grid8.pkl')                                    # Saving the whole object

### 3D visualisation of search performance
df_grid8 = load('MLP_CV_result_augmented_song_grid8.pkl')
df_grid8 = pd.DataFrame(df_grid8)
# Create a 3d scatter plot of the data
fig = plt.figure(figsize=(8, 6))
ax = plt.axes(projection="3d")
x_points = df_grid8['param_alpha']
y_points = df_grid8['mean_score_time']
z_points = df_grid8['mean_test_score']
colour = df_grid8['rank_test_score']
sctt = ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='jet');
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean score time')
ax.set_zlabel('Score')
fig.colorbar(sctt, shrink=0.5)
ax.view_init(elev=15, azim=30)
ax.text2D(0, 1, "3D visualisation of MLP Grid8 randomized SearchCV performance on the song data", transform=ax.transAxes)
fig.savefig("grid8_3d.svg")

#### 7.8 Performing MLP classification using optimal parameters on song channel of the augmented data

In [None]:
# MLP classifer with best parameters from the grid search with the scaled data

model15 = MLPClassifier(
    activation='tanh', 
    solver='adam', 
    alpha=0.001, 
    beta_1=0.9,
    beta_2=0.999,
    batch_size=256, 
    epsilon=1e-08, 
    hidden_layer_sizes=(180,), 
    learning_rate='adaptive',
    max_iter=1000,
    early_stopping=True, # without early stopping
    shuffle = False
)

model15.fit(X_train_scaled_augmented_song, y_train_augmented_song)

print(f'MLP Model\'s accuracy on training set is {100*model15.score(X_train_scaled_augmented_song, y_train_augmented_song):.2f}%')
print(f'MLP Model\'s accuracy on test set is {100*model15.score(X_test_scaled_augmented_song, y_test_augmented_song):.2f}%')


f, axes = plt.subplots(3, 2,figsize=(25, 12))
set_aesthetic(palette='paired', font='Arial', font_scale=2, color_codes=True, rc=None)
visualgrid = [
    CVScores(model15, cv=3, scoring='f1_weighted', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][0]),
    LearningCurve(model15, cv=3, scoring='f1_weighted', train_sizes=np.linspace(.1, 1.0, 5), n_jobs=4, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[0][1]),
    ClassificationReport(model15, support=True, cmap='GnBu', classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][0]),  
    ClassPredictionError(model15, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[1][1]),
    PrecisionRecallCurve(model15, per_class=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][0]),
    ROCAUC(model15, classes=['neutral','calm','happy', 'sad','angry','fearful'], ax=axes[2][1])
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
    viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
    viz.finalize()
f.suptitle('Performance of optimised MLP classifier with early stopping on standard scaled and augmented song dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
viz.show(outpath="model15.svg")

#### 7.9 Confusion matrix of optimised SVM and MLP on speech channel of the augmented data

In [None]:
f, (ax1, ax2) = plt.subplots(1,2, figsize=(25, 12))
set_aesthetic(palette='dark', font='Arial', font_scale=2, color_codes=True, rc=None)
classes=['neutral','calm','happy', 'sad','angry','fearful']
visualgrid = [
    ConfusionMatrix(model12, cmap='GnBu', classes=classes, title="Optimised SVM classification on augmented speech data", ax=ax1),
    ConfusionMatrix(model14, cmap='GnBu', classes=classes, title="Optimised MLP classification on augmented speech data", ax=ax2),
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
    viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
    viz.finalize()
f.suptitle('Confusion matrix for optimised SVM and MLP models on augmented speech data', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
f.savefig("Confusion_matrix4_MLP_SVM_augmented_speech.svg")

#### 7.10 Confusion matrix of optimised SVM and MLP on song channel of the augmented data

In [None]:
f, (ax1, ax2) = plt.subplots(1,2, figsize=(25, 12))
set_aesthetic(palette='dark', font='Arial', font_scale=2, color_codes=True, rc=None)
classes=['neutral','calm','happy', 'sad','angry','fearful']
visualgrid = [
    ConfusionMatrix(model13, cmap='GnBu', classes=classes, title="Optimised SVM classification on augmented song data", ax=ax1),
    ConfusionMatrix(model15, cmap='GnBu', classes=classes, title="Optimised MLP classification on augmented song data", ax=ax2),
]

for viz in visualgrid:
    viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
    viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
    viz.finalize()
f.suptitle('Confusion matrix for optimised SVM and MLP models on augmented song data', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
f.savefig("Confusion_matrix4_MLP_SVM_augmented_song.svg")

#### 7.11 Comparative performance of SVM and MLP classification using speech channel of augmented data

In [None]:
sns.set_context('paper', font_scale=2)
set_palette('paired')
f, ((ax1, ax2, ax3,), (ax4, ax5, ax6)) = plt.subplots(2, 3, figsize=(25, 12))
viz = ClassificationReport(model12, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="SVM classification report for augmented speech data", ax=ax1)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
viz = ClassPredictionError(model12, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="SVM prediction error for augmented speech data", ax=ax2)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
viz = ConfusionMatrix(model12, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="SVM confusion matrix for augmented speech data", ax=ax3)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
viz = ClassificationReport(model14, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="MLP classification report for augmented speech data", ax=ax4)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
viz = ClassPredictionError(model14, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="MLP prediction error for augmented speech data", ax=ax5)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
viz = ConfusionMatrix(model14, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="MLP confusion matrix augmented speech data", ax=ax6)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
f.suptitle('Comparative performance of optimised SVM amd MLP classification using speech channel of augmented data', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
f.savefig("Comparative_performance3.svg")

#### 7.12 Comparative performance of optimised SVM and MLP classification using song channel of augmented data

In [None]:
sns.set_context('paper', font_scale=2)
set_palette('paired')
f, ((ax1, ax2, ax3,), (ax4, ax5, ax6)) = plt.subplots(2, 3, figsize=(25, 12))
viz = ClassificationReport(model13, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="SVM classification report for augmented song data", ax=ax1)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
viz = ClassPredictionError(model13, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="SVM prediction error for augmented song data", ax=ax2)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
viz = ConfusionMatrix(model13, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="SVM confusion matrix for augmented song data", ax=ax3)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
viz = ClassificationReport(model15, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="MLP classification report for augmented song data", ax=ax4)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
viz = ClassPredictionError(model15, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="MLP prediction error for augmented song data", ax=ax5)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
viz = ConfusionMatrix(model15, support=True, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="MLP confusion matrix for augmented song data", ax=ax6)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
f.suptitle('Comparative performance of optimised SVM amd MLP classification using song channel of augmented data', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
f.savefig("Comparative_performance4.svg")

#### 7.13 plotting ROC/AUC

In [None]:
sns.set_context('paper', font_scale=2)
set_palette('paired')
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(25, 12))
viz = ROCAUC(model12, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="SVM ROC/AUC for augmented speech data", ax=ax1)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
viz = ROCAUC(model13, classes=['neutral','calm','happy', 'sad','angry','fearful'], title="SVM ROC/AUC for augmented song data", ax=ax2)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
viz = ROCAUC(model14, classes=['neutral','calm','happy', 'sad','angry','fearful','disgust','surprised'], cmap='GnBu', title="MLP ROC/AUC for augmented speech data", ax=ax3)
viz.fit(X_train_scaled_augmented_speech, y_train_augmented_speech)
viz.score(X_test_scaled_augmented_speech, y_test_augmented_speech)
viz.finalize()
viz = ROCAUC(model15, classes=['neutral','calm','happy', 'sad','angry','fearful'], cmap='GnBu', title="MLP ROC/AUC for augmented song data", ax=ax4)
viz.fit(X_train_scaled_augmented_song, y_train_augmented_song)
viz.score(X_test_scaled_augmented_song, y_test_augmented_song)
viz.finalize()
f.suptitle('Comparison of ROC/AUC of optimised SVM amd MLP emotion classification using speech and song dataset', fontsize=32, weight="bold");
plt.subplots_adjust(top=0.91)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
f.savefig("Comparative_performance5.png", dpi=600)