# **Speech Emotion Recogntion**

Training phase.
Out of all the clasification algorithms trained, *'MLP Classifier'* (which was not expected) turned out to be the better performing algorith with an accuracy around *84%* when scaled and transformed using *'Robust Scaler'*

### **Importing Neccessary Modules**

In [None]:
import os
import math
import json
import joblib
import numpy as np
import pandas as pd
import tensorflow.keras as keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.preprocessing import RobustScaler, PowerTransformer, QuantileTransformer, StandardScaler

import warnings
warnings.filterwarnings('ignore')

## **All Other Stuff**

Preparing data and declaring functions so that it makes things easier

### **Save the Model and Hyperparameters**

If the model is a neural network we have to save it in '*hdf5*' format, hence the below function takes a optional argument '*NN*' which denotes 'Neural Networks'.

In [None]:
def save_model(model, model_path, hyperparameters=None, param_path=None, NN=False):
    if NN:
        model.save(model_path)
        return
    joblib.dump(model, model_path)
    joblib.dump(hyperparameters, param_path)

### **Load The Data**

The features extracted were stores in '*json*' format. This below function loads feature vectors with the shape (rows, n_mels, columns).

In [None]:
def load_data(data_path, feature):
    with open(data_path, "r") as fp:
        data = json.load(fp)
    X = np.array(data[feature])
    y = np.array(data["labels"])
    print("Data succesfully loaded!")

    return  X, y

In [None]:
xMels_rs, yMels_rs = load_data(r"/content/drive/MyDrive/Data/mel_spec_data_40.json", 'mel_spec')
xMfcc_rs, yMfcc_rs = load_data("/content/drive/MyDrive/Data/mfcc_data_40.json", 'mfcc')
xMels_tess, yMels_tess = load_data("/content/drive/MyDrive/Data/tess_mel_spec_40.json", 'mel_spec')
xMfcc_tess, yMfcc_tess = load_data("/content/drive/MyDrive/Data/tess_mfcc_40.json", 'mfcc')
xStft, yStft = load_data("/content/drive/MyDrive/Data/Tess_chroma_stft_40.json", 'chroma_stft')
xCqt, yCqt = load_data('/content/drive/MyDrive/Data/Tess_chroma_cqt_40.json', 'chroma_cqt')

Data succesfully loaded!
Data succesfully loaded!
Data succesfully loaded!
Data succesfully loaded!
Data succesfully loaded!
Data succesfully loaded!


### **Transform the Data**

MFCC and MEl Spectrogram features of RAVDESS and SAVEE audio files were extracted with '*n_mels=26*' whereas all of TESS audio files features were extracted with '*n_mels=40*'. Hence, there will be a mismatch of shape which would generate a error. The below function would fill the missing columns with 0.

In [None]:
def transform_data(data, no_of_columns=18):
    empty_rows = np.empty((no_of_columns, data.shape[2]))
    empty_rows[:] = 0
    x = []
    for i in range(len(data)):
    arr = np.concatenate((data[i], empty_rows), axis=0)
    x.append(arr)
    return np.array(x)

xMels_rs = transform_data(xMels_rs)
xMfcc_rs = transform_data(xMfcc_rs)

While working with the MLP Classifier in thr 'sklearn' module and other traditional ML models, the input features shold be a 2D array of vectors. This below functions convert the given 3D array to 2D array

In [None]:
def convert_to_2d(X, Y):
    x = []
    y = []
    for i in range(len(X)):
        x.append(X[i].flatten())
        # x.append(X[i].flatten())
        
        y.append(Y[i])
        # y.append(Y[i])
        
    return np.array(x), np.array(y)

xMels_rs_2d, yMels_rs_2d = convert_to_2d(xMels_rs, yMels_rs)
xMfcc_rs_2d, yMfcc_rs_2d = convert_to_2d(xMfcc_rs, yMfcc_rs)
xMels_tess_2d, yMels_tess_2d = convert_to_2d(xMels_tess, yMels_tess)
xMfcc_tess_2d, yMfcc_tess_2d = convert_to_2d(xMfcc_tess, yMfcc_tess)
xStft_2d, yStft_2d = convert_to_2d(xStft, yStft)
xCqt_2d, yCqt_2d = convert_to_2d(xCqt, yCqt)

### **Scaling and Transforming Data**

This below function scales the given 2D array with the given transformer

In [None]:
def scale_2d_array(array, transformer):
    scaler = transformer
    scaled_input = scaler.fit_transform(array)
    return scaled_input

This below function scales the given 2D array with the given transformer

In [None]:
def scale_3d_data(X, transformer):
    scaler = transformer
    scaled_arr = scaler.fit_transform(X.reshape(-1, X.shape[-1]))
    scaled_arr = scaled_arr.reshape(*X.shape)
    return X

### **Preparing the 2D Data Set**

In [None]:
def prepare_2d_dataset(X, y, transformer):
    scaled_input = scale_2d_array(X, transformer)
    xTrain, xTest, yTrain, yTest = train_test_split(scaled_input, y, random_state=43, test_size=0.2)
    return xTrain, xTest, yTrain, yTest

### **Making Predictions**

In [None]:
def predict_on_ml_models(model, X, y):
    predictions = model.predict(X)
    print(f'Accuracy: {accuracy_score(y, predictions)}')
    confusion_matrix(y, predictions)

In [None]:
def predict_on_nn_models(model, X, y):
    # add a dimension to input data for sample - model.predict() expects a 4d array in this case
    X = X[np.newaxis, ...] # array shape (1, 130, 13, 1)

    # perform prediction
    prediction = model.predict(X)

    # get index with max value
    predicted_index = np.argmax(prediction, axis=1)

    print("Target: {}, Predicted label: {}".format(y, predicted_index))

## **Training Traditional ML Models**

Trained on Random Forest Classifier and MLP Classifier

In [None]:
# Creating Inputs and Output

input_features = np.concatenate((xMels_rs_2d, xMfcc_rs_2d, xMels_tess_2d, xMfcc_tess_2d, xStft_2d, xCqt_2d))
target = np.concatenate((yMels_rs_2d, yMfcc_rs_2d, yMels_tess_2d, yMfcc_tess_2d, yStft_2d, yCqt_2d))

### **MLP Classifier**

In [None]:
from sklearn.neural_network import MLPClassifier

mlp_model = MLPClassifier(alpha=0.01, batch_size=32, 
                      epsilon=1e-08, hidden_layer_sizes=(300,), 
                      solver='adam', activation='relu',
                      verbose=True, warm_start=True,
                      early_stopping=True,
                      learning_rate='adaptive', max_iter=30)

# train the model
xTrain, xTest, yTrain, yTest = prepare_2d_dataset(input_features, target, RobustScaler())
mlp_model.fit(xTrain, yTrain)

# save the model
model_path = '/content/drive/MyDrive/Models/mlp_model.pkl'
param_path = '/content/drive/MyDrive/Models/mlp_params.pkl'
save_model(mlp_model, model_path, mlp_model.get_params(), param_path)

# make predictions
predict_on_ml_models(mlp_model, xTest, yTest)

Accuracy: 0.48014440433212996


### **Random Forest Classifier**

In [None]:
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier(n_estimators=250, max_depth=20, min_samples_split=200, 
                                 min_samples_leaf=100, class_weight='balanced', random_state=2)


# train the model
xTrain, xTest, yTrain, yTest = prepare_2d_dataset(input_features, target, PowerTransformer())
rnd_clf.fit(xTrain, yTrain)

# save the model
model_path = '/content/drive/MyDrive/Models/rfc_model_pow.joblib'
param_path = '/content/drive/MyDrive/Models/rfc_param_pow.pkl'
save_model(rnd_clf, model_path, hyperparameters=rnd_clf.get_params(), param_path=param_path, NN=False)

# make predictions
predict_on_ml_models(rnd_clf, xTest, yTest)

## **Training ANN Models**

> I have used CNN and LSTM models with three hidden layers. To reduce over fitting l2 regularization is done.




### **Computing Class Weights**

Since some of the neural networks does not implicitly calculate the weights of the class we need to calculate class weigths and pass it as a hyperparameter explicitly

In [None]:
from sklearn.utils import class_weight

class_weights = class_weight.compute_class_weight(class_weight='balanced', classes=np.unique(target), y=target)
class_weights = {0: class_weights[0], 1: class_weights[1], 2: class_weights[2], 3: class_weights[3], 
                 4: class_weights[4], 5: class_weights[5], 6: class_weights[6]}

### **Prepare 3D Data**

In [None]:
def prepare_ann_datasets(test_size, validation_size, transformer, lstm=False):

    features = scale_3d_data(input_features, transformer)
    X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=test_size)
    X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=validation_size)

    if lstm:
      return X_train, X_validation, X_test, y_train, y_validation, y_test

    X_train = X_train[..., np.newaxis]
    X_validation = X_validation[..., np.newaxis]
    X_test = X_test[..., np.newaxis]

    return X_train, X_validation, X_test, y_train, y_validation, y_test

In [None]:
def plot_history(history):
    fig, axis = plt.subplots(2)

    axis[0].plot(history.history["accuracy"], label="Train Accuracy")
    axis[0].plot(history.history["val_accuracy"], label='Validation Accuracy')
    axis[0].legend()

    axis[1].plot(history.history["loss"], label="Train Error")
    axis[1].plot(history.history["val_loss"], label="Test Error")
    axis[1].legend()

    plt.show()

### **Create Inputs and Targets**

All the features extracted individually must be concatenated so as to pass it the model being trained

In [None]:
input_features = np.concatenate((xMels_rs, xMfcc_rs, xMels_tess, xMfcc_tess, xStft, xCqt))
target = np.concatenate((yMels_rs, yMfcc_rs, yMels_tess, yMfcc_tess, yStft, yCqt))

### **LSTM Model** 

In [None]:
def construct_lstm_model():
    # create network
    input_shape = (X_train.shape[1], X_train.shape[2]) # 130, 13

    # build network topology
    lstm_model = keras.Sequential()

    # 2 LSTM layers
    lstm_model.add(keras.layers.LSTM(64, input_shape=input_shape, return_sequences=True))
    lstm_model.add(keras.layers.LSTM(64))

    # dense layer
    lstm_model.add(keras.layers.Dense(64, activation='relu'))
    lstm_model.add(keras.layers.Dropout(0.3))

    # output layer
    lstm_model.add(keras.layers.Dense(7, activation='softmax'))

    # compile model
    optimiser = keras.optimizers.Adam(learning_rate=0.0001)
    lstm_model.compile(optimizer=optimiser,
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

    lstm_model.summary()

    # train model
    history = lstm_model.fit(X_train, y_train, validation_data=(X_validation, y_validation), batch_size=32, epochs=30, class_weight=class_weights)
    return lstm_model, history

### **CNN Model**

In [None]:
def construct_cnn_model():
    # create network
    input_shape = (X_train.shape[1], X_train.shape[2], 1)

    # build network topology
    cnn_model = keras.Sequential()

    # 1st conv layer
    cnn_model.add(keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
    cnn_model.add(keras.layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'))
    cnn_model.add(keras.layers.BatchNormalization())

    # 2nd conv layer
    cnn_model.add(keras.layers.Conv2D(32, (3, 3), activation='relu'))
    cnn_model.add(keras.layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'))
    cnn_model.add(keras.layers.BatchNormalization())

    # 3rd conv layer
    cnn_model.add(keras.layers.Conv2D(32, (2, 2), activation='relu'))
    cnn_model.add(keras.layers.MaxPooling2D((2, 2), strides=(2, 2), padding='same'))
    cnn_model.add(keras.layers.BatchNormalization())

    # flatten output and feed it into dense layer
    cnn_model.add(keras.layers.Flatten())
    cnn_model.add(keras.layers.Dense(64, activation='relu'))
    cnn_model.add(keras.layers.Dropout(0.3))

    # output layer
    cnn_model.add(keras.layers.Dense(7, activation='softmax'))

    # compile model
    optimiser = keras.optimizers.Adam(learning_rate=0.0001)
    cnn_model.compile(optimizer=optimiser,
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])

    cnn_model.summary()

    # train model
    history = cnn_model.fit(X_train, y_train, validation_data=(X_validation, y_validation), batch_size=32, epochs=30, class_weight=class_weights)
    return cnn_model, history

In [None]:
# get train, validation, test splits for lstm models
X_train, X_validation, X_test, y_train, y_validation, y_test = prepare_ann_datasets(0.25, 0.2, transformer=RobustScaler(), lstm=True)
lstm_model, lstm_history = construct_lstm_model()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_4 (LSTM)               (None, 44, 64)            26880     
                                                                 
 lstm_5 (LSTM)               (None, 64)                33024     
                                                                 
 dense_6 (Dense)             (None, 64)                4160      
                                                                 
 dropout_3 (Dropout)         (None, 64)                0         
                                                                 
 dense_7 (Dense)             (None, 7)                 455       
                                                                 
Total params: 64,519
Trainable params: 64,519
Non-trainable params: 0
_________________________________________________________________
Epoch 1/2
Epoch 2/2


In [None]:
# get train, validation, test splits for cnn models
X_train, X_validation, X_test, y_train, y_validation, y_test = prepare_ann_datasets(0.25, 0.2, transformer=RobustScaler())
cnn_model, cnn_history = construct_cnn_model()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 42, 38, 32)        320       
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 21, 19, 32)       0         
 2D)                                                             
                                                                 
 batch_normalization_3 (Batc  (None, 21, 19, 32)       128       
 hNormalization)                                                 
                                                                 
 conv2d_4 (Conv2D)           (None, 19, 17, 32)        9248      
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 10, 9, 32)        0         
 2D)                                                             
                                                      

### **Model Evaluation and Prediction**

In [None]:
def evaluate(model, history):
    # plot accuracy/error for training and validation
    plot_history(history)

    # evaluate model on test set
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
    print('\nTest accuracy:', test_acc)

In [None]:
evaluate(cnn_model, cnn_history)
X_to_predict = X_test[1]
y_to_predict = y_test[1]

# predict sample
predict_on_nn_models(cnn_model, X_to_predict, y_to_predict)

Target: 6, Predicted label: [3]
