#MFCC Extraction
In this section of our notebook, we are preparing our data for the model. We start by defining a function to extract Mel-Frequency Cepstral Coefficients (MFCCs) from our BERSt audio files. MFCCs are a type of feature commonly used in audio and speech processing as they provide a compact representation of sound.

Next, we process our data by loading the CSV files that contain information about the audio files and their labels. We construct the full paths for the chunk files, encode the 'affect' labels into integers, and then convert these integer labels into one-hot vectors. For each audio file, we extract the MFCCs and store them in a dictionary along with the corresponding one-hot vector.

Finally, we save the MFCCs for the training and test data to .npy files. This allows us to load the MFCCs directly in future runs, saving the time and computational resources that would be required to extract the MFCCs again. This step is crucial for preparing our data for training all three of our models

In [1]:
import os
import librosa
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.utils import to_categorical


def extract_mfcc(file_path, n_fft, hop_length, max_sequence_length):
    # Load the audio file
    signal, sample_rate = librosa.load(file_path, sr=None)

    # Normalize the signal
    signal = librosa.util.normalize(signal)

    # Extract MFCC features
    mfcc = librosa.feature.mfcc(y=signal,
                                sr=sample_rate,
                                n_fft=n_fft,
                                hop_length=hop_length,
                                n_mfcc=13)

    # If the audio file is shorter than the max_sequence_length, pad it with zeros
    if (max_sequence_length > mfcc.shape[1]):
        pad_width = max_sequence_length - mfcc.shape[1]
        mfcc = np.pad(mfcc, pad_width=((0, 0), (0, pad_width)), mode='constant')

    # If the audio file is longer than the max_sequence_length, truncate it
    elif (max_sequence_length < mfcc.shape[1]):
        mfcc = mfcc[:, :max_sequence_length]

    return mfcc

def process_data(file_path):
    # Load the CSV file
    df = pd.read_csv(file_path)

    # Replace '/media/data/' with '/content/drive/MyDrive/assigment/' and remove '.wav' extension in 'file_name'
    df['file_name'] = df['file_name'].str.replace('/media/data/', '/content/drive/MyDrive/assigment/').str.replace('.wav', '')

    # Construct the full paths for the chunk files
    full_paths = df.apply(lambda row: row['file_name'] + row['chunk_name'], axis=1)

    # Get the labels from the 'affect' column
    labels = df['affect'].values

    # Encode the labels into integers
    le = LabelEncoder()
    y = le.fit_transform(labels)

    # Convert the integer labels into one-hot vectors
    y = to_categorical(y)

    # Initialize a dictionary to hold the MFCC features
    mfccs_dict = {}

    # Traverse through the full paths and extract MFCC features
    for i, file_path in enumerate(full_paths):
        # Extract MFCC features
        mfccs = extract_mfcc(file_path, n_fft=2048, hop_length=512, max_sequence_length=500)

        # Add the MFCC features to the dictionary
        mfccs_dict[file_path] = (mfccs, y[i])

    return mfccs_dict

# Process the training and test data
mfccs_train = process_data('/content/drive/MyDrive/assigment/filtered_training_data.csv')
mfccs_test = process_data('/content/drive/MyDrive/assigment/test_data.csv')

# Save the MFCC features to .npy files
np.save('mfccs_train.npy', mfccs_train)
np.save('mfccs_test.npy', mfccs_test)


#Training the CNN model

Next, we proceed to train our Convolutional Neural Network (CNN) model. We start by loading the previously saved MFCC features for the training and test data, in which the test data was extracted from the perceived emotion. The MFCC features and labels are then extracted and the training data is split into a training set and a validation set.

We define our CRNN model using the Sequential API from Keras, with multiple layers including Conv2D, MaxPooling2D, Dropout, Flatten, Dense, and BatchNormalization. The model is compiled with a categorical crossentropy loss function and the Adam optimizer.

The model is then trained on the training data for a specified number of epochs, with the validation data used for validation in each epoch. The trained model will then also be saved to a .keras file for future use. Finally, the model's performance is evaluated on the test data, with the accuracy and F1 score calculated and printed.

To test with different hyperparameters, we run a for loop with 2x2 combinations of different batch size and epochs, resulted in 4 sets of different hyperparmeters.

In [3]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from keras.utils import to_categorical
from keras.optimizers import Adam
from keras.regularizers import l2
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
from keras.callbacks import EarlyStopping
import numpy as np

# Load the MFCC features
mfccs_train = np.load('mfccs_train.npy', allow_pickle=True).item()
mfccs_test = np.load('mfccs_test.npy', allow_pickle=True).item()

# Get the MFCC features and labels
X_train = np.array([mfccs for mfccs, label in mfccs_train.values()])
y_train = np.array([label for mfccs, label in mfccs_train.values()])

X_test = np.array([mfccs for mfccs, label in mfccs_test.values()])
y_test = np.array([label for mfccs, label in mfccs_test.values()])

# Split the training data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=30)

# Define the batch sizes and epochs
batch_sizes = [32, 64]
epochs_list = [30, 50]

# Testing different batch size and number of epochs
for batch_size in batch_sizes:
    for epochs in epochs_list:
        print(f"\nTraining model with batch size {batch_size} and {epochs} epochs")

        # Create a new instance of the model
        model = Sequential()
        model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(13, 500, 1)))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))
        model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.5))
        model.add(Flatten())
        model.add(Dense(128, activation='relu', kernel_regularizer=l2(0.01)))
        model.add(BatchNormalization())
        model.add(Dropout(0.5))
        model.add(Dense(256, activation='relu', kernel_regularizer=l2(0.01)))
        model.add(BatchNormalization())
        model.add(Dropout(0.25))
        model.add(Dense(y_train.shape[1], activation='softmax'))
        model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

        # Train the model with the current batch size and number of epochs
        model.fit(X_train, y_train, validation_data=(X_val, y_val), batch_size=batch_size, epochs=epochs)

        # Evaluate the model
        y_pred = model.predict(X_test)
        accuracy = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1))
        f1 = f1_score(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1), average='weighted')
        print(f"Accuracy with batch size {batch_size} and {epochs} epochs: {accuracy}")
        print(f"F1 Score with batch size {batch_size} and {epochs} epochs: {f1}\n")

        # If the batch size is 64 and the number of epochs is 50, save the model
        if batch_size == 64 and epochs == 50:
            model.save('cnn.keras')




Training model with batch size 32 and 30 epochs
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy with batch size 32 and 30 epochs: 0.27
F1 Score with batch size 32 and 30 epochs: 0.22764981648203025


Training model with batch size 32 and 50 epochs
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoc

From all results of different hyperparameters sets, the results indicate that the model’s performance improved over time. Here are some key observations


*   In the initial epochs, the model had a relatively low accuracy on both the training and validation sets. However, as the training progressed, the model’s accuracy improved. However, the validation accuracy fluctuated during these epochs, suggest that we are overfitting our CNN model. If we used a smaller batch size, the result tends to have a lower validation loss but noticable smaller accuracy.
*   After training, the model was evaluated on the test set. On the latest run with 50 epochs and batch size of 64, It achieved an accuracy of 57% and an F1 score of approximately 0.56, indicating that choosing bigger number yields better accuracy.










#Testing with train_test_split data
On our last section of the notebook, the trained model was evaluated on a test set that was directly split from the filtered_training_data.csv.

On our latest run, The model achieved an accuracy of approximately 0.716 and an F1 score of approximately 0.719 on the test set. These metrics indicate that the model was able to correctly predict the emotion labels of the test data with a high degree of accuracy. The F1 score, also suggests that the model has a balanced performance in terms of both false positives and false negatives. Since it was tested on unseen data that was not used during the training process,the results also suggest that the model has learned to generalize well from the training data and can make accurate predictions on new, unseen data.


In [9]:
from sklearn.metrics import confusion_matrix, classification_report
from keras.models  import load_model

# Load the MFCC features
mfccs_data = process_data('/content/drive/MyDrive/assigment/filtered_training_data.csv')

# Get the MFCC features and labels
X = np.array([mfccs for mfccs, label in mfccs_data.values()])
y = np.array([label for mfccs, label in mfccs_data.values()])

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Load the trained model
model_splittest = load_model('cnn.keras')

# Reshape the test data
X_test = X_test.reshape(X_test.shape[0], 13, 500, 1)

# Evaluate the model with the test set
y_pred = model_splittest.predict(X_test)
accuracy = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1))
f1 = f1_score(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1), average='weighted')
print("Accuracy:", accuracy)
print("F1 Score:", f1)

# Convert predicted probabilities to labels
y_pred_labels = np.argmax(y_pred, axis=1)

# Display confusion matrix
conf_matrix = confusion_matrix(np.argmax(y_test, axis=1), y_pred_labels)
print('Confusion Matrix:')
print(conf_matrix)

# Print classification report
print('Classification Report:')
print(classification_report(np.argmax(y_test, axis=1), y_pred_labels))


Accuracy: 0.7157360406091371
F1 Score: 0.718704706867617
Confusion Matrix:
[[71  5  0  3  1  1  3]
 [13 49  1  2  2  8  4]
 [ 9  4 65  4  1  5  4]
 [10  1  0 71  0  5  1]
 [17  9  3  3 40  3  2]
 [11  0  4  5  1 56  1]
 [11  0  4  5  1  1 71]]
Classification Report:
              precision    recall  f1-score   support

           0       0.50      0.85      0.63        84
           1       0.72      0.62      0.67        79
           2       0.84      0.71      0.77        92
           3       0.76      0.81      0.78        88
           4       0.87      0.52      0.65        77
           5       0.71      0.72      0.71        78
           6       0.83      0.76      0.79        93

    accuracy                           0.72       591
   macro avg       0.75      0.71      0.72       591
weighted avg       0.75      0.72      0.72       591



##Testing with the label only

Following the mfcc test result, we also implemented a test for our model directly to the perceived emotion column label from our own csv dataset. Here are the keys factors drawn from the result

*   Accuracy and F1 Score: The model achieved an accuracy of 0.32 on the test data. This means that the model correctly predicted the perceived emotion for 32% of the test samples. The weighted average F1 score is 0.31. Since The F1 score is a measure of a test’s accuracy that considers both the precision and the recall, hence we can see that it performs below average on the labels itself.
*   Confusion Matrix: Showing how correct the model can predict in details. For example, we see that the model correctly predicted the emotion for the first emotion 12 times, but also misclassified it as class 1 four times, class 2 three times, and so on.
*   Classification Report:For class 0, the model has a precision of 0.43, recall of 0.44, and F1-score of 0.44. This means that when the model predicts class 0, it is correct 43% of the time. Additionally, the model correctly identifies 44% of all actual instances of class 0.

From these results, we can conclude that the model performs significantly better on the MFCC test compared to the label test. This suggests that the MFCC features provide important information for emotion prediction that is not captured by the labels alone. Therefore, using MFCC features for training our model could lead to better performance

In [25]:
# Load the model
model = load_model('cnn.keras')

# Load the test data
df = pd.read_csv("/content/drive/MyDrive/assigment/test_data.csv")

# Assuming `labels` is your list of labels
labels = df['perceived emotion'].values  # replace this with your actual labels

label_encoder = LabelEncoder()
label_encoder.fit(labels)

# Prepare the test features
X_test = np.array([mfccs for mfccs, label in mfccs_test.values()])

# Convert predicted probabilities to labels to test
y_pred_prob = model.predict(X_test)
y_pred = np.argmax(y_pred_prob, axis=1)

# Evaluate against 'perceived emotions'
perceived_emotions = df['perceived emotion']
perceived_emotions = label_encoder.transform(perceived_emotions)

# Get f1 score and accuracy
f1 = f1_score(perceived_emotions, y_pred, average='weighted')
accuracy = accuracy_score(perceived_emotions, y_pred)
print(f'F1 Score for Perceived Emotions on CNN: {f1}')
print(f'Accuracy for Perceived Emotions on CNN: {accuracy}')

# Display confusion matrix
conf_matrix = confusion_matrix(perceived_emotions, y_pred)
print('Confusion Matrix for Perceived Emotions on CNN:')
print(conf_matrix)

# Show classification report
print('Classification Report for Perceived Emotions on CNN:')
print(classification_report(perceived_emotions, y_pred))

F1 Score for Perceived Emotions on CNN: 0.31068235694124646
Accuracy for Perceived Emotions on CNN: 0.32
Confusion Matrix for Perceived Emotions on CNN:
[[12  4  3  6  0  2  0]
 [ 3  2  1  0  0  1  2]
 [ 3  0  5  0  0  0  1]
 [ 0  0  2  2  0  1  3]
 [ 3  1  4  2  2  5  2]
 [ 4  0  1  0  1  4  0]
 [ 3  1  2  5  1  1  5]]
Classification Report for Perceived Emotions on CNN:
              precision    recall  f1-score   support

           0       0.43      0.44      0.44        27
           1       0.25      0.22      0.24         9
           2       0.28      0.56      0.37         9
           3       0.13      0.25      0.17         8
           4       0.50      0.11      0.17        19
           5       0.29      0.40      0.33        10
           6       0.38      0.28      0.32        18

    accuracy                           0.32       100
   macro avg       0.32      0.32      0.29       100
weighted avg       0.37      0.32      0.31       100

