<h1>Importing Libraries</h1>

At first, let's import all the necessary Python libraries.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import librosa as rosa
import os
from sklearn.utils import resample
from sklearn.model_selection import train_test_split
import tensorflow as tf
import tensorflow.keras as keras
import statistics
from sklearn.utils import resample
from tensorflow.keras.callbacks import LearningRateScheduler

<h1>Setting the Random Seeds</h1>

When training a model, there are certain random operations that we encounter. For example, the way data is shuffled randomly when doing the training and test split, or the way weights are randomly initialized at the input of a neural network. These random operations produce different results every time we run our code. This makes it hard for us to see how much the model performance changes with changing hyperparameter values. Therefore, in order to tune the hyperparameters of a model so that we get the optimal parameter values, we set these random seeds which tells the machine to pick random values in a specific pattern. Once we finish tuning the hyperparameters, we can remove the seeds to see how the model performs for different shuffles and weights.

In [3]:
# Set the random seeds for replicating results over multiple runs.
np.random.seed(0)
tf.random.set_seed(0)

<h1>Importing Features from CSV</h1>

As we have already seen in the feature extraction code, it can take a while to extract all the audio features from the audio files. Because we saved those features in a CSV file, we can directly import them in this notebook without having to go over the time-consuming process again!

In [4]:
# Import dataframe/dataset into an instance/object 'df' using Pandas. Use first row as column header and first column as row header!
df = pd.read_csv("C:/Users/rezwa/Documents/RAVDESS_Librosa_RNN.csv", header=0, index_col=0)

# We used 36 features. Column of dataframe represents the features (36*median_num_frames), and -1 to avoid considering column indexes.
median_num_frames = (df.shape[1]-1)//36

# Rename target labels.
df['Emotion'].replace({"Neutral" : 1.0, "Happy" : 2.0, "Sad" : 3.0, "Angry" : 4.0, "Fearful" : 5.0, "Disgust" : 6.0, "Surprised" : 7.0}, inplace=True)

<h1>Balancing the Dataset</h1>

One common problem in machine learning is dataset imbalance. A dataset is said to be unbalanced if all the classes do not have the same number of data samples. For our case, the RAVDESS dataset is slightly unbalanced as the neutral class has fewer number of data samples compared to the other classes. To balance our dataset so that the machine does not get biased towards the majority classes, we will *resample* examples in the minority class (i.e. neutral) until its number matches the majority class count.

An important thing to note here is that we want to resample the neutral class AFTER performing the training, validation, and test splits on the neutral class. If we do it before performing the splits, the data samples resampled for the training set might end up in the validation or test set. The model will then perform really well in the testing phase and struggle in the real world!

In [5]:
# Take data samples of each class from dataframe into separate dataframes.
df_happy = df.loc[df.Emotion==2.0]
df_sad = df[df.Emotion==3.0]
df_angry = df[df.Emotion==4.0]
df_fearful = df[df.Emotion==5.0]
df_disgust = df[df.Emotion==6.0]
df_neutral = df[df.Emotion==1.0]
df_surprised = df[df.Emotion==7.0]

# Join only the majority classes, leaving out Neutral.
df_maj = pd.concat([df_happy, df_sad, df_angry, df_fearful, df_disgust, df_surprised])

# Extract labels of majority classes.
y_maj = df_maj.iloc[0:1152, 36*median_num_frames].values
# Extract features of majority classes.
X_maj = df_maj.iloc[0:1152, list(range(36*median_num_frames))].values

# Split and stratify majority class samples for training and testing.
X_train_temp_maj, X_test_maj, y_train_temp_maj, y_test_maj = train_test_split(X_maj, y_maj, test_size=115, random_state=0, stratify=y_maj) # training split = 90%, test split = 10%

# Further split and stratify majority class training samples for training data for training and validating.
X_train_maj, X_val_maj, y_train_maj, y_val_maj = train_test_split(X_train_temp_maj, y_train_temp_maj, test_size=115, random_state=0, stratify=y_train_temp_maj) # training split = 80%, validation split = 10%

# Take minority data samples from dataframe to array.
neutral_array = df_neutral.to_numpy()

# Shuffle the data samples of minority class.
np.random.shuffle(neutral_array)

# Split minority class Neutral in 80:10:10 ratio.
train_neutral = neutral_array[0:76, :]
val_neutral = neutral_array[76:86, :]
test_neutral = neutral_array[86:96, :]

# Resample Neutral data to match majority class samples.
train_neutral_resampled = resample(train_neutral, n_samples=154, replace=True, random_state=0)
val_neutral_resampled = resample(val_neutral, n_samples=19, replace=True, random_state=0)
test_neutral_resampled = resample(test_neutral, n_samples=19, replace=True, random_state=0)

# Separate features and target labels for Neutral data.
X_train_neutral = train_neutral_resampled[:, 0:36*median_num_frames]
X_val_neutral = val_neutral_resampled[:, 0:36*median_num_frames]
X_test_neutral = test_neutral_resampled[:, 0:36*median_num_frames]
y_train_neutral = train_neutral_resampled[:, 36*median_num_frames]
y_val_neutral = val_neutral_resampled[:, 36*median_num_frames]
y_test_neutral = test_neutral_resampled[:, 36*median_num_frames]

# Join upsampled minority data samples with majority data samples.
X_train = np.concatenate((X_train_maj, X_train_neutral), axis=0)
X_val = np.concatenate((X_val_maj, X_val_neutral), axis=0)
X_test = np.concatenate((X_test_maj, X_test_neutral), axis=0)
y_train = np.concatenate((y_train_maj, y_train_neutral), axis=0)
y_val = np.concatenate((y_val_maj, y_val_neutral), axis=0)
y_test = np.concatenate((y_test_maj, y_test_neutral), axis=0)

<h1>Scaling the Features</h1>

In this project, we have used four different types of features. Each of these features vary within different range of values. If we feed our features to the model in the format that we extracted them, the model will prioritize larger feature values over smaller ones. In order to avoid this numerical bias, we need to bring all the feature values into a common scale. One way to do this is standardization, as shown below.

In [6]:
# Calculate the mean and standard deviation of the features.
mean_X = np.mean(X_train, axis=0)
std_X = np.std(X_train, axis=0)

# Standardize the inputs.
X_train_centered = (X_train - mean_X)/std_X
X_val_centered = (X_val - mean_X)/std_X
X_test_centered = (X_test - mean_X)/std_X

# Delete old variables to save space.
del X_train, X_val, X_test, X_train_temp_maj, y_train_temp_maj

print(X_train_centered.shape, y_train.shape)
print(X_val_centered.shape, y_val.shape)
print(X_test_centered.shape, y_test.shape)

<h1>One-hot Encoding the Labels</h1>

The problem with using integer encoding for our output classes is that the machine might think there is a natural ordering, or hierarchy, in the data. This can be misleading as the algorithm will adjust the weights accordingly. For avoiding this problem, we perform one-hot encoding. This creates a feature vector for the labels such that if the output is a happy label, there will be a '1' under the 'Happy' column and zeroes under all other labels for that audio file.

In [7]:
# One-Hot Encode the classes.
y_train_onehot = keras.utils.to_categorical(y_train)
y_val_onehot = keras.utils.to_categorical(y_val)
y_test_onehot = keras.utils.to_categorical(y_test)

<h1>Reshaping Input Arrays</h1>

In Keras, RNNs require 3D arrays (tensors) for input. The three dimensions are batch (i.e. number of data samples), timesteps, and features per timestep. We want the RNN to learn the changes of feature values with each timestep (i.e. audio frame).

In [8]:
# Reshaping X_train and X_test to 3D Numpy arrays for feeding into the RNN. RNNs require 3D array input.
X_train_3D = np.reshape(X_train_centered, (X_train_centered.shape[0], median_num_frames, 36))
X_val_3D = np.reshape(X_val_centered, (X_val_centered.shape[0], median_num_frames, 36))
X_test_3D = np.reshape(X_test_centered, (X_test_centered.shape[0], median_num_frames, 36))

print(X_train_3D.shape, y_train.shape)
print(X_val_3D.shape, y_val.shape)
print(X_test_3D.shape, y_test.shape)

# Transpose tensors so that rows=features and columns=frames.
X_train_3D_posed = tf.transpose(X_train_3D, perm=[0, 2, 1])
X_val_3D_posed = tf.transpose(X_val_3D, perm=[0, 2, 1])
X_test_3D_posed = tf.transpose(X_test_3D, perm=[0, 2, 1])

print(X_train_3D_posed.shape, y_train.shape)
print(X_val_3D_posed.shape, y_val.shape)
print(X_test_3D_posed.shape, y_test.shape)

<h1>Defining RNN Architecture</h1>

We will be using an LSTM network rather than a vanilla RNN as it takes care of vanishing gradient and exploding gradient problems. We will use 36 LSTM cells for our 36 features at the input.

In [9]:
# Create an object/instance 'model' for the 'Sequential()' class.
model = keras.models.Sequential()

model.add(
    keras.layers.LSTM( units=36,
                input_shape=(36, median_num_frames),
                kernel_initializer='glorot_uniform',
                bias_initializer='zeros',
                activation='tanh',
                recurrent_activation='sigmoid',
                dropout=0.30,
                recurrent_dropout=0.30,
                return_sequences=True))

model.add(
    keras.layers.LSTM( units=12,
                input_shape=(36, median_num_frames),
                kernel_initializer='glorot_uniform',
                bias_initializer='zeros',
                activation='tanh',
                recurrent_activation='sigmoid',
                dropout=0.30))

model.add(
    keras.layers.Dense( units=y_train_onehot.shape[1],
                input_dim=36,
                kernel_initializer='glorot_uniform',
                bias_initializer='zeros',
                activation='softmax'))

<h1>Defining the Optimizer and Loss Function</h1>

We will use the Adam optimizer and the categorical crossentropy loss. We will also use a learning rate scheduler - a function that decreases the learning rate slowly as the training progresses. This will ensure that we do not overshoot the global minimum due to a high learning rate during the final stage of training.

In [10]:
# Define the learning rate schedule. This can then be passed as the learning rate for the optimizer.
lrate = keras.optimizers.schedules.InverseTimeDecay(initial_learning_rate=0.01, decay_steps=1000, decay_rate=0.8)

adam_optimizer = keras.optimizers.Adam(learning_rate=lrate)

model.compile(optimizer=adam_optimizer, loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()])

<h1>Fitting the Model</h1>

Now, we will fit the RNN to the training data and observe its performance on the validationd data. As the training progresses, we will print out the performance metrics for each epoch to observe the learning process. You can try improving the model performance by changing the hyperparameters and comparing the training and validation accuracies.

In [None]:
# Train the RNN.
history = model.fit(X_train_3D_posed, y_train_onehot, batch_size=16, epochs=50, verbose=1, validation_data=(X_val_3D_posed, y_val_onehot)) # 80% training / 10% validation

print(history.history)

<h1>Plotting the Accuracy Curves</h1>



In [None]:
# Plot the training and validation accuracies vs. epochs for the latest loop iteration.
fig = plt.figure()
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title('RNN_RAVDESS')
plt.grid()
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

<h1>Plotting the Loss Curves</h1>

In [None]:
# Plot the training and validation losses vs. epochs for the latest loop iteration.
fig = plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('RNN_RAVDESS')
plt.grid()
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

<h1>Evaluating the Model on Test Set</h1>

After being satisfied with the hyperparameter values, we can see how our model performs on the test set.

In [None]:
# Evaluate the model on the test data using `evaluate`.
results = model.evaluate(X_test_3D_posed, y_test_onehot, batch_size=16)
print("test loss, test acc:", results)

<h1>Saving the Model</h1>

The final step is to save the model as a separate file so that it can later be imported into the deployment code. Once you run the final cell, three new files should be generated in the directory where you have this notebook file.

In [112]:
# Save the model as an h5 file.
model.save('RNN_RAVDESS.h5')

# Save mean and standard deviation arrays of features to npy files for standardizing data in other files!
np.save('mean_X.npy', mean_X)
np.save('std_X.npy', std_X)