# Model Training

After the data has been pre-processed and converted into train test sets, the train set can be used for training the model. In this step the algorithm maps the features or the independent variables to the output dependent variable. I will train multiple algorithms and compare results

## Artificial Neural Netwroks (ANN)

ANN is simply called neural networks. It is a collection of connected nodes. It performs better than traditional machine learning algorithms in most cases. First, I will use it to train only with MFCC features, and then I will train it with the combined features.

### Train with MFCC features

In [38]:
X_train.shape[1:]

(20,)

Here, we set the input shape to be entered at the first layer of the neural network. And the number of outputs is set to the number of classes the model has to predict.

In [39]:
input_shape = X_train.shape[1:]
n_outputs = 10

The model is initialised with the sequential class from Keras. The first layer consists of 500 hundred units of neurons with relu activation function and input shape. The number of units in a layer is mostly based on trial and error. There is no fixed methodology. It is only with experience one can understand how these numbers relates. But usually for dense layers, the number of units should start with a large number and then should decrease with further layers. This network consists of 6 dense layers. The second, third, fourth and fifth layers have 400, 300, 200, and 100 units of neurons respectively. All these layers have relu activation function. In previous experimentations Relu activation functions have performed better and gives good results. A dropout of 30% percent is added before the last output layer to prevent overfitting of the model. The last output layer consists of the 10 units as the number of classes and a softmax activation function. Softmax activation function is used for multiclass classification problems.

In [40]:
model = Sequential()
model.add(Dense(500, activation='relu', input_shape=input_shape))
model.add(Dense(400, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(100, activation='relu'))
# drop 30% neurons
model.add(Dropout(0.3))
# output layer
model.add(Dense(n_outputs, activation='softmax'))
# check model summary
model.summary()

2022-04-18 23:22:23.996973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-04-18 23:22:24.040647: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-18 23:22:24.041266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.755GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-04-18 23:22:24.041430: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-04-18 23:22:24.042637: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-04-18 23:22:24.043742: I tensorflow/stream_executor/pl

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 500)               10500     
_________________________________________________________________
dense_1 (Dense)              (None, 400)               200400    
_________________________________________________________________
dense_2 (Dense)              (None, 300)               120300    
_________________________________________________________________
dense_3 (Dense)              (None, 200)               60200     
_________________________________________________________________
dense_4 (Dense)              (None, 100)               20100     
_________________________________________________________________
dropout (Dropout)            (None, 100)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1

The model is set and compiled with a loss function, optimiser, and evaluation metric.
* Categorical crossentropy: It is a loss function used with multiclass classification problems.
* Adam: It is an optimised algorithm that contains the best properties of AdaGrad and RMSProp algorithms. It performs better on complex data such as images and audio.
* Accuracy: It is a standard evaluation metric which is used with almost every model for evaluation methods.

In [41]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The number of epochs depends on the complexity of data and model architecture. It is high for complex problems such as object detection and low for simple problems. 100 is a good number for a mid level problem. Also, I am using Early stopping function from Keras which will automatically stop the training if the validation loss is not dropping for a set number of steps. In early stopping function we can set the monitor method and patience level. I will set it to monitor validation loss and set patience level to 25, means, if there is no improvement in 25 epochs, the training will stop. Early stopping function helps to prevent overfitting of the model. The batch size is set to 32 based on the system resources and the quantity of data. 

In [42]:
epochs = 100
batch_size = 32
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=25)

All the parameters will be loaded and the model will start training with the fit method. The training history will be saved so that it can be used for evaluation later.

In [43]:
history = model.fit(
    X_train, 
    y_train, 
    epochs=epochs, 
    batch_size=batch_size, 
    validation_data=(X_val, y_val), 
    callbacks=[early_stop]
)

Epoch 1/100


2022-04-18 23:27:43.467081: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 00055: early stopping


The model is giving a good accuracy score of 90%. In the next section I will train with the combined features and see the results.

In [44]:
score = model.evaluate(X_test, y_test)
print(score)

[0.5028848648071289, 0.9073226451873779]


### Train with combined features of Melspectrogram and MFCC

The train test and validation sets will be created again on the concatenated feature array which was created in the preprocessing step. Their shapes will be printed to check their dimensions. The input shape to be entered at the first layer of neural network will be updated with the new input dimension. The model architecture will be kept same as I want to compare if the combined features are improving the accuracy or not. 

In [45]:
# create train test validation sets on the concatenated features array
X_train, X_temp, y_train, y_temp = train_test_split(concat_arr, label_arr, test_size=0.2, random_state=1)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.2, random_state=2)
print(X_train.shape, y_train.shape)
print(X_val.shape, y_val.shape)
print(X_test.shape, y_test.shape)

# update input shape
input_shape = X_train.shape[1:]

# build model
model = Sequential()
model.add(Dense(500, activation='relu', input_shape=input_shape))
model.add(Dense(400, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(100, activation='relu'))
# drop 30% neurons
model.add(Dropout(0.3))
# output layer
model.add(Dense(n_outputs, activation='softmax'))
# check model summary
model.summary()

(6985, 148) (6985, 10)
(1397, 148) (1397, 10)
(350, 148) (350, 10)
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 500)               74500     
_________________________________________________________________
dense_7 (Dense)              (None, 400)               200400    
_________________________________________________________________
dense_8 (Dense)              (None, 300)               120300    
_________________________________________________________________
dense_9 (Dense)              (None, 200)               60200     
_________________________________________________________________
dense_10 (Dense)             (None, 100)               20100     
_________________________________________________________________
dropout_1 (Dropout)          (None, 100)               0         
_____________________________________________________

The model parameters are all same as the previous experimentaion.

In [46]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [47]:
history_2 = model.fit(
    X_train, 
    y_train, 
    epochs=epochs, 
    batch_size=batch_size, 
    validation_data=(X_val, y_val), 
    callbacks=[early_stop]
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 00040: early stopping


In [48]:
score_2 = model.evaluate(X_test, y_test)
print(score_2)

[0.37172171473503113, 0.9342857003211975]


The combined features of Melspectrogram and MFCC gives a very good result and improves the accuracy score to 93%. I will save this model to be used later for inference. 

In [49]:
model.save('ann_detector.h5')

## Random Forest

Now, I will train a Random Forest algorithm and see how it performs on this audio data. Random Forest classifier will be imported from sklearn.

In [50]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

The train and test sets will be created for Random Forest model.

In [None]:
# create train and test sets
X_train, X_test, y_train, y_test = train_test_split(mfcc_arr, label_arr, test_size=0.2, random_state=456)

THe parameters for Random Forest are duscussed below.
* n_estimators: It is the number of trees to be used in the forest. 100 is a default value, but i will keep it to 500 hundred as the data is high dimensional. And increasing this value further will overfit the model. So, 500 is a good value. 
* max_depth: The defualt is none, but I will keep it to 10. Increasing this value for a high dimensional data will allow model to learn more complex features but will increase training time a lot. 
* random_state: Seed used for Random Number Generator.

In [51]:
# Random Forest parameters
rf_params = {
    'n_estimators': 500, # default value is 100
    'max_depth': 10, # will keep it to 10
    'random_state': 123 # Seed used for Random Number Generator
}
rf_model = RandomForestClassifier(**rf_params)

The fit method will start the training of model.

In [52]:
rf_model.fit(X_train, y_train)

RandomForestClassifier(max_depth=10, n_estimators=500, random_state=123)

Let's use the predict method of the model on the test set.

In [53]:
# using the predict method on test set
rf_pred = rf_model.predict(X_test)

In [54]:
print('Accuracy score: ', accuracy_score(y_test, rf_pred))

Accuracy score:  0.48


The accuracy is poor. Let's train it with the combined features.

### Train with combined features of Melspectrogram and MFCC

The train and test sets are created with the concatenated array. This time I make some changes in the parameters because the data is complex and the complexity of the model has to be increased in order to fit the data properly. 
* n_estimators is increased to 1000 to increase more trees so that the model can learn more features.
* max_depth is increased to 20 

In [55]:
# create train and test sets
X_train, X_test, y_train, y_test = train_test_split(concat_arr, label_arr, test_size=0.2, random_state=234)
# print shapes of train and test sets
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
# create model and set parameters
rf_params_2 = {
    'n_estimators': 1000,
    'max_depth': 20,
    'random_state': 123 # Seed used for Random Number Generator
}
rf_model_2 = RandomForestClassifier(**rf_params_2)
# train model
rf_model_2.fit(X_train, y_train)

(6985, 148) (6985, 10)
(1747, 148) (1747, 10)


RandomForestClassifier(max_depth=20, n_estimators=1000, random_state=123)

In [56]:
# using the predict method on test set
rf_pred_2 = rf_model_2.predict(X_test)
print('Accuracy score: ', accuracy_score(y_test, rf_pred_2))

Accuracy score:  0.6628506010303378


The accuracy has improved a lot with the combined features. But it is still lower than the ANN accuracy. The model will be saved using the joblib library dump function.

In [57]:
import joblib

joblib.dump(rf_model_2, 'rf_detector.joblib')

['rf_detector.joblib']