## Keyword Spotting with Spectogram

Train and Evaluate a model for keyword spotting on the ***Mini Dpeech Command*** dataset.

### 1.1 ***Data Ingestion and Pre-processing***

Compute resized spectogram features with following hyperparameters:
- Downsampling Rate -> `16000 H`
- STFT frame lenght -> `40ms`
- STFT frame overlap -> `50%`
- Resize to `32x32`


In [1]:
import tensorflow as tf

2022-12-11 09:51:07.588013: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-11 09:51:07.715154: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-11 09:51:07.719618: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-11 09:51:07.719631: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

Note: if frame lenght is `40m`, and we want a frame overlap of `50%`, then the frame step will be half of the frame lenght (`20ms`)

In [14]:
PREPROCESSING_ARGS = {
    'downsampling_rate' : 16000,
    'frame_lenght_in_s' : 0.04,
    'frame_step_in_s' : 0.02,
}

TRAINING_ARGS = {
    'batch_size' : 20,
    'initial_learining_rate' : 0.01,
    'end_learining_rate' : 1.e-5,
    'epochs' : 10,
}

We create `Train`, `Val` and `Test` datasets:

In [3]:
!ls /datasets/minispeechcommands

msc-test.zip  msc-train.zip  msc-val.zip


In [4]:
#!unzip -q /datasets/minispeechcommands/msc-test.zip
#!unzip -q /datasets/minispeechcommands/msc-train.zip
#!unzip -q /datasets/minispeechcommands/msc-val.zip

In [5]:
train_ds = tf.data.Dataset.list_files('msc-train/*')
val_ds = tf.data.Dataset.list_files('msc-val/*')
test_ds = tf.data.Dataset.list_files('msc-test/*')

from preprocessing import LABELS
from preprocessing import get_spectrogram
from functools import partial

def get_spectrogram_and_labels(filename, downsampling_rate, frame_lenght_in_s, frame_step_in_s):
    
    spectrogram, sampling_rate, label = get_spectrogram(filename, downsampling_rate, frame_lenght_in_s, frame_step_in_s)

    return spectrogram, label

# partial function freezes some arguments, while others can be passed as insput
get_frozen_spectrogram = partial(get_spectrogram_and_labels, **PREPROCESSING_ARGS)

for spectrogram, label in train_ds.map(get_frozen_spectrogram).take(1):
    SHAPE = spectrogram.shape

2022-12-11 09:51:13.606336: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-12-11 09:51:13.606371: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-12-11 09:51:13.606386: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (p-1d083ad3-985d-4856-9229-610932999833): /proc/driver/nvidia/version does not exist
2022-12-11 09:51:13.606716: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-11 09:51:14.251498: W tensorflow_io/core/kernels/audio_v

Pre-processing data retrieved:

In [6]:
print(SHAPE)

(49, 321)


In [7]:
def preprocess(filename):
    signal, label = get_frozen_spectrogram(filename)

    print(type(signal))
    signal.set_shape(SHAPE)
    signal = tf.expand_dims(signal, -1)

    # we resize the signal to the 32x23 shape
    signal = tf.image.resize(signal, [32,32])

    label_id = tf.argmax(label == LABELS)

    return signal, label_id

batch_size = TRAINING_ARGS['batch_size']
epochs = TRAINING_ARGS['epochs']

train_ds = train_ds.map(preprocess).batch(batch_size).cache()
val_ds = val_ds.map(preprocess).batch(batch_size).cache()
test_ds = test_ds.map(preprocess).batch(batch_size).cache()


<class 'tensorflow.python.framework.ops.Tensor'>
2022-12-11 09:51:18.491001: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-12-11 09:51:18.492518: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-12-11 09:51:18.492729: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
<class 'tensorflow.python.framework.ops.Tensor'>
2022-12-11 09:51:18.711282: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-12-11 09:51:18.712761: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-12-11 09:51:18.712942: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function libr

In [9]:
for example_batch, example_labels in train_ds.take(1):
    print('Batch Shape:', example_batch.shape)
    print('Data Shape:', example_batch.shape[1:])
    print('Labels:', example_labels)

Batch Shape: (20, 32, 32, 1)
Data Shape: (32, 32, 1)
Labels: tf.Tensor([7 1 6 2 2 3 2 3 3 0 4 7 4 1 0 1 2 2 7 5], shape=(20,), dtype=int64)
2022-12-11 09:53:36.149547: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


### 1.2 ***Model Creation***

We develope a Convolutional Neural Network (CNN) using `Sequential` function, which groups a linear stack of layers into a `tf.keras.Model`.

In [10]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=example_batch.shape[1:]),
    tf.keras.layers.Conv2D(filters=128, kernel_size=[3,3], strides=[2,2], use_bias=False, padding='valid'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.ReLU(),
    tf.keras.layers.Conv2D(filters=128, kernel_size=[3,3], strides=[1,1], use_bias=False, padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.ReLU(),
    tf.keras.layers.Conv2D(filters=128, kernel_size=[3,3], strides=[1,1], use_bias=False, padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.ReLU(),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(units=len(LABELS)),
    tf.keras.layers.Softmax()
])

In [11]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 15, 15, 128)       1152      
                                                                 
 batch_normalization (BatchN  (None, 15, 15, 128)      512       
 ormalization)                                                   
                                                                 
 re_lu (ReLU)                (None, 15, 15, 128)       0         
                                                                 
 conv2d_1 (Conv2D)           (None, 15, 15, 128)       147456    
                                                                 
 batch_normalization_1 (Batc  (None, 15, 15, 128)      512       
 hNormalization)                                                 
                                                                 
 re_lu_1 (ReLU)              (None, 15, 15, 128)       0

### 1.3 ***Model Training***

We now train the model created above, using the following setup:
- `SparseCategoricalCrossEntropy` with `from_logits=False` as loss function
- `Adam` as Optimizer, setting a linear decay schedule for the learning rate
- `SparseCategoricalAccuracy` to evaluate the prediciton quality 

In [15]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=False)
initial_learning_rate = TRAINING_ARGS['initial_learining_rate']
end_learining_rate = TRAINING_ARGS['end_learining_rate']

# note: by passing decay_steps as done below we are telling keras 
# to monotonically decrease learing rate over all the training time

linear_decay = tf.keras.optimizers.schedules.PolynomialDecay(
    initial_learning_rate=initial_learning_rate,
    end_learning_rate=end_learining_rate,
    decay_steps=len(train_ds) * epochs
)

optimizer = tf.optimizers.Adam(learning_rate=linear_decay)
metrics = [tf.metrics.SparseCategoricalAccuracy()]
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

history = model.fit(train_ds, epochs=epochs, validation_data=val_ds)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [16]:
history.history

{'loss': [0.3424413204193115,
  0.28197720646858215,
  0.22325679659843445,
  0.19155332446098328,
  0.16407480835914612,
  0.1558287888765335,
  0.12745708227157593,
  0.10640266537666321,
  0.09218808263540268,
  0.08317052572965622],
 'sparse_categorical_accuracy': [0.885937511920929,
  0.9051562547683716,
  0.9279687404632568,
  0.9379687309265137,
  0.9514062404632568,
  0.9528124928474426,
  0.964062511920929,
  0.9710937738418579,
  0.9759374856948853,
  0.9785937666893005],
 'val_loss': [2.966348648071289,
  1.7325356006622314,
  1.4478559494018555,
  0.786061704158783,
  1.367796778678894,
  0.837324321269989,
  0.650453507900238,
  0.5616068840026855,
  0.46985331177711487,
  0.4382949471473694],
 'val_sparse_categorical_accuracy': [0.5174999833106995,
  0.5887500047683716,
  0.6474999785423279,
  0.768750011920929,
  0.6650000214576721,
  0.78125,
  0.8224999904632568,
  0.84375,
  0.8700000047683716,
  0.8700000047683716]}

### 1.4 ***Model Testing***

We now test our model over the `test_ds`.

In [17]:
test_loss, test_accuracy = model.evaluate(test_ds)



In [18]:
training_loss = history.history['loss'][-1]
training_accuracy = history.history['sparse_categorical_accuracy'][-1]
val_loss = history.history['val_loss'][-1]
val_accuracy = history.history['val_sparse_categorical_accuracy'][-1]

print(f'Training Loss: {training_loss:.4f}')
print(f'Training Accuracy: {training_accuracy*100.:.2f}%')
print()
print(f'Validation Loss: {val_loss:.4f}')
print(f'Validation Accuracy: {val_accuracy*100.:.2f}%')
print()
print(f'Test Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy*100.:.2f}%')

Training Loss: 0.0832
Training Accuracy: 97.86%

Validation Loss: 0.4383
Validation Accuracy: 87.00%

Test Loss: 0.4258
Test Accuracy: 87.63%


### 1.5 ***Model Saving***

We eventually save our model:

In [20]:
from time import time
import os

timestamp = int(time())

saved_model_dir = f'./saved_models/{timestamp}'
if not os.path.exists(saved_model_dir):
    os.makedirs(saved_model_dir)
model.save(saved_model_dir)

INFO:tensorflow:Assets written to: ./saved_models/1670756047/assets
INFO:tensorflow:Assets written to: ./saved_models/1670756047/assets


And the Hyper-Parameters, with their results:

In [21]:
import pandas as pd

output_dict = {
    'timestamp': timestamp,
    **PREPROCESSING_ARGS,
    **TRAINING_ARGS,
    'test_accuracy': test_accuracy
}

df = pd.DataFrame([output_dict])

output_path='./spectrogram_results.csv'
df.to_csv(output_path, mode='a', header=not os.path.exists(output_path), index=False)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=1d083ad3-985d-4856-9229-610932999833' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>