# Neural Network for Keyword Spotting on Microncontrollers

This notebook provides the code for training a Neural Network to be able to recognize spoken workds. This task is commonly referred as Keyword Spotting (KWS).

The goal of this notebook is to build a small enough model to be executed on microcontrollers, where computational power, energy consumption and memory availability are constraints to be taken into account.

## A note on datasets

In order to train the network in this notebook, you need to have a dataset ready to be processed. This notebook requires an audio dataset made of 1-second long audio samples converted into MFCC Spectrograms in the shape of (49,40,1), meaning that each spectrogram must be an image of size 49x40 with only 1 channel (black/white), and saved in a .npz file.

A notebook to convert audio data into a dataset ready to be processed by this notebook is provided.

## Libraries Import

First of all, let's import the needed libraries.

In [None]:
%pip install tensorflow==2.8.2

#Tensorflow import>
import tensorflow as tf
#Numpy import
import numpy as np
#Matplotlib import
import matplotlib as mpl
import matplotlib.pyplot as plt
#Math import
import math

import os
import random
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.metrics import confusion_matrix

import shutil
tfk = tf.keras
tfkl = tf.keras.layers
print(tf.__version__)


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
2.8.2


Before continuing, let's set the seed to the random numbers generator. This will allow us to have reproducible results between different executions of this notebook.

In [None]:
# Random seed for reproducibility

seed = 22 #Choose a fixed seed to have reproducible results (22=Gonzales o Chiesa)

random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
tf.compat.v1.set_random_seed(seed)

## Dataset Import and Loading

If the datast that you want to use is located in your Google Drive, execute the following cell to get access to the drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Unpack the dataset:

In [None]:
#If the dataset is in your Google Drive:
shutil.unpack_archive("/content/drive/MyDrive/sheila_normalized_dataset.zip", "dataset")
#If the dataset has to be uploaded:
#shutil.unpack_archive("/content/sheila_normalized_dataset.zip", "dataset")

Read the .json file associated to the dataset:

In [None]:
import json

# Opening JSON file
with open("dataset/content/dataset_info.json", 'r') as openfile:

    # Reading from json file
    dataset_info = json.load(openfile)

print(dataset_info)

wanted_words = dataset_info['classes']
n_train_samples = dataset_info['train_samples_num']
n_testing_samples = dataset_info['testing_samples_num']
n_validation_samples = dataset_info['validation_samples_num']

{'classes': ['silence', 'unknown', 'sheila'], 'train_samples_num': 8109, 'testing_samples_num': 1015, 'validation_samples_num': 1013, 'representative_samples_num': 103, 'data_shape': [49, 40, 1]}


The dataset contains training, testing and validation sets. It also provides a representative dataset if a quantization of the model needs to be performed.

Load each set into X (inputs) and y (outputs) arrays.

In [None]:
# Loading .npz files
train_dir = "/content/dataset/content/train.npz"
training_npz = np.load(train_dir)
x_train, y_train = training_npz['arr_0'], training_npz['arr_1']

val_dir = "/content/dataset/content/validation.npz"
validation_npz = np.load(val_dir)
x_val, y_val = validation_npz['arr_0'], validation_npz['arr_1']

testing_dir = "/content/dataset/content/testing.npz"
testing_npz = np.load(testing_dir)
x_test, y_test = testing_npz['arr_0'], testing_npz['arr_1']

representative_dir = "/content/dataset/content/representative.npz"
representative_npz = np.load(representative_dir)
x_rep, y_rep = representative_npz['arr_0'], representative_npz['arr_1']

## Neural Network Design

The next section will allow you to design a Neural Network. There is no golden rule, so feel free to experiment with different architectures.

Since MFCC spectrograms can be considered images, we will perform an image classification task, trying to associate each spectrogram with the word it represents. Convolutional Neural Networks have shown very good results in accomplishing image classification tasks.

The first thing to do is define a Data Generator: it is a function that takes care of sending the data to the Neural Network during training and evaluation.

In [None]:
class DataGenerator(tfk.utils.Sequence):
    'Generates data for Keras'
    def __init__(self, data, labels, n_samples, batch_size, dim, n_channels,
                 n_classes, shuffle=True):
        'Initialization'
        self.dim = dim
        self.batch_size = batch_size
        self.data = data
        self.labels = labels
        self.n_samples = n_samples
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.shuffle = shuffle
        self.on_epoch_end()

    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.floor(self.n_samples / self.batch_size))

    def __getitem__(self, index):
        'Generate one batch of data'
        # Generate indexes of the batch
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        # Find list of IDs
        samples_list_temp = indexes
        # Generate data
        X, y = self.__data_generation(samples_list_temp)

        return X, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.indexes = np.arange(self.n_samples)
        if self.shuffle == True:
            np.random.shuffle(self.indexes)

    def __data_generation(self, samples_list_temp):
        'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
        # Initialization
        X = np.empty((self.batch_size, 49, 40, 1)) #*self.dim, self.n_channels))
        y = np.empty((self.batch_size), dtype=int)

        # Generate data
        for i, sample in enumerate(samples_list_temp):
            # Store sample
            mfcc = self.data[sample].reshape(49, 40, 1)
            X[i,] = mfcc
            # Store class
            y[i] = self.labels[sample]

        return X, tfk.utils.to_categorical(y, num_classes=self.n_classes)

Now we instantiate the generators for each set: training, testing and validation.

In this section we also specify the batch size to be used during training. The batch size is the number of training samples that the network processes before updating its weights.

In [None]:
batch_size = 8
n_classes = len(wanted_words)
spectrogram_size = (49,40,)
spectrogram_channels = 1

# Parameters
params = {'dim': spectrogram_size,
          'batch_size': batch_size,
          'n_classes': n_classes,
          'n_channels': spectrogram_channels,
          'shuffle': True}


# Generators
training_generator = DataGenerator(x_train, y_train, n_samples=n_train_samples, **params)
validation_generator = DataGenerator(x_val, y_val, n_samples=n_validation_samples, **params)
testing_generator = DataGenerator(x_test, y_test, n_samples=n_testing_samples, **params)

example_spectrogram = training_generator.__getitem__(0)[0]
print("Neural Network input shape: " + str(example_spectrogram.shape))

Neural Network input shape: (8, 49, 40, 1)


It is now time to build the Neural Network.

In [None]:
input_shape = (49, 40, 1) #(*spectrogram_size, spectrogram_channels) #do not modify

# Assign the name you want to your model
model_name = 'Sheila-NormDoubleConvModel'


from tensorflow.keras import datasets, layers, models

model = models.Sequential()
#model.add(layers.Input(shape=(1960)))
#model.add(layers.Reshape([49,40,1]))
model.add(layers.Conv2D(4, (4, 10), strides = (2, 2), activation='relu', input_shape = input_shape))
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(units=3,
                  activation='softmax',
                  kernel_initializer=tfk.initializers.GlorotUniform(seed),
                  use_bias = True,
                  name='Output'))

input_layer = tfkl.Input(shape=input_shape,
                          name='Input')

optimizer = tfk.optimizers.Adam(learning_rate=0.0001)

# Compile the model
model.compile(loss=tfk.losses.CategoricalCrossentropy(),
              optimizer=optimizer,
              metrics='accuracy')



Compile the network we just built and print a summary with the number of parameters, the layers and input/output shapes of each layer.

In [None]:
#model = build_model(input_shape)
#model.build((-1,49,40,1))
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_1 (Conv2D)           (None, 23, 16, 4)         164       
                                                                 
 global_average_pooling2d_1   (None, 4)                0         
 (GlobalAveragePooling2D)                                        
                                                                 
 Output (Dense)              (None, 3)                 15        
                                                                 
Total params: 179
Trainable params: 179
Non-trainable params: 0
_________________________________________________________________


## Training the Neural Network

This section will train the neural network.

First of all, we define some callback functions to be executed at the end of each epoch. Remember that an epoch is a single pass through the entire dataset during training.

In [None]:
# Utility function to create folders and callbacks for training
from datetime import datetime

def create_folders_and_callbacks(model_name):

  exps_dir = "/content/callback_folder"
  if not os.path.exists(exps_dir):
      os.makedirs(exps_dir)

  now = datetime.now().strftime('%b%d_%H-%M-%S')

  exp_dir = os.path.join(exps_dir, model_name + '_' + str(now))
  if not os.path.exists(exp_dir):
      os.makedirs(exp_dir)

  callbacks = []

  # Model checkpoint
  # ----------------
  ckpt_dir = os.path.join(exp_dir, 'ckpts')
  if not os.path.exists(ckpt_dir):
      os.makedirs(ckpt_dir)

  ckpt_callback = tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(ckpt_dir, 'cp.ckpt'),
                                                     save_weights_only=False, # True to save only weights
                                                     save_best_only=False) # True to save only the best epoch
  callbacks.append(ckpt_callback)

  # Visualize Learning on Tensorboard
  # ---------------------------------
  tb_dir = os.path.join(exp_dir, 'tb_logs')
  if not os.path.exists(tb_dir):
      os.makedirs(tb_dir)

  # By default shows losses and metrics for both training and validation
  tb_callback = tf.keras.callbacks.TensorBoard(log_dir=tb_dir,
                                               profile_batch=0,
                                               histogram_freq=1)  # if > 0 (epochs) shows weights histograms
  callbacks.append(tb_callback)

  # Early Stopping
  # --------------
  es_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
  callbacks.append(es_callback)

  return callbacks

Define a number of epochs to train you network, and then start the training.

In [None]:
# How many epochs?
epochs = 5

# Callbacks creator
model_callbacks = create_folders_and_callbacks(model_name)
# Train the model
history = model.fit(
    x = training_generator,
    epochs = epochs,
    validation_data = validation_generator,
    callbacks = model_callbacks
).history

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Evaluate the trained model on the testing dataset:

In [None]:
inputs = tf.keras.Input(shape=(1960))
x = inputs
x = layers.Reshape([49,40,1])(x)

for layer in model.layers[:]:
  x = model.get_layer(layer.name)(x)

model2 = tf.keras.Model(inputs, x, name='model2')
model2.summary()

Model: "model2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1960)]            0         
                                                                 
 reshape (Reshape)           (None, 49, 40, 1)         0         
                                                                 
 conv2d_1 (Conv2D)           (None, 23, 16, 4)         164       
                                                                 
 global_average_pooling2d_1   (None, 4)                0         
 (GlobalAveragePooling2D)                                        
                                                                 
 Output (Dense)              (None, 3)                 15        
                                                                 
Total params: 179
Trainable params: 179
Non-trainable params: 0
______________________________________________________________

In [None]:
model_metrics = model.evaluate(testing_generator, return_dict=True)



## Saving and exporting the trained model

This last section takes care of saving and exporting the trained model in .h5 format, in order to process it through the Infineon ML Configurator Tool available in Modus Toolbox.

In [None]:
model.save(os.path.join('/content/models', model_name))

In [None]:
model = tfk.models.load_model(os.path.join('/content/models', model_name))
model_metrics = model.evaluate(testing_generator, return_dict=True)



In [None]:
h5_model_name = model_name + '.h5'
tfk.models.save_model(model, os.path.join('/content/models', h5_model_name))

## Conversion for TFLite Micro
The following section will convert the code for a microcontroller with a float and a 8 bit quantization.

This is not to be done if you want to use the Infineon IFX engine, because it will take care of this conversion step.

In [None]:
# Float model export:

converter = tf.lite.TFLiteConverter.from_keras_model(model2)
tflite_model = converter.convert()
print("Float model size:", open(os.path.join('/content/models', model_name + '.tflite'), "wb").write(tflite_model))

Float model size: 3656


In [None]:
# Quantized model export:

# Definition of Representative Dataset generator:
def representative_data_gen():
  for sample in x_rep:
    data = sample.reshape(-1, 49, 40, spectrogram_channels).astype(np.float32)
    yield [data]

#def representative_dataset():
#  for i in range(100):
#    yield [ np.array([(np.random.rand(1960)).astype(np.float32)]) ]

converter = tf.lite.TFLiteConverter.from_keras_model(model2)

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.compat.v1.lite.constants.INT8 # or tf.uint8
converter.inference_output_type = tf.compat.v1.lite.constants.INT8  # or tf.uint8

tflite_model_quant = converter.convert()
print("Quantized model size: ", open(os.path.join('/content/models', model_name + '-int8.tflite'), "wb").write(tflite_model_quant))



Quantized model size:  3576


### Generate a TensorFlow Lite for Microcontrollers Model
To convert the TensorFlow Lite quantized model into a C source file that can be loaded by TensorFlow Lite for Microcontrollers on Arduino we simply need to use the ```xxd``` tool to convert the ```.tflite``` file into a ```.cc``` file.

In [None]:
!apt-get update && apt-get -qq install xxd

MODEL_TFLITE = '/content/models/'+ model_name +'-int8.tflite'
MODEL_TFLITE_MICRO = 'TinyConvModel-int8.cc'
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}
#REPLACE_TEXT = MODEL_TFLITE.replace('/', '_').replace('.', '_')

0% [Working]            Get:1 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Hit:3 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Hit:5 http://archive.ubuntu.com/ubuntu focal InRelease
Get:6 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Hit:7 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Get:8 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal InRelease [18.1 kB]
Get:9 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1,050 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:11 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu focal InRelease
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [3,202 kB]
Hit:13 http://ppa.launchpad.net/ubuntugis/ppa/ub