# Model Optimization for MVP Hardware Accelerator

This tutorial describes how to optimize a keyword spotting model for the Silicon Lab's development board featuring the MVP hardware accelerator.
It uses the various tools offered by the MLTK to optimize a machine learning model so that it can efficiently run on the embedded hardware.

In this tutorial, we use the industry-standard classification model [MobileNetV2](https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html) to detect the keywords:

- __left__
- __right__
- __up__
- __down__
- __stop__
- __go__

MobileNetV2 is a common and useful model because it is generic enough that it can be applied to most classification tasks but still runs efficiently on embedded devices.

## Quick Links

- [GitHub Source](https://github.com/SiliconLabs/mltk/blob/master/mltk/tutorials/model_optimization.ipynb) - View this tutorial on Github
- [Run on Colab](https://colab.research.google.com/github/siliconlabs/mltk/blob/master/mltk/tutorials/model_optimization.ipynb) - Run this tutorial on Google Colab
- [Train in the "Cloud"](https://siliconlabs.github.io/mltk/mltk/tutorials/cloud_training_with_vast_ai.html) - _Vastly_ improve training times by training this model in the "cloud"
- [C++ Example Application](https://siliconlabs.github.io/mltk/docs/cpp_development/examples/audio_classifier.html) - View this tutorial's associated C++ example application
- [Machine Learning Model](https://siliconlabs.github.io/mltk/docs/python_api/models/siliconlabs/keyword_spotting_mobilenetv2.html)- View this tutorial's associated machine learning model

## Overview

In this tutorial, you will learn the following:  
- How to create a [model specification](https://siliconlabs.github.io/mltk/docs/guides/model_specification.html) using [MobileNetV2](https://arxiv.org/abs/1801.04381)
- How to use the [Model Profiler](https://siliconlabs.github.io/mltk/docs/guides/model_profiler.html) to profile the model on the development board
- How to use the `view_audio` command to view the spectrogram generated by the [AudioFeatureGenerator](https://siliconlabs.github.io/mltk/docs/audio/audio_feature_generator.html)
- How to adjust model parameters to fit within the resource constraints of the development board
- How to train the model

## Running this Tutorial
 
- This tutorial assumes the MLTK has been [installed](https://siliconlabs.github.io/mltk/docs/installation.html) and is available on the [command-line](https://siliconlabs.github.io/mltk/docs/command_line/index.html)
- All commands below should run from a local terminal
- In your local terminal, replace the `!mltk` command with `mltk` (i.e. remove the `!` character to run the command)

## Recommended Reading

Before doing this tutorial, it is recommended to review the following documentation:

- [MLTK Overview](https://siliconlabs.github.io/mltk/docs/overview.html) - An overview of the core concepts used by the MLTK
- [Keyword Spotting Overview](https://siliconlabs.github.io/mltk/docs/audio/keyword_spotting_overview.html) - An overview of how keyword spotting works
- [Keyword Spotting Tutorial](https://siliconlabs.github.io/mltk/mltk/tutorials/keyword_spotting_on_off.html) - Detailed tutorial describing how to create a Keyword Spotting model using the MLTK

## Model Specification

The completed [model specification](https://siliconlabs.github.io/mltk/docs/guides/model_specification.html) for this tutorial may be found on Github: [keyword_spotting_mobilenetv2.py](https://github.com/siliconlabs/mltk/blob/master/mltk/models/siliconlabs/keyword_spotting_mobilenetv2.py).

This model is very similar to [keyword_spotting_on_off.py](https://github.com/siliconlabs/mltk/blob/master/mltk/models/siliconlabs/keyword_spotting_on_off.py) with the following changes:  
- Uses [MobileNetV2](https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html) instead of custom designed model
- Keyword "classes" changed
- AudioDataGeneratorSettings modified

This tutorial describes how to start with the [keyword_spotting_on_off.py](https://github.com/siliconlabs/mltk/blob/master/mltk/models/siliconlabs/keyword_spotting_on_off.py) model specification and end up with the [keyword_spotting_mobilenetv2.py](https://github.com/siliconlabs/mltk/blob/master/mltk/models/siliconlabs/keyword_spotting_mobilenetv2.py).

### Create the model specification file

First, copy the file [keyword_spotting_on_off.py](https://github.com/siliconlabs/mltk/blob/master/mltk/models/siliconlabs/keyword_spotting_on_off.py) to your working directory and rename it to `my_keyword_spotting_mobilenetv2.py` (or any name you like, just be sure to update the commands below accordingly).

`my_keyword_spotting_mobilenetv2.py` is your [model specification](https://siliconlabs.github.io/mltk/docs/guides/model_specification.html) file. It contains everything needed to train, evaluate, and quantize your model so that it can run on an embedded device.  

The rest of this tutorial describes how to modify this file to meet our project's requirements.

### Update the model description

Open the `my_keyword_spotting_mobilenetv2.py` model specification file in your favorite text editor and edit the model description:

In [None]:
my_model.description = 'Keyword spotting classifier using MobileNetV2'

### Update the Keywords

Next, update the keywords we want this model to detect:

In [None]:
my_model.classes = ['left', 'right', 'up', 'down', 'stop', 'go', '_unknown_', '_silence_']

### Update the AudioFeatureGenerator Settings

Next, let's update the [AudioFeatureGeneratorSettings](https://siliconlabs.github.io/mltk/docs/python_api/data_preprocessing/audio_feature_generator.html#audiofeaturegenerator-settings).
We want to do this because we are detecting more keywords and need to increase the quality of the generated spectrograms which will help the ML model better distinguish between the different keywords.

To help us adjust the [AudioFeatureGenerator](https://siliconlabs.github.io/mltk/docs/python_api/data_preprocessing/audio_feature_generator.html) settings, the MLTK features the command: `view_audio`. 
This command provides a GUI to view the generated spectrograms in real-time as the settings are adjusted.

In [None]:
# Launch the AudioFeatureGenerator GUI
!mltk view_audio my_keyword_spotting_mobilenetv2

#### Select an audio file

First, select an audio file to analyze. By default, our model's audio samples may be found in:

```
<User Home Directory>/.mltk/datasets/speech_commands/v2
```

![select_audio_file.gif](https://siliconlabs.github.io/mltk/_static/images/select_audio_file.gif)

#### Adjust the AudioFeatureGenerator settings

Adjust the `Sample Length` to `1200` milliseconds. This effectively increases the audio buffer size to 1.2s. This is useful as it allows for longer words to fully fit within the buffer. It also increases the number of times the ML model can "see" the spectrogram generated from the keyword as it streams through the audio buffer. See the [Keyword Spotting Overview](https://siliconlabs.github.io/mltk/docs/audio/keyword_spotting_overview.html) for more details.  
Note that increasing the sample length effectively increases the spectrogram's height. You can also adjust the `Window Size` and `Window Step` to change the spectrogram's effective height.

Additionally, increase the `Num Channels` to `49`. This increases the number of frequency "bins" to include in the generated spectrogram which increases the spectrogram width.

Increasing the generated spectrogram's dimensions increases its resolution and should (hopefully) help the ML model better distinguish between the different keywords. This comes at a cost, though, as it increases the ML model's input size. Increasing the ML model's input size increases the number of required ML model operations which reduces the inference time (i.e. the amount of time to execute the ML model).

Through experimentation, additional settings were also modified including:

- The sample rate was set to 16kHz, this helps to reduce aliasing at the cost of double the RAM requirements
- Noise reduction values modified to help with execution on the embedded device

In the model specification file, update the [AudioFeatureGeneratorSettings](https://siliconlabs.github.io/mltk/docs/python_api/data_preprocessing/audio_feature_generator.html#audiofeaturegenerator-settings) to:

In [None]:
frontend_settings = AudioFeatureGeneratorSettings()

frontend_settings.sample_rate_hz = 16000  # We use 16k for slightly better performance at the cost of more RAM
frontend_settings.sample_length_ms = 1200 # We use a 1.2s buffer to ensure we can process a sample multiple times
frontend_settings.window_size_ms = 30
frontend_settings.window_step_ms = 20
frontend_settings.filterbank_n_channels = 49
frontend_settings.filterbank_upper_band_limit = 4000.0-1 # Spoken language usually only goes up to 4k
frontend_settings.filterbank_lower_band_limit = 125.0
frontend_settings.noise_reduction_enable = True
frontend_settings.noise_reduction_smoothing_bits = 10
frontend_settings.noise_reduction_even_smoothing =  0.025
frontend_settings.noise_reduction_odd_smoothing = 0.06
frontend_settings.noise_reduction_min_signal_remaining = 0.03
frontend_settings.pcan_enable = False
frontend_settings.pcan_strength = 0.95
frontend_settings.pcan_offset = 80.0
frontend_settings.pcan_gain_bits = 21
frontend_settings.log_scale_enable = True
frontend_settings.log_scale_shift = 6

### Update Model Layout

Next, update the model layout to use the [MobileNetV2](https://siliconlabs.github.io/mltk/docs/python_api/models/common_models.html#mobilenet-v2) architecture.

To do this, first `import` the model into your model specification Python script.   
Near the top of the Python script, add the following:

In [None]:
# Import the MobileNetV2 model
from mltk.models.shared import MobileNetV2

Next, near the bottom of your model specification script, update `def my_model_builder(model: MyModel):` with the following:

In [None]:
def my_model_builder(model: MyModel):
    keras_model = MobileNetV2( 
        input_shape=model.input_shape,
        classes=model.n_classes,
        alpha=1.0, 
        weights=None
    )
    keras_model.compile(
        loss=model.loss, 
        optimizer=model.optimizer, 
        metrics=model.metrics
    )
    return keras_model

my_model.build_model_function = my_model_builder

This creates a `MobileNetV2` model with an `alpha` of `1.0`. The `alpha` parameter may be used to tune the width multiplier or how many convolutional filters are used by a given layer of the model. Typically, the more convolutional filters the larger the model and thus the better its accuracy. The increased convolutional filters also increases the computational complexity and thus execution time of the model.

### Summarize the model 

As a quick sanity check, view a summary of the model. We should see the updated model description, classes (i.e. keywords), and MobileNetV2 architecture printed to the terminal.

__NOTE:__ This may take awhile the first time this runs as the model's dataset needs to be downloaded and extracted.

In [21]:
# Generate model summary
# Since the model hasn't been trained yet, we need to add the --build option
# Be sure to change the terminal's current directory to the model spec's directory
# e.g.: cd <same directory as my_keyword_spotting_mobilenetv2.py>
!mltk summarize my_keyword_spotting_mobilenetv2 --tflite --build

  layer_config = serialize_layer_fn(layer)
fully_quantize: 0, inference_type: 6, input_inference_type: 9, output_inference_type: 9



Epoch 00001: LearningRateScheduler setting learning rate to 0.001.

Epoch 00002: LearningRateScheduler setting learning rate to 0.00095.

Epoch 00003: LearningRateScheduler setting learning rate to 0.0009025.
+-------+-------------------+------------------+------------------+-------------------------------------------------------+
| Index | OpCode            | Input(s)         | Output(s)        | Config                                                |
+-------+-------------------+------------------+------------------+-------------------------------------------------------+
| 0     | conv_2d           | 59x49x1 (int8)   | 30x25x32 (int8)  | Padding:same stride:2x2 activation:relu6              |
|       |                   | 3x3x1 (int8)     |                  |                                                       |
|       |                   | 32 (int32)       |                  |                                                       |
| 1     | depthwise_conv_2d | 30x25x32 (int8) 

From the summary, we can see that the model file size, 2.7MB, is too large to fit on our target hardware which has 1.5MB of flash.

Let's reduce MobileNetV2's `alpha` parameter which will reduce the number of convolutional filters used by the model.

Update your model specification with:

In [None]:
keras_model = MobileNetV2( 
    input_shape=model.input_shape,
    classes=model.n_classes,
    alpha=0.5, # Change the alpha parameter to 0.5, which reduces the number of layers in the model 
    weights=None
)

Then re-run the model summary command:

In [20]:
# Re-generate model summary
# Since the model hasn't been trained yet, we need to add the --build option
# Be sure to change the terminal's current directory to the model spec's directory
# e.g.: cd <same directory as my_keyword_spotting_mobilenetv2.py>
!mltk summarize my_keyword_spotting_mobilenetv2 --tflite --build


Epoch 00001: LearningRateScheduler setting learning rate to 0.001.

Epoch 00002: LearningRateScheduler setting learning rate to 0.00095.

  layer_config = serialize_layer_fn(layer)
fully_quantize: 0, inference_type: 6, input_inference_type: 9, output_inference_type: 9




Epoch 00003: LearningRateScheduler setting learning rate to 0.0009025.
+-------+-------------------+-----------------+-----------------+-------------------------------------------------------+
| Index | OpCode            | Input(s)        | Output(s)       | Config                                                |
+-------+-------------------+-----------------+-----------------+-------------------------------------------------------+
| 0     | conv_2d           | 59x49x1 (int8)  | 30x25x16 (int8) | Padding:same stride:2x2 activation:relu6              |
|       |                   | 3x3x1 (int8)    |                 |                                                       |
|       |                   | 16 (int32)      |                 |                                                       |
| 1     | depthwise_conv_2d | 30x25x16 (int8) | 30x25x16 (int8) | Multipler:1 padding:same stride:1x1 activation:relu6  |
|       |                   | 3x3x16 (int8)   |                 |        

With the updated model, the model size is now 983kB which is small enough to fit within the embedded target's flash.

We now have the basics of our model specification complete. The next step is to profile the model to see how efficiently it runs on our embedded target.

## Profile Model

Before we spend the time and energy training our model, let's ensure that it can efficiently run on the embedded target.

We can do this using the [profile](https://siliconlabs.github.io/mltk/docs/guides/model_profiler.html) MLTK command.

The following command assumes you have a development board connected to your computer.  
If you do not have the development board, remove the `--device` argument which will use the simulator instead of physical hardware.

In [24]:
# Profile the model on a development board using the MVP hardware accelerator
# Since the model hasn't been trained yet, we need to add the --build option
# Be sure to change the terminal's current directory to the model spec's directory
# e.g.: cd <same directory as my_keyword_spotting_mobilenetv2.py>
!mltk profile my_keyword_spotting_mobilenetv2 --build --accelerator MVP --device


Epoch 00001: LearningRateScheduler setting learning rate to 0.001.



  layer_config = serialize_layer_fn(layer)
fully_quantize: 0, inference_type: 6, input_inference_type: 9, output_inference_type: 9


Epoch 00002: LearningRateScheduler setting learning rate to 0.00095.

Epoch 00003: LearningRateScheduler setting learning rate to 0.0009025.

Profiling Summary
Name: my_keyword_spotting_mobilenetv2
Accelerator: MVP
Input Shape: 1x59x49x1
Input Data Type: int8
Output Shape: 1x8
Output Data Type: int8
Model File Size (bytes): 983.2k
Runtime Memory Size (bytes): 132.5k
# Operations: 14.8M
# Multiply-Accumulates: 6.8M
# Layers: 69
# Unsupported Layers: 1
# Accelerator Cycles: 10.4M
# CPU Cycles: 7.6M
CPU Utilization (%): 39.6
Clock Rate (hz): 80.0M
Time (s): 239.2m
Ops/s: 62.0M
MACs/s: 28.5M
Inference/s: 4.2

Model Layers
+-------+-------------------+--------+--------+------------+------------+----------+-----------------------------+--------------+-------------------------------------------------------+------------+-------------------------------------------------+
| Index | OpCode            | # Ops  | # MACs | Acc Cycles | CPU Cycles | Time (s) | Input Shape                 | Output Sha

The profiler reports that it takes 239 milliseconds to execute the model one time. Unfortunately, this is likely too slow to fit within our hardware constraints. For this model to work, it should really execute in less than 120ms. This is necessary because we also need to account for the time required to generate a spectrogram (which is _not_ included in the profiler's reported number). We need to process the generated spectrogram in the ML model multiple times so that we can average enough model predictions to have confidence that a keyword is actually detected. See the [Keyword Spotting Overview](https://siliconlabs.github.io/mltk/docs/audio/keyword_spotting_overview.html) for more details.

Thus, let's adjust the MobileNetV2 model parameters to reduce its size further.

Change:
- `alpha=0.15`
- `last_block_filters=384`

The `alpha` parameter has already been discussed.  
The `last_block_filters` is an additional parameter enabled by the MLTK. This controls the number of filters in the last Conv2D block of the MobileNetV2. By default, this value is 1280 which is likely overkill for our purposes (the default MobileNetV2 is designed to classify 1000 different classes). Let's reduce this value to 384.

In [None]:
keras_model = MobileNetV2( 
    input_shape=model.input_shape,
    classes=model.n_classes,
    alpha=0.15, # Change the alpha parameter to 0.15, which reduces the number of layers in the model 
    last_block_filters=384,
    weights=None
)

Then profile the model again:

In [25]:
# Profile the model on a development board using the MVP hardware accelerator
# Since the model hasn't been trained yet, we need to add the --build option
# Be sure to change the terminal's current directory to the model spec's directory
# e.g.: cd <same directory as my_keyword_spotting_mobilenetv2.py>
!mltk profile my_keyword_spotting_mobilenetv2 --build --accelerator MVP --device


Epoch 00001: LearningRateScheduler setting learning rate to 0.001.

Epoch 00002: LearningRateScheduler setting learning rate to 0.00095.

Epoch 00003: LearningRateScheduler setting learning rate to 0.0009025.

Profiling Summary
Name: my_keyword_spotting_mobilenetv2
Accelerator: MVP
Input Shape: 1x59x49x1
Input Data Type: int8
Output Shape: 1x8


  layer_config = serialize_layer_fn(layer)
fully_quantize: 0, inference_type: 6, input_inference_type: 9, output_inference_type: 9


Output Data Type: int8
Model File Size (bytes): 228.6k
Runtime Memory Size (bytes): 104.4k
# Operations: 4.2M
# Multiply-Accumulates: 1.7M
# Layers: 71
# Unsupported Layers: 0
# Accelerator Cycles: 3.5M
# CPU Cycles: 2.9M
CPU Utilization (%): 35.7
Clock Rate (hz): 80.0M
Time (s): 102.2m
Ops/s: 41.4M
MACs/s: 17.0M
Inference/s: 9.8

Model Layers
+-------+-------------------+--------+--------+------------+------------+----------+-------------------------+--------------+-------------------------------------------------------+
| Index | OpCode            | # Ops  | # MACs | Acc Cycles | CPU Cycles | Time (s) | Input Shape             | Output Shape | Options                                               |
+-------+-------------------+--------+--------+------------+------------+----------+-------------------------+--------------+-------------------------------------------------------+
| 0     | conv_2d           | 126.0k | 54.0k  | 184.3k     | 22.2k      | 2.5m     | 1x59x49x1,8x3x3x1,8    

With these parameters, the model now requires 102 milliseconds to execute which is within our target range.

__NOTE:__ Another way to reduce model execution time is to decrease the size of the spectrogram (which then decreases the model input size). Reducing the `frontend_settings.filterbank_n_channels` value is an effective way of doing this.

## Train the Model

With the model specification complete, it is time to fully train the model:

In [None]:
# NOTE: Be patient, this command will take awhile!
!mltk train my_keyword_spotting_mobilenetv2 --clean

### Train in cloud

Alternatively, you can _vastly_ improve the model training time by training this model in the "cloud".  
See the tutorial: [Cloud Training with vast.ai](https://siliconlabs.github.io/mltk/mltk/tutorials/cloud_training_with_vast_ai.html) for more details.

## Test the Model

With the model fully trained, it's time to test it on the development board. This can be done by issuing the following command:

In [None]:
# Test the keyword_spotting_mobilenetv2 using the development board's microphone
# The red LED will turn on when a keyword is detected
# The green LED will turn on when there's audio activity
# NOTE: Your mouth should be ~2 inches from the board's microphone
!mltk classify_audio my_keyword_spotting_mobilenetv2 --device --accelerator MVP