In [None]:
# Copyright (C) 2023 Arm Limited or its affiliates. All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the License); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# CNN_Small - Optimised

Here we reproduce the models with our established codebase and ModelPackage approach for your convenience.

## Model-Package Overview:

| Model           	| CNN_Small                            	|
|:---------------:	|:---------------------------------------------------------------:	|
| <u>**Format**</u>:          	| Keras, Saved Model, TensorFlow Lite int8, TensorFlow Lite fp32 |
| <u>**Feature**</u>:         	| Keyword spotting for Arm Cortex-M CPUs |
| <u>**Architectural Delta w.r.t. Vanilla**</u>: | None |
| <u>**Domain**</u>:         	| Keyword spotting |
| <u>**Package Quality**</u>: 	| Optimised |

### Table of contents <a name="index_page"></a>

This how-to guidance presents the key steps to reproduce everything in this package. The contents are organised as below. We provided the internal navigation links for users to easy-jump among different sections.  

    
* [1.0 Model recreation](#model_recreation)

* [2.0 Training](#training)

* [3.0 Testing](#testing)

* [4.0 Optimization](#optimization)

* [5.0 Quantization and TFLite conversion](#tflite_conversion)

* [6.0 Inference the TFLite model files](#tflite_inference)

## 1.0 Model Recreation<a name="model_recreation"></a>

In order to recreate the model you will first need to be using ```Python3.7``` and install the requirements in ```requirements.txt```.

Once you have these requirements satisfied you can execute the recreation script contained within this folder, just run:

In [1]:
!bash ./recreate_model.sh

2023-01-31 13:13:21.365383: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Untarring speech_commands_v0.02.tar.gz...
2023-01-31 13:14:12.415896: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2023-01-31 13:14:12.453662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: NVIDIA TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2023-01-31 13:14:12.453701: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2023-01-31 13:14:12.477025: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2023-01-31 13:14:12.477130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Success

2023-01-31 13:14:39.184982: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Untarring speech_commands_v0.02.tar.gz...
2023-01-31 13:15:30.798819: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2023-01-31 13:15:30.834958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: NVIDIA TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2023-01-31 13:15:30.834997: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2023-01-31 13:15:30.856434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2023-01-31 13:15:30.856508: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Success

Running this script will use the pre-trained checkpoint files supplied in the ```./model_archive/model_source/weights``` folder to generate the TFLite files and perform evaluation on the test set. Both an fp32 version and a quantized version will be produced. The quantized version will use post-training quantization to fully quantize it.

If you want to run training from scratch you can do this by supplying ```--train``` when running the script. For example:

```bash
bash ./recreate_model.sh --train
```

Training is then performed and should produce a model to the stated accuracy in this repository. Note that exporting to TFLite will still happen with the baseline pre-trained checkpoint files, so you will need to re-run the script and this time supply the path to the new checkpoint files you want to use, for example:

```bash
bash ./recreate_model.sh --ckpt <checkpoint_path>
```

## 2.0 Training<a name="training"></a>

The training scripts can be used to recreate any of the models from the [Hello Edge paper](https://arxiv.org/pdf/1711.07128.pdf) provided the right hyperparameters are used. The training commands with all the hyperparameters to reproduce the model in this repository are given [here](recreate_model.sh). The model in this part of the repository represents just one variation of the models from the paper, other varieties are covered in other parts of the repository.


As a general example of how to train a DNN with 3 fully-connected layers with 128 neurons in each layer, run:
```
python train.py --model_architecture dnn --model_size_info 128 128 128
```

The command line argument *--model_size_info* is used to pass the neural network layer
dimensions such as number of layers, convolution filter size/stride as a list to models.py,
which builds the TensorFlow graph based on the provided model architecture
and layer dimensions. For more info on *model_size_info* for each network architecture see
[models.py](model_core_utils/models.py).


## 3.0 Testing<a name="testing"></a>
To run inference on the trained model from a checkpoint and get accuracy on validation and test sets, run:
```
python evaluation.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint <checkpoint_path>
```
**The model and feature extraction parameters passed to this script should match those used in the Training step.**

## 4.0 Optimization<a name="optimization"></a>

We introduce an *optional* step to optimize the trained keyword spotting model for deployment.

Here we use TensorFlow's [weight clustering API](https://www.tensorflow.org/model_optimization/guide/clustering) to reduce the compressed model size and optimize inference on supported hardware. 32 weight clusters and kmeans++ cluster intialization method are used as the clustering hyperparameters.

To optimize your trained model (e.g. a DNN), a trained model checkpoint is needed to run clustering and fine-tuning on.
You can use the pre-trained checkpoints provided, or train your own model and use the resulting checkpoint.

To apply the optimization and fine-tuning, run the following command:
```
python optimisations.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint <checkpoint_path>
```
**The model and feature extraction parameters used here should match those used in the Training step, except for the number of training steps.
The number of training steps is reduced since the optimization step only requires fine-tuning.**

This will generate a clustered model checkpoint that can be used in the quantization step to generate a quantized and clustered TFLite model.

## 5.0 Quantization and TFLite Conversion<a name="tflite_conversion"></a>

You can now use TensorFlow's
[post training quantization](https://www.tensorflow.org/lite/performance/post_training_quantization) to
make quantization of the trained models super simple.

To quantize your trained model (e.g. a DNN) run:
```
python convert_to_tflite.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint <checkpoint_path> [--inference_type int8|int16]
```
**The model and feature extraction parameters used here should match those used in the Training step.**

The ```inference_type``` parameter is *optional* and to be used if a fully quantized model with inputs and outputs of type int8 or int16 is needed. It defaults to fp32.

In this example, this step will produce a quantized TFLite file *dnn_quantized.tflite*.

You can test the accuracy of this quantized model on the test set by running:
```
python evaluation.py --tflite_path dnn_quantized.tflite
```
**The model and feature extraction parameters used here should match those used in the Training step.**

`convert_to_tflite.py` uses post-training quantization to generate a quantized model by default. If you wish to convert to a floating point TFLite model, use the command below:

```
python convert_to_tflite.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint <checkpoint_path> --no-quantize
```

This will produce a floating point TFLite file *dnn.tflite*. You can test the accuracy of this floating point model using `evaluation.py` as above.


## 6.0 Single inference of the TFLite model files <a name="tflite_inference"></a>

You can conduct TFLite inference for .fp32 and .int8 model files by using the following command: 

```python cnn_s_inference_tflite.py --labels validation_utils/labels.txt --wav <path_to_wav_file> --tflite_path <path_to_tflite_file>```

**The feature extraction parameters used here should match those used in the Training step.**


