# Initialize the accelerator

In [1]:
from finn_examples import models
print(list(filter(lambda x: "kws" in x, dir(models))))

['kws_mlp']


In [2]:
accel = models.kws_mlp()

In [3]:
print("Expected input shape and datatype: %s %s" % (str(accel.ishape_normal), str(accel.idt)))
print("Expected output shape and datatype: %s %s" % (str(accel.oshape_normal), str(accel.odt)))

Expected input shape and datatype: (1, 490) DataType.INT8
Expected output shape and datatype: (1, 1) DataType.UINT8


# Validating network accuracy
In this first part we will be looking at the overall accuracy of the network.

The keyword spotting (KWS) network was trained on the Google Speech Commands v2 dataset, as published here: https://arxiv.org/abs/1804.03209

We then used a feature extraction technique called Mel Frequency Cepstral Coefficients or MFCC for short.
This method turns audio waveforms into 2D images with one channel. Similar to the one shown below:

![MFCC features produced by python_speech_features]("images/mfcc_py.png")

A more in-depth explenation of MFCC features can be found on wikipedia: https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

For this concrete case we used the python library [python_speech_featrues](https://github.com/jameslyons/python_speech_features) to produce these features.

During the training of the KWS network we produce the MFCC features for the training and validation set and then quantize the inputs to the network to eight bit.
We will load the pre-processed and quantized validation dataset in the next step.


### Load preprocessed Google Speech Commands v2 validation dataset

In [23]:
import pkg_resources as pk
import numpy as np

input_npy = pk.resource_filename("finn_examples", "data/python_speech_preprocessing_all_validation_KWS_data_inputs_len_10102.npy")
golden_out_npy = pk.resource_filename("finn_examples", "data/python_speech_preprocessing_all_validation_KWS_data_outputs_len_10102.npy")

input_data = np.load(input_npy)
golden_out_data = np.load(golden_out_npy)
num_samples = input_data.shape[0]

print("Input data shape: " + str(input_data.shape))
print("Label shape: " + str(golden_out_data.shape))

Input data shape: (10102, 490)
Label shape: (10102,)


### Run validation on the FPGA

In [24]:
accel.batch_size = num_samples
accel_out_data = accel.execute(input_data)

print("Accelerator output shape: " + str(accel_out_data.shape))

Accelerator output shape: (10102, 1)


In [32]:
score = np.unique(accel_out_data.flatten() == golden_out_data.flatten(), return_counts=True)
print("Correctly predicted: %d / %d " % (score[1][1], num_samples))
print("Incorrectly predicted: %d / %d " % (score[1][0], num_samples))
print("Accuracy: %f%%" % (100.0 * score[1][1] / num_samples))

Correctly predicted: 9070 / 10102 
Incorrectly predicted: 1032 / 10102 
Accuracy: 89.784201%


Here you should se an accuracy of about 88.76 %.

# Assessing network throughput

Now we will take a look at how fast the FPGA can process the whole validation dataset.

### Using a naive timing benchmark from the notebook

In [33]:
def run_validation():
    accel_out_data = accel.execute(input_data)

In [34]:
full_validation_time = %timeit -n 5 -o run_validation()

5 loops, best of 3: 69 ms per loop


In [35]:
print(f"{(num_samples / float(full_validation_time.best)):.0f} samples per second including data movement")

146301.715477 images per second including data movement


While the result of over 140 thousand inferences per second is already very good, this naive benchmark
also includes data movement from and to the FPGA and it is dificult to assess how much time is spent on
which part of running the FINN accelerator.

### Using the built-in performance benchmark

To measure the performance of indivudual components of the PYNQ stack and the FINN accelerator on the FPGA,
FINN comes with a buit-in benchmark. This benchmark computes the throughput of the FINN accelerator as seen on the FPGA.

In [36]:
accel.throughput_test()

{'DRAM_in_bandwidth[Mb/s]': 121.19740179165815,
 'DRAM_out_bandwidth[Mb/s]': 0.24734163630950642,
 'batch_size': 10102,
 'copy_input_data_to_device[ms]': 26.500940322875977,
 'copy_output_data_from_device[ms]': 0.23293495178222656,
 'fclk[mhz]': 100.0,
 'fold_input[ms]': 0.16808509826660156,
 'pack_input[ms]': 0.1747608184814453,
 'runtime[ms]': 40.842294692993164,
 'throughput[images/s]': 247341.63630950643,
 'unfold_output[ms]': 0.19407272338867188,
 'unpack_output[ms]': 1.2056827545166016}