Micro speech example for TensorFlow Lite

The program analyzes an audio input with a voice recognition model that can detect 2 keywords - yes and no. The recognized keywords are then printed into a serial interface. The voice recognition model is implemented using TensorFlow Lite for Microcontrollers.

The example project can be executed on Arm Virtual Hardware (AVH) as well as on physical hardware targets.

Structure

The repository is organized as follows:

Folder	Description
`./micro_speech/`	Contains the voice recognition model that is used by all targets. This part is similar to the original TF-Lite for Microcontrollers example, with just minor modifications. TensorFlow calculation kernel is provided separately via corresponding software packs listed in Prerequisites.
`./Platform_FVP_Corstone_SSE-300_Ethos-U55/`	Project files specific for Corstone SSE-300 AVH target.
`./Platform_IMXRT1050-EVKB/`	Project files specific for IMXRT1050-EVKB target.
`./Platform_MIMXRT1064-EVK/`	Project files specific for MIMXRT1064-EVK target.
`./VSI/`	Implementation of Audio Streaming Interface for FVP targets with Virtual Streaming Interface (VSI).

Prerequisites

Toolchain

IDE (Windows only): Keil MDK - Professional Edition - V5.38 or higher
alternatively, CMSIS-Toolbox command-line building tools

Arm Virtual Hardware - Corstone and Cortex-M CPUs available via AWS Marketplace

For targets with VSI support
- Python 3.9

Specific for HW targets

NXP MIMXRT1050-EVKB board, or
NXP MIMXRT1064-EVK board

Note that CMSIS software packs used in the specific project will be requested and installed automatically when using Keil MDK or CMSIS-Build.

Micro speech for AVH (Fast Model) for Corstone SSE-300 with Ethos-U55

Project directory: ./Platform_FVP_Corstone_SSE-300_Ethos-U55/

This example executes the program on Corstone SSE-300 with Ethos-U55 Fixed Virtual Platforms (FVPs).

Example project has the following targets:

Example: target runs on AVH with Virtual Streaming Interface (VSI)
- Uses the VHT_Corstone_SSE-300_Ethos-U55 with VSI support
  It is required to install the model executable and binaries in order to run this example. The directory with executables needs to be added to the system path.
- Audio test data is provided by Python script ./VSI/audio/python/arm_vsi0.py from WAVE file test.wav which contains keywords 'Yes' and 'No' alternating three times.
- Open the example with Keil MDK (Windows only) using the uVision project microspeech.uvprojx and build it for target Example.
- Alternatively compile with CMSIS-Build using microspeech.Example.cprj project.
- Run example from the uVision project or standalone with script run_example.cmd.
- When running the example the audio data input is processed and detected keywords are output to the Telnet terminal with their time stamps in the audio stream. Following output can be observed for the default test.wav file included with the example:
```
Heard yes (152) @1100ms
Heard no (141) @5500ms
Heard yes (147) @9100ms
Heard no (148) @13600ms
Heard yes (147) @17100ms
Heard no (148) @21600ms
```
Example Test: internal test for Example target
Audio Provider Mock: runs on AVH
- Uses the VHT_Corstone_SSE-300_Ethos-U55 without using VSI
- Audio test data is embedded in the test code and contains keywords 'Yes' and 'No' alternating indefinitely.
- Open the example with Keil MDK (Windows only) using the uVision project microspeech.uvprojx and build it for target Audio Provider Mock.
- Alternatively compile with CMSIS-Build using microspeech.Audio_Provider_Mock.cprj project.
- Run example from the uVision project or standalone with script run_audio_provider_mock.cmd.
- When running the example the audio data input is processed and detected keywords are continuously output to the Telnet terminal with their time stamps in the audio stream. Following output can be observed for the default test.wav file included with the example:
```
Heard silence (149) @400ms
Heard yes (158) @1200ms
Heard no (143) @5600ms
Heard yes (149) @9100ms
Heard no (142) @13600ms
Heard yes (149) @17100ms
Heard no (142) @21600ms
```
Audio Provider Mock Test: internal test for Audio Provider Mock target

Micro speech for IMXRT1050-EVKB board

Project directory: ./Platform_IMXRT1050-EVKB/

This example executes the program on NXP IMXRT1050-EVKB development board with an Arm Cortex-M7 processor. It uses the on-board microphone for audio input and prints recognized keywords to the serial interface. One target IMXRT1050-EVK is provided in the project.

The board shall be connected to a PC via USB port J28. Jumper J1 shall connect pins 5-6 to ensure correct power supply in such setup. The project is configured to load the program to on-board Hyper Flash so the boot switch SW7 shall be set to 0110.

Execute the program in following steps:

Build example with MDK using uVision project microspeech.uvprojx or CMSIS-Build using microspeech.IMXRT1050-EVKB.cprj project.
Program and run the example with MDK or use Drag-and-drop programming through the DAP-Link drive.
Open the DAP-Link Virtual COM port in a terminal (baudrate = 115200) and monitor recognized keywords.

Micro speech for MIMXRT1064-EVK board

Project directory: ./Platform_MIMXRT1064-EVK/

This example executes the program on NXP MIMXRT1064-EVK development board with an Arm Cortex-M7 processor. It uses the on-board microphone for audio input and prints recognized keywords to the serial interface. One target MIMXRT1064-EVK is provided in the project.

The board shall be connected to a PC via USB port J41. Jumper J1 shall connect the pins 5-6 to ensure correct power supply in such setup. The project is configured to load the program to on-board QSPI NOR flash so the boot switch SW7 shall be set to 0010.

Execute the program in following steps:

Build example with MDK using uVision project microspeech.uvprojx or CMSIS-Build using microspeech.MIMXRT1064-EVK.cprj project.
Program and run the example with MDK or use Drag-and-drop programming through the DAP-Link drive.
Open the DAP-Link Virtual COM port in a terminal (baudrate = 115200) and monitor recognized keywords.

TensorFlow-Lite kernel variants

The micro speech example uses tensorflow-lite-micro pack that contains Machine Learning software component implementing among others the universal kernel for executing TensorFlow ML operations independent from the actual load type (audio, video, or others).

Implementation of these kernel operations is available in several variants optimized for Arm targets. When using the uVision project the variant can be selected in Manage Run-Time Environment window as shown on the picture below.

When using CMSIS-Build the kernel variant is specified in the .cprj project file in the line:

    <component Cclass="Machine Learning" Cgroup="TensorFlow" Csub="Kernel" Cvariant="CMSIS-NN" Cvendor="tensorflow"/>

Following kernel variants are available:

CMSIS-NN
Optimizes execution on Cortex-M devices by using mathematical functions from CMSIS-NN software library.
Underlying implementations automatically utilize target-specific hardware extensions such as Helium (or MVE: M-Profile Vector Extensions) and SIMD (Single Instruction Multiple Data) and so maximize computing performance and reduce code footprint.
For devices with MVE (such as Cortex-M55), there is additional configuration field Vector Extensions in the Options for Target..-Target dialog, that specifies MVE use in the project.

Ethos-U
(Not functional at this moment). Uses implementation optimized for Arm Ethos-U NPUs.
Reference
Target-independent software implementation is used. Fallback solution for operations that cannot be optimized for target hardware.

Performance measurement with Event Statistics

File ./micro_speech/src/main_functions.cc in the repository example contains Event Statistics annotations that allow to measure performance of ML algorithms in different configurations. This currently works only in setup with Keil MDK. This video demonstrates program execution including the views into Event Statistics.

There are three events defined:

Event C0: signal processing part that creates a spectrogram from the current data slice.
Event C1: ML inference algorithm for the created spectrogram. Most compute-intensive part.
Event C2: Verifies if a keyword was detected.

After executing the program with different TF-Lite kernel variants it can be observed that implementations optimized for Arm hardware extensions achieve significantly better performance for ML inference (C1 measurement), but also have shorter times for signal processing (C0 event).

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
Platform_FVP_Corstone_SSE-300_Ethos-U55		Platform_FVP_Corstone_SSE-300_Ethos-U55
Platform_IMXRT1050-EVKB		Platform_IMXRT1050-EVKB
Platform_MIMXRT1064-EVK		Platform_MIMXRT1064-EVK
VSI		VSI
images		images
micro_speech/src		micro_speech/src
.cdefault.yml		.cdefault.yml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
avh.yml		avh.yml
microspeech.csolution.yml		microspeech.csolution.yml

License

ARM-software/AVH-TFLmicrospeech

Folders and files

Latest commit

History

Repository files navigation

Micro speech example for TensorFlow Lite

Structure

Prerequisites

Micro speech for AVH (Fast Model) for Corstone SSE-300 with Ethos-U55

Micro speech for IMXRT1050-EVKB board

Micro speech for MIMXRT1064-EVK board

TensorFlow-Lite kernel variants

Performance measurement with Event Statistics

About

Resources

License

Stars

Watchers

Forks

Languages