GitHub - AmbiqAI/soundkit: Ambiq Lightweight audio intelligence toolkit. Real-time VAD, Noise Suppression, Wake Word Detection, and Speaker ID for edge devices

Documentation: https://ambiqai.github.io/soundkit Source Code: https://github.com/AmbiqAI/soundkit

soundKIT is an AI Development Kit (ADK) designed to help developers build, train, and deploy real-time audio classification models onto Ambiq's family of ultra-low power SoCs. The kit includes task-specific datasets, energy-efficient model architectures, and built-in tools for optimization and deployment. It also integrates with NeuralSPOT, Ambiq’s open-source AI SDK, to streamline the deployment of inference models onto embedded hardware. Developers can use pre-trained models or create custom audio models tailored to their specific edge application.

Key Features:

Real-time: Run low-latency inference on embedded edge devices.
Efficient: Built for Ambiq’s ultra low-power hardware platforms.
Customizable: Add new models, datasets, and audio tasks.
End-to-End: Includes tools for training, quantization, evaluation, and deployment.
Open Source: Available for use and contributions on GitHub.
Int16x8 Tflite Quantization: Utilize HeliaRT, a specialized fork of TensorFlow Lite for Microcontrollers (TFLM), developed by Ambiq. It supports the quantization format int16x8 including LSTM, conv2d with group (separable convolution) exclusively

Operation System

Ubuntu 20.04 / 22.04 / 24.04 (The CLI and installation scripts are currently Ubuntu-only).

Requirements

Python ^3.10+

The following are also required to compile/flash binaries for EVB demos:

Installation

git clone https://github.com/AmbiqAI/soundkit.git
cd soundkit
./install.sh
source .venv/bin/activate # start the soundkit on virtural env

Usage

soundKIT can be used via CLI or directly as a Python package. It supports a flexible configuration-based workflow to streamline training and deployment.

Refer to the Quickstart Guide to get started quickly.

Tasks

soundKIT supports three core audio tasks, each with reference pipelines for training, evaluation, export, and deployment:

SE (Sound Enhancement): Speech enhancemnt.
VAD (Voice Activity Detection): Detect presence or absence of human voice in audio streams.
KWS (Keyword Spotting): Recognize short spoken keywords, such as wake words.
ID (Speaker Identification): Recognize speaker voice. Custom tasks can be implemented using the task registry.

Modes

Each task supports the following operational modes:

Data: Retrieve supported datasets and generate tfrecords.
Train: Train a model using a config file or inline args.
Evaluate: Benchmark model performance on validation/test sets.
Export: Export trained model to TFLite/TFLM formats for deployment.
Demo: Run on-device inference demos using PC or Ambiq EVB.

Datasets

soundKIT includes a flexible dataset factory that supports both speech and non-speech corpora, as well as labeled data for supervised tasks. The following datasets are supported for SE, VAD, and KWS tasks:

Speech Corpora

LibriSpeech (train-clean-100, train-clean-360, train-other-500, dev-clean, test-clean): Large-scale read English speech corpus.
THCHS-30: Mandarin speech corpus with train/dev splits.
Qualcomm Keyword Speech Dataset: Maunally download required. See the license for full terms here.

Noise Datasets

WHAM! Noise: Background noise recordings with train/val splits.
MUSAN: Contains music and noise clips suitable for data augmentation and robust training.
FSD50K: Open dataset with diverse non-verbal sound events.
ESC-50: Environmental sound classification dataset for non-speech events.

Reverb

RIRS_NOISES: Room impulse responses for augmenting audio with realistic reverberation.

Each dataset is loaded using a factory function and supports automatic file discovery or label mapping where applicable.

Model Zoo

Pre-trained models are available for SE, VAD, ID, and KWS tasks. Each model includes:

Downloadable .tflite binaries
Training configuration files
Evaluation reports (accuracy, F1-score, latency)
Deployment instructions

Guides

Explore the Guides section for:

Task-specific tutorials (KWS, VAD, SE, ID)
Dataset preparation and augmentation
Model customization and benchmarking
Deployment on Ambiq Apollo EVBs

License

See LICENSE for full terms.

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
configs		configs
docs		docs
notebooks		notebooks
soundkit		soundkit
wavs		wavs
zoo		zoo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Operation System

Requirements

Installation

Usage

Tasks

Modes

Datasets

Speech Corpora

Noise Datasets

Reverb

Model Zoo

Guides

License

About

Uh oh!

Releases

Uh oh!

Contributors 2

Languages

License

AmbiqAI/soundkit

Folders and files

Latest commit

History

Repository files navigation

Operation System

Requirements

Installation

Usage

Tasks

Modes

Datasets

Speech Corpora

Noise Datasets

Reverb

Model Zoo

Guides

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 2

Languages