Documentation: https://ambiqai.github.io/soundkit Source Code: https://github.com/AmbiqAI/soundkit
soundKIT is an AI Development Kit (ADK) designed to help developers build, train, and deploy real-time audio classification models onto Ambiq's family of ultra-low power SoCs. The kit includes task-specific datasets, energy-efficient model architectures, and built-in tools for optimization and deployment. It also integrates with NeuralSPOT, Ambiq’s open-source AI SDK, to streamline the deployment of inference models onto embedded hardware. Developers can use pre-trained models or create custom audio models tailored to their specific edge application.
Key Features:
- Real-time: Run low-latency inference on embedded edge devices.
- Efficient: Built for Ambiq’s ultra low-power hardware platforms.
- Customizable: Add new models, datasets, and audio tasks.
- End-to-End: Includes tools for training, quantization, evaluation, and deployment.
- Open Source: Available for use and contributions on GitHub.
- Int16x8 Tflite Quantization: Utilize HeliaRT, a specialized fork of TensorFlow Lite for Microcontrollers (TFLM), developed by Ambiq. It supports the quantization format int16x8 including LSTM, conv2d with group (separable convolution) exclusively
- Ubuntu 20.04 / 22.04 / 24.04 (The CLI and installation scripts are currently Ubuntu-only).
The following are also required to compile/flash binaries for EVB demos:
git clone https://github.com/AmbiqAI/soundkit.git
cd soundkit
./install.sh
source .venv/bin/activate # start the soundkit on virtural envsoundKIT can be used via CLI or directly as a Python package. It supports a flexible configuration-based workflow to streamline training and deployment.
Refer to the Quickstart Guide to get started quickly.
soundKIT supports three core audio tasks, each with reference pipelines for training, evaluation, export, and deployment:
- SE (Sound Enhancement): Speech enhancemnt.
- VAD (Voice Activity Detection): Detect presence or absence of human voice in audio streams.
- KWS (Keyword Spotting): Recognize short spoken keywords, such as wake words.
- ID (Speaker Identification): Recognize speaker voice. Custom tasks can be implemented using the task registry.
Each task supports the following operational modes:
- Data: Retrieve supported datasets and generate tfrecords.
- Train: Train a model using a config file or inline args.
- Evaluate: Benchmark model performance on validation/test sets.
- Export: Export trained model to TFLite/TFLM formats for deployment.
- Demo: Run on-device inference demos using PC or Ambiq EVB.
soundKIT includes a flexible dataset factory that supports both speech and non-speech corpora, as well as labeled data for supervised tasks. The following datasets are supported for SE, VAD, and KWS tasks:
-
LibriSpeech (train-clean-100, train-clean-360, train-other-500, dev-clean, test-clean): Large-scale read English speech corpus.
-
THCHS-30: Mandarin speech corpus with train/dev splits.
-
Qualcomm Keyword Speech Dataset: Maunally download required. See the license for full terms here.
- WHAM! Noise: Background noise recordings with train/val splits.
- MUSAN: Contains music and noise clips suitable for data augmentation and robust training.
- FSD50K: Open dataset with diverse non-verbal sound events.
- ESC-50: Environmental sound classification dataset for non-speech events.
- RIRS_NOISES: Room impulse responses for augmenting audio with realistic reverberation.
Each dataset is loaded using a factory function and supports automatic file discovery or label mapping where applicable.
Pre-trained models are available for SE, VAD, ID, and KWS tasks. Each model includes:
- Downloadable
.tflitebinaries - Training configuration files
- Evaluation reports (accuracy, F1-score, latency)
- Deployment instructions
Explore the Guides section for:
- Task-specific tutorials (KWS, VAD, SE, ID)
- Dataset preparation and augmentation
- Model customization and benchmarking
- Deployment on Ambiq Apollo EVBs
See LICENSE for full terms.