# Introduction

This notebook demonstrates how to train custom openWakeWord models using pre-defined datasets and an automated process for dataset generation and training. While not guaranteed to always produce the best performing model, the methods shown in this notebook often produce baseline models with releatively strong performance.

Manual data preparation and model training (e.g., see the [training models](training_models.ipynb) notebook) remains an option for when full control over the model development process is needed.

At a high level, the automatic training process takes advantages of several techniques to try and produce a good model, including:

- Early-stopping and checkpoint averaging (similar to [stochastic weight averaging](https://arxiv.org/abs/1803.05407)) to search for the best models found during training, according to the validation data
- Variable learning rates with cosine decay and multiple cycles
- Adaptive batch construction to focus on only high-loss examples when the model begins to converge, combined with gradient accumulation to ensure that batch sizes are still large enough for stable training
- Cycical weight schedules for negative examples to help the model reduce false-positive rates

See the contents of the `train.py` file for more details.

# Environment Setup

To begin, we'll need to install the requirements for training custom models. In particular, a relatively recent version of Pytorch and custom fork of the [piper-sample-generator](https://github.com/dscripka/piper-sample-generator) library for generating synthetic examples for the custom model.

**Important Note!** Currently, automated model training is only supported on linux systems due to the requirements of the text to speech library used for synthetic sample generation (Piper). It may be possible to use Piper on Windows/Mac systems, but that has not (yet) been tested.

In [12]:
## Environment setup

# install piper-sample-generator (currently only supports linux systems)
!rm -rf piper-sample-generator  # Clean up if exists
!git clone https://github.com/rhasspy/piper-sample-generator
!wget -O piper-sample-generator/models/en_US-libritts_r-medium.pt 'https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt'
!pip install --user piper-phonemize
!pip install --user webrtcvad

# install openwakeword - use normal install, not editable
!rm -rf openwakeword  # Clean up if exists
!git clone https://github.com/celesrenata/jupyter-training openwakeword
# Skip the editable install - openwakeword is already in the image
# If you need to modify openwakeword code, edit it directly in ./openwakeword/

# install other dependencies with compatible versions
!pip install --user 'numpy<2.0'
!pip install --user 'pyarrow<15.0.0'
!pip install --user mutagen==1.47.0
!pip install --user torchinfo==1.8.0
!pip install --user torchmetrics==1.2.0
!pip install --user speechbrain==0.5.14
!pip install --user audiomentations==0.33.0
!pip install --user torch-audiomentations==0.11.0
!pip install --user acoustics==0.2.6
!pip install --user tensorflow-gpu==2.8.1
!pip install --user tensorflow_probability==0.16.0
!pip install --user onnx_tf==1.10.0
!pip install --user pronouncing==0.2.0
!pip install --user 'datasets<4.0' --force-reinstall
!pip install --user deep-phonemizer==0.0.19
!pip install --user ipywidgets
!pip install --user soundfile librosa audioread
!pip install --user 'torchaudio<2.1.0' --force-reinstall

# Download required models
import os
os.makedirs("./openwakeword/openwakeword/resources/models", exist_ok=True)
!wget -nc https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.onnx -O ./openwakeword/openwakeword/resources/models/embedding_model.onnx
!wget -nc https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.tflite -O ./openwakeword/openwakeword/resources/models/embedding_model.tflite
!wget -nc https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/melspectrogram.onnx -O ./openwakeword/openwakeword/resources/models/melspectrogram.onnx
!wget -nc https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/melspectrogram.tflite -O ./openwakeword/openwakeword/resources/models/melspectrogram.tflite

Cloning into 'piper-sample-generator'...
remote: Enumerating objects: 161, done.[K
remote: Counting objects: 100% (92/92), done.[K
remote: Compressing objects: 100% (42/42), done.[K
remote: Total 161 (delta 64), reused 62 (delta 50), pack-reused 69 (from 1)[K
Receiving objects: 100% (161/161), 1.04 MiB | 9.03 MiB/s, done.
Resolving deltas: 100% (74/74), done.
--2026-01-18 08:08:02--  https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt
Resolving github.com (github.com)... 140.82.116.4
Connecting to github.com (github.com)|140.82.116.4|:443... connected.
302 Foundest sent, awaiting response... 
Location: https://release-assets.githubusercontent.com/github-production-release-asset/642029941/73f4af3c-7cf8-4547-a7b9-3bd29e7f3c33?sp=r&sv=2018-11-09&sr=b&spr=https&se=2026-01-18T08%3A44%3A33Z&rscd=attachment%3B+filename%3Den_US-libritts_r-medium.pt&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997

In [1]:
# Imports

import sys
sys.path.insert(0, '/home/jovyan/.local/lib/python3.10/site-packages')

# Patch datasets before importing
import datasets.features.audio as audio_module
audio_module.TORCHCODEC_AVAILABLE = False

import os
import numpy as np
import torch
from pathlib import Path
import uuid
import yaml
import datasets
import scipy
import scipy.io.wavfile
from tqdm import tqdm
import soundfile


# Download Data

When training new openWakeWord models using the automated procedure, four specific types of data are required:

1) Synthetic examples of the target word/phrase generated with text-to-speech models

2) Synthetic examples of adversarial words/phrases generated with text-to-speech models

3) Room impulse reponses and noise/background audio data to augment the synthetic examples and make them more realistic

4) Generic "negative" audio data that is very unlikely to contain examples of the target word/phrase in the context where the model should detect it. This data can be the original audio data, or precomputed openWakeWord features ready for model training.

5) Validation data to use for early-stopping when training the model.

For the purposes of this notebook, all five of these sources will either be generated manually or can be obtained from HuggingFace thanks to their excellent `datasets` library and extremely generous hosting policy. Also note that while only a portion of some datasets are downloaded, for the best possible performance it is recommended to download the entire dataset and keep a local copy for future training runs.

In [3]:
# Download room impulse responses collected by MIT
# https://mcdermottlab.mit.edu/Reverb/IR_Survey.html

output_dir = "./mit_rirs"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

rir_dataset = datasets.load_dataset("davidscripka/MIT_environmental_impulse_responses", split="train", streaming=True)

# Save clips to 16-bit PCM wav files
for row in tqdm(rir_dataset):
    audio_data = row['audio']
    name = audio_data['path'].split('/')[-1]
    scipy.io.wavfile.write(os.path.join(output_dir, name), audio_data['sampling_rate'], (audio_data['array']*32767).astype(np.int16))

Resolving data files:   0%|          | 0/270 [00:00<?, ?it/s]


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.6 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.10/dist-packages/traitlets/config/application.py", line 1075, in launch_instance
    app.start()
  File "/usr/local/lib/python3.10/dist-packages/ipykernel/kernelapp.p

In [4]:
import os
import tarfile
import zipfile
from pathlib import Path

import numpy as np
import scipy.io.wavfile
from tqdm import tqdm
import datasets
from huggingface_hub import hf_hub_download
import requests


# -------------------------
# CONFIG
# -------------------------
# Do NOT hardcode tokens. Export it instead:
#   export HF_TOKEN="hf_...."
HF_TOKEN = ""
AUDISET_REPO = "agkphysics/AudioSet"
AUDISET_FILENAME = "bal_train09.tar"
AUDISET_REVISION = "196c0900867eff791b8f4d4be57db277e9a5b131"

# FMA is NOT stored as a zip in the HF dataset repo path you tried.
# Download from the official host instead:
FMA_ZIP_URL = "https://os.unil.cloud.switch.ch/fma/fma_small.zip"

AUDISET_DIR = Path("./audioset")
AUDISET_16K_DIR = Path("./audioset_16k")

FMA_DL_DIR = Path("./fma_download")
FMA_OUT_DIR = Path("./fma")


# -------------------------
# HELPERS
# -------------------------
def ensure_dir(p: Path) -> None:
    p.mkdir(parents=True, exist_ok=True)


def extract_tar(tar_path: Path, dst_dir: Path) -> None:
    ensure_dir(dst_dir)
    with tarfile.open(tar_path, "r:*") as tf:
        tf.extractall(dst_dir)


def extract_zip(zip_path: Path, dst_dir: Path) -> None:
    ensure_dir(dst_dir)
    with zipfile.ZipFile(zip_path, "r") as zf:
        zf.extractall(dst_dir)


def download_file(url: str, out_path: Path, chunk_size: int = 1024 * 1024) -> Path:
    ensure_dir(out_path.parent)
    with requests.get(url, stream=True, timeout=None) as r:
        r.raise_for_status()
        total = int(r.headers.get("content-length", 0))
        with open(out_path, "wb") as f, tqdm(
            total=total, unit="B", unit_scale=True, desc=out_path.name
        ) as pbar:
            for chunk in r.iter_content(chunk_size=chunk_size):
                if chunk:
                    f.write(chunk)
                    pbar.update(len(chunk))
    return out_path


def convert_audio_files_to_16k_wav(
    input_paths,
    output_dir: Path,
    sampling_rate: int = 16000,
    limit_files: int | None = None,
) -> int:
    """
    Uses HuggingFace datasets Audio feature to decode + resample.
    Requires ffmpeg for mp3/flac decoding in most environments.
    """
    ensure_dir(output_dir)

    input_paths = list(map(str, input_paths))
    if not input_paths:
        return 0

    if limit_files is not None:
        input_paths = input_paths[:limit_files]

    ds = datasets.Dataset.from_dict({"audio": input_paths})
    ds = ds.cast_column("audio", datasets.Audio(sampling_rate=sampling_rate))

    written = 0
    for row in tqdm(ds, total=len(ds), desc=f"Converting to {sampling_rate} Hz wav"):
        src_path = Path(row["audio"]["path"])
        arr = np.asarray(row["audio"]["array"], dtype=np.float32)

        arr = np.clip(arr, -1.0, 1.0)
        pcm16 = (arr * 32767.0).astype(np.int16)

        out_path = output_dir / src_path.with_suffix(".wav").name
        scipy.io.wavfile.write(out_path, sampling_rate, pcm16)
        written += 1

    return written


# -------------------------
# AUDIOSET
# -------------------------
ensure_dir(AUDISET_DIR)
print("Downloading AudioSet tar file...")
audioset_tar_path = hf_hub_download(
    repo_id=AUDISET_REPO,
    repo_type="dataset",
    filename=AUDISET_FILENAME,
    token=HF_TOKEN,
    revision=AUDISET_REVISION,
    local_dir=str(AUDISET_DIR),
    resume_download=True,
)

audioset_tar_path = Path(audioset_tar_path)
print(f"Extracting {audioset_tar_path} ...")
extract_tar(audioset_tar_path, AUDISET_DIR)

audioset_flacs = list(AUDISET_DIR.glob("**/*.flac"))
if not audioset_flacs:
    raise FileNotFoundError(
        f"No .flac files found under {AUDISET_DIR}. "
        "Inspect the extracted folder structure to update the glob."
    )

print(f"Found {len(audioset_flacs)} AudioSet FLACs. Converting to 16k wav...")
n_written = convert_audio_files_to_16k_wav(audioset_flacs, AUDISET_16K_DIR, sampling_rate=16000)
print(f"AudioSet: wrote {n_written} wav files to {AUDISET_16K_DIR}")


# -------------------------
# FMA (official host)
# -------------------------
ensure_dir(FMA_DL_DIR)
ensure_dir(FMA_OUT_DIR)

fma_zip_path = FMA_DL_DIR / "fma_small.zip"
if not fma_zip_path.exists():
    print("Downloading FMA zip file from official host...")
    download_file(FMA_ZIP_URL, fma_zip_path)
else:
    print(f"FMA zip already exists: {fma_zip_path}")

print(f"Extracting {fma_zip_path} ...")
extract_zip(fma_zip_path, FMA_DL_DIR)

fma_mp3s = list(FMA_DL_DIR.glob("**/*.mp3"))
if not fma_mp3s:
    raise FileNotFoundError(
        f"No .mp3 files found under {FMA_DL_DIR}. "
        "Inspect the extracted folder structure to update the glob."
    )

# Optional: limit by hours (rough heuristic)
n_hours = 1
approx_track_seconds = 30
approx_files = int(n_hours * 3600 / approx_track_seconds)

print(f"Found {len(fma_mp3s)} FMA MP3s. Converting ~{n_hours} hour(s) (~{approx_files} files) to 16k wav...")
n_written = convert_audio_files_to_16k_wav(
    fma_mp3s,
    FMA_OUT_DIR,
    sampling_rate=16000,
    limit_files=approx_files,
)
print(f"FMA: wrote {n_written} wav files to {FMA_OUT_DIR}")




Downloading AudioSet tar file...
Extracting audioset/bal_train09.tar ...
Found 685 AudioSet FLACs. Converting to 16k wav...


Converting to 16000 Hz wav: 100%|██████████| 685/685 [00:21<00:00, 32.42it/s]


AudioSet: wrote 685 wav files to audioset_16k
FMA zip already exists: fma_download/fma_small.zip
Extracting fma_download/fma_small.zip ...
Found 8000 FMA MP3s. Converting ~1 hour(s) (~120 files) to 16k wav...


[src/libmpg123/layer3.c:INT123_do_layer3():1774] error: part2_3_length (3360) too large for available bit count (3240)
[src/libmpg123/layer3.c:INT123_do_layer3():1774] error: part2_3_length (3328) too large for available bit count (3240)
Converting to 16000 Hz wav: 100%|██████████| 120/120 [00:12<00:00,  9.82it/s]

FMA: wrote 120 wav files to fma





In [5]:
# Download pre-computed openWakeWord features for training and validation
from huggingface_hub import hf_hub_download

HF_TOKEN = ""  # Set to your token string or leave as None

# training set (~2,000 hours from the ACAV100M Dataset)
# See https://huggingface.co/datasets/davidscripka/openwakeword_features for more information
hf_hub_download(
    repo_id="davidscripka/openwakeword_features",
    filename="openwakeword_features_ACAV100M_2000_hrs_16bit.npy",
    repo_type="dataset",
    token=HF_TOKEN,
    local_dir="."
)

# validation set for false positive rate estimation (~11 hours)
hf_hub_download(
    repo_id="davidscripka/openwakeword_features",
    filename="validation_set_features.npy",
    repo_type="dataset",
    token=HF_TOKEN,
    local_dir="."
)

'validation_set_features.npy'

# Define Training Configuration

For automated model training openWakeWord uses a specially designed training script and a [YAML](https://yaml.org/) configuration file that defines all of the information required for training a new wake word/phrase detection model.

It is strongly recommended that you review [the example config file](../examples/custom_model.yml), as each value is fully documented there. For the purposes of this notebook, we'll read in the YAML file to modify certain configuration parameters before saving a new YAML file for training our example model. Specifically:

- We'll train a detection model for the phrase "hey sebastian"
- We'll only generate 5,000 positive and negative examples (to save on time for this example)
- We'll only generate 1,000 validation positive and negative examples for early stopping (again to save time)
- The model will only be trained for 10,000 steps (larger datasets will benefit from longer training)
- We'll reduce the target metrics to account for the small dataset size and limited training.

On the topic of target metrics, there are *not* specific guidelines about what these metrics should be in practice, and you will need to conduct testing in your target deployment environment to establish good thresholds. However, from very limited testing the default values in the config file (accuracy >= 0.7, recall >= 0.5, false-positive rate <= 0.2 per hour) seem to produce models with reasonable performance.


In [2]:
# Load default YAML config file for training
config = yaml.load(open("openwakeword/examples/custom_model.yml", 'r').read(), yaml.Loader)
config

{'model_name': 'my_model',
 'target_phrase': ['hey jarvis'],
 'custom_negative_phrases': [],
 'n_samples': 10000,
 'n_samples_val': 2000,
 'tts_batch_size': 50,
 'augmentation_batch_size': 16,
 'piper_sample_generator_path': './piper-sample-generator',
 'output_dir': './my_custom_model',
 'rir_paths': ['./mit_rirs'],
 'background_paths': ['./background_clips'],
 'background_paths_duplication_rate': [1],
 'false_positive_validation_data_path': './validation_set_features.npy',
 'augmentation_rounds': 1,
 'feature_data_files': {'ACAV100M_sample': './openwakeword_features_ACAV100M_2000_hrs_16bit.npy'},
 'batch_n_per_class': {'ACAV100M_sample': 1024,
  'adversarial_negative': 50,
  'positive': 50},
 'model_type': 'dnn',
 'layer_size': 32,
 'steps': 50000,
 'max_negative_weight': 1500,
 'target_false_positives_per_hour': 0.2}

In [3]:
config["target_phrase"] = ["nixberry"]
config["model_name"] = config["target_phrase"][0].replace(" ", "_")
config["n_samples"] = 5000  # Less is better for quality
config["n_samples_val"] = 1000
config["steps"] = 10000
config["target_false_positives_per_hour"] = 1.0  # Allow more false positives for better recall
config["max_negative_weight"] = 500  # Reduce from 1500 to be less aggressive
config["piper_model_path"] = "./piper-sample-generator/models/en_US-libritts_r-medium.pt"
config["background_paths"] = ['./audioset_16k', './fma']
config["false_positive_validation_data_path"] = "validation_set_features.npy"
config["feature_data_files"] = {"ACAV100M_sample": "openwakeword_features_ACAV100M_2000_hrs_16bit.npy"}

with open('my_model.yaml', 'w') as file:
    documents = yaml.dump(config, file)

# Train the Model

With the data downloaded and training configuration set, we can now start training the model. We'll do this in parts to better illustrate the sequence, but you can also execute every step at once for a fully automated process.

In [37]:
# Step 1: Generate synthetic clips
import sys
import os

!pip install 'numpy<2' piper-tts

# Create resources directory in local clone
os.makedirs('./openwakeword/openwakeword/resources', exist_ok=True)

# Patch the local data.py to use exist_ok=True
import_path = './openwakeword/openwakeword/data.py'
with open(import_path, 'r') as f:
    content = f.read()

content = content.replace(
    'os.mkdir(os.path.join(os.path.dirname(os.path.abspath(__file__)), "resources"))',
    'os.makedirs(os.path.join(os.path.dirname(os.path.abspath(__file__)), "resources"), exist_ok=True)'
)

with open(import_path, 'w') as f:
    f.write(content)

# Use local openwakeword
sys.path.insert(0, '/home/jovyan/openwakeword')

# Run with proper PYTHONPATH
!PYTHONPATH=/home/jovyan/openwakeword:$PYTHONPATH python3 openwakeword/openwakeword/train.py --training_config my_model.yaml --generate_clips

Defaulting to user installation because normal site-packages is not writeable
  from pkg_resources import resource_stream
torchvision is not available - cannot save figures
INFO:root:##################################################
Generating positive clips for training
##################################################
DEBUG:generate_samples:Loading ./piper-sample-generator/models/en_US-libritts_r-medium.pt
INFO:generate_samples:Successfully loaded the model
DEBUG:generate_samples:CUDA available, using GPU
DEBUG:generate_samples:Batch 1/100 complete
DEBUG:generate_samples:Batch 2/100 complete
DEBUG:generate_samples:Batch 3/100 complete
DEBUG:generate_samples:Batch 4/100 complete
DEBUG:generate_samples:Batch 5/100 complete
DEBUG:generate_samples:Batch 6/100 complete
DEBUG:generate_samples:Batch 7/100 complete
DEBUG:generate_samples:Batch 8/100 complete
DEBUG:generate_samples:Batch 9/100 complete
DEBUG:generate_samples:Batch 10/100 complete
DEBUG:generate_samples:Batch 11/100 complete

In [None]:
# Fix sample rate issue before augmentation
import scipy.io.wavfile
import os
from pathlib import Path

def fix_sample_rate(directory, target_sr=16000):
    for wav_file in Path(directory).glob("*.wav"):
        sr, data = scipy.io.wavfile.read(wav_file)
        if sr != target_sr:
            print(f"Resampling {wav_file.name} from {sr} to {target_sr}")
            # Simple resampling using scipy
            from scipy import signal
            num_samples = int(len(data) * target_sr / sr)
            resampled = signal.resample(data, num_samples)
            scipy.io.wavfile.write(wav_file, target_sr, resampled.astype(data.dtype))

# Fix all generated clips
fix_sample_rate('./my_custom_model/nixberry/positive_train')
fix_sample_rate('./my_custom_model/nixberry/positive_test')
fix_sample_rate('./my_custom_model/nixberry/negative_train')
fix_sample_rate('./my_custom_model/nixberry/negative_test')

# Now run augmentation
import sys
sys.path.insert(0, '/home/jovyan/openwakeword')
!PYTHONPATH=/home/jovyan/openwakeword:$PYTHONPATH python3 openwakeword/openwakeword/train.py --training_config my_model.yaml --augment_clips

In [39]:
# Step 3: Train model
import sys
sys.path.insert(0, '/home/jovyan/openwakeword')
!PYTHONPATH=/home/jovyan/openwakeword:$PYTHONPATH python3 openwakeword/openwakeword/train.py --training_config my_model.yaml --train_model

  from pkg_resources import resource_stream
torchvision is not available - cannot save figures
INFO:root:##################################################
Starting training sequence 1...
##################################################
Training: 100%|███████████████████████████▉| 9999/10000 [03:55<00:00, 42.53it/s]
INFO:root:##################################################
Starting training sequence 2...
##################################################
INFO:root:Increasing weight on negative examples to reduce false positives...
Training: 100%|███████████████████████████▉| 999/1000.0 [02:38<00:00,  6.30it/s]
INFO:root:##################################################
Starting training sequence 3...
##################################################
INFO:root:Increasing weight on negative examples to reduce false positives...
Training: 100%|███████████████████████████▉| 999/1000.0 [02:41<00:00,  6.20it/s]
INFO:root:Merging checkpoints above the 90th percentile into single model.

In [6]:
# Step 4: Fix dependency conflicts and convert to tflite
# Run this cell, then RESTART KERNEL

import sys

# Install compatible versions
get_ipython().system(f'{sys.executable} -m pip install protobuf==3.20.3')
get_ipython().system(f'{sys.executable} -m pip install onnx==1.12.0')
get_ipython().system(f'{sys.executable} -m pip install onnx-tf==1.10.0')
get_ipython().system(f'{sys.executable} -m pip install tensorflow==2.11.0')

print("IMPORTANT: Restart the kernel now (Kernel -> Restart Kernel), then run the next cell")

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting onnx==1.12.0
  Downloading onnx-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.1/13.1 MB[0m [31m34.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting protobuf<=3.20.1,>=3.12.2
  Downloading protobuf-3.20.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m
Installing collected packages: protobuf, onnx
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
  Attempting uninstall: onnx
    Found existing installation: onnx 1.20.1
    Uninstalling onnx-1.20.1:
      Suc

In [4]:
# Step 4 (Optional): On Google Colab, sometimes the .tflite model isn't saved correctly
# If so, run this cell to retry

# Fix protobuf compatibility issue first
import sys
get_ipython().system(f'{sys.executable} -m pip install protobuf==3.20.3')

# Manually save to tflite as this doesn't work right in colab
def convert_onnx_to_tflite(onnx_model_path, output_path):
    """Converts an ONNX version of an openwakeword model to the Tensorflow tflite format."""
    # imports
    import onnx
    import logging
    import tempfile
    from onnx_tf.backend import prepare
    import tensorflow as tf

    # Convert to tflite from onnx model
    onnx_model = onnx.load(onnx_model_path)
    tf_rep = prepare(onnx_model, device="GPU")
    with tempfile.TemporaryDirectory() as tmp_dir:
        tf_rep.export_graph(os.path.join(tmp_dir, "tf_model"))
        converter = tf.lite.TFLiteConverter.from_saved_model(os.path.join(tmp_dir, "tf_model"))
        tflite_model = converter.convert()

        logging.info(f"####\nSaving tflite mode to '{output_path}'")
        with open(output_path, 'wb') as f:
            f.write(tflite_model)

    return None

convert_onnx_to_tflite(f"my_custom_model/{config['model_name']}.onnx", f"my_custom_model/{config['model_name']}.tflite")


Defaulting to user installation because normal site-packages is not writeable
Collecting protobuf==3.20.3
  Using cached protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
Installing collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.19.6
    Uninstalling protobuf-3.19.6:
      Successfully uninstalled protobuf-3.19.6
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.11.0 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.
tensorflow-gpu 2.8.1 requires keras<2.9,>=2.8.0rc0, but you have keras 2.11.0 which is incompatible.
tensorflow-gpu 2.8.1 requires tensorboard<2.9,>=2.8, but you have tensorboard 2.11.2 which is incompatible.
tensorflow-gpu 2.8.1 requires tensorflow-estimator<2.9,>=2.8, but you have tensorflow-estimator 2.11.

2026-01-18 10:45:39.207367: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-01-18 10:45:39.370277: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-18 10:45:40.170021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2026-01-18 10:45:40.170158: W tensorflow/compiler/xla/stream_executor/platfo

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089




INFO:tensorflow:Assets written to: /tmp/tmphqovpqxf/tf_model/assets


INFO:tensorflow:Assets written to: /tmp/tmphqovpqxf/tf_model/assets
2026-01-18 10:45:48.309325: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2026-01-18 10:45:48.309360: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2026-01-18 10:45:48.310171: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /tmp/tmphqovpqxf/tf_model
2026-01-18 10:45:48.311044: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2026-01-18 10:45:48.311079: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /tmp/tmphqovpqxf/tf_model
2026-01-18 10:45:48.314108: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled
2026-01-18 10:45:48.314649: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2026-01-18 10:45:48.335327: I tensorflow/cc/saved_model/loader.cc

After the model finishes training, the auto training script will automatically convert it to ONNX and tflite versions, saving them as `my_custom_model/<model_name>.onnx/tflite` in the present working directory, where `<model_name>` is defined in the YAML training config file. Either version can be used as normal with `openwakeword`. I recommend testing them with the [`detect_from_microphone.py`](https://github.com/dscripka/openWakeWord/blob/main/examples/detect_from_microphone.py) example script to see how the model performs!