## Training your own openWakeWord models


**Quick-start:** If you just want to train a basic custom model for openWakeWord!

Follow the instructions for Step 1 below. Each time you change the wake word, click the play icon to the left of the title to generate a sample and make sure it sounds correct. The first time it takes a few minutes but subsequent runs will be quick.

Once you're satisfied with the pronounciation, go to the "Runtime" dropdown menu in the upper left of the page, and select "run all". Keep the tab open but feel free to do something else. After ~1 hour, your custom model will be ready and will automatically be downloaded to your computer!

If you are a Home Assistant user with the openWakeWord add-on, follow the instructions [here](https://github.com/home-assistant/addons/blob/master/openwakeword/DOCS.md#custom-wake-word-models) to install and enable your custom model.

---

If you are interested in learning more about the custom model training process (and increasing the accuracy of your custom models), read through each step in this notebook and try experimenting with different training parameters. If you have any questions or problems, feel free to start a discussion at the openWakeWord [repo](https://github.com/dscripka/openWakeWord/discussions).

In [3]:
# @title  { display-mode: "form" }
# @markdown # 1. Test Example Training Clip Generation
# @markdown Since openWakeWord models are trained on synthetic examples of your
# @markdown target wake word, it's a good idea to make sure that the examples
# @markdown sound correct. Type in your target wake word below, and run the
# @markdown cell to listen to it.
# @markdown
# @markdown Here are some tips that can help get the wake word to sound right:

# @markdown - If your wake word isn't being pronounced in the way
# @markdown you want, try spelling out the sounds phonetically with underscores
# @markdown separating each part.
# @markdown For example: \"hey siri\" --> \"hey_seer_e\".

# @markdown - Spell out numbers (\"2\" --> \"two\")

# @markdown - Avoid all punctuation except for \"?\" and \"!\", and remove unicode characters

import os
import sys
import subprocess
from IPython.display import Audio

# --- Устанавливаем необходимые зависимости ---
!pip install piper-tts==1.3.0 webrtcvad 'torch<=2.5'

# --- Скачиваем русскую модель голоса (Irina) ---
if not os.path.exists("./ru_RU-irina-medium.onnx"):
    !wget https://huggingface.co/rhasspy/piper-voices/resolve/main/ru/ru_RU/irina/medium/ru_RU-irina-medium.onnx
    !wget https://huggingface.co/rhasspy/piper-voices/resolve/main/ru/ru_RU/irina/medium/ru_RU-irina-medium.onnx.json

target_word = 'Люся' # @param {type:"string"}

RUSSIAN_MODEL_PATH = './ru_RU-irina-medium.onnx'
OUTPUT_WAV_PATH = "test_generation.wav"

# --- НОВЫЙ, ИСПРАВЛЕННЫЙ СПОСОБ ГЕНЕРАЦИИ РЕЧИ ---
def text_to_speech(text):
    """
    Converts text to speech by calling the Piper command-line tool directly.
    """
    command = [
        'piper',
        '--model', RUSSIAN_MODEL_PATH,
        '--output_file', OUTPUT_WAV_PATH,
    ]
    # Запускаем Piper, передаем текст на стандартный ввод (stdin)
    process = subprocess.run(
        command,
        input=text.encode('utf-8'),
        capture_output=True,
        check=True
    )
    if process.stderr:
        print("Piper stderr:", process.stderr.decode('utf-8'))

# Генерируем тестовый аудиофайл
text_to_speech(target_word)

# Воспроизводим результат
Audio(OUTPUT_WAV_PATH, autoplay=True)



In [4]:
# @title  { display-mode: "form" }
# @markdown # 2. Download Data
# @markdown Training custom models requires downloading a wide variety of data
# @markdown that will help make the model perform well in real-world scenarios.
# @markdown This example notebook will download small samples of background noise,
# @markdown music, and Room Impulse Responses (to add echo). This will still produce
# @markdown a custom model that performs well, but if you are interested in adding even more,
# @markdown feel free to extend this notebook to download the full datasets and even add
# @markdown your own!
# @markdown
# @markdown Downloading this example data will usually take about 15 minutes.

# @markdown **Important note!** The data downloaded here has a mixture of different
# @markdown licenses and usage restrictions. As such, any custom models trained with this
# @markdown data should be considered as appropriate for **non-commercial** personal use only.

# ## Install all dependencies
# !pip install datasets
# !pip install scipy
# !pip install tqdm

import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

# install openwakeword (full installation to support training)
!git clone https://github.com/dscripka/openwakeword
!pip install -e ./openwakeword --no-deps
# !cd openwakeword

# install other dependencies
!pip install mutagen==1.47.0
!pip install torchinfo==1.8.0
!pip install torchmetrics==1.2.0
!pip install speechbrain==0.5.14
!pip install audiomentations==0.33.0
!pip install torch-audiomentations==0.11.0
!pip install acoustics==0.2.6
# !pip uninstall tensorflow -y
# !pip install tensorflow-cpu==2.8.1
# !pip install protobuf==3.20.3
# !pip install tensorflow_probability==0.16.0
# !pip install onnx_tf==1.10.0
!pip install onnxruntime==1.22.1 ai_edge_litert==1.4.0 onnxsim
!pip install onnx2tf
!pip install onnx
# !pip install ai_edge_litert==1.2.0
!pip install onnx_graphsurgeon
!pip install sng4onnx
!pip install pronouncing==0.2.0
!pip install datasets==2.14.6
!pip install deep-phonemizer==0.0.19

# Download required models (workaround for Colab)
import os
os.makedirs("./openwakeword/openwakeword/resources/models", exist_ok=True)
!wget https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.onnx -O ./openwakeword/openwakeword/resources/models/embedding_model.onnx
!wget https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.tflite -O ./openwakeword/openwakeword/resources/models/embedding_model.tflite
!wget https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/melspectrogram.onnx -O ./openwakeword/openwakeword/resources/models/melspectrogram.onnx
!wget https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/melspectrogram.tflite -O ./openwakeword/openwakeword/resources/models/melspectrogram.tflite

# Imports
import sys

if "piper-sample-generator/" not in sys.path:
    sys.path.append("piper-sample-generator/")
from generate_samples import generate_samples

import numpy as np
import torch
import sys
from pathlib import Path
import uuid
import yaml
import datasets
import scipy
from tqdm import tqdm

## Download all data

## Download MIR RIR data (takes about ~2 minutes)
output_dir = "./mit_rirs"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    !git lfs install
    !git clone https://huggingface.co/datasets/davidscripka/MIT_environmental_impulse_responses
    rir_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("./MIT_environmental_impulse_responses/16khz").glob("*.wav")]}).cast_column("audio", datasets.Audio())
    # Save clips to 16-bit PCM wav files
    for row in tqdm(rir_dataset):
        name = row['audio']['path'].split('/')[-1]
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))

## Download noise and background audio (takes about ~3 minutes)

# Audioset Dataset (https://research.google.com/audioset/dataset/index.html)
# Download one part of the audioset .tar files, extract, and convert to 16khz
# For full-scale training, it's recommended to download the entire dataset from
# https://huggingface.co/datasets/agkphysics/AudioSet, and
# even potentially combine it with other background noise datasets (e.g., FSD50k, Freesound, etc.)

if not os.path.exists("audioset"):
    os.mkdir("audioset")

    fname = "bal_train09.tar"
    out_dir = f"audioset/{fname}"
    link = "https://huggingface.co/datasets/agkphysics/AudioSet/resolve/main/data/" + fname
    !wget -O {out_dir} {link}
    !cd audioset && tar -xvf bal_train09.tar

    output_dir = "./audioset_16k"
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)

    # Save clips to 16-bit PCM wav files
    audioset_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("audioset/audio").glob("**/*.flac")]})
    audioset_dataset = audioset_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000))
    for row in tqdm(audioset_dataset):
        name = row['audio']['path'].split('/')[-1].replace(".flac", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))

# Free Music Archive dataset
# https://github.com/mdeff/fma

output_dir = "./fma"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    fma_dataset = datasets.load_dataset("rudraml/fma", name="small", split="train", streaming=True)
    fma_dataset = iter(fma_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000)))

    # Save clips to 16-bit PCM wav files
    n_hours = 1  # use only 1 hour of clips for this example notebook, recommend increasing for full-scale training
    for i in tqdm(range(n_hours*3600//30)):  # this works because the FMA dataset is all 30 second clips
        row = next(fma_dataset)
        name = row['audio']['path'].split('/')[-1].replace(".mp3", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))
        i += 1
        if i == n_hours*3600//30:
            break

# Download pre-computed openWakeWord features for training and validation

# training set (~2,000 hours from the ACAV100M Dataset)
# See https://huggingface.co/datasets/davidscripka/openwakeword_features for more information
if not os.path.exists("./openwakeword_features_ACAV100M_2000_hrs_16bit.npy"):
    !wget https://huggingface.co/datasets/davidscripka/openwakeword_features/resolve/main/openwakeword_features_ACAV100M_2000_hrs_16bit.npy

# validation set for false positive rate estimation (~11 hours)
if not os.path.exists("validation_set_features.npy"):
    !wget https://huggingface.co/datasets/davidscripka/openwakeword_features/resolve/main/validation_set_features.npy


Cloning into 'openwakeword'...
remote: Enumerating objects: 1189, done.[K
remote: Counting objects: 100% (514/514), done.[K
remote: Compressing objects: 100% (127/127), done.[K
remote: Total 1189 (delta 411), reused 387 (delta 387), pack-reused 675 (from 2)[K
Receiving objects: 100% (1189/1189), 3.21 MiB | 6.38 MiB/s, done.
Resolving deltas: 100% (734/734), done.
Obtaining file:///content/openwakeword
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: openwakeword
  Building editable for openwakeword (pyproject.toml) ... [?25l[?25hdone
  Created wheel for openwakeword: filename=openwakeword-0.6.0-0.editable-py3-none-any.whl size=17450 sha256=7a37216244c96d20da8c40fcb0f0e3a9ff1e6c7e380f1643a93110c6fb0978d1
  Stored in directory: /tmp/pip-ep

Collecting deep-phonemizer==0.0.19
  Downloading deep-phonemizer-0.0.19.tar.gz (29 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: deep-phonemizer
  Building wheel for deep-phonemizer (setup.py) ... [?25l[?25hdone
  Created wheel for deep-phonemizer: filename=deep_phonemizer-0.0.19-py3-none-any.whl size=33272 sha256=bb6bd09164d0c5b3d166302ed5f2db8cc671dc0d6a82aeddc83756920cc92720
  Stored in directory: /root/.cache/pip/wheels/b9/d7/45/f2ae07184a29327b2a7f93b1f734a936c3a34e57225fca603b
Successfully built deep-phonemizer
Installing collected packages: deep-phonemizer
Successfully installed deep-phonemizer-0.0.19
--2025-09-20 07:46:52--  https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.onnx
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubuserconten

100%|██████████| 270/270 [00:14<00:00, 18.89it/s]


--2025-09-20 07:47:12--  https://huggingface.co/datasets/agkphysics/AudioSet/resolve/main/data/bal_train09.tar
Resolving huggingface.co (huggingface.co)... 3.167.192.123, 3.167.192.19, 3.167.192.4, ...
Connecting to huggingface.co (huggingface.co)|3.167.192.123|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cas-bridge.xethub.hf.co/xet-bridge-us/64897793837ad032c6c25d5b/2da2b65f06f00bed3429be9aa923ef69e211152b1fb32ba30d24630ef3095c32?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250920%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250920T074712Z&X-Amz-Expires=3600&X-Amz-Signature=cf97b8e1cd02376394be960e5c1dfb2593545abc16550c6499b4f09d38804a5f&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27bal_train09.tar%3B+filename%3D%22bal_train09.tar%22%3B&response-content-type=application%2Fx-tar&x-id=GetObject&Expires=1758358032&Policy=eyJTdGF0ZW1lbn

100%|██████████| 685/685 [00:20<00:00, 33.54it/s]
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script: 0.00B [00:00, ?B/s]

Downloading readme:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

 99%|█████████▉| 119/120 [00:37<00:00,  3.16it/s]


--2025-09-20 07:48:52--  https://huggingface.co/datasets/davidscripka/openwakeword_features/resolve/main/openwakeword_features_ACAV100M_2000_hrs_16bit.npy
Resolving huggingface.co (huggingface.co)... 3.167.192.6, 3.167.192.4, 3.167.192.19, ...
Connecting to huggingface.co (huggingface.co)|3.167.192.6|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cas-bridge.xethub.hf.co/xet-bridge-us/64f3a0b6918ffcc15af6923c/7e1cade4c3fda6a5081158383c8d43c4a3e1e42555150b596b373efddf9b5194?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250920%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250920T074853Z&X-Amz-Expires=3600&X-Amz-Signature=b3471c13dead2083ff0038157c5b12232190263f264fe13731114cdfb95bd211&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27openwakeword_features_ACAV100M_2000_hrs_16bit.npy%3B+filename%3D%22openwakeword_features_ACAV100M_2000_hrs_16bit

In [None]:
# @title  { display-mode: "form" }
# @markdown # 3. Train the Model

import yaml
import os
import glob
import subprocess
from tqdm.auto import tqdm

# --- Параметры обучения ---
number_of_examples = 1000 # @param {type:"slider", min:100, max:50000, step:50}
number_of_training_steps = 10000  # @param {type:"slider", min:0, max:50000, step:100}
false_activation_penalty = 1500  # @param {type:"slider", min:100, max:5000, step:50}

# --- ШАГ 1: Генерация аудио-клипов И СОЗДАНИЕ СПИСКОВ ФАЙЛОВ ---

print("--- Step 1: Generating audio clips and file lists ---")

config = yaml.load(open("openwakeword/examples/custom_model.yml", 'r').read(), yaml.Loader)
config["target_phrase"] = [target_word]
config["model_name"] = config["target_phrase"][0].replace(" ", "_")
config["output_dir"] = "./my_custom_model"
n_val_samples = max(500, number_of_examples // 10)
RUSSIAN_MODEL_PATH = "./ru_RU-irina-medium.onnx"

# Создаем папки для аудио
train_clips_dir = os.path.join(config["output_dir"], "positive_clips", config["model_name"])
val_clips_dir = os.path.join(config["output_dir"], "validation_clips", config["model_name"])
os.makedirs(train_clips_dir, exist_ok=True)
os.makedirs(val_clips_dir, exist_ok=True)

def generate_clip(text, output_path):
    command = ['piper', '--model', RUSSIAN_MODEL_PATH, '--output_file', output_path]
    subprocess.run(command, input=text.encode('utf-8'), check=True, capture_output=True)

# Генерируем аудио для обучения
print(f"Generating {number_of_examples} training clips for '{target_word}'...")
for i in tqdm(range(number_of_examples)):
    output_path = os.path.join(train_clips_dir, f"clip_{i}.wav")
    generate_clip(target_word, output_path)

# Генерируем аудио для валидации
print(f"Generating {n_val_samples} validation clips for '{target_word}'...")
for i in tqdm(range(n_val_samples)):
    output_path = os.path.join(val_clips_dir, f"val_clip_{i}.wav")
    generate_clip(target_word, output_path)

# --- КЛЮЧЕВОЕ ИЗМЕНЕНИЕ: СОЗДАЕМ .TXT ФАЙЛЫ СО СПИСКАМИ АУДИО ---
train_txt_path = os.path.join(config["output_dir"], "positive_clips", f"{config['model_name']}.txt")
with open(train_txt_path, 'w') as f:
    for path in glob.glob(os.path.join(train_clips_dir, "*.wav")):
        f.write(path + '\\n')

val_txt_path = os.path.join(config["output_dir"], "validation_clips", f"{config['model_name']}.txt")
with open(val_txt_path, 'w') as f:
    for path in glob.glob(os.path.join(val_clips_dir, "*.wav")):
        f.write(path + '\\n')

print("--- Clip generation and file lists complete! ---")


# --- ШАГ 2: Конфигурация для обучения ---

config["n_samples"] = number_of_examples
config["n_samples_val"] = n_val_samples
config["steps"] = number_of_training_steps
config["target_accuracy"] = 0.5
config["target_recall"] = 0.25
config["max_negative_weight"] = false_activation_penalty
config["background_paths"] = ['./audioset_16k', './fma']
config["false_positive_validation_data_path"] = "validation_set_features.npy"
config["feature_data_files"] = {"ACAV100M_sample": "openwakeword_features_ACAV100M_2000_hrs_16bit.npy"}
# Важно: piper_model_path не нужен, т.к. мы уже сгенерировали клипы
if "piper_model_path" in config:
    del config["piper_model_path"]

with open('my_model.yaml', 'w') as file:
    documents = yaml.dump(config, file)

# --- ШАГ 3: Запуск аугментации и обучения ---

print("\\n--- Step 2: Augmenting clips ---")
!python openwakeword/openwakeword/train.py --training_config my_model.yaml --augment_clips

print("\\n--- Step 3: Training model ---")
!python openwakeword/openwakeword/train.py --training_config my_model.yaml --train_model

# --- ШАГ 4: Конвертация и скачивание ---

print("\\n--- Step 4: Converting and downloading models ---")
onnx_model_path = f"my_custom_model/{config['model_name']}.onnx"
if os.path.exists(onnx_model_path):
    name1, name2 = f"my_custom_model/{config['model_name']}_float32.tflite", f"my_custom_model/{config['model_name']}.tflite"
    !onnx2tf -i {onnx_model_path} -o my_custom_model/ -kat onnx____Flatten_0
    !mv {name1} {name2}

    from google.colab import files
    if os.path.exists(f"my_custom_model/{config['model_name']}.onnx"):
        files.download(f"my_custom_model/{config['model_name']}.onnx")
    if os.path.exists(f"my_custom_model/{config['model_name']}.tflite"):
        files.download(f"my_custom_model/{config['model_name']}.tflite")
else:
    print(f"ERROR: Model training failed. '{onnx_model_path}' was not created.")

--- Step 1: Generating audio clips and file lists ---
Generating 1000 training clips for 'Люся'...


  0%|          | 0/1000 [00:00<?, ?it/s]