# 🎯 Training a microWakeWord Model

<div style="background-color: #f0f7fb; padding: 15px; border-radius: 10px; border-left: 5px solid #3498db; margin-bottom: 20px;">
    <h2 style="margin-top: 0; color: #3498db;">Welcome to microWakeWord Training!</h2>
    <p>This notebook steps you through training a basic microWakeWord model. It is intended as a <b>starting point</b> for users who want to create their own wake word model. You should use <b>Python 3.10</b> for the best experience.</p>
    <p>The training process follows these main steps:</p>
    <ol>
        <li>Setup the environment and install dependencies</li>
        <li>Generate wake word samples using text-to-speech</li>
        <li>Download and prepare background audio for training</li>
        <li>Set up audio augmentation to create robust training data</li>
        <li>Configure and train the neural network model</li>
        <li>Export the model for use with ESPHome</li>
    </ol>
</div>

<div style="background-color: #fff3cd; padding: 15px; border-radius: 10px; border-left: 5px solid #f0ad4e; margin-bottom: 20px;">
    <h3 style="margin-top: 0; color: #8a6d3b;">⚠️ Important Note</h3>
    <p>The model generated will most likely not be usable for everyday use without experimentation; it may be difficult to trigger or falsely activate too frequently. You will most likely have to experiment with many different settings to obtain a decent model!</p>
</div>

At the end of this notebook, you will be able to download a tflite file. To use this in ESPHome, you need to write a model manifest JSON file. See the [ESPHome documentation](https://esphome.io/components/micro_wake_word) for the details and the [model repo](https://github.com/esphome/micro-wake-word-models/tree/main/models/v2) for examples.

## 📦 Step 1: Setup Environment

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Installs all necessary dependencies for microWakeWord training, including platform-specific requirements.</p>
    <p><b>Expected time:</b> 2-5 minutes depending on your internet connection</p>
    <p><b>Note:</b> You may need to restart your notebook kernel after this step completes.</p>
</div>

In [1]:
!wget https://github.com/korakot/kora/releases/download/v0.10/py310.sh
!bash ./py310.sh -b -f -p /usr/local
!python -m ipykernel install --name "py310" --user

--2025-09-06 15:31:41--  https://github.com/korakot/kora/releases/download/v0.10/py310.sh
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/266951884/0d0623be-3dec-4820-9e7b-69a3a5a75ef7?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-09-06T16%3A23%3A54Z&rscd=attachment%3B+filename%3Dpy310.sh&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-09-06T15%3A23%3A34Z&ske=2025-09-06T16%3A23%3A54Z&sks=b&skv=2018-11-09&sig=LetUXIex2Q0SeBOiX0oZRryGNBocGTNnPFBCSDOWmOY%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc1NzE3MzAwMSwibmJmIjoxNzU3MTcyNzAxLCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5u

In [2]:
# Confirm version
!python3 --version
# Python 3.10

Python 3.10.6


In [3]:
# Installs microWakeWord. Be sure to restart the session after this is finished.
import platform

if platform.system() == "Darwin":
    # `pymicro-features` is installed from a fork to support building on macOS
    !pip install 'git+https://github.com/puddly/pymicro-features@puddly/minimum-cpp-version'

# `audio-metadata` is installed from a fork to unpin `attrs` from a version that breaks Jupyter
!pip install 'git+https://github.com/whatsnowplaying/audio-metadata@d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f'

# Install ipywidgets for interactive notebook elements
!pip install ipywidgets
# Google Colab?
!pip install -U datasets huggingface_hub fsspec torchcodec

#!git clone https://github.com/kahrendt/microWakeWord
!git clone https://github.com/BigPappy098/microWakeWord

!pip install -e ./microWakeWord

Collecting git+https://github.com/whatsnowplaying/audio-metadata@d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f
  Cloning https://github.com/whatsnowplaying/audio-metadata (to revision d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f) to /tmp/pip-req-build-s0j4u321
  Running command git clone --filter=blob:none --quiet https://github.com/whatsnowplaying/audio-metadata /tmp/pip-req-build-s0j4u321
  Running command git rev-parse -q --verify 'sha^d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f'
  Running command git fetch -q https://github.com/whatsnowplaying/audio-metadata d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f
  Running command git checkout -q d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f
  Resolved https://github.com/whatsnowplaying/audio-metadata to commit d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting bidict==0.*
  Downloading bidict-

## 🔊 Step 2: Generate Wake Word Samples

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Generates a single sample of your wake word using text-to-speech so you can verify it sounds correct.</p>
    <p><b>Key parameter to modify:</b></p>
    <ul>
        <li><code>target_word</code> - Set this to your desired wake word (use underscores instead of spaces)</li>
    </ul>
    <p><b>Tips:</b></p>
    <ul>
        <li>Try phonetic spellings for better pronunciation (e.g., "hey_komputer" instead of "hey_computer")</li>
        <li>Listen to the generated sample to verify it sounds correct before proceeding</li>
    </ul>
</div>

In [1]:
# Generates 1 sample of the target word for manual verification.

target_word = 'sanja,'  # Phonetic spellings may produce better samples

import os
import sys
import platform

from IPython.display import Audio

if not os.path.exists("./piper-sample-generator"):
    if platform.system() == "Darwin":
        !git clone -b mps-support https://github.com/kahrendt/piper-sample-generator
    else:
        !git clone https://github.com/rhasspy/piper-sample-generator

    !wget -O piper-sample-generator/models/en_US-libritts_r-medium.pt 'https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt'

    # Install system dependencies
    !pip install torch torchaudio piper-phonemize-cross==1.2.1 piper-TTS torchcodec


    if "piper-sample-generator/" not in sys.path:
        sys.path.append("piper-sample-generator/")

!python3 piper-sample-generator/generate_samples.py "{target_word}" \
--max-samples 1 \
--batch-size 1 \
--output-dir generated_samples \
--model /content/piper-sample-generator/models/en_US-libritts_r-medium.pt

Audio("generated_samples/0.wav", autoplay=True)

Cloning into 'piper-sample-generator'...
remote: Enumerating objects: 142, done.[K
remote: Counting objects: 100% (73/73), done.[K
remote: Compressing objects: 100% (30/30), done.[K
remote: Total 142 (delta 51), reused 49 (delta 43), pack-reused 69 (from 1)[K
Receiving objects: 100% (142/142), 1.03 MiB | 10.58 MiB/s, done.
Resolving deltas: 100% (61/61), done.
--2025-09-06 15:38:30--  https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/642029941/73f4af3c-7cf8-4547-a7b9-3bd29e7f3c33?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-09-06T16%3A22%3A28Z&rscd=attachment%3B+filename%3Den_US-libritts_r-medium.pt&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=39

### 🔊 Step 2.1: Generate Multiple Wake Word Samples

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Generates a larger set of wake word samples (1000 by default) for training.</p>
    <p><b>Key parameters to modify:</b></p>
    <ul>
        <li><code>--max-samples</code> - Number of samples to generate (default: 1000)</li>
        <li><code>--batch-size</code> - How many samples to generate at once (default: 100)</li>
    </ul>
    <p><b>Advanced options:</b> See the <a href="https://github.com/rhasspy/piper-sample-generator">piper-sample-generator documentation</a> for additional parameters like:</p>
    <ul>
        <li><code>--noise-scale</code> - Controls voice variation (higher = more variation)</li>
        <li><code>--noise-w</code> - Controls speaking style variation</li>
        <li><code>--length-scale</code> - Controls speaking speed (higher = slower)</li>
    </ul>
</div>

In [2]:
# Generates a larger amount of wake word samples.
# Start here when trying to improve your model.
# See https://github.com/rhasspy/piper-sample-generator for the full set of
# parameters. In particular, experiment with noise-scales and noise-scale-ws,
# generating negative samples similar to the wake word, and generating many more
# wake word samples, possibly with different phonetic pronunciations.

!python3 piper-sample-generator/generate_samples.py "{target_word}" \
--max-samples 1000 \
--batch-size 100 \
--output-dir generated_samples \
--model /content/piper-sample-generator/models/en_US-libritts_r-medium.pt

DEBUG:__main__:Loading /content/piper-sample-generator/models/en_US-libritts_r-medium.pt
INFO:__main__:Successfully loaded the model
DEBUG:__main__:Batch 1/10 complete
DEBUG:__main__:Batch 2/10 complete
DEBUG:__main__:Batch 3/10 complete
DEBUG:__main__:Batch 4/10 complete
DEBUG:__main__:Batch 5/10 complete
DEBUG:__main__:Batch 6/10 complete
DEBUG:__main__:Batch 7/10 complete
DEBUG:__main__:Batch 8/10 complete
DEBUG:__main__:Batch 9/10 complete
DEBUG:__main__:Batch 10/10 complete
INFO:__main__:Done


## 🎵 Step 3: Download Background Audio Data

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Downloads audio data for augmentation, including room impulse responses and background noise.</p>
    <p><b>Expected time:</b> 10-20 minutes (this step can be slow!)</p>
    <p><b>Why this matters:</b> Good background audio is essential for training a robust wake word model that works in real environments.</p>
    <p><b>Note:</b> The data downloaded has mixed licenses and should be considered for <b>non-commercial personal use only</b>.</p>
</div>

In [3]:
# Downloads audio data for augmentation. This can be slow!
# Borrowed from openWakeWord's automatic_model_training.ipynb, accessed March 4, 2024
#
# **Important note!** The data downloaded here has a mixture of difference
# licenses and usage restrictions. As such, any custom models trained with this
# data should be considered as appropriate for **non-commercial** personal use only.


import datasets
import scipy
import os

import numpy as np

from pathlib import Path
from tqdm import tqdm

## Download MIR RIR data

output_dir = "./mit_rirs"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    rir_dataset = datasets.load_dataset("davidscripka/MIT_environmental_impulse_responses", split="train", streaming=True)
    # Save clips to 16-bit PCM wav files
    for row in tqdm(rir_dataset):
        name = row['audio']['path'].split('/')[-1]
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))

## Download noise and background audio

# Audioset Dataset (https://research.google.com/audioset/dataset/index.html)
# Download one part of the audioset .tar files, extract, and convert to 16khz
# For full-scale training, it's recommended to download the entire dataset from
# https://huggingface.co/datasets/agkphysics/AudioSet, and
# even potentially combine it with other background noise datasets (e.g., FSD50k, Freesound, etc.)

if not os.path.exists("audioset"):
    os.mkdir("audioset")

    fname = "bal_train09.tar"
    out_dir = f"audioset/{fname}"
    link = "https://huggingface.co/datasets/agkphysics/AudioSet/resolve/main/data/" + fname
    !wget -O {out_dir} {link}
    !cd audioset && tar -xf bal_train09.tar

    output_dir = "./audioset_16k"
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)

    # Save clips to 16-bit PCM wav files
    audioset_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("audioset/audio").glob("**/*.flac")]})
    audioset_dataset = audioset_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000))
    for row in tqdm(audioset_dataset):
        name = row['audio']['path'].split('/')[-1].replace(".flac", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))

# Free Music Archive dataset
# https://github.com/mdeff/fma
# (Third-party mchl914 extra small set)

output_dir = "./fma"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    fname = "fma_xs.zip"
    link = "https://huggingface.co/datasets/mchl914/fma_xsmall/resolve/main/" + fname
    out_dir = f"fma/{fname}"
    !wget -O {out_dir} {link}
    !cd {output_dir} && unzip -q {fname}

    output_dir = "./fma_16k"
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)

    # Save clips to 16-bit PCM wav files
    fma_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("fma/fma_small").glob("**/*.mp3")]})
    fma_dataset = fma_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000))
    for row in tqdm(fma_dataset):
        name = row['audio']['path'].split('/')[-1].replace(".mp3", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/936 [00:00<?, ?B/s]

Resolving data files:   0%|          | 0/270 [00:00<?, ?it/s]

0it [00:00, ?it/s]


ImportError: To support decoding audio data, please install 'torchcodec'.

## 🔄 Step 4: Set Up Audio Augmentation

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Configures audio augmentation to create more varied training samples.</p>
    <p><b>Why this matters:</b> Augmentation helps the model learn to recognize your wake word in different environments and conditions.</p>
    <p><b>Key parameters to experiment with:</b></p>
    <ul>
        <li><code>augmentation_probabilities</code> - Chances of applying different audio effects</li>
        <li><code>background_min_snr_db</code> and <code>background_max_snr_db</code> - Signal-to-noise ratio range</li>
    </ul>
</div>

In [None]:
# Sets up the augmentations.
# To improve your model, experiment with these settings and use more sources of
# background clips.

from microwakeword.audio.augmentation import Augmentation
from microwakeword.audio.clips import Clips
from microwakeword.audio.spectrograms import SpectrogramGeneration

clips = Clips(input_directory='generated_samples',
              file_pattern='*.wav',
              max_clip_duration_s=None,
              remove_silence=False,
              random_split_seed=10,
              split_count=0.1,
              )
augmenter = Augmentation(augmentation_duration_s=3.2,
                         augmentation_probabilities = {
                                "SevenBandParametricEQ": 0.1,
                                "TanhDistortion": 0.1,
                                "PitchShift": 0.1,
                                "BandStopFilter": 0.1,
                                "AddColorNoise": 0.1,
                                "AddBackgroundNoise": 0.75,
                                "Gain": 1.0,
                                "RIR": 0.5,
                            },
                         impulse_paths = ['mit_rirs'],
                         background_paths = ['fma_16k', 'audioset_16k'],
                         background_min_snr_db = -5,
                         background_max_snr_db = 10,
                         min_jitter_s = 0.195,
                         max_jitter_s = 0.205,
                         )


### 🔄 Step 4.1: Test Audio Augmentation

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Augments a random clip and plays it back so you can verify the augmentation sounds reasonable.</p>
    <p><b>What to listen for:</b> The wake word should still be recognizable despite background noise and effects.</p>
    <p><b>Tip:</b> If the augmentation is too strong (wake word not audible) or too weak (no background noise), adjust the parameters in the previous cell.</p>
</div>

In [None]:
# Augment a random clip and play it back to verify it works well

from IPython.display import Audio
from microwakeword.audio.audio_utils import save_clip

random_clip = clips.get_random_clip()
augmented_clip = augmenter.augment_clip(random_clip)
save_clip(augmented_clip, 'augmented_clip.wav')

Audio("augmented_clip.wav", autoplay=True)

## 🔄 Step 5: Generate Augmented Features

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Augments samples and saves training, validation, and testing sets.</p>
    <p><b>Why this matters:</b> This creates the actual data that will be used to train the neural network.</p>
    <p><b>Note:</b> The training set uses more repetition to help the model learn, while the testing set uses a streaming approach to better simulate real-world usage.</p>
</div>

In [None]:
# Augment samples and save the training, validation, and testing sets.
# Validating and testing samples generated the same way can make the model
# benchmark better than it performs in real-word use. Use real samples or TTS
# samples generated with a different TTS engine to potentially get more accurate
# benchmarks.

import os
from mmap_ninja.ragged import RaggedMmap

output_dir = 'generated_augmented_features'

if not os.path.exists(output_dir):
    os.mkdir(output_dir)

splits = ["training", "validation", "testing"]
for split in splits:
  out_dir = os.path.join(output_dir, split)
  if not os.path.exists(out_dir):
      os.mkdir(out_dir)


  split_name = "train"
  repetition = 2

  spectrograms = SpectrogramGeneration(clips=clips,
                                     augmenter=augmenter,
                                     slide_frames=10,    # Uses the same spectrogram repeatedly, just shifted over by one frame. This simulates the streaming inferences while training/validating in nonstreaming mode.
                                     step_ms=10,
                                     )
  if split == "validation":
    split_name = "validation"
    repetition = 1
  elif split == "testing":
    split_name = "test"
    repetition = 1
    spectrograms = SpectrogramGeneration(clips=clips,
                                     augmenter=augmenter,
                                     slide_frames=1,    # The testing set uses the streaming version of the model, so no artificial repetition is necessary
                                     step_ms=10,
                                     )

  RaggedMmap.from_generator(
      out_dir=os.path.join(out_dir, 'wakeword_mmap'),
      sample_generator=spectrograms.spectrogram_generator(split=split_name, repeat=repetition),
      batch_size=100,
      verbose=True,
  )

## 📥 Step 6: Download Negative Datasets

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Downloads pre-generated spectrogram features for various negative datasets.</p>
    <p><b>Why this matters:</b> Negative samples help the model learn what is NOT your wake word, reducing false activations.</p>
    <p><b>Datasets included:</b></p>
    <ul>
        <li><code>dinner_party</code> - Conversations in a dinner party setting</li>
        <li><code>dinner_party_eval</code> - Separate evaluation set of dinner party audio</li>
        <li><code>no_speech</code> - Environmental sounds without speech</li>
        <li><code>speech</code> - Various speech samples</li>
    </ul>
</div>

In [None]:
# Downloads pre-generated spectrogram features (made for microWakeWord in
# particular) for various negative datasets. This can be slow!

output_dir = './negative_datasets'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    link_root = "https://huggingface.co/datasets/kahrendt/microwakeword/resolve/main/"
    filenames = ['dinner_party.zip', 'dinner_party_eval.zip', 'no_speech.zip', 'speech.zip']
    for fname in filenames:
        link = link_root + fname

        zip_path = f"negative_datasets/{fname}"
        !wget -O {zip_path} {link}
        !unzip -q {zip_path} -d {output_dir}

## ⚙️ Step 7: Configure Training Parameters

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Creates a YAML configuration file that controls the training process.</p>
    <p><b>Why this matters:</b> These hyperparameters can make a huge difference in model quality.</p>
    <p><b>Key parameters to experiment with:</b></p>
    <ul>
        <li><code>sampling_weight</code> - Controls how often samples from each dataset are used in training</li>
        <li><code>penalty_weight</code> - Controls how much incorrect predictions from each dataset are penalized</li>
        <li><code>training_steps</code> - Number of training iterations (increase for potentially better models)</li>
        <li><code>positive_class_weight</code> and <code>negative_class_weight</code> - Balance between false positives and false negatives</li>
    </ul>
</div>

In [None]:
# Save a yaml config that controls the training process
# These hyperparamters can make a huge different in model quality.
# Experiment with sampling and penalty weights and increasing the number of
# training steps.

import yaml
import os

config = {}

config["window_step_ms"] = 10

config["train_dir"] = (
    "trained_models/wakeword"
)


# Each feature_dir should have at least one of the following folders with this structure:
#  training/
#    ragged_mmap_folders_ending_in_mmap
#  testing/
#    ragged_mmap_folders_ending_in_mmap
#  testing_ambient/
#    ragged_mmap_folders_ending_in_mmap
#  validation/
#    ragged_mmap_folders_ending_in_mmap
#  validation_ambient/
#    ragged_mmap_folders_ending_in_mmap
#
#  sampling_weight: Weight for choosing a spectrogram from this set in the batch
#  penalty_weight: Penalizing weight for incorrect predictions from this set
#  truth: Boolean whether this set has positive samples or negative samples
#  truncation_strategy = If spectrograms in the set are longer than necessary for training, how are they truncated
#       - random: choose a random portion of the entire spectrogram - useful for long negative samples
#       - truncate_start: remove the start of the spectrogram
#       - truncate_end: remove the end of the spectrogram
#       - split: Split the longer spectrogram into separate spectrograms offset by 100 ms. Only for ambient sets

config["features"] = [
    {
        "features_dir": "generated_augmented_features",
        "sampling_weight": 2.0,
        "penalty_weight": 1.0,
        "truth": True,
        "truncation_strategy": "truncate_start",
        "type": "mmap",
    },
    {
        "features_dir": "negative_datasets/speech",
        "sampling_weight": 10.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "random",
        "type": "mmap",
    },
    {
        "features_dir": "negative_datasets/dinner_party",
        "sampling_weight": 10.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "random",
        "type": "mmap",
    },
    {
        "features_dir": "negative_datasets/no_speech",
        "sampling_weight": 5.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "random",
        "type": "mmap",
    },
    { # Only used for validation and testing
        "features_dir": "negative_datasets/dinner_party_eval",
        "sampling_weight": 0.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "split",
        "type": "mmap",
    },
]

# Number of training steps in each iteration - various other settings are configured as lists that corresponds to different steps
config["training_steps"] = [10000]

# Penalizing weight for incorrect class predictions - lists that correspond to training steps
config["positive_class_weight"] = [1]
config["negative_class_weight"] = [20]

config["learning_rates"] = [
    0.001,
]  # Learning rates for Adam optimizer - list that corresponds to training steps
config["batch_size"] = 128

config["time_mask_max_size"] = [
    0
]  # SpecAugment - list that corresponds to training steps
config["time_mask_count"] = [0]  # SpecAugment - list that corresponds to training steps
config["freq_mask_max_size"] = [
    0
]  # SpecAugment - list that corresponds to training steps
config["freq_mask_count"] = [0]  # SpecAugment - list that corresponds to training steps

config["eval_step_interval"] = (
    500  # Test the validation sets after every this many steps
)
config["clip_duration_ms"] = (
    1500  # Maximum length of wake word that the streaming model will accept
)

# The best model weights are chosen first by minimizing the specified minimization metric below the specified target_minimization
# Once the target has been met, it chooses the maximum of the maximization metric. Set 'minimization_metric' to None to only maximize
# Available metrics:
#   - "loss" - cross entropy error on validation set
#   - "accuracy" - accuracy of validation set
#   - "recall" - recall of validation set
#   - "precision" - precision of validation set
#   - "false_positive_rate" - false positive rate of validation set
#   - "false_negative_rate" - false negative rate of validation set
#   - "ambient_false_positives" - count of false positives from the split validation_ambient set
#   - "ambient_false_positives_per_hour" - estimated number of false positives per hour on the split validation_ambient set
config["target_minimization"] = 0.9
config["minimization_metric"] = None  # Set to None to disable

config["maximization_metric"] = "average_viable_recall"

with open(os.path.join("training_parameters.yaml"), "w") as file:
    documents = yaml.dump(config, file)

## 🚀 Step 8: Train the Model

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Trains the neural network model using the data and configuration from previous steps.</p>
    <p><b>Expected time:</b> 30+ minutes (much faster with a GPU)</p>
    <p><b>What to expect:</b> The training process will print progress updates. When finished, it will convert the model to a streaming version suitable for on-device detection.</p>
    <p><b>Key parameters to modify:</b></p>
    <ul>
        <li><code>--train 1</code> - Set to 0 to only convert and test the best-weighted model without training</li>
        <li>Neural network architecture parameters at the end of the command</li>
    </ul>
</div>

In [None]:
# Trains a model. When finished, it will quantize and convert the model to a
# streaming version suitable for on-device detection.
# It will resume if stopped, but it will start over at the configured training
# steps in the yaml file.
# Change --train 0 to only convert and test the best-weighted model.
# On Google colab, it doesn't print the mini-batch results, so it may appear
# stuck for several minutes! Additionally, it is very slow compared to training
# on a local GPU.

!python -m microwakeword.model_train_eval \
--training_config='training_parameters.yaml' \
--train 1 \
--restore_checkpoint 1 \
--test_tf_nonstreaming 0 \
--test_tflite_nonstreaming 0 \
--test_tflite_nonstreaming_quantized 0 \
--test_tflite_streaming 0 \
--test_tflite_streaming_quantized 1 \
--use_weights "best_weights" \
mixednet \
--pointwise_filters "64,64,64,64" \
--repeat_in_block  "1, 1, 1, 1" \
--mixconv_kernel_sizes '[5], [7,11], [9,15], [23]' \
--residual_connection "0,0,0,0" \
--first_conv_filters 32 \
--first_conv_kernel_size 5 \
--stride 3

## 📤 Step 9: Export the Model

<div style="background-color: #e8f4f8; padding: 15px; border-radius: 10px; margin-bottom: 15px;">
    <p><b>What this step does:</b> Downloads the trained TFLite model file for use with ESPHome.</p>
    <p><b>Next steps:</b></p>
    <ol>
        <li>Create a model manifest JSON file based on the training results</li>
        <li>Adjust the probability threshold based on test results</li>
        <li>Upload both files to your ESPHome device</li>
    </ol>
    <p><b>Resources:</b></p>
    <ul>
        <li><a href="https://esphome.io/components/micro_wake_word">ESPHome documentation</a></li>
        <li><a href="https://github.com/esphome/micro-wake-word-models/tree/main/models/v2">Example model configurations</a></li>
    </ul>
</div>

In [None]:
# Downloads the tflite model file. To use on the device, you need to write a
# Model JSON file. See https://esphome.io/components/micro_wake_word for the
# documentation and
# https://github.com/esphome/micro-wake-word-models/tree/main/models/v2 for
# examples. Adjust the probability threshold based on the test results obtained
# after training is finished. You may also need to increase the Tensor arena
# model size if the model fails to load.

import os

# Get the model file path
model_path = "trained_models/wakeword/tflite_stream_state_internal_quant/stream_state_internal_quant.tflite"

# Check if running in a Jupyter environment
try:
    from google.colab import files
    # If in Colab, use files.download
    files.download(model_path)
    print(f"Model downloaded from {model_path}")
except ImportError:
    # If not in Colab, just print the path
    print(f"\nModel saved at: {os.path.abspath(model_path)}")
    print("\nTo use this model with ESPHome:")
    print("1. Create a model manifest JSON file")
    print("2. Copy both files to your ESPHome configuration directory")
    print("3. Configure ESPHome to use the model")

## 🎉 Congratulations!

<div style="background-color: #dff0d8; padding: 15px; border-radius: 10px; border-left: 5px solid #3c763d; margin-bottom: 20px;">
    <h3 style="margin-top: 0; color: #3c763d;">You've Successfully Trained a Wake Word Model!</h3>
    <p>You've completed all the steps to train a custom wake word model with microWakeWord. Here's what you can do next:</p>
    <ol>
        <li><b>Test your model</b> - Try different probability thresholds to balance between detection rate and false positives</li>
        <li><b>Experiment</b> - Try different training parameters to improve your model</li>
        <li><b>Deploy to ESPHome</b> - Use your model on an ESP32 device</li>
    </ol>
    <p>Remember that wake word model training is an iterative process. You may need to adjust parameters and retrain several times to get the best results for your specific use case.</p>
</div>

### Example ESPHome Configuration

```yaml
# Wake word configuration
micro_wake_word:
  model_file: "stream_state_internal_quant.tflite"
  model_name: "my_wake_word"
  probability_cutoff: 0.5  # Adjust based on training results
  
binary_sensor:
  - platform: micro_wake_word
    name: "Wake Word Detected"
    id: wake_word
    model_id: my_wake_word
    
# Optional - add a text-to-speech response
esphome:
  on_boot:
    priority: -100
    then:
      - delay: 5s
      - logger.log: "Wake word detection ready"
```