# Easy microWakeWord Training

This notebook provides a simplified approach to training custom wake word models using microWakeWord. It's designed to be accessible to users with minimal machine learning experience while still producing high-quality models.

## What You'll Need

- Python 3.10 installed
- A GPU is recommended for faster training (but not required)
- Your desired wake word phrase (e.g., "hey computer")

## Setup

First, let's install the required packages:

In [None]:
# Install microWakeWord and dependencies
import platform

if platform.system() == "Darwin":
    # `pymicro-features` is installed from a fork to support building on macOS
    !pip install 'git+https://github.com/puddly/pymicro-features@puddly/minimum-cpp-version'

# `audio-metadata` is installed from a fork to unpin `attrs` from a version that breaks Jupyter
!pip install 'git+https://github.com/whatsnowplaying/audio-metadata@d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f'

# Install ipywidgets for interactive notebook elements
!pip install ipywidgets

!git clone https://github.com/BigPappy098/microWakeWord
!pip install -e ./microWakeWord

## Step 1: Choose Your Wake Word

Choose a wake word phrase that you want to use. Good wake words typically have:
- Multiple syllables (3-5 is ideal)
- Distinctive sounds that don't commonly appear in everyday speech
- Clear pronunciation

Examples: "hey computer", "jarvis", "alexa", "computer"

You can use phonetic spellings to improve recognition. For example, "computer" might be better as "kuhm-pyoo-ter".

In [None]:
# Set your wake word here
wake_word = "hey_computer"  # Use underscores instead of spaces

# Listen to a sample of how it will sound
import os
import sys
from IPython.display import Audio

if not os.path.exists("./piper-sample-generator"):
    !git clone https://github.com/rhasspy/piper-sample-generator
    !wget -O piper-sample-generator/models/en_US-libritts_r-medium.pt 'https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt'
    !pip install torch torchaudio piper-phonemize-cross==1.2.1

    if "piper-sample-generator/" not in sys.path:
        sys.path.append("piper-sample-generator/")

!mkdir -p sample_test
!python3 piper-sample-generator/generate_samples.py "{wake_word}" \
--max-samples 1 \
--batch-size 1 \
--output-dir sample_test

Audio("sample_test/0.wav", autoplay=True)

## Step 2: Choose Training Parameters

Now, let's configure the training process based on your wake word and needs:

1. **Wake Word Length**: Choose a preset based on the length of your wake word
   - `short`: For 1-2 syllable wake words (e.g., "jarvis")
   - `medium`: For 3-4 syllable wake words (e.g., "hey computer")
   - `long`: For 5+ syllable wake words (e.g., "hey google assistant")

2. **Augmentation Level**: Choose how much to vary the training samples
   - `light`: Less variation, good for quiet environments
   - `medium`: Balanced variation, good for most home environments
   - `heavy`: High variation, good for noisy environments

3. **Sample Count**: How many synthetic samples to generate
   - 500-1000 is good for testing
   - 2000-5000 is recommended for production models
   
4. **Batch Size**: Size of batches during training
   - Larger values may train faster but require more memory
   - Smaller values use less memory but may train slower

In [None]:
# Configure training parameters
preset = "medium"  # Choose from: "short", "medium", "long"
augmentation_level = "medium"  # Choose from: "light", "medium", "heavy"
samples_count = 1000  # Number of samples to generate
batch_size = 128  # Batch size for training (larger values may be faster but require more memory)

# Output directory
output_dir = f"trained_models/{wake_word}"

## Step 3: Download Negative Samples

To train a robust model, we need "negative" samples - audio that is NOT the wake word. These help the model learn what to ignore.

In [None]:
# Download negative datasets
output_dir = './negative_datasets'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    link_root = "https://huggingface.co/datasets/kahrendt/microwakeword/resolve/main/"
    filenames = ['dinner_party.zip', 'dinner_party_eval.zip', 'no_speech.zip', 'speech.zip']
    for fname in filenames:
        link = link_root + fname
        zip_path = f"negative_datasets/{fname}"
        !wget -O {zip_path} {link}
        !unzip -q {zip_path} -d {output_dir}

## Step 4: Train Your Model

Now we'll use our simplified training interface to train the model. This process includes:
1. Generating synthetic wake word samples
2. Generating spectrograms from the samples
3. Creating a training configuration
4. Training the neural network
5. Converting to a streaming TFLite model for deployment

**Note:** If you encounter a shape mismatch error like "Invalid input shape for input Tensor", it means there's an issue with the spectrogram generation. The latest version of the code should handle this automatically by setting the correct spectrogram dimensions.

In [None]:
from microwakeword.easy_train import WakeWordTrainer

# Create trainer
trainer = WakeWordTrainer(
    wake_word=wake_word,
    output_dir="trained_models",
    preset=preset,
    augmentation_level=augmentation_level,
    samples_count=samples_count,
    batch_size=batch_size
)

# Run the full training pipeline
model_path = trainer.run_full_pipeline()

## Step 5: Download Your Model

Once training is complete, you can download your model for use with ESPHome or other compatible systems.

In [None]:
from IPython.display import FileLink

# Path to the trained model
model_file = os.path.join(f"trained_models/{wake_word}/model/streaming_quantized.tflite")

if os.path.exists(model_file):
    print(f"Your model is ready! Click below to download:")
    display(FileLink(model_file))
else:
    print(f"Model file not found at {model_file}. Check for errors in the training process.")

## Step 6: Create a Model Manifest for ESPHome

To use your model with ESPHome, you need to create a model manifest JSON file. Here's a template:

In [None]:
import json

# Create a model manifest for ESPHome
manifest = {
    "name": wake_word,
    "version": 2,
    "type": "micro_speech",
    "description": f"Custom wake word model for '{wake_word}'",
    "specs": {
        "average_window_length": 10,
        "detection_threshold": 0.7,
        "suppression_ms": 1000,
        "minimum_count": 3,
        "sample_rate": 16000,
        "vocabulary": ["_silence_", "_unknown_", wake_word]
    }
}

manifest_file = os.path.join(f"trained_models/{wake_word}/model/manifest.json")
with open(manifest_file, 'w') as f:
    json.dump(manifest, f, indent=2)

print(f"Model manifest created at {manifest_file}")
display(FileLink(manifest_file))

## Troubleshooting and Fine-Tuning

If your model doesn't perform as expected, here are some tips:

1. **False Positives** (activates too often):
   - Increase the `negative_class_weight` in the advanced configuration
   - Increase the `detection_threshold` in the manifest file
   - Try a different phonetic spelling of your wake word

2. **False Negatives** (doesn't activate when it should):
   - Decrease the `negative_class_weight` in the advanced configuration
   - Decrease the `detection_threshold` in the manifest file
   - Generate more training samples
   - Try a different phonetic spelling of your wake word

3. **Advanced Configuration**:
   - For more control, you can pass an `advanced_config` dictionary to the `WakeWordTrainer`

```python
advanced_config = {
    "training_steps": [30000],  # Train for longer
    "negative_class_weight": [30],  # Increase to reduce false positives
    "time_mask_max_size": [10],  # Increase for more augmentation
    "freq_mask_max_size": [10]   # Increase for more augmentation
}

trainer = WakeWordTrainer(
    wake_word=wake_word,
    output_dir="trained_models",
    preset=preset,
    augmentation_level=augmentation_level,
    samples_count=samples_count,
    batch_size=batch_size,
    advanced_config=advanced_config
)
```