# Czech CML Wake Word Training
## Train OpenWakeWord model for Czech "c√© em el" wake word

This notebook trains a custom wake word detection model for the Czech pronunciation of "CML" (c√© em el).

**Estimated time:** 60-90 minutes total
- Setup & data download: 15-20 min
- Sample generation: 10-15 min
- Augmentation: 5-10 min
- Training: 20-30 min

**Requirements:** GPU runtime (T4 recommended)

## 1. Check GPU and Install Dependencies

In [None]:
# Check GPU availability
!nvidia-smi
print("\n‚úÖ If you see GPU info above, you're good to go!")
print("‚ùå If not, go to Runtime > Change runtime type > Select GPU")

In [None]:
# Install required packages
print("üì¶ Installing dependencies...\n")

!pip install -q torch torchaudio
!pip install -q openwakeword
!pip install -q piper-tts
!pip install -q pydub

print("\n‚úÖ Dependencies installed!")

## 2. Clone OpenWakeWord Repository

In [None]:
# Clone OpenWakeWord for training utilities
!git clone -q https://github.com/dscripka/openWakeWord.git
!mv openWakeWord openwakeword

import sys
sys.path.append('/content/openwakeword')

print("‚úÖ OpenWakeWord repository cloned")

## 3. Download Training Data

In [None]:
# Download MIT room impulse responses (for reverb augmentation)
print("üì• Downloading room impulse responses...")
!wget -q https://github.com/dscripka/openWakeWord/releases/download/v0.1.1/impulse_responses.zip
!unzip -q impulse_responses.zip
!rm impulse_responses.zip

print("‚úÖ Room impulse responses downloaded")

In [None]:
# Download FMA background audio (for noise augmentation)
print("üì• Downloading background audio...")
!wget -q https://github.com/dscripka/openWakeWord/releases/download/v0.1.1/background_audio.zip
!unzip -q background_audio.zip
!rm background_audio.zip

print("‚úÖ Background audio downloaded")

In [None]:
# Download pre-computed training features (ACAV100M dataset - 16GB)
print("üì• Downloading training features (16GB - this will take 10-15 minutes)...\n")
!wget -q --show-progress https://github.com/dscripka/openWakeWord/releases/download/v0.5.0/openwakeword_features_train.npy

print("\n‚úÖ Training features downloaded")

In [None]:
# Download validation features (176 MB)
print("üì• Downloading validation features...\n")
!wget -q --show-progress https://github.com/dscripka/openWakeWord/releases/download/v0.5.0/openwakeword_features_val.npy

print("\n‚úÖ Validation features downloaded")

## 4. Download Czech Piper TTS Model

In [None]:
# Download Czech Piper TTS model (jirka voice - medium quality)
print("üì• Downloading Czech Piper TTS model...\n")

!wget -q --show-progress https://huggingface.co/rhasspy/piper-voices/resolve/main/cs/cs_CZ/jirka/medium/cs_CZ-jirka-medium.onnx
!wget -q https://huggingface.co/rhasspy/piper-voices/resolve/main/cs/cs_CZ/jirka/medium/cs_CZ-jirka-medium.onnx.json

print("\n‚úÖ Czech Piper model downloaded!")
print("üìÅ Model: cs_CZ-jirka-medium.onnx")

## 5. Setup Piper Binary for Sample Generation

In [None]:
# Download and setup Piper binary
print("üì• Downloading Piper TTS binary...\n")

!wget -q --show-progress https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz
!tar -xzf piper_amd64.tar.gz
!rm piper_amd64.tar.gz

print("\n‚úÖ Piper binary ready!")
print("üìÅ Location: ./piper/piper")

In [None]:
# Create custom generate_samples function using Piper binary
import os
import subprocess
import random
import string
from scipy.io import wavfile
import numpy as np
import librosa

def generate_samples_piper(model_path, text, n_samples, output_dir, sample_rate=16000):
    """
    Generate audio samples using Piper binary directly.
    Mimics the behavior of openwakeword.train.generate_samples()
    """
    os.makedirs(output_dir, exist_ok=True)
    
    # Add random variations to text for diversity
    text_variations = [
        text,
        text + '.',
        text + '!',
        text.upper(),
        text.lower(),
        f'{text} {text}',  # Repeat
    ]
    
    generated = 0
    batch_num = 0
    
    print(f"Generating {n_samples} samples...")
    
    while generated < n_samples:
        # Pick random variation
        current_text = random.choice(text_variations)
        
        # Create temp input file
        temp_txt = f'/tmp/piper_input_{batch_num}.txt'
        temp_wav = f'/tmp/piper_output_{batch_num}.wav'
        
        with open(temp_txt, 'w', encoding='utf-8') as f:
            f.write(current_text)
        
        # Generate audio using Piper
        try:
            subprocess.run(
                ['./piper/piper', '--model', model_path, '--output_file', temp_wav],
                stdin=open(temp_txt, 'r'),
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
                check=True
            )
            
            # Load and resample to 16kHz if needed
            audio, sr = librosa.load(temp_wav, sr=sample_rate)
            
            # Add slight random speed variation (0.9x to 1.1x)
            speed_factor = random.uniform(0.9, 1.1)
            audio = librosa.effects.time_stretch(audio, rate=speed_factor)
            
            # Save to output directory
            output_file = os.path.join(output_dir, f'sample_{generated:04d}.wav')
            wavfile.write(output_file, sample_rate, (audio * 32767).astype(np.int16))
            
            generated += 1
            
            if generated % 100 == 0:
                print(f"  Generated {generated}/{n_samples} samples...")
            
        except Exception as e:
            print(f"Warning: Failed to generate sample {batch_num}: {e}")
        
        finally:
            # Clean up temp files
            if os.path.exists(temp_txt):
                os.remove(temp_txt)
            if os.path.exists(temp_wav):
                os.remove(temp_wav)
        
        batch_num += 1
    
    print(f"‚úÖ Generated {generated} samples in {output_dir}")
    return generated

print("‚úÖ Custom generate_samples_piper() function ready!")

## 6. Fix Bug in OpenWakeWord train.py

In [None]:
# Fix all 4 occurrences of generate_samples() bug in train.py
import re

print("üîß Fixing generate_samples() bug in train.py...\n")

train_py = "/content/openwakeword/openwakeword/train.py"

with open(train_py, 'r') as f:
    content = f.read()

# Fix pattern: add model= parameter before text=
# Using \s* to match 0 or more whitespace (catches all 4 lines including 685!)
pattern = r'generate_samples\(\s*text='
replacement = 'generate_samples(model=config["tts_model_path"], text='

fixed_content = re.sub(pattern, replacement, content)

# Count fixes
original_count = len(re.findall(pattern, content))
fixed_count = len(re.findall(r'generate_samples\(model=', fixed_content))

with open(train_py, 'w') as f:
    f.write(fixed_content)

print(f"‚úÖ Fixed {fixed_count} occurrences in train.py")
print(f"   (Original had {original_count} broken calls)")
print("\n‚úÖ Bug fix complete!")

## 7. Create Configuration for Czech CML

In [None]:
# Create config for Czech "c√© em el" wake word
import yaml

config = {
    'target_phrase': 'c√© em el',
    'tts_model_path': '/content/cs_CZ-jirka-medium.onnx',
    'n_samples': 1000,
    'val_samples': 100,
    'n_epochs': 30,
    'batch_size': 128,
    'learning_rate': 0.001,
    'model_name': 'cml_cs'
}

with open('cml_config.yaml', 'w') as f:
    yaml.dump(config, f, allow_unicode=True)

print("‚úÖ Config created: cml_config.yaml")
print(f"üéØ Target phrase: '{config['target_phrase']}' (Czech pronunciation)")
print(f"ü§ñ TTS Model: {config['tts_model_path']}")
print(f"üìä Training samples: {config['n_samples']}")
print(f"üìä Validation samples: {config['val_samples']}")

## 8. Generate Training Samples (Czech Pronunciation)

In [None]:
# Generate positive training samples with Czech pronunciation
import os

# Create output directories
os.makedirs('./custom_clips/positive', exist_ok=True)
os.makedirs('./custom_clips/negative', exist_ok=True)

print(f"üé§ Generating {config['n_samples']} training samples...")
print(f"üó£Ô∏è  Phrase: '{config['target_phrase']}' (Czech)")
print(f"ü§ñ Using TTS: {config['tts_model_path']}")
print("\n‚è∞ This will take ~10-15 minutes...\n")

# Generate training clips using our custom function
clip_count = generate_samples_piper(
    model_path=config['tts_model_path'],
    text=config['target_phrase'],
    n_samples=config['n_samples'],
    output_dir='./custom_clips/positive'
)

print(f"\n‚úÖ Generated {clip_count} Czech training clips!")

## 9. Generate Validation Samples

In [None]:
# Generate validation samples
os.makedirs('./custom_clips_val/positive', exist_ok=True)
os.makedirs('./custom_clips_val/negative', exist_ok=True)

print(f"üé§ Generating {config['val_samples']} validation samples...\n")

val_clip_count = generate_samples_piper(
    model_path=config['tts_model_path'],
    text=config['target_phrase'],
    n_samples=config['val_samples'],
    output_dir='./custom_clips_val/positive'
)

print(f"\n‚úÖ Generated {val_clip_count} Czech validation clips!")

## 10. Augment Audio Clips

In [None]:
# Augment clips with background noise, room impulse responses, speed/pitch variation
from openwakeword.train import augment_clips

print("üîä Augmenting training clips...")
print("   Adding: background noise, reverb, speed/pitch variation")
print("‚è∞ This will take ~5-10 minutes...\n")

# Augment training clips
augment_clips(
    positive_clips_dir='./custom_clips/positive',
    negative_clips_dir='./custom_clips/negative',
    output_dir='./custom_clips_augmented',
    background_paths='./background_audio',
    impulse_response_paths='./impulse_responses',
    n_positive=4000,
    n_negative=4000
)

print("‚úÖ Training clips augmented!\n")

print("üîä Augmenting validation clips...\n")

# Augment validation clips
augment_clips(
    positive_clips_dir='./custom_clips_val/positive',
    negative_clips_dir='./custom_clips_val/negative',
    output_dir='./custom_clips_val_augmented',
    background_paths='./background_audio',
    impulse_response_paths='./impulse_responses',
    n_positive=500,
    n_negative=500
)

print("\n‚úÖ All clips augmented!")

## 11. Train the Czech CML Model

In [None]:
# Train the Czech CML wake word model
from openwakeword.train import train_model
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

print("üöÄ Starting Czech CML model training...")
print(f"üéØ Target: {config['target_phrase']}")
print(f"üíæ Model name: {config['model_name']}")
print(f"üîß Device: {device}")
print(f"üìä Epochs: {config['n_epochs']}")
print(f"üì¶ Batch size: {config['batch_size']}")
print("\n‚è∞ Estimated time: 20-30 minutes on GPU\n")
print("=" * 60)

# Train model
train_model(
    train_data_dir='./custom_clips_augmented',
    val_data_dir='./custom_clips_val_augmented',
    output_dir='./trained_models',
    model_name=config['model_name'],
    epochs=config['n_epochs'],
    batch_size=config['batch_size'],
    learning_rate=config['learning_rate']
)

print("\n" + "=" * 60)
print("‚úÖ Czech CML model training complete!")
print(f"üìÅ Model saved to: ./trained_models/{config['model_name']}.onnx")

## 12. Download the Trained Model

In [None]:
# Download the trained Czech CML model
from google.colab import files
import os

model_path = f'./trained_models/{config["model_name"]}.onnx'

if os.path.exists(model_path):
    size_mb = os.path.getsize(model_path) / (1024*1024)
    print(f"üì• Downloading Czech CML model...")
    print(f"üìä Size: {size_mb:.2f} MB\n")
    
    files.download(model_path)
    
    print(f"\n‚úÖ Model downloaded: {config['model_name']}.onnx")
    print("\nüìù Next steps:")
    print("   1. Upload this model to your Linux system")
    print("   2. Place it in: /home/jirka/oc/openwakeword-models/")
    print("   3. Update cml-wake-listener.py to use OpenWakeWord")
else:
    print(f"‚ùå Model not found at: {model_path}")
    print("   Make sure training completed successfully!")

## 13. Test the Model (Optional)

In [None]:
# Test the trained model on validation samples
from openwakeword.model import Model
import numpy as np
from scipy.io import wavfile
import random

print("üß™ Testing trained model...\n")

# Load model
oww = Model(wakeword_models=[model_path])

# Test on a few validation samples
val_positives = [f for f in os.listdir('./custom_clips_val/positive') if f.endswith('.wav')]
test_samples = random.sample(val_positives, min(5, len(val_positives)))

print(f"Testing {len(test_samples)} positive samples:\n")

for sample in test_samples:
    wav_path = f'./custom_clips_val/positive/{sample}'
    rate, audio = wavfile.read(wav_path)
    
    # Convert to int16 if needed
    if audio.dtype != np.int16:
        audio = (audio * 32767).astype(np.int16)
    
    # Process in chunks
    chunk_size = 1280  # OpenWakeWord chunk size
    max_score = 0.0
    
    for i in range(0, len(audio) - chunk_size, chunk_size):
        chunk = audio[i:i+chunk_size]
        prediction = oww.predict(chunk)
        score = prediction.get(config['model_name'], 0.0)
        max_score = max(max_score, score)
    
    status = "‚úÖ DETECTED" if max_score > 0.5 else "‚ùå missed"
    print(f"{sample}: {max_score:.3f} {status}")

print("\n‚úÖ Testing complete!")

## Summary

‚úÖ You now have a trained Czech wake word model for "c√© em el"!

### Model Details:
- **Name:** cml_cs.onnx
- **Target:** "c√© em el" (Czech pronunciation)
- **TTS:** Czech Piper (cs_CZ-jirka-medium)
- **Training samples:** 1000 ‚Üí 4000 augmented
- **Validation samples:** 100 ‚Üí 500 augmented

### Next Steps:
1. Download the model (cml_cs.onnx)
2. Install OpenWakeWord on your Linux system
3. Update cml-wake-listener.py to use OpenWakeWord instead of Porcupine
4. Test with Czech pronunciation of "c√© em el"

### Installation on Linux:
```bash
pip install openwakeword
mkdir -p /home/jirka/oc/openwakeword-models
# Upload cml_cs.onnx to this directory
```