# Piper Voice Training Notebook
This notebook automates the process of training a Piper voice model, including:
1. Dataset preprocessing
2. Model training
3. Model exporting
4. Model testing

## Step 1: Install Dependencies
Ensure all required dependencies are installed.

In [None]:
!apt-get update
!apt-get install -y python3-dev espeak-ng
!pip install --upgrade pip wheel setuptools
!pip install pytorch-lightning jupyterlab

# Install Piper
!git clone https://github.com/rhasspy/piper.git
%cd piper/src/python
!pip install -e .
!bash build_monotonic_align.sh
%cd ../../..

## Step 2: Prepare Dataset
Place your dataset in the `dataset` directory with the following structure:
```
dataset/
├── metadata.csv
├── wav/
│   ├── 1.wav
│   ├── 2.wav
│   └── ...
```

In [None]:
import os

# Create dataset directory
dataset_dir = "dataset"
os.makedirs(dataset_dir, exist_ok=True)

# Example: Create metadata.csv and wav directory
metadata_path = os.path.join(dataset_dir, "metadata.csv")
wav_dir = os.path.join(dataset_dir, "wav")
os.makedirs(wav_dir, exist_ok=True)

# Example metadata.csv content (single speaker)
with open(metadata_path, "w") as f:
    f.write("1|This is a test sentence.\n")
    f.write("2|Another test sentence.\n")

## Step 3: Preprocess Dataset
Run the preprocessing script to generate `config.json` and `dataset.jsonl`.

In [None]:
!python3 -m piper_train.preprocess \
  --language en-us \
  --input-dir {dataset_dir} \
  --output-dir training_dir \
  --dataset-format ljspeech \
  --single-speaker \
  --sample-rate 22050

## Step 4: Train the Model
Train the Piper voice model using the preprocessed dataset.

In [None]:
# Download a pre-trained model checkpoint (e.g., lessac medium quality)
!wget https://example.com/path/to/lessac/epoch=2164-step=1355540.ckpt -O lessac.ckpt

# Train the model
!python3 -m piper_train \
    --dataset-dir training_dir \
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 32 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 10000 \
    --resume_from_checkpoint lessac.ckpt \
    --checkpoint-epochs 1 \
    --precision 32

## Step 5: Export the Model
Export the trained model to ONNX format.

In [None]:
# Find the latest checkpoint
import glob
checkpoints = glob.glob("training_dir/lightning_logs/version_0/checkpoints/*.ckpt")
latest_checkpoint = checkpoints[-1]

# Export to ONNX
!python3 -m piper_train.export_onnx \
    {latest_checkpoint} \
    model.onnx

# Copy config.json
!cp training_dir/config.json model.onnx.json

## Step 6: Test the Model
Test the exported model by generating audio from text.

In [None]:
# Create a test sentence
test_sentence = "This is a test sentence generated by Piper."

# Generate audio
!echo '{test_sentence}' | \
  piper -m model.onnx --output_file test.wav

# Play the audio (requires IPython and sound playback support)
from IPython.display import Audio
Audio("test.wav")

## Step 7: Monitor Training with TensorBoard
Monitor training progress using TensorBoard.

In [None]:
# Start TensorBoard
%load_ext tensorboard
%tensorboard --logdir training_dir/lightning_logs

## Step 8: Export and Download the Model
Export the trained model and provide download links for the ONNX model and JSON config.

In [None]:
import shutil
import json
from IPython.display import FileLink

# Define the source paths for the ONNX model and config
onnx_source_path = "model.onnx"
config_source_path = "model.onnx.json"

# Define the destination paths for download
onnx_destination_path = "./piper_model.onnx"
config_destination_path = "./piper_model.json"

# Copy the ONNX model and config to the current working directory
shutil.copy(onnx_source_path, onnx_destination_path)
shutil.copy(config_source_path, config_destination_path)

# Define the JSON file content for Piper
piper_json_data = {
    "audio": {
        "sample_rate": 22050  # Update this based on your model's sample rate
    },
    "espeak": {
        "language": "en-us",  # Update this based on your model's language
        "voice": "piper"      # Update this if you have a specific voice name
    },
    "inference": {
        "noise_scale": 0.667,
        "length_scale": 1.0,
        "noise_w": 0.8
    },
    "phoneme_type": "espeak",  # Update this if your model uses a different phoneme type
    "num_symbols": 256,        # Update this based on your model's configuration
    "num_speakers": 1,         # Update this for multi-speaker models
    "speaker_id_map": {},      # Add speaker mappings if applicable
    "phoneme_id_map": {
        "_": 0,  # Padding
        "^": 1,  # Beginning of utterance
        "$": 2,  # End of utterance
        " ": 3   # Word separator
    }
}

# Write the JSON file
with open(config_destination_path, "w") as json_file:
    json.dump(piper_json_data, json_file, indent=2)

# Generate download links for both files
print("Download your files:")
print("ONNX Model:")
display(FileLink(onnx_destination_path))
print("\nJSON Config:")
display(FileLink(config_destination_path))