# Piper TTS Fine-Tuning (2025 Compatible)

This notebook allows you to fine-tune a Piper TTS model using your own voice. It uses the maintained `OHF-Voice` fork to ensure compatibility after the archiving of the original repository.

## 1. Setup Environment
Install necessary dependencies and the Piper training tools.

In [None]:
# Install system dependencies
!sudo apt-get install -y espeak-ng

# Clone the maintained fork of Piper
!git clone https://github.com/OHF-Voice/piper1-gpl.git
%cd piper1-gpl/src/python

# Install python dependencies
!pip install --upgrade pip
!pip install -e .[train]

# Build the monotonic alignment search (required for training)
!./build_monotonic_align.sh

## 2. Mount Google Drive
Mount your drive to access your dataset and save your trained model.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# CONFIGURATION - Update these paths!
dataset_zip_path = "/content/drive/MyDrive/piper_training/my_voice_dataset.zip"
model_name = "my_custom_voice"

## 3. Prepare Dataset
Unzip your dataset. The zip file should contain a `metadata.csv` and a folder of wav files, formatted according to LJSpeech standards.

In [None]:
!unzip -q "$dataset_zip_path" -d /content/dataset

## 4. Preprocessing
Download a base model to fine-tune from (e.g., typically `en_US-lessac-medium` or similar) and preprocess your data.

In [None]:
# Create directory for training artifacts
!mkdir -p /content/training

# Preprocess the dataset
# Adjust --language and --sample-rate (22050 for medium/high, 16000 for low) as needed
!python3 -m piper_train.preprocess \
  --language en \
  --input-dir /content/dataset \
  --output-dir /content/training \
  --dataset-format ljspeech \
  --single-speaker \
  --sample-rate 22050

## 5. Training (Fine-Tuning)
Start the training process. 
**Note:** You need a base checkpoint to resume from. You can download one using `wget`.

In [None]:
# Example: Download en_US-libritts_r-medium checkpoint (adjust URL for your desired base model)
!wget -O /content/training/epoch=2324-step=1355936.ckpt https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/libritts_r/medium/epoch%3D2324-step%3D1355936.ckpt

# Start Fine-Tuning
!python3 -m piper_train \
  --dataset-dir /content/training \
  --accelerator gpu \
  --devices 1 \
  --batch-size 32 \
  --validation-split 0.0 \
  --num-test-examples 0 \
  --max-epochs 6000 \
  --resume_from_checkpoint /content/training/epoch=2324-step=1355936.ckpt \
  --checkpoint-epochs 50 \
  --precision 16-mixed

## 6. Export Model
Convert the trained checkpoint to ONNX format for use.

In [None]:
# Export to ONNX
!python3 -m piper_train.export_onnx \
  /content/training/lightning_logs/version_0/checkpoints/*.ckpt \
  /content/drive/MyDrive/piper_training/$model_name.onnx