Multispeaker Community Vocoder model for DiffSinger
This is the code used to train the "HiFiPLN" vocoder.
A trained model for use with OpenUtau is available for download on the official release page.
Because a lot of PLN was spent training this thing.
Python 3.10 or 3.11 is required.
Preperocessing and splitting the dataset into smaller files is done using a single script. Note that if the input files are shorter than --length
seconds, they will be skipped. It is better to provide full unsegmented files to the script, but if your input files are already split into chunks, you can run with --length 0
to disable splitting.
python preproc.py --config PATH_TO_CONFIG -o "dataset/train" --length 1 PATH_TO_TRAIN_DATASET
You will also need to provide some validation audio files. Run preproc.py
with --length 0
to disable segmenting.
python preproc.py --config PATH_TO_CONFIG -o "dataset/valid" --length 0 PATH_TO_VALIDATION_DATASET
python train.py --config "configs/hifipln.yaml"
- If you see an error saying "Total length of `Data Loader` across ranks is zero" then you do not have enough validation files.
- You may want to edit
configs/hifipln.yaml
and changetrain: batch_size: 12
to a value that better fits your available VRAM.
python train.py --config "configs/hifipln.yaml" --resume CKPT_PATH
You may set CKPT_PATH to a log directory (eg. logs/HiFiPLN), and it will find the last checkpoint of the last run.
Download a checkpoint from https://utau.pl/hifipln/#checkpoints-for-finetuning
Save the checkpoint as ckpt/HiFiPLN.ckpt then run:
python train.py --config "configs/hifipln-finetune.yaml"
- Finetuning shouldn't be run for too long, especially for small datasets. Just 2-3 epochs or ~20000 steps should be fine.
python export.py --config configs/hifipln.yaml --output out/hifipln --model CKPT_PATH
You may set CKPT_PATH to a log directory (eg. logs/HiFiPLN), and it will find the last checkpoint of the last run.