An implement of "Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training"
- PPG features (10ms frameshift)
 - F0 features (10ms frameshift)
 - Speaker embedding (One embedding per wav file)
 - Audio files (wave format, 24000 sample rate, mono)
 
Set path / directory or other configurations in .json files in directory "configs" Rewrite your data load function in utils/dataset.py
Single GPU
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/stage1.json
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/stage2.json
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/stage3.json