PITS

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

Abstract: Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes variational inference to model pitch. Based on VITS, PITS incorporates the Yingram encoder, the Yingram decoder, and adversarial training of pitch-shifted synthesis to achieve pitch-controllability. Experiments demonstrate that PITS generates high-quality speech that is indistinguishable from ground truth speech and has high pitch-controllability without quality degradation. Code and audio samples will be available at https://github.com/anonymous-pits/pits.

Training code is uploaded.

Demo and Checkpoint are uploaded at Hugging Face Space🤗

Audio samples are uploaded at github.io.

For the pitch-shifted Inference, we unify to use the notation in scope-shift, s, instead of pitch-shift.

Preprint version contains some errors! Please wait for the update!

README IS WIP...

fix requirements.txt for colab
running on python3.9

Preparing the environment

git clone https://github.com/Kurisu-Preston/pits

pip install -r requirements.txt

in monotonic_align

python setup.py build_ext --inplace

Preprocess

prepare filelist train.list/val.list

support chinese[ZH] japanese[JA] english[EN] korean[KO]

python preprocess.py

Config

you need to modify speakers list in config/config_cjke.yaml
you can also modify the keep_ckpts and log_path
data_path is the root path of your data.

Training

download the pretrained checkpoint

wget https://huggingface.co/spaces/anonymous-pits/pits/resolve/main/logs/pits_vctk_AD_3000.pth

fine tuning the pretrained checkpoint

CUDA_VISIBLE_DEVICES=0 python train.py -c configs/config_cjke.yaml -m cjke -t logs/pits_vctk_AD_3000.pth

training from scratch

python train.py -c configs/config_cjke.yaml -m cjke

resume from previous training checkpoint

CUDA_VISIBLE_DEVICES=0 python train.py -c configs/config_cjke.yaml -m cjke -r logs/cjke/cjke_3000.pth

References

Official VITS Implementation: https://github.com/jaywalnut310/vits
NANSY Implementation from dhchoi99: https://github.com/dhchoi99/NANSY
Official Avocodo Implementation: https://github.com/ncsoft/avocodo
Official PhaseAug Implementation: https://github.com/mindslab-ai/phaseaug
Tacotron Implementation from keithito: https://github.com/keithito/tacotron
CSTR VCTK Corpus (version 0.92): https://datashare.ed.ac.uk/handle/10283/3443
G2P for demo, g2p_en from Kyubyong: https://github.com/Kyubyong/g2p

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
asset		asset
configs		configs
filelists		filelists
monotonic_align		monotonic_align
text		text
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
analysis.py		analysis.py
app.py		app.py
attentions.py		attentions.py
commons.py		commons.py
config_cjke.yaml		config_cjke.yaml
data_utils.py		data_utils.py
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
pqmf.py		pqmf.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py
transforms.py		transforms.py
utils.py		utils.py
yin.py		yin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PITS

Preparing the environment

Preprocess

Config

Training

References

About

Releases

Packages

Languages

License

Kurisu-Preston/pits

Folders and files

Latest commit

History

Repository files navigation

PITS

Preparing the environment

Preprocess

Config

Training

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages