Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)
This is an unofficial implementation of GradTTS. We created this project based on GlowTTS (https://github.com/jaywalnut310/glow-tts). We replace the GlowDecoder with DiffusionDecoder which follows the settings of the original paper. In addition, we also replace torch.distributed with horovod for convenience and we don't use fp16 now.
2021/07/28: LJSpeech Samples uploaded which has the same performance as the original paper's demo.
Please go to egs/ folder, and see run.sh and inference_waveglow_vocoder.py for example use. Before training, please download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY
. And build Monotonic Alignment Search Code (Cython): cd monotonic_align; python setup.py build_ext --inplace
. Before inference, you should download waveglow checkpoint from download_link and put it into the waveglow folder.
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Score-Based Generative Modeling through Stochastic Differential Equations
Heyang Xue(https://github.com/WelkinYang) and Qicong Xie(https://github.com/QicongXie)