Skip to content

creotiv/RussianTTS-Tacotron2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tacotron 2 (with HiFi-GAN)

PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions.

This implementation includes distributed and automatic mixed precision support and uses the RUSLAN dataset.

Distributed and Automatic Mixed Precision support relies on NVIDIA's Apex and AMP.

Generated samples

https://soundcloud.com/andrey-nikishaev/sets/russian-tts-nvidia-tacotron2

New

  • Added Diagonal guided attention (DGA) from another model https://arxiv.org/abs/1710.08969
  • Added Maximizing Mutual Information for Tacotron (MMI) https://arxiv.org/abs/1909.01145
    • Can't make it work as showed in paper
    • DGA still gives better results, and much cleaner
  • Added Russian text preparation with simple stress dictionary (za'mok i zamo'k)
  • Using HiFi GAN

Pre-requisites

  1. NVIDIA GPU + CUDA cuDNN

Setup

  1. Download and extract the RUSLAN dataset
  2. Clone this repo: git clone https://github.com/NVIDIA/tacotron2.git
  3. CD into this repo: cd tacotron2
  4. Install PyTorch 1.0
  5. Install Apex
  6. Install python requirements or build docker image
    • Install python requirements: pip install -r requirements.txt

Training

  1. python train.py --output_directory=outdir --log_directory=logdir
  2. (OPTIONAL) tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence
By default, the dataset dependent text embedding layers are ignored

  1. Download our published Ruslan Model or LJ Speech model
  2. python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start

Multi-GPU (distributed) and Automatic Mixed Precision Training

  1. python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

Inference demo

  1. Download our published Ruslan Model or LJ Speech model
  2. Download published HiFi-GAN Model (Universal model recommended for non-English languages)
  3. jupyter notebook --ip=127.0.0.1 --port=31337
  4. Load inference.ipynb

N.b. When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron 2 and the Mel decoder were trained on the same mel-spectrogram representation.

Related repos

HiFi-Gan HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Acknowledgements

This implementation uses code from the following repos: Nvidia/Tacotron2

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published