Skip to content

thuhcsi/tacotron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTacotron

PyTorch implementation of Tacotron: Towards End-to-End Speech Synthesis, and
PyTorch implementation of Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions.

Features

Branches

New configurations can be created by merging features from the following different branches.

  • master: Basic Tacotron and Tacotron2 implementation

  • dynamic_r: Dynamic reduction factor (r) changing along with training schedule

  • gst: Global style token (GST) support

  • multispeaker: Multi-speaker support with speaker embeddings

Setup

  1. Prepare DATASET directory
    • Prepare train.csv.txt and val.csv.txt files
    • Change training_files and validation_files in hparams.py to the above two files respectively
    • Make necessary modifications to files_to_list to retrieve 'mel_file_path' and 'text' in utils/dataset.py
  2. Install PyTorch
  3. Install python requirements or build docker image
    • Install python requirements: pip install -r requirements.txt

Training

Training from scratch

  1. python train.py -o outdir -l logdir
  2. (OPTIONAL) tensorboard --logdir=logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence
By default, the dataset dependent text embedding layers are ignored

  1. Download the published Tacotron model
  2. python train.py -o outdir -l logdir -c tacotron_statedict.pt --warm_start

Multi-GPU (distributed) Training

  1. python train.py -o outdir -l logdir --hparams=distributed_run=True

Inference demo

  1. Download published Tacotron model
  2. Download published WaveGAN model
  3. jupyter notebook --ip=127.0.0.1 --port=31337
  4. Load inference.ipynb

Note: When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron and the Mel decoder were trained on the same mel-spectrogram representation.

Acknowledgements

This implementation uses code from the following repos as described in our code.