Skip to content

Training and Testing

Eren Gölge edited this page Sep 15, 2020 · 5 revisions

TTS can train any language, including different alphabets like Chinese. If the target language is one of the listed languages here, you can enable phoneme based training. We suggest using phonemes for better pronunciation.

TTS also supports multi-GPUs training with gradient averaging.

Setting up config.json

There are different config files under each sub-module (tts, vocodoer etc.). You need to adopt these files for your run and environment. Even though we provide default parameters, you might need to perform a hyper-parameter search on your dataset for the best performance. Note that each dataset has its unique characteristics, therefore they might perform differently with the same set of hyper-parameters.

How to Run

Note that CUDA_VISIBLE_DEVICES environment variable defines GPU(s) you like to use for your run.

  • Training a New Model: CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_*.py --config_path TTS/*/configs/config.json

  • Finetuning as Pre-Trained Model: It continues the training of a previously saved model with the new parameters defined in given config.json. Also, if there is an architectural mismatch between the saved model and the new code, it only initializes compatible layers and randomly initialize the others.

CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_*.py --config_path TTS/*/configs/config.json --restore_path your/model/path.pth.tar

  • Resuming a Previous Training: Following command by default uses all the available GPUs defined by CUDA_VISIBLE_DEVICES.

CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_*.py --continue_path your/training/folder/

  • Distributed training: Following command by default uses all the available GPUs defined by CUDA_VISIBLE_DEVICES.

CUDA_VISIBLE_DEVICES="0,1,2" python TTS/bin/distribute_*.py --config_path TTS/*/configs/config.json

Inspecting Training

Throughout the training, there are different ways to inspect the model performance. On the terminal, you see basic model stats like loss values, step times, gradient norms, etc. However, the best way is to use Tensorboard. If you enable validation iteration (run_eval in config.json), the best way to see your model training is to follow validation losses. The second important indicator is attention alignment. The third and best indicator is to listen to synthesized audios. Keep in mind that all other audio examples, except test audios, are synthesized with teacher forcing. Therefore, test audios are the best performance indicators for real-life model performance.

You can also watch console logs for training. You'll see something like this;

 | > decoder_loss:  # decoder output loss (before postnet)
 | > postnet_loss:  # postnet output(final model output) loss
 | > stopnet_loss:  # stop token prediction loss. It defines performance of the model to predict the end of the decoder iterations.
 | > ga_loss:       # guided attention loss. It defines how diagonal attention alignment is. (If Guided Attention is enabled).
 | > loss:          # total loss
 | > align_error:   # alignment error compared to the fully diagonal alignment. 
 | > avg_spec_length:   # average spectrogram length of the previous batch.
 | > avg_text_length:   # average text length of the previous batch.
 | > step_time:         # step time of the prev. batch.
 | > loader_time:       # loader time delay to load the previous batch.
 | > current_lr:        # current learning rate.

–> EVAL PERFORMANCE
 | > avg_decoder_loss:   # average decoder loss of evaluation phase.
 | > avg_postnet_loss:   # average postnet loss of evaluation phase. IMPORTANT!! This defines the quality of the generated spectrograms. 
 | > avg_stopnet_loss:   # average stopnet loss of evaluation phase. 
 | > avg_ga_loss:        # average guided attention loss. (If Guided Attention is enabled). 
 | > avg_loss:           # average total loss.
 | > avg_align_error:    # average alignment error.  IMPORTANT!! It measures the quality of the attention alignment assuming that the more diagonal the better.

Stopping Training

Stop the training, if your model starts to overfit (validation loss increases as training loss stays the same or decreases). Sometimes, the attention module overfits as well without noticing from the loss values. It is observed when the attention alignment is misaligned at test examples but train and validation examples. If your final model does not work well at this stage, you can retrain the model with higher weight decay, larger dataset, or bigger dropout rate.

Testing Model

Currently, there are two good ways to test your trained model.

  • Notebooks: There are notebooks under notebooks folder to test your models. Check also our Colab Notebooks.
  • Server Package: You can run a demo server and test your model on a simple web interface.

FAQ

If you encounter any issues, first please check the FAQ.