PyTorch implementation of Tacotron2 (https://arxiv.org/pdf/1712.05884.pdf) with various variants.
- GMM Attention (GMM)
- Location Sensitive Attention (LAS)
- Dynamic Convolution Attention (DCA)
- StepwiseMonotonicAttention (SMA)
- Tacotron2
- GST-Tacotron
- VAE-Tacotron
This code is validated on a found data, which is extracted from Obama's talking videos, including around 11 hours. Currently, only English is supported.
-
Text processing
Refer to
bash/text-to-seq
to preprocess the input transcriptions. In this script,save_dir
will be the text input dir in the training script.python text_processing.py \ --txt_dir=${the path to the text dir} \ --save_dir=${dir to save texts_seq}
-
Training
This code supports kinds of attention mechanisms and reference embedding methods. You can define the specific model within a hyperparameter config file. In
/bash
, there are already several set hyperparameter files. You can follow the scriptbash/gst-train.sh
to start the training.python main.py \ --cfg_file= config/gst-tts.yaml \ --txt_dir=/path to/texts_seq \ --mel_dir=/path to/mels \ --file_dir=/path to filename dir \ --save_root=/logdir \ --train
Note that "--file_dir" is a directory to save/load filenames for training and test. You don't have to manually create these files. If not filename file exists, this code will create them.
Here are alignment results achieved by this code based on LSA attention mechanism and GMM attention mechanism respectively:
- Griffin-Lim was supported
This project is highly based on the work below.