Keras implementations of Deep mind's Tacotron-2. A deep neural network architecture described in this paper: Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions
- write a Keras inpimentation of Tacotron-2 (in progress)
- achieve a high quality human-like text to speech synthesizer based on DeepMind's paper
- achieve a high speed of training and work ofr multi-GPU systems.
- provide a pre-trained Tacotron-2 model
- provide compatibility with Mozilla LPCNet project (Optional)
Our preprocessing only supports Ljspeech and Ljspeech-like datasets (M-AILABS speech data)! If running on datasets stored differently, you will probably need to make your own preprocessing script.
The model described by the authors can be divided in two parts:
- Spectrogram prediction network
- Vocoder (e.g. Wavenet vocoder)
To have an in-depth exploration of the model architecture, training procedure and preprocessing logic, refer to our wiki
- Clone a repository
$ git clone https://github.com/Stevel705/Tacotron-2-keras.git
- Download LJ-like dataset (e.g. english Speech Dataset)
- Extract dataset to
Tacotron-2-keras\data
folder - Run
$ python3 1_create_audio_dataset.py
to process an audio - Run
$ python3 2_create_text_dataset.py
to create a text data - Train tacotron
$ python3 3_train.py
- Test pretrained model
$ python3 4_test.py
(optional) - Synthesize mels and speech
$ python3 5_syntezer.py
(in progress)
MIT Lisense