Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for German TTS with Thorsten dataset #405
Add support for German TTS with Thorsten dataset #405
Changes from all commits
9b092a6
cdc35cf
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
😋 TensorFlowTTS
Real-Time State-of-the-art Speech Synthesis for Tensorflow 2
🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
What's new
Features
Requirements
This repository is tested on Ubuntu 18.04 with:
Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. We recommend you install TensorFlow 2.3.0 to training in case you want to use MultiGPU.
Installation
With pip
From source
Examples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.
If you want to upgrade the repository and its dependencies:
$ git pull $ pip install --upgrade .
Supported Model architectures
TensorFlowTTS currently provides the following architectures:
We are also implementing some techniques to improve quality and convergence speed from the following papers:
Audio Samples
Here in an audio samples on valid set. tacotron-2, fastspeech, melgan, melgan.stft, fastspeech2, multiband_melgan
Tutorial End-to-End
Prepare Dataset
Prepare a dataset in the following format:
Where
metadata.csv
has the following format:id|transcription
. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.Note that
NAME_DATASET
should be[ljspeech/kss/baker/libritts]
for example.Preprocessing
The preprocessing has two steps:
To reproduce the steps above:
Right now we only support
ljspeech
,kss
,baker
and,libritts
libritts
andthorsten
for dataset argument. In the future, we intend to support more datasets.Note: To run
libritts
preprocessing, please first read the instruction in examples/fastspeech2_libritts. We need to reformat it first before run preprocessing.After preprocessing, the structure of the project folder should be:
stats.npy
contains the mean and std from the training split mel spectrogramsstats_energy.npy
contains the mean and std of energy values from the training splitstats_f0.npy
contains the mean and std of F0 values in the training splittrain_utt_ids.npy
/valid_utt_ids.npy
contains training and validation utterances IDs respectivelyWe use suffix (
ids
,raw-feats
,raw-energy
,raw-f0
,norm-feats
, andwave
) for each input type.IMPORTANT NOTES:
dump
folder SHOULD follow the above structure to be able to use the training script, or you can modify it by yourself 😄.Training models
To know how to train model from scratch or fine-tune with other datasets/languages, please see detail at example directory.
Abstract Class Explaination
Abstract DataLoader Tensorflow-based dataset
A detail implementation of abstract dataset class from tensorflow_tts/dataset/abstract_dataset. There are some functions you need overide and understand:
IMPORTANT NOTES:
Some examples to use this abstract_dataset are tacotron_dataset.py, fastspeech_dataset.py, melgan_dataset.py, fastspeech2_dataset.py
Abstract Trainer Class
A detail implementation of base_trainer from tensorflow_tts/trainer/base_trainer.py. It include Seq2SeqBasedTrainer and GanBasedTrainer inherit from BasedTrainer. All trainer support both single/multi GPU. There a some functions you MUST overide when implement new_trainer:
All models on this repo are trained based-on GanBasedTrainer (see train_melgan.py, train_melgan_stft.py, train_multiband_melgan.py) and Seq2SeqBasedTrainer (see train_tacotron2.py, train_fastspeech.py).
End-to-End Examples
You can know how to inference each model at notebooks or see a colab (for English), colab (for Korean). Here is an example code for end2end inference with fastspeech and melgan.
Contact
Minh Nguyen Quan Anh: nguyenquananhminh@gmail.com, erogol: erengolge@gmail.com, Kuan Chen: azraelkuan@gmail.com, Dawid Kobus: machineko@protonmail.com, Takuya Ebata: meguru.mokke@gmail.com, Trinh Le Quang: trinhle.cse@gmail.com, Yunchao He: yunchaohe@gmail.com, Alejandro Miguel Velasquez: xml506ok@gmail.com
License
Overall, Almost models here are licensed under the Apache 2.0 for all countries in the world, except in Viet Nam this framework cannot be used for production in any way without permission from TensorFlowTTS's Authors. There is an exception, Tacotron-2 can be used with any purpose. If you are Vietnamese and want to use this framework for production, you Must contact us in advance.
Acknowledgement
We want to thank Tomoki Hayashi, who discussed with us much about Melgan, Multi-band melgan, Fastspeech, and Tacotron. This framework based-on his great open-source ParallelWaveGan project.