Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤘 Support Multi-GPU gradient Accumulate for trainer. #377

Merged
merged 4 commits into from
Nov 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,17 @@
:zany_face: TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.

## What's new
- 2020/08/23 **(NEW!)** Add Parallel WaveGAN tensorflow implementation. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/parallel_wavegan)
- 2020/08/23 **(NEW!)** Add MBMelGAN G + ParallelWaveGAN G example. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/multiband_pwgan)
- 2020/08/20 **(NEW!)** Add C++ inference code. Thank [@ZDisket](https://github.com/ZDisket). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin)
- 2020/08/18 **(NEW!)** Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file.
- 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
- 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
- 2020/07/17 Support MultiGPU for all Trainer.
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from the TFlite team for his support.
- 2020/11/19 **(NEW!)** Add Multi-GPU gradient accumulator. See [here](https://github.com/TensorSpeech/TensorFlowTTS/pull/377)
- 2020/08/23 Add Parallel WaveGAN tensorflow implementation. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/parallel_wavegan)
- 2020/08/23 Add MBMelGAN G + ParallelWaveGAN G example. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/multiband_pwgan)
- 2020/08/20 Add C++ inference code. Thank [@ZDisket](https://github.com/ZDisket). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin)
- 2020/08/18 Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file
- 2020/08/14 Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan)
- 2020/08/05 Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153)
- 2020/07/17 Support MultiGPU for all Trainer
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from the TFlite team for his support
- 2020/06/20 [FastSpeech2](https://arxiv.org/abs/2006.04558) implementation with Tensorflow is supported.
- 2020/06/07 [Multi-band MelGAN (MB MelGAN)](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/multiband_melgan/) implementation with Tensorflow is supported.
- 2020/06/07 [Multi-band MelGAN (MB MelGAN)](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/multiband_melgan/) implementation with Tensorflow is supported


## Features
Expand All @@ -38,6 +39,7 @@
- Suitable for deployment.
- Easy to implement a new model, based-on abstract class.
- Mixed precision to speed-up training if possible.
- Support Single/Multi GPU gradient Accumulate.
- Support both Single/Multi GPU in base trainer class.
- TFlite conversion for all supported models.
- Android example.
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech/conf/fastspeech.v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ fastspeech_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_size: 16 # Batch size for each GPU with asuming that gradient_accumulation_steps is 1
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 32 # remove all targets has mel_length <= 32
Expand All @@ -60,7 +60,8 @@ optimizer_params:
decay_steps: 150000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech/conf/fastspeech.v3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ fastspeech_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_size: 16 # Batch size for each GPU with assuming that gradient_accumulation_steps == 1.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 32 # remove all targets has mel_length <= 32
Expand All @@ -60,7 +60,8 @@ optimizer_params:
decay_steps: 150000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
16 changes: 9 additions & 7 deletions examples/fastspeech/train_fastspeech.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,7 @@
from tensorflow_tts.models import TFFastSpeech
from tensorflow_tts.optimizers import AdamWeightDecay, WarmUp
from tensorflow_tts.trainers import Seq2SeqBasedTrainer
from tensorflow_tts.utils import (calculate_2d_loss, calculate_3d_loss,
return_strategy)
from tensorflow_tts.utils import calculate_2d_loss, calculate_3d_loss, return_strategy


class FastSpeechTrainer(Seq2SeqBasedTrainer):
Expand Down Expand Up @@ -218,7 +217,7 @@ def main():
default="",
type=str,
nargs="?",
help='pretrained checkpoint file to load weights from. Auto-skips non-matching layers',
help="pretrained checkpoint file to load weights from. Auto-skips non-matching layers",
)
args = parser.parse_args()

Expand Down Expand Up @@ -302,7 +301,9 @@ def main():
).create(
is_shuffle=config["is_shuffle"],
allow_cache=config["allow_cache"],
batch_size=config["batch_size"] * STRATEGY.num_replicas_in_sync,
batch_size=config["batch_size"]
* STRATEGY.num_replicas_in_sync
* config["gradient_accumulation_steps"],
)

valid_dataset = CharactorDurationMelDataset(
Expand Down Expand Up @@ -335,11 +336,12 @@ def main():
)
fastspeech._build()
fastspeech.summary()

if len(args.pretrained) > 1:
fastspeech.load_weights(args.pretrained, by_name=True, skip_mismatch=True)
logging.info(f"Successfully loaded pretrained weight from {args.pretrained}.")

logging.info(
f"Successfully loaded pretrained weight from {args.pretrained}."
)

# AdamW for fastspeech
learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech2/conf/fastspeech2.baker.v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ fastspeech2_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_size: 16 # Batch size for each GPU with assuming that gradient_accumulation_steps == 1.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 32 # remove all targets has mel_length <= 32
Expand All @@ -62,7 +62,8 @@ optimizer_params:
decay_steps: 150000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech2/conf/fastspeech2.kss.v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ fastspeech2_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_size: 16 # Batch size for each GPU with assuming that gradient_accumulation_steps == 1.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 32 # remove all targets has mel_length <= 32
Expand All @@ -61,7 +61,8 @@ optimizer_params:
decay_steps: 150000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech2/conf/fastspeech2.kss.v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ fastspeech2_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_size: 16 # Batch size for each GPU with assuming that gradient_accumulation_steps == 1.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 32 # remove all targets has mel_length <= 32
Expand All @@ -62,7 +62,8 @@ optimizer_params:
decay_steps: 150000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech2/conf/fastspeech2.v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ fastspeech2_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_size: 16 # Batch size for each GPU with assuming that gradient_accumulation_steps == 1.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 32 # remove all targets has mel_length <= 32
Expand All @@ -60,7 +60,8 @@ optimizer_params:
decay_steps: 150000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech2/conf/fastspeech2.v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ fastspeech2_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_size: 16 # Batch size for each GPU with assuming that gradient_accumulation_steps == 1
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 32 # remove all targets has mel_length <= 32
Expand All @@ -61,7 +61,8 @@ optimizer_params:
decay_steps: 150000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
17 changes: 9 additions & 8 deletions examples/fastspeech2/train_fastspeech2.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,13 @@
from tqdm import tqdm

import tensorflow_tts
from examples.fastspeech2.fastspeech2_dataset import \
CharactorDurationF0EnergyMelDataset
from examples.fastspeech2.fastspeech2_dataset import CharactorDurationF0EnergyMelDataset
from examples.fastspeech.train_fastspeech import FastSpeechTrainer
from tensorflow_tts.configs import FastSpeech2Config
from tensorflow_tts.models import TFFastSpeech2
from tensorflow_tts.optimizers import AdamWeightDecay, WarmUp
from tensorflow_tts.trainers import Seq2SeqBasedTrainer
from tensorflow_tts.utils import (calculate_2d_loss, calculate_3d_loss,
return_strategy)
from tensorflow_tts.utils import calculate_2d_loss, calculate_3d_loss, return_strategy


class FastSpeech2Trainer(Seq2SeqBasedTrainer):
Expand Down Expand Up @@ -244,9 +242,8 @@ def main():
default="",
type=str,
nargs="?",
help='pretrained weights .h5 file to load weights from. Auto-skips non-matching layers',
help="pretrained weights .h5 file to load weights from. Auto-skips non-matching layers",
)


args = parser.parse_args()

Expand Down Expand Up @@ -330,7 +327,9 @@ def main():
).create(
is_shuffle=config["is_shuffle"],
allow_cache=config["allow_cache"],
batch_size=config["batch_size"] * STRATEGY.num_replicas_in_sync,
batch_size=config["batch_size"]
* STRATEGY.num_replicas_in_sync
* config["gradient_accumulation_steps"],
)

valid_dataset = CharactorDurationF0EnergyMelDataset(
Expand Down Expand Up @@ -367,7 +366,9 @@ def main():
fastspeech.summary()
if len(args.pretrained) > 1:
fastspeech.load_weights(args.pretrained, by_name=True, skip_mismatch=True)
logging.info(f"Successfully loaded pretrained weight from {args.pretrained}.")
logging.info(
f"Successfully loaded pretrained weight from {args.pretrained}."
)

# AdamW for fastspeech
learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(
Expand Down
5 changes: 3 additions & 2 deletions examples/fastspeech2_libritts/conf/fastspeech2libritts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ fastspeech2_params:
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 32 # Batch size.
batch_size: 32 # Batch size for each GPU with assuming that gradient_accumulation_steps == 1.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
mel_length_threshold: 48 # remove all targets has mel_length <= 32
Expand All @@ -60,7 +60,8 @@ optimizer_params:
decay_steps: 120000 # < train_max_steps is recommend.
warmup_proportion: 0.02
weight_decay: 0.001


gradient_accumulation_steps: 1
var_train_expr: null # trainable variable expr (eg. 'embeddings|encoder|decoder' )
# must separate by |. if var_train_expr is null then we
# training all variable
Expand Down
Loading