fastspeech2 training error #203

mataym · 2020-08-11T18:30:25Z

i have already created durations with MFA, and also ran well two preprocess script(tensorflow-tts-preprocess, tensorflow-tts-normalize) with no error. but when i ran the train script, there is an error occurred as follows:
2020-08-12 02:19:06,034 (train_fastspeech2:289) INFO: batch_size = 16
2020-08-12 02:19:06,034 (train_fastspeech2:289) INFO: remove_short_samples = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: allow_cache = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: mel_length_threshold = 32
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: is_shuffle = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 5e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: train_max_steps = 200000
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: save_interval_steps = 5000
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: eval_interval_steps = 500
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: log_interval_steps = 200
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: num_save_intermediate_results = 1
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: train_dir = ./dump/train/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: dev_dir = ./dump/valid/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: use_norm = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: f0_stat = ./dump/stats_f0.npy
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: energy_stat = ./dump/stats_energy.npy
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: outdir = ./examples/fastspeech2/exp/train.fastspeech2.v1/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: config = ./examples/fastspeech2/conf/fastspeech2.v1.yaml
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: resume =
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: verbose = 1
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: mixed_precision = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: version = 0.6.1
Traceback (most recent call last):
File "examples/fastspeech2/train_fastspeech2.py", line 400, in
main()
File "examples/fastspeech2/train_fastspeech2.py", line 316, in main
mel_length_threshold=mel_length_threshold,
File "/home/speechlab/TensorflowTTS/examples/fastspeech2/fastspeech2_dataset.py", line 104, in init
), f"Number of charactor, mel, duration, f0 and energy files are different"
AssertionError: Number of charactor, mel, duration, f0 and energy files are different
how do i solve this problem？ can anybody help me ? thank a lot!

machineko · 2020-08-11T19:19:45Z

Run fix mismatch to fix few frames difference in audio and duration files and save durations into dump directory (its last step in mfa example):

https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction

python examples/mfa_extraction/fix_mismatch.py \
  --base_path ./dump \
  --trimmed_dur_path ./dataset/trimmed-durations \
  --dur_path ./dataset/durations

dathudeptrai · 2020-08-12T02:35:23Z

@mataym make sure all duration/charactor/mel/f0/energy files is in ./dump/train, ./dump/valid and the number of file of each input is the same.

mataym · 2020-08-12T03:31:51Z

after i fixed the durations with mfa_extraction script, i ran the model train script(train_fastspeech2.py), but there is fatal error occurred:
(tensorflowtts) [speechlab@localhost TensorflowTTS]$ CUDA_VISIBLE_DEVICES=0 python examples/fastspeech2/train_fastspeech2.py \

--train-dir ./dump/train/
--dev-dir ./dump/valid/
--outdir ./examples/fastspeech2/exp/train.fastspeech2.v1/
--config ./examples/fastspeech2/conf/fastspeech2.v1.yaml
--use-norm 1
--f0-stat ./dump/stats_f0.npy
--energy-stat ./dump/stats_energy.npy
--mixed_precision 1
--resume ""
2020-08-12 11:13:02.070006: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-12 11:13:03.297113: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-12 11:13:07.236504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-08-12 11:13:07.236595: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-12 11:13:07.239380: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-12 11:13:07.241273: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-12 11:13:07.241620: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-12 11:13:07.244302: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-12 11:13:07.246252: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-12 11:13:07.251879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-08-12 11:13:07.255859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow_addons/utils/ensure_tf_install.py:68: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.2.0 and strictly below 2.3.0 (nightly versions are not supported).
The versions of TensorFlow you are currently using is 2.3.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
UserWarning,
2020-08-12 11:13:08.041127: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-12 11:13:08.059890: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2399950000 Hz
2020-08-12 11:13:08.061938: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b70c762e80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-12 11:13:08.062005: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-12 11:13:08.232431: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b70c7cf5f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-12 11:13:08.232483: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0
2020-08-12 11:13:08.234590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-08-12 11:13:08.234658: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-12 11:13:08.234700: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-12 11:13:08.234729: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-12 11:13:08.234756: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-12 11:13:08.234783: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-12 11:13:08.234806: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-12 11:13:08.234831: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-08-12 11:13:08.238585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-08-12 11:13:08.238629: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-12 11:13:09.155030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-12 11:13:09.155104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-08-12 11:13:09.155144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-08-12 11:13:09.160381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14729 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:2d:00.0, compute capability: 7.0)
2020-08-12 11:13:09,179 (train_fastspeech2:289) INFO: hop_size = 256
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: format = npy
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: model_type = fastspeech2
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: fastspeech2_params = {'n_speakers': 1, 'encoder_hidden_size': 384, 'encoder_num_hidden_layers': 4, 'encoder_num_attention_heads': 2, 'encoder_attention_head_size': 192, 'encoder_intermediate_size': 1024, 'encoder_intermediate_kernel_size': 3, 'encoder_hidden_act': 'mish', 'decoder_hidden_size': 384, 'decoder_num_hidden_layers': 4, 'decoder_num_attention_heads': 2, 'decoder_attention_head_size': 192, 'decoder_intermediate_size': 1024, 'decoder_intermediate_kernel_size': 3, 'decoder_hidden_act': 'mish', 'variant_prediction_num_conv_layers': 2, 'variant_predictor_filter': 256, 'variant_predictor_kernel_size': 3, 'variant_predictor_dropout_rate': 0.5, 'num_mels': 80, 'hidden_dropout_prob': 0.2, 'attention_probs_dropout_prob': 0.1, 'max_position_embeddings': 2048, 'initializer_range': 0.02, 'output_attentions': False, 'output_hidden_states': False}
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: batch_size = 16
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: remove_short_samples = True
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: allow_cache = True
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: mel_length_threshold = 32
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: is_shuffle = True
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 5e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: train_max_steps = 200000
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: save_interval_steps = 5000
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: eval_interval_steps = 500
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: log_interval_steps = 200
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: num_save_intermediate_results = 1
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: train_dir = ./dump/train/
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: dev_dir = ./dump/valid/
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: use_norm = True
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: f0_stat = ./dump/stats_f0.npy
2020-08-12 11:13:09,180 (train_fastspeech2:289) INFO: energy_stat = ./dump/stats_energy.npy
2020-08-12 11:13:09,181 (train_fastspeech2:289) INFO: outdir = ./examples/fastspeech2/exp/train.fastspeech2.v1/
2020-08-12 11:13:09,181 (train_fastspeech2:289) INFO: config = ./examples/fastspeech2/conf/fastspeech2.v1.yaml
2020-08-12 11:13:09,181 (train_fastspeech2:289) INFO: resume =
2020-08-12 11:13:09,181 (train_fastspeech2:289) INFO: verbose = 1
2020-08-12 11:13:09,181 (train_fastspeech2:289) INFO: mixed_precision = True
2020-08-12 11:13:09,181 (train_fastspeech2:289) INFO: version = 0.6.1
2020-08-12 11:13:16.322486: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-12 11:13:17.858452: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
Model: "tf_fast_speech2"

Layer (type) Output Shape Param #

embeddings (TFFastSpeechEmbe multiple 844032

encoder (TFFastSpeechEncoder multiple 11814400

length_regulator (TFFastSpee multiple 0

decoder (TFFastSpeechDecoder multiple 12601216

mel_before (Dense) multiple 30800

postnet (TFTacotronPostnet) multiple 4352400

f0_predictor (TFFastSpeechVa multiple 493313

energy_predictor (TFFastSpee multiple 493313

duration_predictor (TFFastSp multiple 493313

f0_embeddings (Conv1D) multiple 3840

dropout_32 (Dropout) multiple 0

energy_embeddings (Conv1D) multiple 3840

dropout_33 (Dropout) multiple 0

Total params: 31,130,467
Trainable params: 29,552,579
Non-trainable params: 1,577,888

[train]: 0%| | 0/200000 [00:00<?, ?it/s]2020-08-12 11:13:24.628420: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
2020-08-12 11:13:24.643382: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
2020-08-12 11:13:34.486241: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 631 of 2050
2020-08-12 11:13:44.492164: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 1286 of 2050
2020-08-12 11:13:54.491977: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 1912 of 2050
2020-08-12 11:13:57.385583: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:221] Shuffle buffer filled.
/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:432: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2020-08-12 11:14:20.627617: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 1123/9897 nodes to float16 precision using 113 cast(s) to float16 (excluding Const and Variable casts)
2020-08-12 11:14:24.429316: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 0/8317 nodes to float16 precision using 0 cast(s) to float16 (excluding Const and Variable casts)
Traceback (most recent call last):
**File "examples/fastspeech2/train_fastspeech2.py", line 400, in
main()
File "examples/fastspeech2/train_fastspeech2.py", line 392, in main
resume=args.resume,
File "/home/speechlab/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 852, in fit
self.run()
File "/home/speechlab/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 101, in run
self._train_epoch()
File "/home/speechlab/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch
self._train_step(batch)
File "/home/speechlab/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 666, in _train_step
self.one_step_forward(batch)
File "/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in call
result = self._call(*args, **kwds)
File "/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
cancellation_manager=cancellation_manager)
File "/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/home/speechlab/anaconda3/envs/tensorflowtts/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [16,98,384] vs. [16,113,384]
[[node tf_fast_speech2/add_1 (defined at /home/speechlab/TensorflowTTS/tensorflow_tts/models/fastspeech2.py:181) ]]
[[tf_fast_speech2/length_regulator/while/LoopCond/_92/_132]]
(1) Invalid argument: Incompatible shapes: [16,98,384] vs. [16,113,384]
[[node tf_fast_speech2/add_1 (defined at /home/speechlab/TensorflowTTS/tensorflow_tts/models/fastspeech2.py:181) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__one_step_forward_45935]

Errors may have originated from an input operation.
Input Source operations connected to node tf_fast_speech2/add_1:
tf_fast_speech2/encoder/layer_._3/mul (defined at /home/speechlab/TensorflowTTS/tensorflow_tts/models/fastspeech.py:380)

Input Source operations connected to node tf_fast_speech2/add_1:
tf_fast_speech2/encoder/layer_._3/mul (defined at /home/speechlab/TensorflowTTS/tensorflow_tts/models/fastspeech.py:380)

Function call stack:
_one_step_forward -> _one_step_forward

[train]: 0%| | 0/200000 [01:03<?, ?it/s]**

dathudeptrai · 2020-08-12T03:43:54Z

@mataym can you check 2 things bellow:

sum(duration) == len(mel) in all files.
all element in each duration files is positive (>=0)

mataym · 2020-08-12T06:19:12Z

@mataym can you check 2 things bellow:

sum(duration) == len(mel) in all files.

all element in each duration files is positive (>=0)
sorry, im very new to tts,idont know how to do that? can you tell me how to do that in detail or give me link?thanks

dathudeptrai · 2020-08-12T06:52:37Z

@mataym can you give me ur structure file in ./dump/train and ./dump/valid ?

mataym · 2020-08-12T08:53:42Z

@mataym can you check 2 things bellow:

sum(duration) == len(mel) in all files.

all element in each duration files is positive (>=0)
sorry, im very new to tts,idont know how to do that? can you tell me how to do that in detail or give me link?thanks

@mataym can you give me ur structure file in ./dump/train and ./dump/valid ?

the structure of dump folder in my workspace is a follows:
dump/
├── --config
├── --dev-dir
├── --energy-stat
├── --f0-stat
├── --mixed_precision
├── _o
├── --outdir
├── --resume
├── stats_energy.npy
├── stats_f0.npy
├── stats.npy
├── train
│   ├── fix_dur
│   │   ├── 00000001-durations.npy
...
│   │   └── 00002158-durations.npy
│   ├── ids
│   │   ├── 00000001-ids.npy
│   │   ...
│   │   └── 00002158-ids.npy
│   ├── norm-feats
│   │   ├── 00000001-norm-feats.npy
│   │   ...
│   │   └── 00002158-norm-feats.npy
│   ├── raw-energies
│   │   ├── 00000001-raw-energy.npy
│   │   ...
│   │   └── 00002158-raw-energy.npy
│   ├── raw-f0
│   │   ├── 00000001-raw-f0.npy
│   │   ...
│   │   └── 00002158-raw-f0.npy
│   ├── raw-feats
│   │   ├── 00000001-raw-feats.npy
│   │   ├...
│   │   └── 00002158-raw-feats.npy
│   └── wavs
│   ├── 00000001-wave.npy
│   ...
│   └── 00002158-wave.npy
├── --train-dir
├── train_utt_ids.npy
├── --use-norm
├── valid
│   ├── fix_dur
│   │   ├── 00000030-durations.npy
│   │   ...
│   │   └── 00002135-durations.npy
│   ├── ids
│   │   ├── 00000030-ids.npy
│   │   ├...
│   │   └── 00002135-ids.npy
│   ├── norm-feats
│   │   ├── 00000030-norm-feats.npy
│   │   ├...
│   │   └── 00002135-norm-feats.npy
│   ├── raw-energies
│   │   ├── 00000030-raw-energy.npy
│   │   ├...
│   │   └── 00002135-raw-energy.npy
│   ├── raw-f0
│   │   ├── 00000030-raw-f0.npy
│   │   ├...
│   │   └── 00002135-raw-f0.npy
│   ├── raw-feats
│   │   ├── 00000030-raw-feats.npy
│   │   ...
│   │   └── 00002135-raw-feats.npy
│   └── wavs
│   ├── 00000030-wave.npy
│   ...
│   └── 00002135-wave.npy
└── valid_utt_ids.npy
i have 2158 wav and txt file in my corpus.

dathudeptrai · 2020-08-12T08:57:15Z

@mataym in ur fix_dur, can you load each file and check if all element of each file is positive value ?. And check if sum(np.load(''./fix_dur/...-durations.npy")) == len(np.load("./norm-feats/...-norm-feats.npy"))

mataym · 2020-08-12T12:04:47Z

@mataym in ur fix_dur, can you load each file and check if all element of each file is positive value ?. And check if sum(np.load(''./fix_dur/...-durations.npy")) == len(np.load("./norm-feats/...-norm-feats.npy"))

i checked the value in ...-fix_dur.npy in train and valid value, all of them is positive value.
i checked the sum(np.load(''./fix_dur/...-durations.npy")) == len(np.load("./norm-feats/...-norm-feats.npy")) in train and valid folder , the sum of ...-durations.npy is equal to ...len(-norm-feats.npy), there is no not equal values.
what can i do next?

machineko · 2020-08-12T12:12:47Z

Upload dur and norm-feats files somewhere and send it to me ill check it locally

or just run

for i in ["train", "valid"]:
    for j in os.listdir(f"{i}/fix_dur"):
        assert np.sum(np.load(f"{i}/fix_dur/{j}")) == len(np.load(f"{i}/norm-feats/{j.split('-')[0]}-raw-feats.npy"))

mataym · 2020-08-12T12:39:03Z

for i in ["train", "valid"]:
for j in os.listdir(f"{i}/fix_dur"):
assert np.sum(np.load(f"{i}/fix_dur/{j}")) == len(np.load(f"{i}/norm-feats/{j.split('-')[0]}-raw-feats.npy"))

thanks,raw-feats.npy in your crode should be changed to norm_feats.npy. and after I run the program, assert is ok. does it mean that data preprocessing is ok?

machineko · 2020-08-12T12:49:13Z

Ye it is, are u using some sort of debugger if yes check values in debugger before training breaks

mataym · 2020-08-12T13:39:19Z

in fix_mismatch.py script, what is --trimmed_dur_path ./trimmed-durations ? i have mo ./trimmed-durations folder anyway.
python examples/mfa_extraction/fix_mismatch.py
--base_path ./dump
--trimmed_dur_path ./trimmed-durations
--dur_path ./durations

machineko · 2020-08-12T13:44:36Z

I don't know where u saved trimmed durations it's up to u if u just follow mfa extraction steps everything is in dataset/ (or libritts) folder

mataym · 2020-08-13T07:25:00Z

How to generate the trimmed-durations folder in this project?

dathudeptrai · 2020-08-13T07:30:09Z

@mataym see here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/preprocess/preprocess_libritts.yaml#L19). You should add trim_mfa: true in ur preprocesing config. The trimmed-durations dir created automatically when u run the preprocessing script (see https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/bin/preprocess.py#L129).

manmay-nakhashi · 2020-08-13T07:38:33Z

@dathudeptrai got same error in fastspeech2 , when i am attempting symbol based training for my dataset
when preprocess.py phonemes and mfa phonemes are same it works fine , but when i am switching phoneme to symbol based training with mfa extracted durations it throws this error

Traceback (most recent call last):
  File "examples/fastspeech2/train_fastspeech2.py", line 411, in <module>
    main()
  File "examples/fastspeech2/train_fastspeech2.py", line 403, in main
    resume=args.resume,
  File "/mnt/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 852, in fit
    self.run()
  File "/mnt/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 101, in run
    self._train_epoch()
  File "/mnt/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch
    self._train_step(batch)
  File "/mnt/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 666, in _train_step
    self.one_step_forward(batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Incompatible shapes: [16,132,256] vs. [16,156,256]
         [[node tf_fast_speech2/add_1 (defined at /mnt/TensorflowTTS/tensorflow_tts/models/fastspeech2.py:181) ]]
         [[tf_fast_speech2/length_regulator/while/LoopCond/_92/_108]]
  (1) Invalid argument:  Incompatible shapes: [16,132,256] vs. [16,156,256]
         [[node tf_fast_speech2/add_1 (defined at /mnt/TensorflowTTS/tensorflow_tts/models/fastspeech2.py:181) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__one_step_forward_41923]

Errors may have originated from an input operation.
Input Source operations connected to node tf_fast_speech2/add_1:
 tf_fast_speech2/encoder/layer_._2/mul (defined at /mnt/TensorflowTTS/tensorflow_tts/models/fastspeech.py:380)

Input Source operations connected to node tf_fast_speech2/add_1:
 tf_fast_speech2/encoder/layer_._2/mul (defined at /mnt/TensorflowTTS/tensorflow_tts/models/fastspeech.py:380)

Function call stack:
_one_step_forward -> _one_step_forward

dathudeptrai · 2020-08-13T07:42:58Z

@manmay-nakhashi did you follow the instruction ?, pls add trim_mfa: true in the preprocess config, you should also run fix_mismatch script.

manmay-nakhashi · 2020-08-13T09:26:46Z

@dathudeptrai still same error

dathudeptrai · 2020-08-13T09:29:32Z

@manmay-nakhashi again, this bug is only cause by the mismatch between duration and mel length. Let check by yourself if the sum duration is equal len(mel) (both raw-feats and norm-feats), also make sure all element in each duration file is positive. You should check what duration files you are using for training, duration extract from textgirds or durations after fix mixmatch. In ur log, the bug is here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/models/fastspeech2.py#L181), that mean you should check the len(ids) and len(f0s) and len(energys). In (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/fastspeech2_dataset.py#L158) pls add bellow code:

assert len(charactor) == len(f0) == len(energy)

manmay-nakhashi · 2020-08-13T09:38:16Z

@dathudeptrai ok i'll check that , we are doing tf_average_by_duration for f0 and energy , do we also have to do for charactor ??
after doing that it doesn't throw that error , but is it right thing to do ?

mataym · 2020-08-13T16:36:34Z

in ljspeech.py file, i cannot understand the list valid_symbols[], it has 84 element, what is that?is that english phones?but i know english has just only 39 phones.
in my case, i have 32 different phones in my-lexicon.txt.
meta.yaml file's structure in test-g2p-model.zip(g2p model generated from test-lexicon.txt with MFA) as follows:
architecture: phonetisaurus
graphemes: [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z]
phones: [a, b, c, d, ddd, e, f, fff, g, ggg, h, hhh, i, j, jjj, k, kkk, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z]
version: 1.0.0
What configuration should I modify before preprocessing and training the model? should i change valid_symbols[] elements with my 32 phones? i ask the experts for advice, thank a lot!

dathudeptrai · 2020-08-13T16:40:02Z

@mataym you just need change the symbols :)). here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/ljspeech.py#L110) to ur symbols :)). note that pad symbols always has id = 0 :D.

mataym · 2020-08-13T18:21:08Z

@mataym you just need change the symbols :)). here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/ljspeech.py#L110) to ur symbols :)). note that pad symbols always has id = 0 :D.
hi @dathudeptrai , my symbols all of them are in _letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", so it is not necessary to add my symbols there, isn't it? but my phonese is different from english phones.

dathudeptrai · 2020-08-14T02:12:25Z

@mataym will u use phone or charactor ? , you just need make sure the symbols in the code cover all ur symbols :)).

deepConnectionism · 2020-08-14T07:33:14Z

i have already created durations with MFA, and also ran well two preprocess script(tensorflow-tts-preprocess, tensorflow-tts-normalize) with no error. but when i ran the train script, there is an error occurred as follows:
2020-08-12 02:19:06,034 (train_fastspeech2:289) INFO: batch_size = 16
2020-08-12 02:19:06,034 (train_fastspeech2:289) INFO: remove_short_samples = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: allow_cache = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: mel_length_threshold = 32
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: is_shuffle = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 5e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: train_max_steps = 200000
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: save_interval_steps = 5000
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: eval_interval_steps = 500
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: log_interval_steps = 200
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: num_save_intermediate_results = 1
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: train_dir = ./dump/train/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: dev_dir = ./dump/valid/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: use_norm = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: f0_stat = ./dump/stats_f0.npy
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: energy_stat = ./dump/stats_energy.npy
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: outdir = ./examples/fastspeech2/exp/train.fastspeech2.v1/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: config = ./examples/fastspeech2/conf/fastspeech2.v1.yaml
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: resume =
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: verbose = 1
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: mixed_precision = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: version = 0.6.1
Traceback (most recent call last): File "examples/fastspeech2/train_fastspeech2.py", line 400, in main() File "examples/fastspeech2/train_fastspeech2.py", line 316, in main mel_length_threshold=mel_length_threshold, File "/home/speechlab/TensorflowTTS/examples/fastspeech2/fastspeech2_dataset.py", line 104, in init ), f"Number of charactor, mel, duration, f0 and energy files are different" AssertionError: Number of charactor, mel, duration, f0 and energy files are different
how do i solve this problem？ can anybody help me ? thank a lot!

You also can download this extracted durations at 40k steps at link. Then put them in appropriate folders.
You can refer to the folder placement here.Step 4: Extract duration from alignments for FastSpeech

machineko · 2020-08-14T13:01:00Z

@Hymnhyz If you're using your own dataset u need to retrain taco2 for extraction (otherwise it works pretty bad)

mataym · 2020-08-18T18:26:07Z

@manmay-nakhashi again, this bug is only cause by the mismatch between duration and mel length. Let check by yourself if the sum duration is equal len(mel) (both raw-feats and norm-feats), also make sure all element in each duration file is positive. You should check what duration files you are using for training, duration extract from textgirds or durations after fix mixmatch. In ur log, the bug is here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/models/fastspeech2.py#L181), that mean you should check the len(ids) and len(f0s) and len(energys). In (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/fastspeech2_dataset.py#L158) pls add bellow code:
assert len(charactor) == len(f0) == len(energy)

i checked the the duration files with sum(duration) == len(mel) in all files, it is no problem
i checked each duration files is positive (>=0), it also no problem.
then, according to your suggestion, I put assert(len(charactor)== len(f0)==len(energy)) in fastspeech2_dataset generator function, and every file reported an error, and none of the files passed from this assert line.
I suspect that the problem lies in the generation of TextGrid files. because I don't have an acoustic model for my working language, so I can't use mfa_align script (bin/mfa_align corpus_directory dictionary_path acoustic_model_path output_directory). The mfa_train_and_align (bin/mfa_train_and_align corpus_directory dictionary_path output_directory) script is used instead, because this script can generate TextGrid files without acoustic model. Then all other operations are the same as the official methods.
I really don't know how to solve this problem.

dathudeptrai · 2020-08-31T10:11:37Z

this bug is because in the duration, there is zero value :))). So, the condition is not >=0, it should be > 0.

gbaian10 · 2020-09-01T12:18:35Z

@mataym can you check 2 things bellow:

sum(duration) == len(mel) in all files.

all element in each duration files is positive (>=0)

@dathudeptrai If my sum(duration) != len(mel),how can I fix it?
I use LJSpeech and your Tacotron2 Extract duration from Googledrive

machineko · 2020-09-01T14:55:16Z

@gbaian10 Just pad data our change duration value of the last duration token to match len(mel)

gbaian10 · 2020-09-01T15:15:59Z

@gbaian10 Just pad data our change duration value of the last duration token to match len(mel)
@machineko I don't know how to do it, but I found that their length is only difference 1

machineko · 2020-09-01T15:18:17Z

@gbaian10 add +1 to last element in duration array or just follow https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/mfa_extraction/fix_mismatch.py

gbaian10 · 2020-09-01T15:20:23Z

@gbaian10 add +1 to last element in duration array or just follow https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/mfa_extraction/fix_mismatch.py

+1 any element?

machineko · 2020-09-01T15:21:55Z

@gbaian10 yes u can add it to any element

gbaian10 · 2020-09-02T11:52:27Z

this bug is because in the duration, there is zero value :))). So, the condition is not >=0, it should be > 0.

@dathudeptrai
Include ndim 0? I have run "fix_mismatch.py"

I downloaded your duration file and run fix.
I checked all the duration files with sum(duration) == len(mel)
"fix_mismatch.py" fixed the last ndim +1.
But ndim 0 is still 0 in part of .npy. Even some middle parts are still 0.
How can i fix it?

dathudeptrai · 2020-09-02T14:05:31Z

this bug is because in the duration, there is zero value :))). So, the condition is not >=0, it should be > 0.

@dathudeptrai
Include ndim 0? I have run "fix_mismatch.py"

I downloaded your duration file and run fix.
I checked all the duration files with sum(duration) == len(mel)
"fix_mismatch.py" fixed the last ndim +1.
But ndim 0 is still 0 in part of .npy. Even some middle parts are still 0.
How can i fix it?

can you pass ur sample .npy that include zero value here ?

gbaian10 · 2020-09-02T14:21:19Z

this bug is because in the duration, there is zero value :))). So, the condition is not >=0, it should be > 0.

@dathudeptrai
Include ndim 0? I have run "fix_mismatch.py"
I downloaded your duration file and run fix.
I checked all the duration files with sum(duration) == len(mel)
"fix_mismatch.py" fixed the last ndim +1.
But ndim 0 is still 0 in part of .npy. Even some middle parts are still 0.
How can i fix it?

can you pass ur sample .npy that include zero value here ?

How can I send it to you?

dathudeptrai · 2020-09-02T15:05:20Z

this bug is because in the duration, there is zero value :))). So, the condition is not >=0, it should be > 0.

@dathudeptrai
Include ndim 0? I have run "fix_mismatch.py"
I downloaded your duration file and run fix.
I checked all the duration files with sum(duration) == len(mel)
"fix_mismatch.py" fixed the last ndim +1.
But ndim 0 is still 0 in part of .npy. Even some middle parts are still 0.
How can i fix it?

can you pass ur sample .npy that include zero value here ?

How can I send it to you?

just pass a numpy array to here (print it and pass here)

gbaian10 · 2020-09-02T16:02:45Z

np.load("LJ001-0001-durations.npy")
array([ 1, 5, 4, 6, 7, 6, 15, 3, 9, 18, 2, 8, 1, 2, 2, 3, 5,
7, 8, 6, 4, 5, 8, 9, 13, 4, 3, 4, 3, 4, 2, 2, 4, 2,
2, 4, 7, 3, 3, 4, 5, 5, 4, 5, 5, 3, 4, 5, 3, 5, 2,
7, 5, 6, 5, 2, 2, 6, 1, 7, 7, 13, 8, 8, 5, 7, 10, 20,
11, 6, 4, 3, 3, 13, 6, 9, 10, 4, 4, 2, 3, 7, 9, 9, 7,
5, 3, 4, 6, 6, 6, 7, 4, 3, 5, 3, 3, 7, 7, 9, 6, 6,
1, 1, 3, 6, 8, 5, 6, 2, 4, 3, 1, 4, 4, 4, 6, 11, 9,
10, 2, 6, 5, 10, 5, 3, 6, 4, 6, 4, 4, 5, 4, 9, 5, 5,
1, 2, 2, 3, 3, 10, 8, 6, 4, 5, 6, 5, 3, 6, 25],
dtype=int32)
np.load("LJ001-0002-durations.npy")
array([ 1, 8, 2, 5, 5, 6, 5, 1, 3, 4, 4, 2, 7, 10, 5, 7, 6,
8, 3, 4, 6, 5, 4, 6, 11, 8, 3, 8, 6, 12], dtype=int32)
np.load("LJ001-0003-durations.npy")
array([ 1, 3, 6, 7, 4, 4, 3, 4, 2, 4, 3, 3, 3, 3, 0, 1, 3,
7, 6, 11, 4, 15, 4, 2, 7, 5, 4, 4, 3, 2, 9, 2, 7, 3,
6, 5, 3, 2, 2, 8, 10, 10, 10, 5, 2, 2, 4, 5, 8, 5, 2,
3, 6, 6, 11, 9, 8, 6, 32, 8, 6, 8, 4, 8, 12, 1, 6, 4,
4, 4, 3, 3, 5, 5, 7, 6, 4, 6, 7, 5, 2, 3, 7, 8, 8,
4, 6, 3, 8, 5, 1, 5, 4, 3, 6, 6, 3, 5, 3, 1, 1, 1,
3, 3, 7, 5, 4, 6, 10, 5, 3, 2, 11, 3, 4, 5, 3, 3, 1,
2, 1, 6, 5, 7, 4, 1, 3, 4, 8, 12, 1, 8, 8, 2, 20, 10,
8, 8, 1, 3, 6, 7, 4, 4, 5, 4, 3, 4, 5, 2, 15, 8, 12,
7, 20], dtype=int32)
np.load("LJ001-0004-durations.npy")
array([ 1, 4, 3, 5, 11, 8, 4, 5, 4, 4, 4, 0, 5, 3, 5, 9, 4,
4, 5, 4, 4, 8, 8, 10, 16, 5, 11, 1, 2, 4, 3, 3, 2, 2,
2, 3, 5, 2, 2, 3, 8, 6, 2, 9, 7, 5, 4, 6, 4, 1, 3,
6, 2, 8, 4, 7, 6, 8, 4, 5, 5, 10, 2, 5, 4, 2, 2, 1,
2, 2, 5, 6, 3, 10, 5, 5, 5, 3, 5, 5, 5, 6, 5, 3, 6,
4, 8, 11, 16], dtype=int32)
np.load("LJ001-0005-durations.npy")
array([ 1, 1, 3, 2, 6, 4, 7, 5, 7, 3, 2, 5, 4, 3, 4, 3, 3,
9, 6, 6, 5, 4, 4, 3, 5, 9, 5, 7, 3, 2, 5, 7, 6, 6,
2, 4, 13, 3, 6, 2, 4, 2, 0, 1, 3, 2, 5, 5, 4, 2, 3,
4, 3, 3, 2, 3, 2, 1, 2, 3, 6, 5, 6, 10, 5, 5, 5, 4,
1, 4, 5, 6, 7, 4, 3, 8, 20, 15, 12, 3, 3, 6, 7, 10, 4,
7, 7, 5, 5, 5, 3, 7, 3, 2, 9, 5, 9, 5, 2, 3, 10, 6,
24, 9, 4, 5, 1, 3, 3, 5, 5, 3, 6, 7, 7, 4, 2, 8, 1,
3, 5, 1, 3, 1, 2, 3, 7, 5, 6, 5, 2, 3, 2, 3, 7, 3,
8, 4, 5, 8, 7, 1, 13], dtype=int32)
np.load("LJ001-0006-durations.npy")
array([ 1, 17, 12, 22, 2, 3, 2, 6, 3, 5, 4, 6, 7, 7, 0, 5, 5,
3, 7, 3, 4, 5, 7, 4, 5, 5, 4, 5, 9, 11, 5, 6, 14, 5,
10, 22, 2, 3, 12, 2, 13, 6, 2, 4, 4, 2, 3, 9, 8, 10, 9,
6, 3, 3, 12, 4, 5, 1, 8, 13, 7, 5, 3, 3, 12, 7, 16, 4,
3, 5, 5, 5, 11, 20], dtype=int32)
np.load("LJ001-0007-durations.npy")
array([ 1, 0, 2, 7, 1, 7, 9, 7, 5, 9, 5, 5, 5, 6, 3, 14, 7,
7, 9, 3, 4, 5, 7, 5, 5, 3, 3, 3, 5, 2, 4, 8, 6, 6,
6, 2, 4, 4, 6, 6, 17, 9, 6, 18, 11, 0, 11, 0, 2, 5, 10,
10, 8, 3, 3, 6, 4, 15, 5, 1, 14, 12, 6, 11, 6, 8, 6, 5,
5, 5, 6, 8, 4, 6, 6, 16, 4, 4, 2, 9, 14, 6, 6, 9, 21,
5, 4, 2, 5, 5, 8, 3, 5, 4, 5, 5, 5, 3, 9, 8, 7, 5,
3, 6, 5, 5, 8, 4, 6, 7, 21, 8, 1, 13], dtype=int32)
np.load("LJ001-0008-durations.npy")
array([ 1, 8, 0, 8, 6, 6, 5, 2, 3, 7, 4, 2, 2, 7, 4, 6, 6,
5, 5, 27, 8, 2, 15, 4, 12], dtype=int32)

np.load("LJ001-0009-durations.npy")
array([ 0, 5, 5, 7, 9, 4, 5, 1, 1, 3, 5, 5, 16, 7, 3, 1, 6,
3, 3, 4, 11, 7, 6, 3, 6, 9, 4, 7, 10, 2, 9, 24, 16, 3,
3, 3, 5, 3, 4, 6, 5, 3, 6, 10, 8, 4, 3, 1, 7, 2, 2,
5, 7, 3, 2, 3, 3, 9, 7, 10, 5, 9, 4, 4, 4, 7, 8, 7,
3, 4, 3, 4, 6, 7, 10, 6, 18, 24, 8, 11, 4, 6, 4, 8, 5,
8, 2, 5, 5, 5, 10, 7, 7, 5, 5, 6, 6, 7, 7, 14, 6, 6,
27, 1], dtype=int32)
np.load("LJ001-0036-durations.npy")
array([18, 5, 3, 30, 6, 7, 6, 5, 3, 0, 4, 0, 3, 6, 8, 11, 6,
2, 5, 5, 7, 10, 19, 16, 21, 8, 5, 5, 6, 10, 9, 4, 2, 9,
2, 4, 7, 4, 9, 11, 7, 6, 4, 14, 6, 13, 6, 6, 4, 17, 2,
4, 3, 3, 4, 7, 3, 9, 4, 6, 8, 4, 3, 2, 4, 6, 5, 15,
1, 5, 3, 2, 1, 3, 4, 2, 3, 5, 9, 4, 3, 8, 4, 5, 9,
4, 4, 8, 5, 6, 6, 7, 11, 7, 7, 20, 1], dtype=int32)
np.load("LJ001-0048-durations.npy")
array([ 0, 4, 9, 5, 5, 5, 5, 3, 2, 6, 4, 10, 3, 10, 15, 4, 6,
3, 5, 3, 4, 8, 7, 5, 6, 6, 7, 5, 4, 8, 9, 4, 6, 4,
7, 9, 6, 6, 4, 8, 10, 18, 11, 6, 8, 6, 12, 4, 4, 5, 6,
4, 7, 5, 6, 1, 5, 6, 3, 6, 4, 6, 4, 5, 7, 5, 5, 10,
2, 8, 4, 4, 5, 4, 4, 4, 6, 3, 4, 3, 1, 4, 5, 12, 5,
4, 3, 6, 6, 16, 3, 19, 2], dtype=int32)
np.load("LJ001-0051-durations.npy")
array([10, 7, 6, 4, 7, 7, 4, 3, 5, 2, 4, 3, 3, 4, 6, 5, 0,
6, 2, 5, 8, 5, 3, 2, 5, 10, 2, 5, 3, 1, 2, 1, 4, 3,
6, 6, 10, 1, 6, 4, 2, 6, 11, 7, 31, 2, 6, 6, 14, 3, 6,
5, 4, 4, 6, 6, 10, 12, 6, 6, 5, 3, 5, 2, 5, 4, 4, 7,
5, 7, 7, 9, 6, 17, 1], dtype=int32)
np.load("LJ001-0064-durations.npy")
array([ 5, 3, 12, 8, 3, 4, 3, 1, 3, 4, 4, 7, 5, 4, 5, 15, 9,
5, 8, 1, 0, 5, 4, 5, 14, 9, 3, 2, 4, 2, 4, 5, 2, 3,
3, 2, 9, 4, 1, 4, 2, 1, 4, 0, 3, 5, 9, 7, 7, 11, 8,
7, 7, 7, 1, 8, 10, 8, 5, 20, 23, 2, 5, 3, 3, 3, 2, 3,
4, 3, 5, 4, 9, 6, 9, 7, 3, 5, 4, 5, 5, 4, 5, 3, 5,
6, 4, 7, 5, 6, 8, 17, 1], dtype=int32)
np.load("LJ001-0100-durations.npy")
array([ 2, 3, 3, 3, 2, 3, 7, 7, 4, 12, 7, 6, 4, 2, 4, 2, 3,
6, 4, 11, 7, 12, 2, 12, 8, 7, 3, 4, 6, 6, 7, 28, 7, 10,
3, 13, 8, 4, 3, 4, 5, 4, 5, 2, 7, 5, 4, 6, 1, 5, 1,
2, 2, 4, 5, 3, 8, 8, 3, 3, 4, 5, 4, 13, 5, 5, 10, 3,
20, 12, 2, 4, 3, 3, 3, 3, 2, 1, 0, 4, 3, 5, 8, 7, 4,
5, 5, 6, 4, 4, 13, 4, 6, 7, 2, 4, 3, 4, 3, 5, 4, 6,
4, 2, 4, 6, 4, 23, 10, 3, 5, 6, 3, 2, 3, 6, 2, 10, 4,
4, 4, 2, 10, 5, 4, 4, 5, 9, 4, 42, 0, 1], dtype=int32)
np.load("LJ001-0120-durations.npy")
array([ 0, 7, 2, 2, 3, 3, 5, 6, 11, 6, 4, 5, 5, 11, 10, 7, 39,
4, 12, 2, 5, 11, 2, 6, 5, 3, 4, 9, 8, 5, 6, 5, 4, 3,
6, 2, 6, 7, 12, 3, 5, 3, 5, 3, 4, 5, 4, 4, 3, 4, 5,
7, 6, 6, 4, 15, 11, 9, 14, 7, 12, 17, 3, 6, 4, 5, 6, 6,
6, 5, 4, 7, 1, 9, 5, 4, 2, 4, 4, 4, 6, 5, 5, 11, 5,
5, 2, 5, 4, 4, 3, 3, 2, 2, 4, 5, 10, 2, 2, 4, 18, 1],
dtype=int32)

dathudeptrai · 2020-09-03T09:33:14Z

@gbaian10 can you calculate how many samples have zero value ?. If it's not much, you can ignore those samples :D

gbaian10 · 2020-09-03T09:56:28Z

can you calculate how many samples have zero value ?. If it's not much, you can ignore those samples :D

In 12445 training data, there are a total of 7237 files have zero value.

dathudeptrai · 2020-09-03T10:01:54Z

@gbaian10 can you pass the error when you training fs ?

dathudeptrai · 2020-09-03T10:05:06Z

@gbaian10 sorry, i think everything is ok, even the duration of some charator is zero, the shape is still match :)). I will close issue now, please open new issue if needed.

dathudeptrai self-assigned this Sep 2, 2020

dathudeptrai added bug 🐛 Something isn't working question ❓ Further information is requested labels Sep 2, 2020

dathudeptrai closed this as completed Sep 3, 2020

fastspeech2 training error #203

fastspeech2 training error #203

Comments

mataym commented Aug 11, 2020

machineko commented Aug 11, 2020 • edited

dathudeptrai commented Aug 12, 2020

mataym commented Aug 12, 2020

Layer (type) Output Shape Param #

dropout_33 (Dropout) multiple 0

dathudeptrai commented Aug 12, 2020

mataym commented Aug 12, 2020

dathudeptrai commented Aug 12, 2020

mataym commented Aug 12, 2020

dathudeptrai commented Aug 12, 2020 • edited

mataym commented Aug 12, 2020 • edited

machineko commented Aug 12, 2020 • edited

mataym commented Aug 12, 2020

machineko commented Aug 12, 2020

mataym commented Aug 12, 2020

machineko commented Aug 12, 2020 • edited

mataym commented Aug 13, 2020

dathudeptrai commented Aug 13, 2020

manmay-nakhashi commented Aug 13, 2020 • edited

dathudeptrai commented Aug 13, 2020

manmay-nakhashi commented Aug 13, 2020

dathudeptrai commented Aug 13, 2020 • edited

manmay-nakhashi commented Aug 13, 2020 • edited

mataym commented Aug 13, 2020

dathudeptrai commented Aug 13, 2020

mataym commented Aug 13, 2020

dathudeptrai commented Aug 14, 2020

deepConnectionism commented Aug 14, 2020

machineko commented Aug 14, 2020

mataym commented Aug 18, 2020

dathudeptrai commented Aug 31, 2020

gbaian10 commented Sep 1, 2020

machineko commented Sep 1, 2020

gbaian10 commented Sep 1, 2020

machineko commented Sep 1, 2020

gbaian10 commented Sep 1, 2020

machineko commented Sep 1, 2020

gbaian10 commented Sep 2, 2020 • edited

dathudeptrai commented Sep 2, 2020

gbaian10 commented Sep 2, 2020

dathudeptrai commented Sep 2, 2020

gbaian10 commented Sep 2, 2020

dathudeptrai commented Sep 3, 2020

gbaian10 commented Sep 3, 2020

dathudeptrai commented Sep 3, 2020

dathudeptrai commented Sep 3, 2020 • edited

machineko commented Aug 11, 2020 •

edited

dathudeptrai commented Aug 12, 2020 •

edited

mataym commented Aug 12, 2020 •

edited

machineko commented Aug 12, 2020 •

edited

machineko commented Aug 12, 2020 •

edited

manmay-nakhashi commented Aug 13, 2020 •

edited

dathudeptrai commented Aug 13, 2020 •

edited

manmay-nakhashi commented Aug 13, 2020 •

edited

gbaian10 commented Sep 2, 2020 •

edited

dathudeptrai commented Sep 3, 2020 •

edited