New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastspeech2 training error #203
Comments
Run fix mismatch to fix few frames difference in audio and duration files and save durations into dump directory (its last step in mfa example): https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction python examples/mfa_extraction/fix_mismatch.py \
--base_path ./dump \
--trimmed_dur_path ./dataset/trimmed-durations \
--dur_path ./dataset/durations |
@mataym make sure all duration/charactor/mel/f0/energy files is in |
after i fixed the durations with mfa_extraction script, i ran the model train script(train_fastspeech2.py), but there is fatal error occurred:
Layer (type) Output Shape Param #embeddings (TFFastSpeechEmbe multiple 844032 encoder (TFFastSpeechEncoder multiple 11814400 length_regulator (TFFastSpee multiple 0 decoder (TFFastSpeechDecoder multiple 12601216 mel_before (Dense) multiple 30800 postnet (TFTacotronPostnet) multiple 4352400 f0_predictor (TFFastSpeechVa multiple 493313 energy_predictor (TFFastSpee multiple 493313 duration_predictor (TFFastSp multiple 493313 f0_embeddings (Conv1D) multiple 3840 dropout_32 (Dropout) multiple 0 energy_embeddings (Conv1D) multiple 3840 dropout_33 (Dropout) multiple 0Total params: 31,130,467 [train]: 0%| | 0/200000 [00:00<?, ?it/s]2020-08-12 11:13:24.628420: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do Errors may have originated from an input operation. Input Source operations connected to node tf_fast_speech2/add_1: Function call stack: [train]: 0%| | 0/200000 [01:03<?, ?it/s]** |
@mataym can you check 2 things bellow:
|
|
@mataym can you give me ur structure file in |
the structure of dump folder in my workspace is a follows: |
@mataym in ur |
|
Upload dur and norm-feats files somewhere and send it to me ill check it locally or just run
|
thanks,raw-feats.npy in your crode should be changed to norm_feats.npy. and after I run the program, assert is ok. does it mean that data preprocessing is ok? |
Ye it is, are u using some sort of debugger if yes check values in debugger before training breaks |
in fix_mismatch.py script, what is --trimmed_dur_path ./trimmed-durations ? i have mo ./trimmed-durations folder anyway. |
I don't know where u saved trimmed durations it's up to u if u just follow mfa extraction steps everything is in dataset/ (or libritts) folder |
How to generate the trimmed-durations folder in this project? |
@mataym see here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/preprocess/preprocess_libritts.yaml#L19). You should add |
@dathudeptrai got same error in fastspeech2 , when i am attempting symbol based training for my dataset
|
@manmay-nakhashi did you follow the instruction ?, pls add |
@dathudeptrai still same error |
@manmay-nakhashi again, this bug is only cause by the mismatch between duration and mel length. Let check by yourself if the sum duration is equal len(mel) (both raw-feats and norm-feats), also make sure all element in each duration file is positive. You should check what duration files you are using for training, duration extract from textgirds or durations after fix mixmatch. In ur log, the bug is here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/models/fastspeech2.py#L181), that mean you should check the len(ids) and len(f0s) and len(energys). In (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/fastspeech2_dataset.py#L158) pls add bellow code:
|
@dathudeptrai ok i'll check that , we are doing tf_average_by_duration for f0 and energy , do we also have to do for charactor ?? |
in ljspeech.py file, i cannot understand the list valid_symbols[], it has 84 element, what is that?is that english phones?but i know english has just only 39 phones. |
@mataym you just need change the symbols :)). here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/ljspeech.py#L110) to ur symbols :)). note that |
|
@mataym will u use phone or charactor ? , you just need make sure the symbols in the code cover all ur symbols :)). |
You also can download this extracted durations at 40k steps at link. Then put them in appropriate folders. |
@Hymnhyz If you're using your own dataset u need to retrain taco2 for extraction (otherwise it works pretty bad) |
|
this bug is because in the duration, there is zero value :))). So, the condition is not >=0, it should be > 0. |
@dathudeptrai If my sum(duration) != len(mel),how can I fix it? |
@gbaian10 Just pad data our change duration value of the last duration token to match len(mel) |
|
@gbaian10 add +1 to last element in duration array or just follow https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/mfa_extraction/fix_mismatch.py |
+1 any element? |
@gbaian10 yes u can add it to any element |
@dathudeptrai I downloaded your duration file and run fix. |
can you pass ur sample .npy that include zero value here ? |
How can I send it to you? |
just pass a numpy array to here (print it and pass here) |
|
@gbaian10 can you calculate how many samples have zero value ?. If it's not much, you can ignore those samples :D |
In 12445 training data, there are a total of 7237 files have zero value. |
@gbaian10 can you pass the error when you training fs ? |
@gbaian10 sorry, i think everything is ok, even the duration of some charator is zero, the shape is still match :)). I will close issue now, please open new issue if needed. |
i have already created durations with MFA, and also ran well two preprocess script(tensorflow-tts-preprocess, tensorflow-tts-normalize) with no error. but when i ran the train script, there is an error occurred as follows:
2020-08-12 02:19:06,034 (train_fastspeech2:289) INFO: batch_size = 16
2020-08-12 02:19:06,034 (train_fastspeech2:289) INFO: remove_short_samples = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: allow_cache = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: mel_length_threshold = 32
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: is_shuffle = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 5e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: train_max_steps = 200000
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: save_interval_steps = 5000
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: eval_interval_steps = 500
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: log_interval_steps = 200
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: num_save_intermediate_results = 1
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: train_dir = ./dump/train/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: dev_dir = ./dump/valid/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: use_norm = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: f0_stat = ./dump/stats_f0.npy
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: energy_stat = ./dump/stats_energy.npy
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: outdir = ./examples/fastspeech2/exp/train.fastspeech2.v1/
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: config = ./examples/fastspeech2/conf/fastspeech2.v1.yaml
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: resume =
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: verbose = 1
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: mixed_precision = True
2020-08-12 02:19:06,035 (train_fastspeech2:289) INFO: version = 0.6.1
Traceback (most recent call last):
File "examples/fastspeech2/train_fastspeech2.py", line 400, in
main()
File "examples/fastspeech2/train_fastspeech2.py", line 316, in main
mel_length_threshold=mel_length_threshold,
File "/home/speechlab/TensorflowTTS/examples/fastspeech2/fastspeech2_dataset.py", line 104, in init
), f"Number of charactor, mel, duration, f0 and energy files are different"
AssertionError: Number of charactor, mel, duration, f0 and energy files are different
how do i solve this problem? can anybody help me ? thank a lot!
The text was updated successfully, but these errors were encountered: