Spectogram Loss Value is NaN #73

kin0303 · 2022-12-28T04:32:59Z

I'm trying to do some training and found that the spectrogram loss is NaN. After reading again I found in the section https://github.com/DigitalPhonetics/IMS-Toucan#faq-:~:text=Loss%20turns%20to,use%20for%20TTS. that I should try using the scorer. I do it like this:

Running python3 run_training_pipeline.py integration_test --gpu_id 0, but even now the result is still NaN and I can't find the file best.py
After that I ran python3 run_scorer.py

Is this step correct? I'm trying to run this using 1000 LJ Speech data. What should I do so that the spectrogram loss value is not NaN? For information, I’m using batch size: 8 and lr=0.001

The text was updated successfully, but these errors were encountered:

Flux9665 · 2023-01-02T17:40:38Z

Have you added your data in those pipelines, is the dataset cache properly created and are you using the pretrained models? What exactly is the configuration that you are running?

kin0303 · 2023-01-03T01:49:58Z

Have you added your data in those pipelines, is the dataset cache properly created and are you using the pretrained models?

Yes, I added and I'm using pretrained models

What exactly is the configuration that you are running?

python3 run_training_pipeline.py integration_test --gpu_id 0

kin0303 · 2023-01-03T03:30:54Z

In this section

https://github.com/DigitalPhonetics/IMS-Toucan#creating-a-new-pipeline-:~:text=In%20your%20new,to%20extract%20durations.

I can't find the acoustic_model

Flux9665 · 2023-01-04T19:14:21Z

The way you integrated the data into the pipeline looks good, I don't see an issue there. For LJSpeech, data cleaning with the scorer should not be necessary, because the data is already pretty clean, so I suspect that the problem is not in the data, but there is a mistake somewhere. The hyperparameters are meant for testing, not necessarily to get good results, but the loss should not become NaN even with the settings of the integration test.

You're right about the acoustic_model missing, that part of the documentation is outdated. I will fix it with the next version. The acoustic model is now detected and loaded automatically.

IMS-Toucan/Utility/corpus_preparation.py

Line 40 in 1c581e0

if os.path.exists(os.path.join(MODELS_DIR, "Aligner", "aligner.pt")):

I'm not sure where the problem lies, but you could try using this pipeline instead of the testing pipeline:

https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/TrainingInterfaces/TrainingPipelines/FastSpeech2_Controllable.py

The docuentation is pretty outdated, I have been very sick for a long time recently and still recovering, so everything is a bit behind and outdated at the moment. When I'm better I'll get back to updating the docs and prepare a new release.

kin0303 · 2023-01-05T02:34:55Z

I have been very sick for a long time recently and still recovering

I hope you get better soon.

I'm not sure where the problem lies, but you could try using this pipeline instead of the testing pipeline:
https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/TrainingInterfaces/TrainingPipelines/FastSpeech2_Controllable.py

And I'll try this thing and report again

kin0303 · 2023-01-06T01:48:55Z

The way you integrated the data into the pipeline looks good, I don't see an issue there. For LJSpeech, data cleaning with the scorer should not be necessary, because the data is already pretty clean, so I suspect that the problem is not in the data, but there is a mistake somewhere. The hyperparameters are meant for testing, not necessarily to get good results, but the loss should not become NaN even with the settings of the integration test.

You're right about the acoustic_model missing, that part of the documentation is outdated. I will fix it with the next version. The acoustic model is now detected and loaded automatically.

IMS-Toucan/Utility/corpus_preparation.py

Line 40 in 1c581e0

if os.path.exists(os.path.join(MODELS_DIR, "Aligner", "aligner.pt")):

I'm not sure where the problem lies, but you could try using this pipeline instead of the testing pipeline:

https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/TrainingInterfaces/TrainingPipelines/FastSpeech2_Controllable.py

The docuentation is pretty outdated, I have been very sick for a long time recently and still recovering, so everything is a bit behind and outdated at the moment. When I'm better I'll get back to updating the docs and prepare a new release.

The result is still the same

Flux9665 · 2023-01-16T11:28:59Z

Is the loss NaN at already the first step? Or does it turn to NaN over time?

kin0303 · 2023-01-18T01:52:54Z

Is the loss NaN at already the first step? Or does it turn to NaN over time?

The loss NaN at the first step

Flux9665 · 2023-01-18T14:03:08Z

Then it really sounds like there is a bad datapoint in the dataset that causes this problem, maybe a complete mismatch of text and audio. Have you checked for your subset of LJSpeech that the texts and audios you are using actually match? Maybe there war a small mistake somewhere and the index of text and audio has shifted or so.

If everything seems alright with the data and there are no obvious mismatches of text and audio, have you tried the scorer again? Was there still a problem that kept you from using it with the pretrained multilingual model?

kin0303 · 2023-01-25T03:11:25Z

Then it really sounds like there is a bad datapoint in the dataset that causes this problem, maybe a complete mismatch of text and audio. Have you checked for your subset of LJSpeech that the texts and audios you are using actually match? Maybe there war a small mistake somewhere and the index of text and audio has shifted or so.

If everything seems alright with the data and there are no obvious mismatches of text and audio, have you tried the scorer again? Was there still a problem that kept you from using it with the pretrained multilingual model?

After I used another computer, I didn't have this problem anymore. I don't know what the cause is. The data I use is exactly the same

Flux9665 · 2023-02-13T17:38:31Z

Would be interesting to know what caused this, but I'm happy to hear that it works now!

kin0303 changed the title ~~Training using several languages~~ Spectogram Loss Value is NaN Jan 2, 2023

Flux9665 closed this as completed Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spectogram Loss Value is NaN #73

Spectogram Loss Value is NaN #73

kin0303 commented Dec 28, 2022 •

edited

Flux9665 commented Jan 2, 2023

kin0303 commented Jan 3, 2023 •

edited

kin0303 commented Jan 3, 2023

Flux9665 commented Jan 4, 2023

kin0303 commented Jan 5, 2023

kin0303 commented Jan 6, 2023

Flux9665 commented Jan 16, 2023

kin0303 commented Jan 18, 2023

Flux9665 commented Jan 18, 2023

kin0303 commented Jan 25, 2023

Flux9665 commented Feb 13, 2023

Spectogram Loss Value is NaN #73

Spectogram Loss Value is NaN #73

Comments

kin0303 commented Dec 28, 2022 • edited

Flux9665 commented Jan 2, 2023

kin0303 commented Jan 3, 2023 • edited

kin0303 commented Jan 3, 2023

Flux9665 commented Jan 4, 2023

kin0303 commented Jan 5, 2023

kin0303 commented Jan 6, 2023

Flux9665 commented Jan 16, 2023

kin0303 commented Jan 18, 2023

Flux9665 commented Jan 18, 2023

kin0303 commented Jan 25, 2023

Flux9665 commented Feb 13, 2023

kin0303 commented Dec 28, 2022 •

edited

kin0303 commented Jan 3, 2023 •

edited