Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectogram Loss Value is NaN #73

Closed
kin0303 opened this issue Dec 28, 2022 · 11 comments
Closed

Spectogram Loss Value is NaN #73

kin0303 opened this issue Dec 28, 2022 · 11 comments

Comments

@kin0303
Copy link

kin0303 commented Dec 28, 2022

I'm trying to do some training and found that the spectrogram loss is NaN. After reading again I found in the section https://github.com/DigitalPhonetics/IMS-Toucan#faq-:~:text=Loss%20turns%20to,use%20for%20TTS. that I should try using the scorer. I do it like this:

  1. Running python3 run_training_pipeline.py integration_test --gpu_id 0, but even now the result is still NaN and I can't find the file best.py
  2. After that I ran python3 run_scorer.py

Is this step correct? I'm trying to run this using 1000 LJ Speech data. What should I do so that the spectrogram loss value is not NaN? For information, I’m using batch size: 8 and lr=0.001

@kin0303 kin0303 changed the title Training using several languages Spectogram Loss Value is NaN Jan 2, 2023
@Flux9665
Copy link
Collaborator

Flux9665 commented Jan 2, 2023

Have you added your data in those pipelines, is the dataset cache properly created and are you using the pretrained models? What exactly is the configuration that you are running?

@kin0303
Copy link
Author

kin0303 commented Jan 3, 2023

Have you added your data in those pipelines, is the dataset cache properly created and are you using the pretrained models?

Yes, I added and I'm using pretrained models

Screenshot from 2023-01-03 08-44-51

What exactly is the configuration that you are running?

python3 run_training_pipeline.py integration_test --gpu_id 0

@kin0303
Copy link
Author

kin0303 commented Jan 3, 2023

@Flux9665
Copy link
Collaborator

Flux9665 commented Jan 4, 2023

The way you integrated the data into the pipeline looks good, I don't see an issue there. For LJSpeech, data cleaning with the scorer should not be necessary, because the data is already pretty clean, so I suspect that the problem is not in the data, but there is a mistake somewhere. The hyperparameters are meant for testing, not necessarily to get good results, but the loss should not become NaN even with the settings of the integration test.

You're right about the acoustic_model missing, that part of the documentation is outdated. I will fix it with the next version. The acoustic model is now detected and loaded automatically.

if os.path.exists(os.path.join(MODELS_DIR, "Aligner", "aligner.pt")):

I'm not sure where the problem lies, but you could try using this pipeline instead of the testing pipeline:

https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/TrainingInterfaces/TrainingPipelines/FastSpeech2_Controllable.py

The docuentation is pretty outdated, I have been very sick for a long time recently and still recovering, so everything is a bit behind and outdated at the moment. When I'm better I'll get back to updating the docs and prepare a new release.

@kin0303
Copy link
Author

kin0303 commented Jan 5, 2023

I have been very sick for a long time recently and still recovering

I hope you get better soon.

I'm not sure where the problem lies, but you could try using this pipeline instead of the testing pipeline:
https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/TrainingInterfaces/TrainingPipelines/FastSpeech2_Controllable.py

And I'll try this thing and report again

@kin0303
Copy link
Author

kin0303 commented Jan 6, 2023

The way you integrated the data into the pipeline looks good, I don't see an issue there. For LJSpeech, data cleaning with the scorer should not be necessary, because the data is already pretty clean, so I suspect that the problem is not in the data, but there is a mistake somewhere. The hyperparameters are meant for testing, not necessarily to get good results, but the loss should not become NaN even with the settings of the integration test.

You're right about the acoustic_model missing, that part of the documentation is outdated. I will fix it with the next version. The acoustic model is now detected and loaded automatically.

if os.path.exists(os.path.join(MODELS_DIR, "Aligner", "aligner.pt")):

I'm not sure where the problem lies, but you could try using this pipeline instead of the testing pipeline:

https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/TrainingInterfaces/TrainingPipelines/FastSpeech2_Controllable.py

The docuentation is pretty outdated, I have been very sick for a long time recently and still recovering, so everything is a bit behind and outdated at the moment. When I'm better I'll get back to updating the docs and prepare a new release.

The result is still the same

@Flux9665
Copy link
Collaborator

Is the loss NaN at already the first step? Or does it turn to NaN over time?

@kin0303
Copy link
Author

kin0303 commented Jan 18, 2023

Is the loss NaN at already the first step? Or does it turn to NaN over time?

The loss NaN at the first step

@Flux9665
Copy link
Collaborator

Then it really sounds like there is a bad datapoint in the dataset that causes this problem, maybe a complete mismatch of text and audio. Have you checked for your subset of LJSpeech that the texts and audios you are using actually match? Maybe there war a small mistake somewhere and the index of text and audio has shifted or so.

If everything seems alright with the data and there are no obvious mismatches of text and audio, have you tried the scorer again? Was there still a problem that kept you from using it with the pretrained multilingual model?

@kin0303
Copy link
Author

kin0303 commented Jan 25, 2023

Then it really sounds like there is a bad datapoint in the dataset that causes this problem, maybe a complete mismatch of text and audio. Have you checked for your subset of LJSpeech that the texts and audios you are using actually match? Maybe there war a small mistake somewhere and the index of text and audio has shifted or so.

If everything seems alright with the data and there are no obvious mismatches of text and audio, have you tried the scorer again? Was there still a problem that kept you from using it with the pretrained multilingual model?

After I used another computer, I didn't have this problem anymore. I don't know what the cause is. The data I use is exactly the same

@Flux9665
Copy link
Collaborator

Would be interesting to know what caused this, but I'm happy to hear that it works now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants