Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abrupt noise, #68

Closed
WendongGan opened this issue Jan 7, 2019 · 20 comments
Closed

Abrupt noise, #68

WendongGan opened this issue Jan 7, 2019 · 20 comments

Comments

@WendongGan
Copy link

Does anybody have such a problem? When it is trained for 1000k steps with LjSpeech , the "abrupt noise" appears. For example:
image
image

The audio file is :
LJ001-0007.wav_synthesis_01.zip

My config.json file is:
image

I used single GPU。

Look forward your help!

@WendongGan
Copy link
Author

Some friends think that the reason is that the dataset is not enough and overfitting appears.

@WendongGan
Copy link
Author

WendongGan commented Jan 7, 2019

My code is from commit f4c04e2. It is commited on Nov 10, 2018。The train costs so long time that I have not use latest code。 Does the latest code have this problem?

@Yeongtae
Copy link

Yeongtae commented Jan 7, 2019

Have you make the sample audio from melspectrogram or text?

@WendongGan
Copy link
Author

When audio is made from melspectrogram and text, the "abrupt noise" will appear. The Both conditions get the same result of noise.

@WendongGan
Copy link
Author

WendongGan commented Jan 7, 2019

I'm trying the latest code. And I want to know whether the latest commits could solve the problem. For example,
image

@Yeongtae
Copy link

Yeongtae commented Jan 14, 2019

@UESTCgan Is it solved? my model has similar noise.
8.zip

@WendongGan
Copy link
Author

WendongGan commented Jan 15, 2019

@UESTCgan Is it solved? my model has similar noise.
8.zip

I listened your sample. How many steps have you trained ? How many hours are your dataset of train ? You mean that your noise is this one :
image

I also have this noise, but the "Abrupt noise" is more serious. It is the noise :
image

I‘m trying the latest code of the author。The step is just 100k,it is not enough , so I'm not sure if it could solve the problem. (f4c04e2).

@Yeongtae
Copy link

Yeongtae commented Jan 15, 2019

My model was trained with 1100epoch.
But it has reverb effect.

@WendongGan
Copy link
Author

My model was trained with 1100epoch.

How many hours are your dataset of train ?

@Yeongtae
Copy link

Yeongtae commented Jan 15, 2019

With 8 v100 gpus in gcp vm, it takes 5 days.
My experiment setting is following:
Num channels: 8bit
Batch size: 80( 10 for each gpu)
Another prameters are dafault.

@WendongGan
Copy link
Author

How much is your sigma ? I set it as 1.0 when I train and infer.

@Yeongtae
Copy link

Yeongtae commented Jan 15, 2019

Sigma is Sqrt(0.5) ~ 0.7071.... for training.
It is default in the waveglow paper.

Sigma is 0.66 for inference. It is default in the demo.

@WendongGan
Copy link
Author

Increase the sigma when infering , background noise will decrease.
image

@Yeongtae
Copy link

Yeongtae commented Jan 15, 2019

But big sigma makes more reverb effect.

@WendongGan
Copy link
Author

But big sigma makes more reverb effect.

I see, thank you !

@Yeongtae
Copy link

Yeongtae commented Jan 16, 2019

My model was trained with 1100epoch.

How many hours are your dataset of train ?

my dataset consist of 13000 sentences and 10 hours.

@yxt132
Copy link

yxt132 commented Jan 28, 2019

Does anybody have such a problem? When it is trained for 1000k steps with LjSpeech , the "abrupt noise" appears. For example:
image
image

The audio file is :
LJ001-0007.wav_synthesis_01.zip

My config.json file is:
image

I used single GPU。

Look forward your help!

I saw you used 16k sampling rate. Isn't the sampling rate 22050 for the LJSPEECH dataset? Or does it matter? What does the segment length do? Does it have to be consistent with the sampling rate?

@rafaelvalle
Copy link
Contributor

Segment length is independent of sampling rate.
It is ok to convert LJS to 16khz. Note that if training tacotron in parallel, it must have the same audio specifications.

@rafaelvalle
Copy link
Contributor

We've shared a quick hack to decrease the fixed noise from model's bias in waveglow :
NVIDIA/tacotron2#142 (comment)

@rafaelvalle
Copy link
Contributor

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants