Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does anyone test the inference process using trained model? #5

Closed
azraelkuan opened this issue Nov 9, 2018 · 10 comments
Closed

Does anyone test the inference process using trained model? #5

azraelkuan opened this issue Nov 9, 2018 · 10 comments

Comments

@azraelkuan
Copy link
Contributor

I have train a model about 60k,
when i test the inference.py using the checkpoint waveglow_0, there will be all noise in the wav.
but when i use the trained model(60k), the generated wav is almost 0, nothing in the wav.
Does anyone have this problem?

@hcwu1993
Copy link

hcwu1993 commented Nov 9, 2018

60k? paper said 580k. model is not convergency.

@azraelkuan
Copy link
Contributor Author

@hcwu1993 Although the model is not trained well, it should generate some noise but not zero value

@Arbaletos
Copy link

In my case on 52k already appears audible speech, but i use lesser model - 256 channels instead of default 512.

@azraelkuan
Copy link
Contributor Author

@Arbaletos do u use the distributed.py? loss value at 56k, my is -18. Thanks

@mkolod
Copy link

mkolod commented Nov 9, 2018

@azraelkuan Have you tried training all the way until convergence? If you're still getting noise after the indicated number of iterations, please provide the following information for reproducibility of your case:

  1. Driver, GPU type and VBIOS version
    nvidia-smi --query-gpu=gpu_name,vbios_version,driver_version --format=csv

  2. PyTorch build (pip package, from source (if so, git hash), Docker image, etc.)

Also, which distributed approach are you using - default PyTorch DDP or Apex DDP? The latter can be found here.

Are you doing fp16 or fp32 training?

@rafaelvalle
Copy link
Contributor

rafaelvalle commented Nov 9, 2018

The code does not include FP16 training.

@azraelkuan
Copy link
Contributor Author

@rafaelvalle may be i have found the problem
for this question, because i load the wav using librosa.load rather than the scipy, so the read data will be [-1, 1] but not int16 numbel, so in the line

waveglow/inference.py

Lines 51 to 52 in f4c04e2

audio = audio.cpu().numpy()
audio = audio.astype('int16')

the output audio is a float32 number between -1 and 1, so when we use the func int16, all the value will be zero.

@rafaelvalle
Copy link
Contributor

rafaelvalle commented Nov 10, 2018

Yes, that's probably it. In the code we load the audio with scipy and divide it by 2^15.
Then, during inference we multiply the output by 2^15.
In your setup, try multiplying your inference output by 2^30.

@azraelkuan
Copy link
Contributor Author

Yes, i remove the max_audio_value and retrain it. thanks

@HashiamKadhim
Copy link

@azraelkuan just wondering if your fix worked, and if so, can you please explain exactly how you fixed it?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants