Can single GPU get good result? #12

Cheneng · 2018-11-11T02:13:48Z

Does anyone train this model with single GPU(1080ti) and get good result? In this situation i can only run the model with the batch size 1. Cause I don't have enough GPU...

will-rice · 2018-11-11T18:11:08Z

I've been training with batch size 1 and it is doing pretty well. Definitely takes longer, but it seems to still work.

belevtsoff · 2018-11-11T23:04:05Z

@will-rice how long does it take in your case, how many iterations? I tried running it with bsz=3, but shortening the segment length to 8000. After 80k iterations the speech is barely intelligible (I'm using about 12 hours of male voice, 16kHz).

will-rice · 2018-11-11T23:13:42Z

I'm at 140k on ljspeech. It doesn't sound great, but it continues to improve. A smaller dataset I'm using at 165k the speech is noisy, but definitely better than the ljspeech. According to the paper https://arxiv.org/pdf/1811.00002.pdf their model was 24 batch size and 580k iterations. So by extremely rough math you are looking at well over 1mil iterations for results equivalent to the paper.

belevtsoff · 2018-11-11T23:19:13Z

Oh, I see, I overlooked that they used batch size of 24! This explains a lot... Wow, the amount of training this thing requires is insane, compared to the wavenet. Thanks

RPrenger · 2018-11-12T07:47:05Z

Can I ask what value you're using for sigma during training and sample generation? And can you post a sample? We hear "decent speech" at ~160k iterations (though it definitely improves with more). I haven't seen a huge effect from the larger batch size, but we haven't done a lot of ablative analysis yet.

will-rice · 2018-11-12T19:11:22Z

https://soundcloud.com/user-667131267/waveglow-tedlium-150k
training sigma ~~sqrt(0.5).~~ sample sigma is 1.0.
correction: training sigma is also 1.0

rafaelvalle · 2018-11-12T20:56:41Z

@will-rice try sampling with a smaller sigma, 0.8 or 0.6 for example

belevtsoff · 2018-11-12T22:32:23Z

@RPrenger I just realized that I had a bug in the code that made the tensorboardX output audio worse than it is in reality. Anyway, here's an example at 120k steps (16kHz, bsz=3, segment=8000; 1080Ti): https://soundcloud.com/belevtsoff/waveglow_120k. Both sigmas are 1.0

will-rice · 2018-11-12T22:33:57Z

@rafaelvalle Thanks! this is what I'm getting from LJSpeech at 250k now. https://soundcloud.com/user-667131267/in-domain-ljspeech-250k

rafaelvalle · 2018-11-12T23:28:48Z

@will-rice sounds like it's training properly! for generating this ljs sample, what sigma value did you use?

will-rice · 2018-11-13T00:36:46Z

@rafaelvalle sigma 0.85 for that one.

dchaws · 2018-11-13T15:20:01Z

@will-rice is that sample (https://soundcloud.com/user-667131267/in-domain-ljspeech-250k) from a model trained on a smaller dataset?

will-rice · 2018-11-14T01:27:40Z

@dchaws That model was trained on the full ljspeech dataset with the default parameters.

RPrenger · 2018-11-14T17:30:38Z

@belevtsoff That sounds reasonable for 120k iterations. It should keep improving with more iterations. Also try doing inference with sigma=0.8 or so.

G-Wang · 2018-11-14T18:31:12Z

@will-rice what is the synthesis speed on your set up? Faster than real time?

will-rice · 2018-11-14T18:48:31Z

@G-Wang Using a single 1080ti that 9 second clip took about 2 seconds to generate.
Edit: I wanted to add that the model inference is not the only thing running on this card.

rafaelvalle · 2018-11-17T16:21:36Z

Closing issue. Please re-open if needed.

yxt132 · 2019-01-30T03:28:16Z

Can I ask what value you're using for sigma during training and sample generation? And can you post a sample? We hear "decent speech" at ~160k iterations (though it definitely improves with more). I haven't seen a huge effect from the larger batch size, but we haven't done a lot of ablative analysis yet.

How many and what gpu(s) did you use? Did you train on LJSpeech or other dataset? Thanks

rafaelvalle closed this as completed Nov 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can single GPU get good result? #12

Can single GPU get good result? #12

Cheneng commented Nov 11, 2018

will-rice commented Nov 11, 2018

belevtsoff commented Nov 11, 2018

will-rice commented Nov 11, 2018

belevtsoff commented Nov 11, 2018

RPrenger commented Nov 12, 2018 •

edited

will-rice commented Nov 12, 2018 •

edited

rafaelvalle commented Nov 12, 2018

belevtsoff commented Nov 12, 2018 •

edited

will-rice commented Nov 12, 2018

rafaelvalle commented Nov 12, 2018 •

edited

will-rice commented Nov 13, 2018

dchaws commented Nov 13, 2018

will-rice commented Nov 14, 2018

RPrenger commented Nov 14, 2018

G-Wang commented Nov 14, 2018

will-rice commented Nov 14, 2018 •

edited

rafaelvalle commented Nov 17, 2018

yxt132 commented Jan 30, 2019

Can single GPU get good result? #12

Can single GPU get good result? #12

Comments

Cheneng commented Nov 11, 2018

will-rice commented Nov 11, 2018

belevtsoff commented Nov 11, 2018

will-rice commented Nov 11, 2018

belevtsoff commented Nov 11, 2018

RPrenger commented Nov 12, 2018 • edited

will-rice commented Nov 12, 2018 • edited

rafaelvalle commented Nov 12, 2018

belevtsoff commented Nov 12, 2018 • edited

will-rice commented Nov 12, 2018

rafaelvalle commented Nov 12, 2018 • edited

will-rice commented Nov 13, 2018

dchaws commented Nov 13, 2018

will-rice commented Nov 14, 2018

RPrenger commented Nov 14, 2018

G-Wang commented Nov 14, 2018

will-rice commented Nov 14, 2018 • edited

rafaelvalle commented Nov 17, 2018

yxt132 commented Jan 30, 2019

RPrenger commented Nov 12, 2018 •

edited

will-rice commented Nov 12, 2018 •

edited

belevtsoff commented Nov 12, 2018 •

edited

rafaelvalle commented Nov 12, 2018 •

edited

will-rice commented Nov 14, 2018 •

edited