The choice of vocoder (WaveRNN vs WaveGlow) #82

snakers4 · 2019-08-08T03:49:26Z

Hi!

For a change it is always great to see a repo where a real human did something as opposed to an endless stream of corporate / academic research that cannot essentially be reproduced.

We have published a huge STT dataset and are also planning to extend our TTS dataset with 30-40 voices (at least). Our datasets are in Russian. So if you would like to extend your language support - please stay tuned.

Anyway - I wanted to ask - why did you choose WaveRNN? It seems that WaveGlow / FloWaveNet are the go-to option now? I tested WaveGlow - it trains mostly as promised and code is really easy to use.

oytunturk · 2019-08-08T05:09:35Z

MOS quality of both WaveGlow and FloWaveNet seems to be significantly worse than WaveNet according to original papers. WaveRNN’s is much closer to WaveNet.

…

On Thu, Aug 8, 2019 at 6:49 AM Alexander Veysov ***@***.***> wrote: Hi! For a change it is always great to see a repo where a real human did something as opposed to an endless stream of corporate / academic research that cannot essentially be reproduced. We have published a huge STT dataset <https://github.com/snakers4/open_stt/> and are also planning to extend our TTS dataset <https://github.com/snakers4/open_stt/> with 30-40 voices (at least). Our datasets are in Russian. So if you would like to extend your language support - please stay tuned. Anyway - I wanted to ask - why did you choose WaveRNN? It seems that WaveGlow / FloWaveNet are the go-to option now? I tested WaveGlow - it trains mostly as promised and code is really easy to use. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#82?email_source=notifications&email_token=ABMAQJ3Y2LIV3YL6H4YFGS3QDOJURA5CNFSM4IKGC462YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEBJZ3A>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABMAQJ7VMJ32MLLGVBVCKRDQDOJURANCNFSM4IKGC46Q> .

orbisAI · 2019-08-08T05:37:52Z

Well actually WaveGlow's quality doesn't seem too bad compared to WaveRNN.
Important thing to note is that MOS is subjective and relative measure of audio quality. In the original paper, WaveNet's MOS of training dataset is much higher than that of WaveGlow, which might indicate that for the same quality WaveGlow's MOS may be lower.

But also, WaveGlow may have had a worse synthesizer than google did. But as far as vocoder's performance goes (spec to wav), WaveGlow does not seem too bad, but much faster.

CorentinJ · 2019-08-08T08:02:03Z

WaveGlow is both slower and of worse quality than WaveRNN. Do keep in mind that we're talking about the available public implementations, not the papers. But even in the papers it's the case.

snakers4 · 2019-08-08T08:06:21Z

WaveGlow is both slower and of worse quality than WaveRNN

I (and some other people) tested that the official public implementation of WaveGlow was 4-8x RTS on one 1080Ti (inference)
Have not tested WaveRNN yet, but heard reports of it being <1 RTS

What are your benchmarks?

CorentinJ · 2019-08-08T08:13:38Z

Right, I guess I was wrong about the speed of WaveGlow then. WaveRNN uses batched inference, so its speed is proportional to the length of the spectrogram to synthesize. I've gone up to 20x real-time for WaveRNN, but on short sentences it's going to be around 1x.

Anyway, the reason I picked WaveRNN over WaveGlow was due to the quality of the samples each open source implementation presented.

orbisAI · 2019-08-08T08:14:29Z

WaveGlow is both slower and of worse quality than WaveRNN. Do keep in mind that we're talking about the available public implementations, not the papers. But even in the papers it's the case.

Nvidia's implementation of WaveGlow is giving me nearly 2000kHz on my v100. I tried this repo's vocoder out of box, and maybe I'm doing sth wrong but it's sub real-time (0.6~0.8 RTS).

orbisAI · 2019-08-08T08:14:54Z

Right, I guess I was wrong about the speed of WaveGlow then. WaveRNN uses batched inference, so its speed is proportional to the length of the spectrogram to synthesize. I've gone up to 20x real-time for WaveRNN, but on short sentences it's going to be around 1x.

Anyway, the reason I picked WaveRNN over WaveGlow was due to the quality of the samples each open source implementation presented.

ah yes, I should test on longer sentences. I ran tests on a single sentence <100 characters.

oytunturk · 2019-08-08T08:15:10Z

I agree with CorentinJ, too. MOS results/links to samples reported in original papers, open source implementations, and derivative work on speech synthesis and voice conversion using alternate models don't seem to reach WaveNet/WaveRNN quality yet. Of course, WaveNet which is kind of the gold standard in vocoder technology is computationally too expensive to use in run-time. I think WaveRNN is the best alternate vocoder so far. It's easier to implement and to train. LPCNet looks also very interesting especially if you have more limited computational resources, no GPU, etc.

…

On Thu, Aug 8, 2019 at 11:02 AM Corentin Jemine ***@***.***> wrote: WaveGlow is both slower and of worse quality than WaveRNN. Do keep in mind that we're talking about the available public implementations, not the papers. But even in the papers it's the case. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#82?email_source=notifications&email_token=ABMAQJ46JRIUOADIMYKQYWLQDPHH3A5CNFSM4IKGC462YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD32Z3JA#issuecomment-519413156>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABMAQJ5NGM7C3CZRXEUKVNDQDPHH3ANCNFSM4IKGC46Q> .

snakers4 · 2019-08-08T08:19:18Z

I see. Many thanks to all of the participants of the chat.
Given all of the above, I guess that for business production-like setting (i.e. short sentences) WaveGlow and FloWaveNet are the most balanced options for now.

qo4on · 2020-04-02T14:44:49Z

Is WaveNet still the best vokoder for today?

bryant0918 · 2022-06-07T22:32:19Z

I've read that WaveGlow is more robust in handling several languages, but WaveRNN is language dependent and quickly degrades when you train on an additional language. If I were to create a multilingual system would it still be better to use WaveRNN and train several different models? Or use a single WaveGlow model that could essentially handle any language? What would my cost be in quality and Speed?

RuntimeRacer · 2022-07-29T00:37:41Z

I've read that WaveGlow is more robust in handling several languages, but WaveRNN is language dependent and quickly degrades when you train on an additional language. If I were to create a multilingual system would it still be better to use WaveRNN and train several different models? Or use a single WaveGlow model that could essentially handle any language? What would my cost be in quality and Speed?

I did not play around with it yet, but some time ago I came across this repo on multilingual TTS in a single synthesizer: https://github.com/Tomiinek/Multilingual_Text_to_Speech

They're using WaveRNN, so I assume the quality really just depends on whether the Vocoder has been trained with good multilingual samples. In the end the vocoder is used to render a generated AI voice more natural; so it 'should' not matter which language the voice is speaking in.

snakers4 closed this as completed Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The choice of vocoder (WaveRNN vs WaveGlow) #82

The choice of vocoder (WaveRNN vs WaveGlow) #82

snakers4 commented Aug 8, 2019

oytunturk commented Aug 8, 2019 via email

orbisAI commented Aug 8, 2019

CorentinJ commented Aug 8, 2019

snakers4 commented Aug 8, 2019 •

edited

Loading

CorentinJ commented Aug 8, 2019

orbisAI commented Aug 8, 2019

orbisAI commented Aug 8, 2019

oytunturk commented Aug 8, 2019 via email

snakers4 commented Aug 8, 2019

qo4on commented Apr 2, 2020

bryant0918 commented Jun 7, 2022

RuntimeRacer commented Jul 29, 2022

The choice of vocoder (WaveRNN vs WaveGlow) #82

The choice of vocoder (WaveRNN vs WaveGlow) #82

Comments

snakers4 commented Aug 8, 2019

oytunturk commented Aug 8, 2019 via email

orbisAI commented Aug 8, 2019

CorentinJ commented Aug 8, 2019

snakers4 commented Aug 8, 2019 • edited Loading

CorentinJ commented Aug 8, 2019

orbisAI commented Aug 8, 2019

orbisAI commented Aug 8, 2019

oytunturk commented Aug 8, 2019 via email

snakers4 commented Aug 8, 2019

qo4on commented Apr 2, 2020

bryant0918 commented Jun 7, 2022

RuntimeRacer commented Jul 29, 2022

snakers4 commented Aug 8, 2019 •

edited

Loading