-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The choice of vocoder (WaveRNN vs WaveGlow) #82
Comments
MOS quality of both WaveGlow and FloWaveNet seems to be significantly worse
than WaveNet according to original papers. WaveRNN’s is much closer to
WaveNet.
…On Thu, Aug 8, 2019 at 6:49 AM Alexander Veysov ***@***.***> wrote:
Hi!
For a change it is always great to see a repo where a real human did
something as opposed to an endless stream of corporate / academic research
that cannot essentially be reproduced.
We have published a huge STT dataset
<https://github.com/snakers4/open_stt/> and are also planning to extend
our TTS dataset <https://github.com/snakers4/open_stt/> with 30-40 voices
(at least). Our datasets are in Russian. So if you would like to extend
your language support - please stay tuned.
Anyway - I wanted to ask - why did you choose WaveRNN? It seems that
WaveGlow / FloWaveNet are the go-to option now? I tested WaveGlow - it
trains mostly as promised and code is really easy to use.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#82?email_source=notifications&email_token=ABMAQJ3Y2LIV3YL6H4YFGS3QDOJURA5CNFSM4IKGC462YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEBJZ3A>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABMAQJ7VMJ32MLLGVBVCKRDQDOJURANCNFSM4IKGC46Q>
.
|
Well actually WaveGlow's quality doesn't seem too bad compared to WaveRNN. But also, WaveGlow may have had a worse synthesizer than google did. But as far as vocoder's performance goes (spec to wav), WaveGlow does not seem too bad, but much faster. |
WaveGlow is both slower and of worse quality than WaveRNN. Do keep in mind that we're talking about the available public implementations, not the papers. But even in the papers it's the case. |
I (and some other people) tested that the official public implementation of WaveGlow was 4-8x RTS on one 1080Ti (inference) What are your benchmarks? |
Right, I guess I was wrong about the speed of WaveGlow then. WaveRNN uses batched inference, so its speed is proportional to the length of the spectrogram to synthesize. I've gone up to 20x real-time for WaveRNN, but on short sentences it's going to be around 1x. Anyway, the reason I picked WaveRNN over WaveGlow was due to the quality of the samples each open source implementation presented. |
Nvidia's implementation of WaveGlow is giving me nearly 2000kHz on my v100. I tried this repo's vocoder out of box, and maybe I'm doing sth wrong but it's sub real-time (0.6~0.8 RTS). |
ah yes, I should test on longer sentences. I ran tests on a single sentence <100 characters. |
I agree with CorentinJ, too. MOS results/links to samples reported in
original papers, open source implementations, and derivative work on speech
synthesis and voice conversion using alternate models don't seem to reach
WaveNet/WaveRNN quality yet. Of course, WaveNet which is kind of the gold
standard in vocoder technology is computationally too expensive to use in
run-time. I think WaveRNN is the best alternate vocoder so far. It's easier
to implement and to train. LPCNet looks also very interesting especially if
you have more limited computational resources, no GPU, etc.
…On Thu, Aug 8, 2019 at 11:02 AM Corentin Jemine ***@***.***> wrote:
WaveGlow is both slower and of worse quality than WaveRNN. Do keep in mind
that we're talking about the available public implementations, not the
papers. But even in the papers it's the case.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#82?email_source=notifications&email_token=ABMAQJ46JRIUOADIMYKQYWLQDPHH3A5CNFSM4IKGC462YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD32Z3JA#issuecomment-519413156>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABMAQJ5NGM7C3CZRXEUKVNDQDPHH3ANCNFSM4IKGC46Q>
.
|
I see. Many thanks to all of the participants of the chat. |
Is WaveNet still the best vokoder for today? |
I've read that WaveGlow is more robust in handling several languages, but WaveRNN is language dependent and quickly degrades when you train on an additional language. If I were to create a multilingual system would it still be better to use WaveRNN and train several different models? Or use a single WaveGlow model that could essentially handle any language? What would my cost be in quality and Speed? |
I did not play around with it yet, but some time ago I came across this repo on multilingual TTS in a single synthesizer: https://github.com/Tomiinek/Multilingual_Text_to_Speech They're using WaveRNN, so I assume the quality really just depends on whether the Vocoder has been trained with good multilingual samples. In the end the vocoder is used to render a generated AI voice more natural; so it 'should' not matter which language the voice is speaking in. |
Hi!
For a change it is always great to see a repo where a real human did something as opposed to an endless stream of corporate / academic research that cannot essentially be reproduced.
We have published a huge STT dataset and are also planning to extend our TTS dataset with 30-40 voices (at least). Our datasets are in Russian. So if you would like to extend your language support - please stay tuned.
Anyway - I wanted to ask - why did you choose WaveRNN? It seems that WaveGlow / FloWaveNet are the go-to option now? I tested WaveGlow - it trains mostly as promised and code is really easy to use.
The text was updated successfully, but these errors were encountered: