alternative vocoders? #261

DabiaoMa · 2017-10-13T07:10:02Z

Hi,

I am using World vocoder to reroduce wavs but it seems like World maybe not a good choice to produce high quality voices. I extracted acoustic features with World from original wavs then directly synthesized wavs with them. For most of the time the generated wavs are not good.

Maybe I need to modify some parameters of World vocoder like frame shift or frame length, but with more synthesis time cost?

Can we use some other vocal models other than World or Straight?

Deep mind declared that they created a new version of WaveNet that is 1000 times faster than the
previous one, maybe the accelerated WaveNet is a good choice?

Tacotron uses a Griffin-Lim algorithm as vocoder, but I do not know whether Griffin-Lim performs better or not.

Best,

ljuvela · 2017-10-13T12:24:06Z

I recommend trying https://github.com/gillesdegottex/pulsemodel
It should be relatively straightforward to integrate into Merlin.

As for WaveNet, the latest DeepMind announcement contained virtually no information on what they have done. Let's wait for a paper. Anyway, most recent papers with WaveNet waveform generator (e.g. Baidu's stuff) use some acoustic features for local conditioning, so let's not give up on the old vocoders yet.

Griffin-Lim is not really a vocoder, but rather a method to create consistent phase information for a magnitude spectrogram. So the quality there depends entirely on how well you can predict magnitude spectra (which is not too easy).

DabiaoMa · 2017-10-16T03:06:53Z

Thanks, I checked the paper 'Pulse Model in Log-domain for a Uniform Synthesizer', the author concludes that the Comparative mean opinion scores is a little worse than Straight.

Hope Deep mind would publish details soon...

gillesdegottex · 2017-10-16T09:25:50Z

For PML, using RNN and proper layer output, the results are eventually better, as shown in the journal article:
http://gillesdegottex.eu/wp-content/papercite-data/pdf/DegottexG2017pmlj_acceptedversion.pdf
(out since a week only).
Do not hesitate to ask me some new features for the code (feature extraction and synthesis options), I'll be surely happy to implement them.

Waveform synth is definitely a great solution for quality. The only problem is the technical "details" necessary to run this fast and the current feedback I got in conference about this is: "We can't speak about this". So you might have to wait quite a bit before getting details :(

Happy to read any other solution you find @DabiaoMa

(Thanks @ljuvela !)

fosimoes · 2017-10-17T12:10:26Z

I was considering using MagPhase, which was recently presented at Interspeech.
https://github.com/CSTR-Edinburgh/magphase
MagPhase documentation has some guidelines on how to use it with Merlin.
It suggests that running some scripts and changing Merlin's config file should be enough. I believe, however, that some modification in source code is necessary, since references to WORLD features (mgc, bap and lf0) are hard-coded into Merlin.
Has anyone tried to do it?

dreamk73 · 2017-10-17T13:25:29Z

Interesting suggestions to check out. @fosimoes, you only need to add some code to merlin/src/configuration/configuration.py, where you will see settings for STRAIGHT and WORLD. You can define a different vocoder there in the same manner. You can define there what kind of parameter directories / files are used by the particular vocoder. I have tried it with GlottHMM and AHOcoder in the past.

DabiaoMa · 2017-10-18T02:40:41Z

@gillesdegottex Thanks and I will wait for the details

felipeespic · 2017-10-29T02:25:47Z

@fosimoes Yes, I have tried and works! You just need to add the MagPhase parameters (mag, real, imag and their deltas) in the configuration.py file with the dimensions: mag: 60, dmag: 180, real: 45, dreal: 135, imag: 45, dimag: 135.

I am currently working on the MagPhase-Merlin integration, but for now you can run the scripts manually as you mentioned. That should work.

m-toman · 2017-11-09T11:27:42Z

@felipeespic Do you need help with the integration?
I also planned to try it out.

Do you think MagPhase can be ported to C++ easily? And does it require access to the whole feature vector or can you run it streaming i.e. on chunks/windows?

Great to see so many vocoders coming out now, for years we've been mostly stuck with hts_engine and STRAIGHT ;)

RasmusD · 2017-11-09T12:31:47Z

MagPhase can be ported to C++ - whether it's easy to do is up to you ;-) It can be run in individual frame chunks as well - so you can use it in a streaming fashion - if you're doing a C++ implementation simply think that in from the beginning :-) 2017-11-09 12:27 GMT+01:00 Markus Toman <notifications@github.com>:

…

@felipeespic <https://github.com/felipeespic> Do you need help with the integration? I also planned to try it out. Do you think MagPhase can be ported to C++ easily? And does it require access to the whole feature vector or can you run it streaming i.e. on chunks/windows? Great to see so many vocoders coming out now, for years we've been mostly stuck with hts_engine and STRAIGHT ;) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#261 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEyDpxAHS9mz85057dxMHdLGhvgpYX11ks5s0uGvgaJpZM4P4EAj> .

felipeespic · 2017-11-09T18:03:45Z

Hi @m-toman , yes, I think I will need help for testing it. Could you help me with that?

Also, as @RasmusD mentioned, you can implement MagPhase in C++ for streaming.
I think that it should be quite simple if you are proficient in C++.

m-toman · 2017-11-09T18:56:37Z

@RasmusD @felipeespic
Well :)... let's see. I've briefly checked the source and the paper and guess it should be possible for me. I'm not very proficient with signal processing, but guess my C++ is OK (at least I ported the synthesis part of Merlin to C++).

@felipeespic Sure, would be glad to. I'm currently looking at the code and stepping through the merlin-preprocessing steps. Also started adapting the configuration.py.
I should be able to test it with a couple of voices with very different levels of quality.

felipeespic · 2017-11-15T00:18:17Z

Hi @m-toman , I just pushed the slt_arctic demo using MagPhase to the branch in my fork: https://github.com/felipeespic/merlin/tree/magphase_integration
It should work out of the box.

Could you test that everything works OK? Also, just if you have time, could you implement the slt_arctic full voice with MagPhase, please?

m-toman · 2017-11-15T06:46:28Z

@felipeespic OK thanks, should be able to check it out later today and guess should have the time to adapt the "full" version.

EDIT: Could you upload http://felipeespic.com/depot/databases/merlin_demos/slt_arctic_full_data_magphase.zip ?

The demo ran without problems, I'll start to integrate everything into my own scripts, which do an "out of source" build from scratch (so for a given folder of wavs and orthographic transcription). Makes it easier to test it on a dozen voices or so.

dreamk73 · 2017-11-15T09:40:04Z

@felipeespic when running the script to extract magphase acoustic features with my 48000 Hz audio data, I get an error. Same with 16000.

on line 392 of tools/magphase/src/magphase.py

it says: if (fs != 48000) or (fs != 16000)

The 'or' should be 'and'

And then on line 1657 I get another error but now fs is a list of numbers instead of the single number it was before, so somewhere fs is assigned something else.

dreamk73 · 2017-11-15T10:11:53Z

tools/magphase/src/magphase.py line 155

m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period)

should be

m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, fs, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period)

Also, that function only returns two variables and this line has five?

dreamk73 · 2017-11-15T10:21:01Z

Change the return statement of analysis_with_del_comp_from_pm to have those five variables. Next issue is that the function get_fft_params_from_complex_data is not defined in magphase.py. I did have it in the previous version that you made but c&p it and running analysis gave lots of the same errors:

fft : m must be a integer of power of 2!

felipeespic · 2017-11-15T11:59:42Z

Hi @dreamk73 , Thank you for pointing out. Those functions do not work, because they are deprecated.

I have moved this to the "Issues" section in my fork (felipeespic#1), since that code is not part of the Merlin repo yet. Thanks!

PS: Could I remove these comments from here?

ljuvela · 2017-11-22T22:26:32Z

Details out on the Google's new production WaveNet:
https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet/

felipeespic · 2017-11-23T01:39:28Z

MagPhase vocoder integration with Merlin done in PR #281

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alternative vocoders? #261

alternative vocoders? #261

DabiaoMa commented Oct 13, 2017

ljuvela commented Oct 13, 2017

DabiaoMa commented Oct 16, 2017

gillesdegottex commented Oct 16, 2017

fosimoes commented Oct 17, 2017

dreamk73 commented Oct 17, 2017

DabiaoMa commented Oct 18, 2017

felipeespic commented Oct 29, 2017

m-toman commented Nov 9, 2017

RasmusD commented Nov 9, 2017 via email

felipeespic commented Nov 9, 2017

m-toman commented Nov 9, 2017

felipeespic commented Nov 15, 2017

m-toman commented Nov 15, 2017 •

edited

Loading

dreamk73 commented Nov 15, 2017 •

edited

Loading

dreamk73 commented Nov 15, 2017

dreamk73 commented Nov 15, 2017

felipeespic commented Nov 15, 2017 •

edited

Loading

ljuvela commented Nov 22, 2017

felipeespic commented Nov 23, 2017

alternative vocoders? #261

alternative vocoders? #261

Comments

DabiaoMa commented Oct 13, 2017

ljuvela commented Oct 13, 2017

DabiaoMa commented Oct 16, 2017

gillesdegottex commented Oct 16, 2017

fosimoes commented Oct 17, 2017

dreamk73 commented Oct 17, 2017

DabiaoMa commented Oct 18, 2017

felipeespic commented Oct 29, 2017

m-toman commented Nov 9, 2017

RasmusD commented Nov 9, 2017 via email

felipeespic commented Nov 9, 2017

m-toman commented Nov 9, 2017

felipeespic commented Nov 15, 2017

m-toman commented Nov 15, 2017 • edited Loading

dreamk73 commented Nov 15, 2017 • edited Loading

dreamk73 commented Nov 15, 2017

dreamk73 commented Nov 15, 2017

felipeespic commented Nov 15, 2017 • edited Loading

ljuvela commented Nov 22, 2017

felipeespic commented Nov 23, 2017

m-toman commented Nov 15, 2017 •

edited

Loading

dreamk73 commented Nov 15, 2017 •

edited

Loading

felipeespic commented Nov 15, 2017 •

edited

Loading