Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alternative vocoders? #261

Open
DabiaoMa opened this issue Oct 13, 2017 · 19 comments
Open

alternative vocoders? #261

DabiaoMa opened this issue Oct 13, 2017 · 19 comments

Comments

@DabiaoMa
Copy link

Hi,

I am using World vocoder to reroduce wavs but it seems like World maybe not a good choice to produce high quality voices. I extracted acoustic features with World from original wavs then directly synthesized wavs with them. For most of the time the generated wavs are not good.

Maybe I need to modify some parameters of World vocoder like frame shift or frame length, but with more synthesis time cost?

Can we use some other vocal models other than World or Straight?

Deep mind declared that they created a new version of WaveNet that is 1000 times faster than the
previous one, maybe the accelerated WaveNet is a good choice?

Tacotron uses a Griffin-Lim algorithm as vocoder, but I do not know whether Griffin-Lim performs better or not.

Best,

@ljuvela
Copy link

ljuvela commented Oct 13, 2017

I recommend trying https://github.com/gillesdegottex/pulsemodel
It should be relatively straightforward to integrate into Merlin.

As for WaveNet, the latest DeepMind announcement contained virtually no information on what they have done. Let's wait for a paper. Anyway, most recent papers with WaveNet waveform generator (e.g. Baidu's stuff) use some acoustic features for local conditioning, so let's not give up on the old vocoders yet.

Griffin-Lim is not really a vocoder, but rather a method to create consistent phase information for a magnitude spectrogram. So the quality there depends entirely on how well you can predict magnitude spectra (which is not too easy).

@DabiaoMa
Copy link
Author

Thanks, I checked the paper 'Pulse Model in Log-domain for a Uniform Synthesizer', the author concludes that the Comparative mean opinion scores is a little worse than Straight.

Hope Deep mind would publish details soon...

@gillesdegottex
Copy link
Contributor

For PML, using RNN and proper layer output, the results are eventually better, as shown in the journal article:
http://gillesdegottex.eu/wp-content/papercite-data/pdf/DegottexG2017pmlj_acceptedversion.pdf
(out since a week only).
Do not hesitate to ask me some new features for the code (feature extraction and synthesis options), I'll be surely happy to implement them.

Waveform synth is definitely a great solution for quality. The only problem is the technical "details" necessary to run this fast and the current feedback I got in conference about this is: "We can't speak about this". So you might have to wait quite a bit before getting details :(

Happy to read any other solution you find @DabiaoMa

(Thanks @ljuvela !)

@fosimoes
Copy link

I was considering using MagPhase, which was recently presented at Interspeech.
https://github.com/CSTR-Edinburgh/magphase
MagPhase documentation has some guidelines on how to use it with Merlin.
It suggests that running some scripts and changing Merlin's config file should be enough. I believe, however, that some modification in source code is necessary, since references to WORLD features (mgc, bap and lf0) are hard-coded into Merlin.
Has anyone tried to do it?

@dreamk73
Copy link

Interesting suggestions to check out. @fosimoes, you only need to add some code to merlin/src/configuration/configuration.py, where you will see settings for STRAIGHT and WORLD. You can define a different vocoder there in the same manner. You can define there what kind of parameter directories / files are used by the particular vocoder. I have tried it with GlottHMM and AHOcoder in the past.

@DabiaoMa
Copy link
Author

@gillesdegottex Thanks and I will wait for the details

@felipeespic
Copy link
Member

@fosimoes Yes, I have tried and works! You just need to add the MagPhase parameters (mag, real, imag and their deltas) in the configuration.py file with the dimensions: mag: 60, dmag: 180, real: 45, dreal: 135, imag: 45, dimag: 135.

I am currently working on the MagPhase-Merlin integration, but for now you can run the scripts manually as you mentioned. That should work.

@m-toman
Copy link
Contributor

m-toman commented Nov 9, 2017

@felipeespic Do you need help with the integration?
I also planned to try it out.

Do you think MagPhase can be ported to C++ easily? And does it require access to the whole feature vector or can you run it streaming i.e. on chunks/windows?

Great to see so many vocoders coming out now, for years we've been mostly stuck with hts_engine and STRAIGHT ;)

@RasmusD
Copy link
Contributor

RasmusD commented Nov 9, 2017 via email

@felipeespic
Copy link
Member

Hi @m-toman , yes, I think I will need help for testing it. Could you help me with that?

Also, as @RasmusD mentioned, you can implement MagPhase in C++ for streaming.
I think that it should be quite simple if you are proficient in C++.

@m-toman
Copy link
Contributor

m-toman commented Nov 9, 2017

@RasmusD @felipeespic
Well :)... let's see. I've briefly checked the source and the paper and guess it should be possible for me. I'm not very proficient with signal processing, but guess my C++ is OK (at least I ported the synthesis part of Merlin to C++).

@felipeespic Sure, would be glad to. I'm currently looking at the code and stepping through the merlin-preprocessing steps. Also started adapting the configuration.py.
I should be able to test it with a couple of voices with very different levels of quality.

@felipeespic
Copy link
Member

Hi @m-toman , I just pushed the slt_arctic demo using MagPhase to the branch in my fork: https://github.com/felipeespic/merlin/tree/magphase_integration
It should work out of the box.

Could you test that everything works OK? Also, just if you have time, could you implement the slt_arctic full voice with MagPhase, please?

@m-toman
Copy link
Contributor

m-toman commented Nov 15, 2017

@felipeespic OK thanks, should be able to check it out later today and guess should have the time to adapt the "full" version.

EDIT: Could you upload http://felipeespic.com/depot/databases/merlin_demos/slt_arctic_full_data_magphase.zip ?

The demo ran without problems, I'll start to integrate everything into my own scripts, which do an "out of source" build from scratch (so for a given folder of wavs and orthographic transcription). Makes it easier to test it on a dozen voices or so.

@dreamk73
Copy link

dreamk73 commented Nov 15, 2017

@felipeespic when running the script to extract magphase acoustic features with my 48000 Hz audio data, I get an error. Same with 16000.

on line 392 of tools/magphase/src/magphase.py

it says: if (fs != 48000) or (fs != 16000)

The 'or' should be 'and'

And then on line 1657 I get another error but now fs is a list of numbers instead of the single number it was before, so somewhere fs is assigned something else.

@dreamk73
Copy link

tools/magphase/src/magphase.py line 155

m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period)

should be

m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, fs, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period)

Also, that function only returns two variables and this line has five?

@dreamk73
Copy link

Change the return statement of analysis_with_del_comp_from_pm to have those five variables. Next issue is that the function get_fft_params_from_complex_data is not defined in magphase.py. I did have it in the previous version that you made but c&p it and running analysis gave lots of the same errors:

fft : m must be a integer of power of 2!

@felipeespic
Copy link
Member

felipeespic commented Nov 15, 2017

Hi @dreamk73 , Thank you for pointing out. Those functions do not work, because they are deprecated.

I have moved this to the "Issues" section in my fork (felipeespic#1), since that code is not part of the Merlin repo yet. Thanks!

PS: Could I remove these comments from here?

@ljuvela
Copy link

ljuvela commented Nov 22, 2017

Details out on the Google's new production WaveNet:
https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet/

@felipeespic
Copy link
Member

MagPhase vocoder integration with Merlin done in PR #281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants