-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alternative vocoders? #261
Comments
I recommend trying https://github.com/gillesdegottex/pulsemodel As for WaveNet, the latest DeepMind announcement contained virtually no information on what they have done. Let's wait for a paper. Anyway, most recent papers with WaveNet waveform generator (e.g. Baidu's stuff) use some acoustic features for local conditioning, so let's not give up on the old vocoders yet. Griffin-Lim is not really a vocoder, but rather a method to create consistent phase information for a magnitude spectrogram. So the quality there depends entirely on how well you can predict magnitude spectra (which is not too easy). |
Thanks, I checked the paper 'Pulse Model in Log-domain for a Uniform Synthesizer', the author concludes that the Comparative mean opinion scores is a little worse than Straight. Hope Deep mind would publish details soon... |
For PML, using RNN and proper layer output, the results are eventually better, as shown in the journal article: Waveform synth is definitely a great solution for quality. The only problem is the technical "details" necessary to run this fast and the current feedback I got in conference about this is: "We can't speak about this". So you might have to wait quite a bit before getting details :( Happy to read any other solution you find @DabiaoMa (Thanks @ljuvela !) |
I was considering using MagPhase, which was recently presented at Interspeech. |
Interesting suggestions to check out. @fosimoes, you only need to add some code to merlin/src/configuration/configuration.py, where you will see settings for STRAIGHT and WORLD. You can define a different vocoder there in the same manner. You can define there what kind of parameter directories / files are used by the particular vocoder. I have tried it with GlottHMM and AHOcoder in the past. |
@gillesdegottex Thanks and I will wait for the details |
@fosimoes Yes, I have tried and works! You just need to add the MagPhase parameters (mag, real, imag and their deltas) in the configuration.py file with the dimensions: mag: 60, dmag: 180, real: 45, dreal: 135, imag: 45, dimag: 135. I am currently working on the MagPhase-Merlin integration, but for now you can run the scripts manually as you mentioned. That should work. |
@felipeespic Do you need help with the integration? Do you think MagPhase can be ported to C++ easily? And does it require access to the whole feature vector or can you run it streaming i.e. on chunks/windows? Great to see so many vocoders coming out now, for years we've been mostly stuck with hts_engine and STRAIGHT ;) |
MagPhase can be ported to C++ - whether it's easy to do is up to you ;-)
It can be run in individual frame chunks as well - so you can use it in a
streaming fashion - if you're doing a C++ implementation simply think that
in from the beginning :-)
2017-11-09 12:27 GMT+01:00 Markus Toman <notifications@github.com>:
… @felipeespic <https://github.com/felipeespic> Do you need help with the
integration?
I also planned to try it out.
Do you think MagPhase can be ported to C++ easily? And does it require
access to the whole feature vector or can you run it streaming i.e. on
chunks/windows?
Great to see so many vocoders coming out now, for years we've been mostly
stuck with hts_engine and STRAIGHT ;)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEyDpxAHS9mz85057dxMHdLGhvgpYX11ks5s0uGvgaJpZM4P4EAj>
.
|
@RasmusD @felipeespic @felipeespic Sure, would be glad to. I'm currently looking at the code and stepping through the merlin-preprocessing steps. Also started adapting the configuration.py. |
Hi @m-toman , I just pushed the slt_arctic demo using MagPhase to the branch in my fork: https://github.com/felipeespic/merlin/tree/magphase_integration Could you test that everything works OK? Also, just if you have time, could you implement the slt_arctic full voice with MagPhase, please? |
@felipeespic OK thanks, should be able to check it out later today and guess should have the time to adapt the "full" version. EDIT: Could you upload http://felipeespic.com/depot/databases/merlin_demos/slt_arctic_full_data_magphase.zip ? The demo ran without problems, I'll start to integrate everything into my own scripts, which do an "out of source" build from scratch (so for a given folder of wavs and orthographic transcription). Makes it easier to test it on a dozen voices or so. |
@felipeespic when running the script to extract magphase acoustic features with my 48000 Hz audio data, I get an error. Same with 16000. on line 392 of tools/magphase/src/magphase.py it says: if (fs != 48000) or (fs != 16000) The 'or' should be 'and' And then on line 1657 I get another error but now fs is a list of numbers instead of the single number it was before, so somewhere fs is assigned something else. |
tools/magphase/src/magphase.py line 155 m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period) should be m_sp, m_ph, v_shift, m_frms, m_fft = analysis_with_del_comp_from_pm(v_in_sig, fs, v_pm_smpls, nFFT, win_func=win_func, nwin_per_pitch_period=nwin_per_pitch_period) Also, that function only returns two variables and this line has five? |
Change the return statement of analysis_with_del_comp_from_pm to have those five variables. Next issue is that the function get_fft_params_from_complex_data is not defined in magphase.py. I did have it in the previous version that you made but c&p it and running analysis gave lots of the same errors: fft : m must be a integer of power of 2! |
Hi @dreamk73 , Thank you for pointing out. Those functions do not work, because they are deprecated. I have moved this to the "Issues" section in my fork (felipeespic#1), since that code is not part of the Merlin repo yet. Thanks! PS: Could I remove these comments from here? |
Details out on the Google's new production WaveNet: |
MagPhase vocoder integration with Merlin done in PR #281 |
Hi,
I am using World vocoder to reroduce wavs but it seems like World maybe not a good choice to produce high quality voices. I extracted acoustic features with World from original wavs then directly synthesized wavs with them. For most of the time the generated wavs are not good.
Maybe I need to modify some parameters of World vocoder like frame shift or frame length, but with more synthesis time cost?
Can we use some other vocal models other than World or Straight?
Deep mind declared that they created a new version of WaveNet that is 1000 times faster than the
previous one, maybe the accelerated WaveNet is a good choice?
Tacotron uses a Griffin-Lim algorithm as vocoder, but I do not know whether Griffin-Lim performs better or not.
Best,
The text was updated successfully, but these errors were encountered: