New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: NNSVS vs. NEUTRINO #101
Comments
I wonder why higher mgc(s) and bap generated by GAN-based model are over-smoothed. Is there any possibility that MLPG contributes this over-smoothing? I think phrase-level synthesis of neutrino may be to avoid the shortage of GPU memory. |
I suspect MLPG causes over-smoothing. I tried disabling MLPG but it actually did cause quality degradation. In particular, generated F0 became too flat. Maybe it would be worth trying to disable MLPG for spectral features (mgc and bap) and enable it for F0. Also, note that the GAN-based model is still in an experimental stage. I am still struggling to make it work good. Yes, phrase-level synthesis could be useful to avoid GPU out-of-memory error when using NSF. It would also be useful if we use modulation spectrum based post-filter (search |
Thank you for your rapid resnponse. I'm sorry but I misunderstood that the acoustic model of NNSVS was GAN-based because the graph legends of MGC and BAP(I re-checked the descriptions of samples at soundcloud). And thank you for the information about modulation spectrum based post-filter. I'll read the paper. |
Sorry that's my bad. I didn't include any detailed information in the description. Some notes:
For spectrogram/aperiodicity/F0, I used the |
Top: NNSVS (w/ GAN-based post-filter) My bad; previous spectrogram visualization was wrong. I was assuming that neutrino uses the same mgc as ours, but it turned out they use a slightly differnet approach. Specifically,
I suppose there's no big difference, but we may want to try the same approach as Neutrino to see if it actually makes difference. |
I'll report a more detailed comparison by Jan 2023. I'll have a long vacation for a while. |
Samples: https://soundcloud.com/r9y9/sets/nnsvs-and-neutrino-comparison
While I was looking into the differences from nnsvs and neutrino samples, I noticed that there are MUCH room for improvement in the acoustic model. I will put some analysis results for the record.
Global variance
Spectrogram
Upper: nnsvs, lower: neutrino
Looks like neutrino put emphasis on <8000 Hz frequency bands
Aperiodicity
Upper: nnsvs, lower: neutrino
It seems that neutrino performs phrase-level synthesis (separated by rests I guess?). Aperiodicity components are filled with constant values for pause.
F0
MGC
BAP
So what can we do?
So far I am thinking of the following ideas
The text was updated successfully, but these errors were encountered: