Discussion: NNSVS vs. NEUTRINO #101

r9y9 · 2022-04-30T04:15:38Z

Samples: https://soundcloud.com/r9y9/sets/nnsvs-and-neutrino-comparison

While I was looking into the differences from nnsvs and neutrino samples, I noticed that there are MUCH room for improvement in the acoustic model. I will put some analysis results for the record.

Global variance

Spectrogram

Upper: nnsvs, lower: neutrino

Looks like neutrino put emphasis on <8000 Hz frequency bands

Aperiodicity

Upper: nnsvs, lower: neutrino

It seems that neutrino performs phrase-level synthesis (separated by rests I guess?). Aperiodicity components are filled with constant values for pause.

F0

MGC

mgc 0th: ours are shifted. This is not important cause gain of signals are different at training.
mgc higher dims: Clearly ours are smoothed. Temporal fluctuations are clearly observed for neutrino, but not for nnsvs.

BAP

Same as mgc, ours are over-smoothed

So what can we do?

So far I am thinking of the following ideas

Try autoregressive models to alleviate over-smoothing issues for mgc/bap modeling Improved acoustic model support: introducing autoregressive structure #15
Design a post-filter to alleviate the over-smoothing issues. I guess modulation spectrum based post-filter would work to some extent.

taroushirani · 2022-04-30T12:05:37Z

I wonder why higher mgc(s) and bap generated by GAN-based model are over-smoothed. Is there any possibility that MLPG contributes this over-smoothing?

I think phrase-level synthesis of neutrino may be to avoid the shortage of GPU memory.

r9y9 · 2022-04-30T13:51:33Z

I suspect MLPG causes over-smoothing. I tried disabling MLPG but it actually did cause quality degradation. In particular, generated F0 became too flat. Maybe it would be worth trying to disable MLPG for spectral features (mgc and bap) and enable it for F0. Also, note that the GAN-based model is still in an experimental stage. I am still struggling to make it work good.

Yes, phrase-level synthesis could be useful to avoid GPU out-of-memory error when using NSF. It would also be useful if we use modulation spectrum based post-filter (search segment-level post-filter in https://ahcweb01.naist.jp/papers/journal/2016/201604_TASLP_Takamichi_1/201604_TASLP_Takamichi_1.paper.pdf)

taroushirani · 2022-04-30T14:46:36Z

Thank you for your rapid resnponse. I'm sorry but I misunderstood that the acoustic model of NNSVS was GAN-based because the graph legends of MGC and BAP(I re-checked the descriptions of samples at soundcloud).

And thank you for the information about modulation spectrum based post-filter. I'll read the paper.

r9y9 · 2022-04-30T15:17:11Z

Sorry that's my bad. I didn't include any detailed information in the description. Some notes:

baseline: a baseline ResSkipF0FFConvLSTM model
gan: my attempt to integrate GAN for training ResSkipF0FFConvLSTM model (not very good at the moment)
neutrino neutrino.

For spectrogram/aperiodicity/F0, I used the baseline model. For mgc/bap, I used both the baseline and gan for comparison.

r9y9 · 2022-04-30T15:24:26Z

A good news: I've done an initial cut for MS post-filter and here is the spectrogram example:

From top to bottom: gan, gan with MS post-filter, neutrino

Findings so far:

I got very similar patterns with neutrino by the MS-based post-filter. It's likely that neutrino also uses a similar (or same) post-filtering technique.
Over-smoothing can be alleviated by the MS-based post-filter.

An illustration for 50-dim mgc with and without post-filter:

r9y9 · 2022-05-22T04:15:05Z

Top: NNSVS (w/ GAN-based post-filter)
Bottom: Neutrino

My bad; previous spectrogram visualization was wrong. I was assuming that neutrino uses the same mgc as ours, but it turned out they use a slightly differnet approach. Specifically,

Neutrino: pyworld.code_spectral_envelope (or C++ version of its impl) to convert spectral envelope to mgc
nnsvs: pysptk.sp2mc to convert spectral envelope to mgc

I suppose there's no big difference, but we may want to try the same approach as Neutrino to see if it actually makes difference.

r9y9 · 2022-11-30T18:19:37Z

#1 (comment)

r9y9 · 2022-12-01T02:56:29Z

I'll report a more detailed comparison by Jan 2023. I'll have a long vacation for a while.

r9y9 added the discussion label Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: NNSVS vs. NEUTRINO #101

Discussion: NNSVS vs. NEUTRINO #101

r9y9 commented Apr 30, 2022

taroushirani commented Apr 30, 2022

r9y9 commented Apr 30, 2022

taroushirani commented Apr 30, 2022 •

edited

r9y9 commented Apr 30, 2022

r9y9 commented Apr 30, 2022

r9y9 commented May 22, 2022

r9y9 commented Nov 30, 2022

r9y9 commented Dec 1, 2022

Discussion: NNSVS vs. NEUTRINO #101

Discussion: NNSVS vs. NEUTRINO #101

Comments

r9y9 commented Apr 30, 2022

Global variance

Spectrogram

Aperiodicity

F0

MGC

BAP

So what can we do?

taroushirani commented Apr 30, 2022

r9y9 commented Apr 30, 2022

taroushirani commented Apr 30, 2022 • edited

r9y9 commented Apr 30, 2022

r9y9 commented Apr 30, 2022

r9y9 commented May 22, 2022

r9y9 commented Nov 30, 2022

r9y9 commented Dec 1, 2022

taroushirani commented Apr 30, 2022 •

edited