Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: NNSVS vs. NEUTRINO #101

Open
r9y9 opened this issue Apr 30, 2022 · 8 comments
Open

Discussion: NNSVS vs. NEUTRINO #101

r9y9 opened this issue Apr 30, 2022 · 8 comments

Comments

@r9y9
Copy link
Collaborator

r9y9 commented Apr 30, 2022

Samples: https://soundcloud.com/r9y9/sets/nnsvs-and-neutrino-comparison

While I was looking into the differences from nnsvs and neutrino samples, I noticed that there are MUCH room for improvement in the acoustic model. I will put some analysis results for the record.

Global variance

download

Spectrogram

Upper: nnsvs, lower: neutrino

download-1

Looks like neutrino put emphasis on <8000 Hz frequency bands

Aperiodicity

Upper: nnsvs, lower: neutrino

download-2

It seems that neutrino performs phrase-level synthesis (separated by rests I guess?). Aperiodicity components are filled with constant values for pause.

F0

download-3

MGC

download-4

  • mgc 0th: ours are shifted. This is not important cause gain of signals are different at training.
  • mgc higher dims: Clearly ours are smoothed. Temporal fluctuations are clearly observed for neutrino, but not for nnsvs.

BAP

download-5

  • Same as mgc, ours are over-smoothed

So what can we do?

So far I am thinking of the following ideas

@taroushirani
Copy link
Contributor

I wonder why higher mgc(s) and bap generated by GAN-based model are over-smoothed. Is there any possibility that MLPG contributes this over-smoothing?

I think phrase-level synthesis of neutrino may be to avoid the shortage of GPU memory.

@r9y9
Copy link
Collaborator Author

r9y9 commented Apr 30, 2022

I suspect MLPG causes over-smoothing. I tried disabling MLPG but it actually did cause quality degradation. In particular, generated F0 became too flat. Maybe it would be worth trying to disable MLPG for spectral features (mgc and bap) and enable it for F0. Also, note that the GAN-based model is still in an experimental stage. I am still struggling to make it work good.

Yes, phrase-level synthesis could be useful to avoid GPU out-of-memory error when using NSF. It would also be useful if we use modulation spectrum based post-filter (search segment-level post-filter in https://ahcweb01.naist.jp/papers/journal/2016/201604_TASLP_Takamichi_1/201604_TASLP_Takamichi_1.paper.pdf)

@taroushirani
Copy link
Contributor

taroushirani commented Apr 30, 2022

Thank you for your rapid resnponse. I'm sorry but I misunderstood that the acoustic model of NNSVS was GAN-based because the graph legends of MGC and BAP(I re-checked the descriptions of samples at soundcloud).

And thank you for the information about modulation spectrum based post-filter. I'll read the paper.

@r9y9
Copy link
Collaborator Author

r9y9 commented Apr 30, 2022

Sorry that's my bad. I didn't include any detailed information in the description. Some notes:

  • baseline: a baseline ResSkipF0FFConvLSTM model
  • gan: my attempt to integrate GAN for training ResSkipF0FFConvLSTM model (not very good at the moment)
  • neutrino neutrino.

For spectrogram/aperiodicity/F0, I used the baseline model. For mgc/bap, I used both the baseline and gan for comparison.

@r9y9
Copy link
Collaborator Author

r9y9 commented Apr 30, 2022

A good news: I've done an initial cut for MS post-filter and here is the spectrogram example:

From top to bottom: gan, gan with MS post-filter, neutrino
download

Findings so far:

  • I got very similar patterns with neutrino by the MS-based post-filter. It's likely that neutrino also uses a similar (or same) post-filtering technique.
  • Over-smoothing can be alleviated by the MS-based post-filter.

An illustration for 50-dim mgc with and without post-filter:

download-1

@r9y9
Copy link
Collaborator Author

r9y9 commented May 22, 2022

ダウンロード (7)

Top: NNSVS (w/ GAN-based post-filter)
Bottom: Neutrino

My bad; previous spectrogram visualization was wrong. I was assuming that neutrino uses the same mgc as ours, but it turned out they use a slightly differnet approach. Specifically,

  • Neutrino: pyworld.code_spectral_envelope (or C++ version of its impl) to convert spectral envelope to mgc
  • nnsvs: pysptk.sp2mc to convert spectral envelope to mgc

I suppose there's no big difference, but we may want to try the same approach as Neutrino to see if it actually makes difference.

@r9y9
Copy link
Collaborator Author

r9y9 commented Nov 30, 2022

#1 (comment)

@r9y9
Copy link
Collaborator Author

r9y9 commented Dec 1, 2022

I'll report a more detailed comparison by Jan 2023. I'll have a long vacation for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants