Natural speech training #4

leon2milan · 2022-06-13T03:04:18Z

In paper, they use Soft Dynamic Time Warping in KL loss. In your code, I didn't find it. So, is the code in the progress? or any other reason?

The text was updated successfully, but these errors were encountered:

dunky11 · 2022-06-14T06:00:26Z

I tried implementing naturalspeech here. It pretty much contains it's architecture with a different text encoder. You can swap out the text encoder with the one from VITS, if you want to stay close to the paper. You also want to make sure that the softmax is calculated over the phoneme dimension in the Durator, I don't know if I corrected this.

The most difficult part of the paper is it's warped KL-loss. I implemented it in the most straightforward way which includes creating three matrices of shape (batch_size, n_channels, n_phones, n_spec_frames), one for x, one for the means, and one for the standard deviations. Obviously those matrices wen't huge and caused CUDA out of memory errors.
So I wrote the authors and they explained that they looped over the channels in order to calculate the DTW losses, which probably involved writing some low level stuff.
I'm bad at low level stuff so hopefully someone else will implement natural speech. You can also check here, I listed some stuff that the authors of the paper told me which may not be obvious from the paper.

dunky11 · 2022-06-14T06:03:29Z

And you obviously also want to swap out the decoder with HiFiGAN, called Generator in the file.

leon2milan · 2022-07-11T07:06:09Z

After a long time of trying, the model has never been able to converge. I want to try the effect of UnivNet. But my server has no GUI, how should I train the model? Is there any relevant documentation?

rishikksh20 · 2022-08-24T20:54:29Z

@dunky11 Have look at this Soft-DTW implementation https://github.com/google-research/soft-dtw-divergences ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Natural speech training #4

Natural speech training #4

leon2milan commented Jun 13, 2022

dunky11 commented Jun 14, 2022 •

edited

Loading

dunky11 commented Jun 14, 2022

leon2milan commented Jul 11, 2022

rishikksh20 commented Aug 24, 2022

Natural speech training #4

Natural speech training #4

Comments

leon2milan commented Jun 13, 2022

dunky11 commented Jun 14, 2022 • edited Loading

dunky11 commented Jun 14, 2022

leon2milan commented Jul 11, 2022

rishikksh20 commented Aug 24, 2022

dunky11 commented Jun 14, 2022 •

edited

Loading