A more accurate English Model Introduction #15

jerryli27 · 2023-03-12T16:45:29Z

I found the English description a bit hard to understand. The English version missed the crucial part that you use VITS. The 中文简体 intro is easier to understand.

... to extract source audio speech features, and inputs them together with F0 to replace the original text input to achieve the effect of song conversion. ...

The following would be better in my humble opinion:

... to extract source audio speech features, and inputs them together with F0 into VITS instead of the original text input to achieve the effect of song conversion. ...

Thanks!

P.S. Sorry, no 中文简体 input method on this laptop, so I'm using English.

Kerushii · 2023-03-13T07:10:16Z

"The vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved"

Kerushii mentioned this issue Mar 13, 2023

address issues #17

Merged

Miuzarte closed this as completed Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A more accurate English Model Introduction #15

A more accurate English Model Introduction #15

jerryli27 commented Mar 12, 2023

Kerushii commented Mar 13, 2023

A more accurate English Model Introduction #15

A more accurate English Model Introduction #15

Comments

jerryli27 commented Mar 12, 2023

Kerushii commented Mar 13, 2023