Skip to content
This repository has been archived by the owner on Nov 11, 2023. It is now read-only.

A more accurate English Model Introduction #15

Closed
jerryli27 opened this issue Mar 12, 2023 · 1 comment
Closed

A more accurate English Model Introduction #15

jerryli27 opened this issue Mar 12, 2023 · 1 comment

Comments

@jerryli27
Copy link

I found the English description a bit hard to understand. The English version missed the crucial part that you use VITS. The 中文简体 intro is easier to understand.

... to extract source audio speech features, and inputs them together with F0 to replace the original text input to achieve the effect of song conversion. ...

The following would be better in my humble opinion:

... to extract source audio speech features, and inputs them together with F0 into VITS instead of the original text input to achieve the effect of song conversion. ...

Thanks!

P.S. Sorry, no 中文简体 input method on this laptop, so I'm using English.

@Kerushii
Copy link
Contributor

"The vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants