-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hoping for your result #1
Comments
i update |
thanks, I'll try train it. Have u tested speed between vits and tactron2? Which do u think is better in terms of speed and quality? |
of course, vits is better.it is so amazing |
how about inference speed? |
Do u think it worthy to deploy (or reasonable) ? If so, I can help deploy to TensorRT and make a C++ inference demo, also, tvm also applicable if the speed is good. |
the vits_样本.wav is about 100 Seconds, it spends 800ms of a 1080 GPU to inference. if you need more fast, you can change the decoder from hifigan to mb melgan. |
the technology in vits: vae & normlizing flow & gan & mas & multi task train & adaptability to long sentences etc, i think it is a general frame work of tts in the future. |
@dtx525942103 800ms for 100s, 20-30s need only 200ms, which is tolerrenable, if using TensorRT accelerate it, can be 3x faster average. Seems can be even run on some low level compute devices such as Raspberry pi. |
@jinfagang your wav is 16K? |
you can train ljspeech use official vits first |
Ok, i will try. Thank you. |
I think I have solved the problem, which is caused by the compilation of monotonic align. Thanks for help. |
en en |
Hoping for your result trained vits on Chinese
The text was updated successfully, but these errors were encountered: