You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, that's a good question :) According to my testing, if the model is trained on one language, e.g., English, it can be also applied to Chinese but the performance is not as decent as English, e.g., the original content of Chinese utterances may not be well preserved. The key of successful conversion for unseen languages is that the VQCPC-based content encoder can accurately discover acoustic units related to underlying linguistic content or pronunciations of speech. Different languages may share some similar pronunciations, but there're still many different articulation units among different languages. Therefore, if the model is trained on multiple languages (i.e., letting the content encoder to discover more articulation units), it should have better generalization ability to unseen languages.
Hi I wonder if this method can be used for any language, without retraining ? thank you
The text was updated successfully, but these errors were encountered: