Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting for other languages #1

Open
AnnCod opened this issue May 7, 2024 · 5 comments
Open

Adjusting for other languages #1

AnnCod opened this issue May 7, 2024 · 5 comments

Comments

@AnnCod
Copy link

AnnCod commented May 7, 2024

Hi,

Do you think that this solution can be adapted easily to work on different languages than English?

@BakerBunker
Copy link
Owner

BakerBunker commented May 7, 2024

Hi, @AnnCod , I think it can work on non-English languages. We tested this solution on Chinese speeches before, and we got a good result, though the audio quality is not good. I supposed it could because:

  • The feature extractor: WavLM was trained on English corpus, and it is out of distribution when process non-English speech
  • The vocoder: HiFiGAN was trained on English corpus, causing the OOD issue too

If you want to get a decent audio quality, you may try to use pretrained models trained on multilingual corpus like XLS-R and then train a vocoder with your target language.

@AnnCod
Copy link
Author

AnnCod commented May 8, 2024

Thanks for the reply. Is this demo working correctly? I have some errors while trying to run it on colab.

@BakerBunker
Copy link
Owner

Sorry, I accidentally misspelled a variable name, fixed by 8060405

@AnnCod
Copy link
Author

AnnCod commented May 8, 2024

but there's still an error "RuntimeError: The size of tensor a (23866) must match the size of tensor b (214) at non-singleton dimension 2"

@BakerBunker
Copy link
Owner

I can't reproduce this error, would you consider share your colab notebook with these output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants