Adjusting for other languages #1

AnnCod · 2024-05-07T12:49:55Z

Hi,

Do you think that this solution can be adapted easily to work on different languages than English?

BakerBunker · 2024-05-07T14:20:03Z

Hi, @AnnCod , I think it can work on non-English languages. We tested this solution on Chinese speeches before, and we got a good result, though the audio quality is not good. I supposed it could because:

The feature extractor: WavLM was trained on English corpus, and it is out of distribution when process non-English speech
The vocoder: HiFiGAN was trained on English corpus, causing the OOD issue too

If you want to get a decent audio quality, you may try to use pretrained models trained on multilingual corpus like XLS-R and then train a vocoder with your target language.

AnnCod · 2024-05-08T06:57:28Z

Thanks for the reply. Is this demo working correctly? I have some errors while trying to run it on colab.

BakerBunker · 2024-05-08T07:22:37Z

Sorry, I accidentally misspelled a variable name, fixed by 8060405

AnnCod · 2024-05-08T08:50:59Z

but there's still an error "RuntimeError: The size of tensor a (23866) must match the size of tensor b (214) at non-singleton dimension 2"

BakerBunker · 2024-05-08T08:57:22Z

I can't reproduce this error, would you consider share your colab notebook with these output?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjusting for other languages #1

Adjusting for other languages #1

AnnCod commented May 7, 2024

BakerBunker commented May 7, 2024 •

edited

Loading

AnnCod commented May 8, 2024

BakerBunker commented May 8, 2024

AnnCod commented May 8, 2024

BakerBunker commented May 8, 2024

Adjusting for other languages #1

Adjusting for other languages #1

Comments

AnnCod commented May 7, 2024

BakerBunker commented May 7, 2024 • edited Loading

AnnCod commented May 8, 2024

BakerBunker commented May 8, 2024

AnnCod commented May 8, 2024

BakerBunker commented May 8, 2024

BakerBunker commented May 7, 2024 •

edited

Loading