Char-Level Training: Great Results on Seen Data, Poor Generalization on Unseen Text #1011

etachi77 · 2025-05-01T18:59:11Z

etachi77
May 1, 2025

Hi, I've trained an Arabic TTS model using your repo at the character level, based on 61 hours of high-quality audio data. The training ran up to 400,000 updates.

The model performs impressively on training sentences — pronunciation reaches around 85% accuracy at the character level. But when I input new/unseen text, the output drops drastically: pronunciation becomes inaccurate and unintelligible.

What can I do to improve generalization on new Arabic text?

Is phoneme-level training a better choice for Arabic?

Would fine-tuning on diverse or augmented text help more than extending training?

Is there a better preprocessing approach (e.g., normalization, diacritics)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Char-Level Training: Great Results on Seen Data, Poor Generalization on Unseen Text #1011

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Char-Level Training: Great Results on Seen Data, Poor Generalization on Unseen Text #1011

Uh oh!

etachi77 May 1, 2025

Replies: 0 comments

etachi77
May 1, 2025