Enabling Multilingual Support #87

ArshamBz · 2025-09-28T06:45:47Z

ArshamBz
Sep 28, 2025

We’d like to explore how to make piper models multilingual so they can handle both English and Farsi seamlessly. This discussion aims to collect ideas, best practices, and potential technical approaches for adding multilingual capabilities.

Some open points to consider:

Architecture design: Should we train a single unified model for multiple languages, or separate models with a shared front-end?

Data requirements: What datasets are available for Farsi? How can we balance the amount of English and Farsi data?

Tokenization strategies: Is it better to use a shared tokenizer (e.g., SentencePiece/BPE covering both scripts) or language-specific ones?

Inference pipeline: Should the system auto-detect the language, or require explicit language selection from the user?

Evaluation metrics: How do we ensure quality across both languages (e.g., WER, BLEU, MOS for speech/text outputs)?

Deployment considerations: Performance implications of supporting multiple languages in a single model.

We invite contributions, suggestions, and references to prior work or experiments. If anyone has experience building multilingual systems (especially with low-resource languages like Farsi), your input would be extremely valuable.

jackusay · 2025-11-03T02:48:42Z

jackusay
Nov 3, 2025

I am not sure this info helpful or not.
In kokoro, it separates a sentence into different lang, then fallbacks everything into IPA in G2P layer.
kokoro: text > G2P > vits-like > sound
G2P: misaki <--- espeak

G2P:

        ┌──────────────────────────────────┐
        │            Misaki G2P            │
        │         (zh.ZHG2P class)         │
        └──────────────────────────────────┘
                     │
                     ▼
        ┌──────────────────────────┐
        │ Step 1: Chinese/English Segmentation │ ← Detect character type (Chinese / English / punctuation)
        └──────────────────────────┘
                     │
     ┌───────────────┼──────────────────────┐
     ▼                                   ▼
┌──────────────┐                 ┌────────────────┐
│ Chinese segments │              │ English segments │
│ (handled by pypinyin) │         │ (handled by en_callable) │
└──────────────┘                 └────────────────┘
     │                                   │
     ▼                                   ▼
 [zh phonemes]                    [en phonemes]
     │                                   │
     └──────────────┬────────────────────┘
                    ▼
        ┌──────────────────────────┐
        │ Step 3: Merge & Normalize │
        │ (convert symbols → pauses, spaces, align phoneme sets) │
        └──────────────────────────┘
                    │
                    ▼
          🗣️ Unified phoneme string

0 replies

1PD-IS-NO-1 · 2026-02-27T05:34:46Z

1PD-IS-NO-1
Feb 27, 2026

Your info. is fully correct dear. this approach will work to add multilingual support in any phoneme based tts like styletts2 & piper both.
But at step 1 either you have to use any language classifier& chunker or You have to use regular expression.

for regular expression you text should be like this in the case of if you want to add language switching styletts2 or piper or kokoro.
text = (hindi)["fully hindi text here"] (chiense)["fully chienese text here"] so using regular expression during training you can do language detection & chunking and after of same phoneme workflow you can apply to add efforless language switching in styletts2 & piper.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling Multilingual Support #87

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Enabling Multilingual Support #87

Uh oh!

ArshamBz Sep 28, 2025

Replies: 2 comments

Uh oh!

jackusay Nov 3, 2025

Uh oh!

1PD-IS-NO-1 Feb 27, 2026

ArshamBz
Sep 28, 2025

jackusay
Nov 3, 2025

1PD-IS-NO-1
Feb 27, 2026