Question about supported language / dialect mapping for prompt-based ASR

Hi, thanks for sharing this ASR project. I’m interested in the prompt-based approach for controlling transcription behavior.

I’ve been reading through the code and doing some experiments, and I noticed that the current implementation seems to explicitly support the following prompt variants:

- 语音转写 (auto)
- 语音转写成中文
- 语音转写成英文
- 语音转写成日文

From the inference logic, it looks like the language control is handled via a fixed mapping (e.g., zh / en / ja) that is then converted into a Chinese prompt such as “语音转写成中文 / 英文 / 日文”. 

I wanted to ask a clarification question to make sure I’m understanding this correctly:

1. Is there a recommended or official language mapping table for prompts? For example, if I write prompts like

- “语音转写成印度尼西亚文”
- “语音转写成印尼文”
are these expected to work, or is the model only guaranteed to behave correctly for the explicitly supported languages?

note. I checked the [Fun-ASR technical report](https://arxiv.org/pdf/2509.12508), but I couldn’t find an explicit definition of a prompt-level language mapping or recommended language keywords. It seems the current language handling is an implementation choice rather than something specified in the paper.


2. How should dialects or non-standard language variants be handled in prompts?
As an example, for Min Nan (閩南語 / Minnan / Hokkien), would you recommend prompting with something like:

- “语音转写成閩南語”
- “语音转写成中文”
- leaving the language as auto?

My assumption is that dialect handling may depend more on the underlying acoustic/language model than on the prompt itself, but I wanted to confirm whether there is a preferred or recommended way to express dialects in prompts.

Apologies if this is already documented somewhere — I may have missed it. Any guidance or pointers would be greatly appreciated.

Thanks again for your excellent work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about supported language / dialect mapping for prompt-based ASR #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about supported language / dialect mapping for prompt-based ASR #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions