-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Hi, thanks for sharing this ASR project. I’m interested in the prompt-based approach for controlling transcription behavior.
I’ve been reading through the code and doing some experiments, and I noticed that the current implementation seems to explicitly support the following prompt variants:
- 语音转写 (auto)
- 语音转写成中文
- 语音转写成英文
- 语音转写成日文
From the inference logic, it looks like the language control is handled via a fixed mapping (e.g., zh / en / ja) that is then converted into a Chinese prompt such as “语音转写成中文 / 英文 / 日文”.
I wanted to ask a clarification question to make sure I’m understanding this correctly:
- Is there a recommended or official language mapping table for prompts? For example, if I write prompts like
- “语音转写成印度尼西亚文”
- “语音转写成印尼文”
are these expected to work, or is the model only guaranteed to behave correctly for the explicitly supported languages?
note. I checked the Fun-ASR technical report, but I couldn’t find an explicit definition of a prompt-level language mapping or recommended language keywords. It seems the current language handling is an implementation choice rather than something specified in the paper.
- How should dialects or non-standard language variants be handled in prompts?
As an example, for Min Nan (閩南語 / Minnan / Hokkien), would you recommend prompting with something like:
- “语音转写成閩南語”
- “语音转写成中文”
- leaving the language as auto?
My assumption is that dialect handling may depend more on the underlying acoustic/language model than on the prompt itself, but I wanted to confirm whether there is a preferred or recommended way to express dialects in prompts.
Apologies if this is already documented somewhere — I may have missed it. Any guidance or pointers would be greatly appreciated.
Thanks again for your excellent work!