Skip to content

feat(asr): 接入 Qwen3-ASR / SiliconFlow / GLM-ASR / Groq 作为可选服务商#213

Merged
appergb merged 5 commits into
mainfrom
feat/asr-providers-presets
May 3, 2026
Merged

feat(asr): 接入 Qwen3-ASR / SiliconFlow / GLM-ASR / Groq 作为可选服务商#213
appergb merged 5 commits into
mainfrom
feat/asr-providers-presets

Conversation

@appergb
Copy link
Copy Markdown
Collaborator

@appergb appergb commented May 3, 2026

User description

Closes #212

Summary

  • 设置页 ASR 服务商下拉新增 4 项:通义千问 Qwen3-ASR(DashScope 兼容)、硅基流动 SenseVoice、智谱 GLM-ASR、Groq Whisper-large-v3。
  • 这些厂商都暴露 OpenAI 兼容的 /audio/transcriptions,复用现有 WhisperBatchASR不新增 Rust 客户端
  • 切换 preset 时自动预填 asr.endpointasr.model,匹配 LLM preset 同款交互。

改动点

后端(surgical)

  • coordinator.rs:抽出 is_whisper_compatible_provider(id),把两处 if active_asr == "whisper" 替换为该 helper;新增厂商一处加 id 即可。
  • 凭据缺失错误文案从 "请先在设置中填写 Whisper ASR API Key" 改为更通用的 "请先在设置中填写 ASR 服务商 API Key"
  • QA 路径不动 —— qa_hotkey 强制 Volcengine 流式(coordinator.rs:1760 注释要求低延迟)。

前端

  • Settings.tsx::ASR_PRESETS 扩 4 项;每项带 baseUrl + model 默认值。
  • onAsrProviderChange 在切到非 volcengine preset 时调用 setCredential('asr.endpoint', ...)setCredential('asr.model', ...)
  • 把硬编码的 defaultValue="https://api.openai.com/v1" / placeholder="whisper-1" 替换为按当前 preset 渲染。
  • i18n(zh-CN.ts / en.ts)补 asrQwen / asrZhipu / asrGroq 三条;asrSiliconflow 复用了 issue fix: 设置页 SiliconFlow ASR 选项无后端实现,必然失败 #58 残留的旧 key。

不在范围(V2 候选,单独提)

验证

  • cargo check --manifest-path src-tauri/Cargo.toml 通过(warning 全是已存在的)
  • npm run build(tsc + vite)通过
  • CLAUDE.md 静默 fallback 契约保留:凭据缺失走 read_whisper_credentials 默认值,不抛硬错

用户侧手动自测

我没有上述四家的 API key,请合并后或本地 checkout 时按需手动验证:

  • 切到 SiliconFlow(自备 key)→ 录一段中文 → capsule → polish → insertion 通过
  • 切到 Qwen3-ASR-Flash → 同上 (注:DashScope 兼容模式是否真支持 /audio/transcriptions 我没在 issue 调研里 curl 实测,如不通则降级到 V2 单独适配,其余 3 家不受影响)
  • 切到 GLM-ASR / Groq → 同上
  • 凭据空时点开始 → 弹 "请先在设置中填写 ASR 服务商 API Key",不崩溃

Test plan

  • 启动 npm run tauri dev,打开设置页,切换四个新 preset,确认 base URL / model 自动预填
  • 切回 volcengine preset,确认显示 App ID / Access Token / Resource ID 三栏(旧路径未受影响)
  • 真实 ASR 调用见上方「用户侧手动自测」

PR Type

Enhancement, Bug fix


Description

  • Add SiliconFlow, Zhipu, Groq ASR presets

  • Auto-fill endpoint/model on provider switch

  • Fix QA hotkey using wrong credentials check

  • Update i18n for new ASR providers


Diagram Walkthrough

flowchart LR
    A["New ASR presets (SiliconFlow, Zhipu, Groq)"] --> B["is_whisper_compatible_provider()"]
    B --> C["WhisperBatchASR"]
    A --> D["Settings.tsx auto-fills endpoint/model"]
    E["QA hotkey path"] --> F["ensure_qa_volcengine_credentials()"]
    F --> G["Volcengine stream ASR"]
Loading

File Walkthrough

Relevant files
Enhancement
coordinator.rs
Extend ASR routing for multiple providers and fix QA check

openless-all/app/src-tauri/src/coordinator.rs

  • Add is_whisper_compatible_provider() to route OpenAI-compatible ASR
    providers
  • Replace active_asr == "whisper" checks with the new helper
  • Add ensure_qa_volcengine_credentials() to fix QA hotkey using wrong
    credentials
  • Update missing API key error message to generic ASR wording
+31/-4   
Settings.tsx
Add ASR preset selection with pre-filled configuration     

openless-all/app/src/pages/Settings.tsx

  • Extend ASR_PRESETS with SiliconFlow, Zhipu, Groq entries including
    baseUrl and model
  • Implement auto-fill of asr.endpoint and asr.model on provider switch
  • Update input placeholders and defaults based on current ASR preset
  • Re-enable SiliconFlow option in UI after backend support
+24/-6   
Documentation
en.ts
Add i18n for new ASR presets                                                         

openless-all/app/src/i18n/en.ts

  • Add English translations for Zhipu GLM-ASR and Groq Whisper-large-v3
    presets
+2/-0     
zh-CN.ts
Add i18n for new ASR presets                                                         

openless-all/app/src/i18n/zh-CN.ts

  • Add Chinese translations for Zhipu GLM-ASR and Groq Whisper-large-v3
    presets
+2/-0     

复用现有 OpenAI 兼容 `/audio/transcriptions` 通道(WhisperBatchASR),无需
新增 Rust 客户端。把 `coordinator.rs` 中的 whisper 分支条件改为
`is_whisper_compatible_provider(id)`,新增厂商时一处即可扩展。

设置页 ASR 下拉新增 4 项预设,切换时自动预填 baseUrl 与 model,避免用户
忘填模型 ID 必然踩坑。

Closes #212
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94d6a7199c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fn ensure_asr_credentials() -> Result<(), String> {
let active_asr = CredentialsVault::get_active_asr();
if active_asr == "whisper" {
if is_whisper_compatible_provider(&active_asr) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate Volcengine creds for QA regardless active ASR preset

ensure_asr_credentials() now treats qwen/siliconflow/zhipu/groq as Whisper-compatible and only checks asr.api_key, but begin_qa_session() still hardcodes Volcengine streaming ASR. This means QA can fail in two user-facing ways after this change: selecting a new preset without setting asr.api_key blocks QA even when Volcengine credentials are valid, and having asr.api_key set can let QA proceed until open_session() fails with Volcengine credential errors. The credential gate for QA should validate Volcengine fields directly instead of branching on active ASR provider.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修复,commit 0a4d33c

按反馈把 QA 凭据校验从 ensure_asr_credentials 解耦,新增 ensure_qa_volcengine_credentials() 直接校验 Volcengine 字段。dictation 路径继续按 active_asr 分支走,不受影响。两类回归路径都已堵上:

  • active_asr=qwen 但 asr.api_key 空、火山凭据齐 → QA 现在通过
  • active_asr=qwen 且 asr.api_key 已填、火山凭据空 → QA 现在直接拿到正确错误,不会再走到 open_session() 才崩

Codex P1 (PR #213): 主听写路径分支化 `ensure_asr_credentials` 后,QA 路径
仍然复用同一函数,导致两类用户可见 bug:

1. 用户选了 qwen/siliconflow/zhipu/groq 但没填 `asr.api_key` → QA 报
   "请先填写 ASR 服务商 API Key",可火山凭据明明是齐的。
2. 反之 `asr.api_key` 已填、火山凭据为空 → QA 通过 ensure,进 open_session
   再以 Volcengine 凭据错误失败,更难诊断。

修法是给 QA 加专用 `ensure_qa_volcengine_credentials()`,只看 Volcengine 字段;
dictation 路径继续走 `ensure_asr_credentials` 不变。
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

PR Reviewer Guide 🔍

(Review updated until commit d048107)

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

212 - Partially compliant

Compliant requirements:

  • Added SiliconFlow, Zhipu GLM-ASR, and Groq presets.
  • Kept the Whisper preset.
  • Auto-fills asr.endpoint and asr.model when switching presets.
  • Added new i18n entries for the new ASR providers.
  • Reuses WhisperBatchASR for the OpenAI-compatible providers instead of adding a new Rust client.
  • Keeps the QA/Volcengine streaming path separate and unchanged.

Non-compliant requirements:

  • Qwen3-ASR preset is not added to the settings list.
  • Missing ASR API key still returns a hard error instead of a silent fallback.
  • No test or verification evidence is included for the required provider-switching scenarios.

Requires further human verification:

  • The required end-to-end provider checks (SiliconFlow, Qwen, GLM, credential-fallback behavior, and Volcengine/Whisper switching) cannot be confirmed from code alone.

58 - PR Code Verified

Compliant requirements:

  • SiliconFlow now routes through the OpenAI-compatible batch ASR path.
  • The SiliconFlow preset is present in the settings UI.

Requires further human verification:

  • None
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Missing Qwen

The preset list still does not include a qwen option, so users cannot select Qwen3-ASR from the settings page even though that is part of the requested feature set.

const ASR_PRESETS = [
  { id: 'volcengine',   nameKey: 'asrVolcengine',   baseUrl: '',                                              model: ''                              },
  { id: 'siliconflow',  nameKey: 'asrSiliconflow',  baseUrl: 'https://api.siliconflow.cn/v1',                  model: 'FunAudioLLM/SenseVoiceSmall' },
  { id: 'zhipu',        nameKey: 'asrZhipu',        baseUrl: 'https://open.bigmodel.cn/api/paas/v4',           model: 'glm-asr-2512'                },
  { id: 'groq',         nameKey: 'asrGroq',         baseUrl: 'https://api.groq.com/openai/v1',                 model: 'whisper-large-v3-turbo'      },
  { id: 'whisper',      nameKey: 'asrWhisper',      baseUrl: 'https://api.openai.com/v1',                      model: 'whisper-1'                   },
Hard Failure

Selecting a Whisper-compatible provider without an API key still returns an error and aborts session start. The ticket requires missing credentials to fall back silently, so first-time users who have not entered a key cannot start dictation at all.

fn ensure_asr_credentials() -> Result<(), String> {
    let active_asr = CredentialsVault::get_active_asr();
    if is_whisper_compatible_provider(&active_asr) {
        let api_key = CredentialsVault::get(CredentialAccount::AsrApiKey)
            .ok()
            .flatten()
            .unwrap_or_default();
        if api_key.trim().is_empty() {
            return Err("请先在设置中填写 ASR 服务商 API Key".to_string());
        }
        return Ok(());

@appergb
Copy link
Copy Markdown
Collaborator Author

appergb commented May 3, 2026

谢谢 reviewer guide。逐条回应 focus area:

Fallback Gap("non-compliant: missing creds 应该静默 fallback"):这条是 advisory,不是 PR 引入的回归。ensure_asr_credentials 在 main 上对 whisper 分支早就返回 Err("请先在设置中填写 Whisper ASR API Key"),对 Volcengine 分支同样硬错——我这次只是把同一段语义扩到了 4 个新 preset,没改"硬错 vs 静默"的策略。

CLAUDE.md 里描述的"silent fallback"指的是 运行时凭据缺失 的下游行为(缺 Ark → 插原文;缺 Volc → mock placeholder),不是 UI 层的预校验。是否把 ensure_asr_credentials 整体改成静默通过、让缺凭据走运行时 mock,是一个独立的 UX trade-off:当前硬错是为了显式提示用户去填凭据,改静默后用户会看到录了一段但无文字插入、不知道原因。这超出本 PR 范围,建议单独提 issue 讨论。

我也会把 issue #212 验收清单里"凭据缺失时不抛硬错"那条改成"延续既有硬错语义",避免误导。

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Persistent review updated to latest commit 0a4d33c

PR-Agent #213 reviewer guide: 切到新 preset 后用户若手动清空
asr.endpoint,read_whisper_credentials 会返回空字符串,
WhisperBatchASR 拼出相对路径 `/audio/transcriptions`,要等用户录完
才以 reqwest 错误失败。model 同理(兜底成 "whisper-1" 对 SiliconFlow
等是无效模型名,runtime 才 400)。

在 ensure_asr_credentials 的兼容分支里加前置校验,把错误时机提前到
按下热键的瞬间,复用与 api_key 同款 hard-error 语义。
@appergb
Copy link
Copy Markdown
Collaborator Author

appergb commented May 3, 2026

已修复 reviewer guide 提到的 endpoint/model 校验漏洞,commit 3416919

ensure_asr_credentials 兼容分支现在依次校验 asr.api_keyasr.endpointasr.model,缺哪个直接在按下热键时弹错误胶囊指向具体字段,避免:

  • 空 endpoint → reqwest 拼相对路径 /audio/transcriptions runtime 失败
  • 空 model → 兜底成 whisper-1 发到 SiliconFlow / GLM 等 → 400 model not found

不动 read_whisper_credentials 的 fallback 默认值(KISS:fallback 仍然有,预校验只是把"录完才知道错"提前到"按下时就知道错")。

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Persistent review updated to latest commit 3416919

@H-Chris233
Copy link
Copy Markdown
Collaborator

仍需进行处理 @appergb

PR-Agent 在 commit 3416919 后改口反对前置校验("breaks silent-fallback
contract, removes the previous backend defaulting behavior")。维护者
@H-Chris233 同样要求处理。

回滚到 0a4d33c 状态:兼容分支只校验 api_key(与 main 上既有 whisper
分支一致),endpoint/model 缺失走 read_whisper_credentials 的既有
fallback(base_url 空字符串、model 兜底 "whisper-1"),让 runtime
失败信号在 coordinator 既有错误胶囊路径里出。

api_key 校验保留,因为这段不是本 PR 引入——main 上既有,超出 V1 范围。
@appergb
Copy link
Copy Markdown
Collaborator Author

appergb commented May 3, 2026

@H-Chris233 已回滚 endpoint/model 前置校验,commit 9809b19

决策依据:

  • PR-Agent 在 0a4d33c 时说"endpoint/model 没校验"是 non-compliant;3416919 加了校验后又改口说"破坏 silent-fallback 契约"。两轮自相矛盾。
  • 仓库实际行为:grep 全仓库没有 ASR mock placeholder 实现——CLAUDE.md 描述的"Missing Volcengine creds → mock pipeline copies a placeholder"在代码里是陈旧注释,volcengine.rs 的 fallback 仅是 partial transcript 续命。
  • 既然没有真正的 silent fallback 可走,最 surgical 的处理是:去掉本 PR 引入的 endpoint/model 校验(PR-Agent 当前最新意见反对),保留既有 api_key 校验(main 上本就有,不属本 PR 范围)。

实际效果:

  • asr.api_key 缺失 → 按下时弹错误胶囊(与 main 上 whisper 分支语义一致)
  • asr.endpoint / asr.model 缺失 → 录完后由 WhisperBatchASR.transcribeanyhow::bail 走 coordinator 既有错误胶囊路径

如果项目希望真正落实 CLAUDE.md 的 silent-fallback 契约(包括缺凭据时给 placeholder transcript),那是跨多文件的语义改动,建议单独提 follow-up issue 讨论范围。

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9809b19eab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/// OpenAI 兼容的 `/audio/transcriptions`,统一走 `WhisperBatchASR`。
/// 新增 OpenAI 兼容 ASR 时只需在这里加一项。
fn is_whisper_compatible_provider(id: &str) -> bool {
matches!(id, "whisper" | "qwen" | "siliconflow" | "zhipu" | "groq")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Route Qwen ASR through a Qwen-specific client path

Including qwen in is_whisper_compatible_provider sends Qwen requests through WhisperBatchASR, but that client always uploads a WAV file to .../audio/transcriptions (asr/whisper.rs). DashScope’s Qwen3-ASR compatible mode uses a /chat/completions contract instead of Whisper-style multipart transcription, so selecting the new Qwen preset will consistently fail at transcription time even with valid credentials.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Persistent review updated to latest commit 9809b19

reviewer 指出 DashScope 的 Qwen3-ASR-Flash 不走 OpenAI Whisper 的
multipart `/audio/transcriptions`,而是 MultiModalConversation 协议
(messages=[{content:[{audio:...}]}])。核实两个独立公开实现
(xinnan-tech/xiaozhi-esp32-server、jianchang512/pyvideotrans)确认
属实——把 qwen 强行塞进 WhisperBatchASR 必然 transcription 失败。

按 PR 描述里"待复核风险 → 不通则降级到 V2"的预案:
- 从 ASR_PRESETS 删除 qwen 项
- 从 is_whisper_compatible_provider 拿掉 "qwen"
- i18n 删掉 asrQwen 文案
- 注释里点名 Qwen 留给 V2 的原因

剩下 SiliconFlow / GLM-ASR / Groq / OpenAI Whisper 四家都有公开
OpenAI 兼容 /audio/transcriptions 端点,继续走原通道。
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@appergb
Copy link
Copy Markdown
Collaborator Author

appergb commented May 3, 2026

确认 reviewer 说得对,已把 Qwen 从 V1 拿掉,commit d048107

核实(决策必须有据,不凭假设):

  • xinnan-tech/xiaozhi-esp32-server qwen3_asr_flash.pydashscope.MultiModalConversation.call(model="qwen3-asr-flash", messages=[{content:[{audio:path}]}])
  • jianchang512/pyvideotrans _qwen3asr.py:同样走 dashscope.MultiModalConversation
  • DashScope 的 compatible-mode/v1 主要是 chat 文本兼容,ASR 走单独多模态通道

WhisperBatchASR 上传 multipart WAV 到 /audio/transcriptions,跟 Qwen 的 messages 协议根本不是一个形状。强行复用必然 400 / 404。

改动

  • is_whisper_compatible_provider 拿掉 "qwen"
  • ASR_PRESETS 拿掉 qwen 项
  • i18n 拿掉 asrQwen 文案
  • 代码注释点名 Qwen 留给 V2(需要 dashscope 多模态对话客户端)

V1 剩余:SiliconFlow / GLM-ASR / Groq / OpenAI Whisper —— 这四家都有公开的 OpenAI /audio/transcriptions 兼容端点,继续走原通道。

cargo check + npm run build 全过。Qwen 单独提 V2 follow-up issue。

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Persistent review updated to latest commit d048107

@appergb appergb merged commit 98488cd into main May 3, 2026
2 checks passed
@appergb appergb deleted the feat/asr-providers-presets branch May 6, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(asr): 接入通义千问/SiliconFlow/智谱 GLM 与国外常见 ASR 作为可选项

2 participants