feat(asr): 接入 Qwen3-ASR / SiliconFlow / GLM-ASR / Groq 作为可选服务商#213
Conversation
复用现有 OpenAI 兼容 `/audio/transcriptions` 通道(WhisperBatchASR),无需 新增 Rust 客户端。把 `coordinator.rs` 中的 whisper 分支条件改为 `is_whisper_compatible_provider(id)`,新增厂商时一处即可扩展。 设置页 ASR 下拉新增 4 项预设,切换时自动预填 baseUrl 与 model,避免用户 忘填模型 ID 必然踩坑。 Closes #212
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94d6a7199c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn ensure_asr_credentials() -> Result<(), String> { | ||
| let active_asr = CredentialsVault::get_active_asr(); | ||
| if active_asr == "whisper" { | ||
| if is_whisper_compatible_provider(&active_asr) { |
There was a problem hiding this comment.
Validate Volcengine creds for QA regardless active ASR preset
ensure_asr_credentials() now treats qwen/siliconflow/zhipu/groq as Whisper-compatible and only checks asr.api_key, but begin_qa_session() still hardcodes Volcengine streaming ASR. This means QA can fail in two user-facing ways after this change: selecting a new preset without setting asr.api_key blocks QA even when Volcengine credentials are valid, and having asr.api_key set can let QA proceed until open_session() fails with Volcengine credential errors. The credential gate for QA should validate Volcengine fields directly instead of branching on active ASR provider.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
已修复,commit 0a4d33c。
按反馈把 QA 凭据校验从 ensure_asr_credentials 解耦,新增 ensure_qa_volcengine_credentials() 直接校验 Volcengine 字段。dictation 路径继续按 active_asr 分支走,不受影响。两类回归路径都已堵上:
- active_asr=qwen 但
asr.api_key空、火山凭据齐 → QA 现在通过 - active_asr=qwen 且
asr.api_key已填、火山凭据空 → QA 现在直接拿到正确错误,不会再走到open_session()才崩
Codex P1 (PR #213): 主听写路径分支化 `ensure_asr_credentials` 后,QA 路径 仍然复用同一函数,导致两类用户可见 bug: 1. 用户选了 qwen/siliconflow/zhipu/groq 但没填 `asr.api_key` → QA 报 "请先填写 ASR 服务商 API Key",可火山凭据明明是齐的。 2. 反之 `asr.api_key` 已填、火山凭据为空 → QA 通过 ensure,进 open_session 再以 Volcengine 凭据错误失败,更难诊断。 修法是给 QA 加专用 `ensure_qa_volcengine_credentials()`,只看 Volcengine 字段; dictation 路径继续走 `ensure_asr_credentials` 不变。
PR Reviewer Guide 🔍(Review updated until commit d048107)Here are some key observations to aid the review process:
|
|
谢谢 reviewer guide。逐条回应 focus area: Fallback Gap("non-compliant: missing creds 应该静默 fallback"):这条是 advisory,不是 PR 引入的回归。 CLAUDE.md 里描述的"silent fallback"指的是 运行时凭据缺失 的下游行为(缺 Ark → 插原文;缺 Volc → mock placeholder),不是 UI 层的预校验。是否把 我也会把 issue #212 验收清单里"凭据缺失时不抛硬错"那条改成"延续既有硬错语义",避免误导。 |
|
Persistent review updated to latest commit 0a4d33c |
PR-Agent #213 reviewer guide: 切到新 preset 后用户若手动清空 asr.endpoint,read_whisper_credentials 会返回空字符串, WhisperBatchASR 拼出相对路径 `/audio/transcriptions`,要等用户录完 才以 reqwest 错误失败。model 同理(兜底成 "whisper-1" 对 SiliconFlow 等是无效模型名,runtime 才 400)。 在 ensure_asr_credentials 的兼容分支里加前置校验,把错误时机提前到 按下热键的瞬间,复用与 api_key 同款 hard-error 语义。
|
已修复 reviewer guide 提到的 endpoint/model 校验漏洞,commit 3416919。
不动 |
|
Persistent review updated to latest commit 3416919 |
|
仍需进行处理 @appergb |
PR-Agent 在 commit 3416919 后改口反对前置校验("breaks silent-fallback contract, removes the previous backend defaulting behavior")。维护者 @H-Chris233 同样要求处理。 回滚到 0a4d33c 状态:兼容分支只校验 api_key(与 main 上既有 whisper 分支一致),endpoint/model 缺失走 read_whisper_credentials 的既有 fallback(base_url 空字符串、model 兜底 "whisper-1"),让 runtime 失败信号在 coordinator 既有错误胶囊路径里出。 api_key 校验保留,因为这段不是本 PR 引入——main 上既有,超出 V1 范围。
|
@H-Chris233 已回滚 endpoint/model 前置校验,commit 9809b19。 决策依据:
实际效果:
如果项目希望真正落实 CLAUDE.md 的 silent-fallback 契约(包括缺凭据时给 placeholder transcript),那是跨多文件的语义改动,建议单独提 follow-up issue 讨论范围。 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9809b19eab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// OpenAI 兼容的 `/audio/transcriptions`,统一走 `WhisperBatchASR`。 | ||
| /// 新增 OpenAI 兼容 ASR 时只需在这里加一项。 | ||
| fn is_whisper_compatible_provider(id: &str) -> bool { | ||
| matches!(id, "whisper" | "qwen" | "siliconflow" | "zhipu" | "groq") |
There was a problem hiding this comment.
Route Qwen ASR through a Qwen-specific client path
Including qwen in is_whisper_compatible_provider sends Qwen requests through WhisperBatchASR, but that client always uploads a WAV file to .../audio/transcriptions (asr/whisper.rs). DashScope’s Qwen3-ASR compatible mode uses a /chat/completions contract instead of Whisper-style multipart transcription, so selecting the new Qwen preset will consistently fail at transcription time even with valid credentials.
Useful? React with 👍 / 👎.
|
Persistent review updated to latest commit 9809b19 |
reviewer 指出 DashScope 的 Qwen3-ASR-Flash 不走 OpenAI Whisper 的
multipart `/audio/transcriptions`,而是 MultiModalConversation 协议
(messages=[{content:[{audio:...}]}])。核实两个独立公开实现
(xinnan-tech/xiaozhi-esp32-server、jianchang512/pyvideotrans)确认
属实——把 qwen 强行塞进 WhisperBatchASR 必然 transcription 失败。
按 PR 描述里"待复核风险 → 不通则降级到 V2"的预案:
- 从 ASR_PRESETS 删除 qwen 项
- 从 is_whisper_compatible_provider 拿掉 "qwen"
- i18n 删掉 asrQwen 文案
- 注释里点名 Qwen 留给 V2 的原因
剩下 SiliconFlow / GLM-ASR / Groq / OpenAI Whisper 四家都有公开
OpenAI 兼容 /audio/transcriptions 端点,继续走原通道。
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
确认 reviewer 说得对,已把 Qwen 从 V1 拿掉,commit d048107。 核实(决策必须有据,不凭假设):
WhisperBatchASR 上传 multipart WAV 到 改动:
V1 剩余:SiliconFlow / GLM-ASR / Groq / OpenAI Whisper —— 这四家都有公开的 OpenAI
|
|
Persistent review updated to latest commit d048107 |
User description
Closes #212
Summary
/audio/transcriptions,复用现有WhisperBatchASR,不新增 Rust 客户端。asr.endpoint与asr.model,匹配 LLM preset 同款交互。改动点
后端(surgical)
coordinator.rs:抽出is_whisper_compatible_provider(id),把两处if active_asr == "whisper"替换为该 helper;新增厂商一处加 id 即可。"请先在设置中填写 Whisper ASR API Key"改为更通用的"请先在设置中填写 ASR 服务商 API Key"。qa_hotkey强制 Volcengine 流式(coordinator.rs:1760注释要求低延迟)。前端
Settings.tsx::ASR_PRESETS扩 4 项;每项带baseUrl+model默认值。onAsrProviderChange在切到非 volcengine preset 时调用setCredential('asr.endpoint', ...)和setCredential('asr.model', ...)。defaultValue="https://api.openai.com/v1"/placeholder="whisper-1"替换为按当前 preset 渲染。zh-CN.ts/en.ts)补asrQwen/asrZhipu/asrGroq三条;asrSiliconflow复用了 issue fix: 设置页 SiliconFlow ASR 选项无后端实现,必然失败 #58 残留的旧 key。不在范围(V2 候选,单独提)
验证
cargo check --manifest-path src-tauri/Cargo.toml通过(warning 全是已存在的)npm run build(tsc + vite)通过read_whisper_credentials默认值,不抛硬错用户侧手动自测
/audio/transcriptions我没在 issue 调研里 curl 实测,如不通则降级到 V2 单独适配,其余 3 家不受影响)"请先在设置中填写 ASR 服务商 API Key",不崩溃Test plan
npm run tauri dev,打开设置页,切换四个新 preset,确认 base URL / model 自动预填PR Type
Enhancement, Bug fix
Description
Add SiliconFlow, Zhipu, Groq ASR presets
Auto-fill endpoint/model on provider switch
Fix QA hotkey using wrong credentials check
Update i18n for new ASR providers
Diagram Walkthrough
flowchart LR A["New ASR presets (SiliconFlow, Zhipu, Groq)"] --> B["is_whisper_compatible_provider()"] B --> C["WhisperBatchASR"] A --> D["Settings.tsx auto-fills endpoint/model"] E["QA hotkey path"] --> F["ensure_qa_volcengine_credentials()"] F --> G["Volcengine stream ASR"]File Walkthrough
coordinator.rs
Extend ASR routing for multiple providers and fix QA checkopenless-all/app/src-tauri/src/coordinator.rs
is_whisper_compatible_provider()to route OpenAI-compatible ASRproviders
active_asr == "whisper"checks with the new helperensure_qa_volcengine_credentials()to fix QA hotkey using wrongcredentials
Settings.tsx
Add ASR preset selection with pre-filled configurationopenless-all/app/src/pages/Settings.tsx
ASR_PRESETSwith SiliconFlow, Zhipu, Groq entries includingbaseUrl and model
asr.endpointandasr.modelon provider switchen.ts
Add i18n for new ASR presetsopenless-all/app/src/i18n/en.ts
presets
zh-CN.ts
Add i18n for new ASR presetsopenless-all/app/src/i18n/zh-CN.ts
presets