Skip to content

接入百炼 DashScope 实时 ASR#385

Merged
H-Chris233 merged 3 commits into
Open-Less:betafrom
H-Chris233:fix/issue-384-bailian-asr
May 9, 2026
Merged

接入百炼 DashScope 实时 ASR#385
H-Chris233 merged 3 commits into
Open-Less:betafrom
H-Chris233:fix/issue-384-bailian-asr

Conversation

@H-Chris233
Copy link
Copy Markdown
Collaborator

@H-Chris233 H-Chris233 commented May 9, 2026

User description

变更概览

  • 新增百炼 / DashScope 传统实时 ASR provider,走 wss://dashscope.aliyuncs.com/api-ws/v1/inference/,直接消费现有 16 kHz mono PCM。
  • 接入 coordinator 听写状态机:启动时先录音并缓存 PCM,WebSocket task-started 后再补发缓存音频;结束时发送 finish-task 并等待最终结果。
  • Settings 增加“阿里云百炼实时 ASR”选项,默认模型 fun-asr-realtime,endpoint 可手动改为新加坡等地址。
  • 增加可选 asr.vocabulary_id 字段,用于下发已在百炼侧创建的热词表 ID;不自动创建/管理远端热词资源。
  • Overview 与五种 UI 语言补齐供应商文案。

取舍说明

  • 本 PR 实现传统 DashScope realtime ASR 线,不混入 qwen3-asr-flash-realtime 的 OpenAI Realtime 协议;后者事件和音频提交格式不同,适合单独 provider。
  • 不自动把本地词汇表同步成百炼热词表,避免静默创建/更新供应商远端资源;当前支持用户填写已有 vocab-... ID。

验证

  • cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml
  • cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib
  • cargo test --manifest-path openless-all/app/src-tauri/backend-tests/Cargo.toml
  • cd openless-all/app && npm run build

未验证

  • 缺少真实 DashScope API Key 和本机麦克风授权环境,未做线上 macOS 完整 dictation 实测。

Closes #384


PR Type

Enhancement, Tests


Description

  • Added new Bailian Realtime ASR provider (bailian.rs) implementing DashScope WebSocket protocol.

  • Integrated provider into coordinator dictation lifecycle with deferred audio bridging.

  • Extended credential schema and UI settings for endpoint, model, and optional vocabulary ID.

  • Added English, Japanese, Korean, Simplified/Traditional Chinese i18n labels.

  • Included unit tests for credentials normalization, message building, and audio chunking.


Diagram Walkthrough

flowchart LR
  A["Settings: select Bailian ASR"] --> B["Credentials (api_key, endpoint, model, vocabulary_id)"]
  B --> C["Coordinator: begin_session"]
  C --> D["BailianRealtimeASR::open_session"]
  D --> E["WebSocket to DashScope /api-ws/v1/inference"]
  E --> F["Stream audio & receive transcription"]
  F --> G["Return RawTranscript"]
Loading

File Walkthrough

Relevant files
Enhancement
8 files
bailian.rs
Implement Bailian DashScope realtime ASR client                   
+592/-0 
mod.rs
Expose Bailian module and public types                                     
+2/-0     
commands.rs
Add bailian provider validation and model listing               
+59/-0   
coordinator.rs
Integrate bailian credential reading into coordinator       
+35/-3   
dictation.rs
Handle dictation session lifecycle for bailian ASR             
+117/-1 
resources.rs
Cancel bailian ASR on active ASR cleanup                                 
+1/-0     
Overview.tsx
Map bailian ASR provider in overview page                               
+1/-0     
Settings.tsx
Add bailian ASR preset and settings UI                                     
+18/-2   
Configuration changes
1 files
persistence.rs
Persist vocabulary_id in credentials store                             
+14/-2   
Documentation
5 files
en.ts
English localization for bailian ASR UI                                   
+3/-0     
ja.ts
Japanese localization for bailian ASR UI                                 
+3/-0     
ko.ts
Korean localization for bailian ASR UI                                     
+3/-0     
zh-CN.ts
Simplified Chinese localization for bailian ASR UI             
+3/-0     
zh-TW.ts
Traditional Chinese localization for bailian ASR UI           
+3/-0     

百炼传统实时识别协议可以直接消费 OpenLess 已有的 16 kHz mono PCM,所以新增独立 ASR provider 并复用现有 DeferredAsrBridge / coordinator 会话状态机,而不是引入另一条录音链路。Settings 侧增加百炼选项、默认北京 endpoint、默认 fun-asr-realtime 模型,并允许填写可选 vocabulary_id 下发已创建的百炼热词表。

Constraint: issue Open-Less#384 要求接入百炼 DashScope 实时 ASR,并保持北京/新加坡 endpoint 可配置。
Constraint: 凭据仍必须走系统凭据库,新增 asr.vocabulary_id 只作为 active ASR provider 的可选 provider 设置。
Rejected: 同时实现 qwen3-asr-flash OpenAI Realtime 协议 | 协议和结果事件不同,超出最小更改范围。
Rejected: 自动从本地词汇表创建百炼热词表 | 需要远端资源生命周期和用户确认,不能静默替用户管理供应商资源。
Confidence: high
Scope-risk: moderate
Directive: 百炼 provider 当前走传统 /api-ws/v1/inference 协议;不要把 qwen realtime 事件格式混进同一实现。
Tested: cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml
Tested: cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib
Tested: cargo test --manifest-path openless-all/app/src-tauri/backend-tests/Cargo.toml
Tested: cd openless-all/app && npm run build
Not-tested: 需要真实 DashScope API Key 和麦克风环境才能完成线上 macOS dictation 验证。
Related: Open-Less#384
Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

PR Reviewer Guide 🔍

(Review updated until commit 0a2b6bd)

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅

384 - PR Code Verified

Compliant requirements:

  • Added a Bailian realtime ASR provider over the DashScope WebSocket inference API.
  • Reuses the existing DashScope API key credential for authentication.
  • Streams recorder PCM through the ASR pipeline via AudioConsumer.
  • Handles interim and final transcript events.
  • Supports optional vocabulary ID input.
  • Endpoint is configurable from Settings.
  • Integrated the provider into coordinator session flow.
  • Added Settings UI support for selecting Bailian ASR.

Requires further human verification:

  • Full end-to-end dictation on macOS with a real DashScope API key and microphone permissions.
  • Real network validation of the configured DashScope endpoints in the target regions.
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ No major issues detected

Bailian startup stores the ASR instance before opening the remote session so recorder startup audio can be buffered. If the websocket setup then fails in the active startup path, the provider must be cancelled before the coordinator returns to idle. The partial-result fallback remains intentional and is now documented as matching the existing Volcengine no-loss behavior.\n\nConstraint: OpenLess ASR paths prefer not losing already recognized user speech.\nRejected: Treat every websocket close with partial text as failure | would diverge from Volcengine fallback semantics.\nConfidence: high\nScope-risk: narrow\nTested: cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml\nTested: cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib\nTested: cargo test --manifest-path openless-all/app/src-tauri/backend-tests/Cargo.toml\nCo-authored-by: OmX <omx@oh-my-codex.dev>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Persistent review updated to latest commit 00f150a

Bailian uses Notify::notify_waiters when the remote task starts. send_last_frame must register the Notified future before checking the started flag so a task-started event cannot land between the check and waiter registration.\n\nConstraint: DashScope stop should not falsely time out when task-started races with user stop.\nRejected: Replace Notify with polling | broader and less direct than the Tokio enable pattern.\nConfidence: high\nScope-risk: narrow\nTested: cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml\nTested: cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib asr::bailian::tests\nCo-authored-by: OmX <omx@oh-my-codex.dev>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Persistent review updated to latest commit 0a2b6bd

@H-Chris233 H-Chris233 merged commit b1c3872 into Open-Less:beta May 9, 2026
4 checks passed
@H-Chris233 H-Chris233 deleted the fix/issue-384-bailian-asr branch May 10, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[asr] 添加百炼 DashScope 实时语音识别供应商

1 participant