fix(local-asr): Qwen3-ASR 长语音末段丢内容 + 长录音超时#434
Merged
Conversation
两个独立缺陷一并修: 缺陷 1(主要,丢内容):transcribe_stream 内部按 2s chunk 切片; 用户说完最后一个字立刻松键时录音缓冲没有任何静默尾巴,末 chunk < 2s 拿不到静默帧 → C 引擎不收尾 → 该 chunk 转写结果被丢弃。等 5-10 秒静默再松键时由于尾部静默被录进缓冲反而正常。 修复:local_provider.rs transcribe() 把 PCM 转 f32 后追加 0.5 秒 (8000 个 f32 零值 @ 16kHz)静默,给 C 引擎收尾信号。duration_ms 仍按原始缓冲长度计算,padding 不计入。 缺陷 2(次要,长录音超时):COORDINATOR_GLOBAL_TIMEOUT_SECS = 15s 固定值;用户 RTF ≈ 0.3、慢机可达 0.5,60s 录音需 ~18s 转写就直接 超时。 修复:新增 local_qwen_transcribe_timeout(audio_secs) -> Duration, 公式 max(15, ceil(audio_s × 0.6) + 10);只在 Local Qwen 路径用 (Volcengine / Whisper / Bailian / Foundry / QA 路径不动)。配套 LocalQwenAsr::buffer_duration_ms() 不消费缓冲地读取音频时长。 加 4 条单测覆盖公式:短录音兜底 15s、长录音线性放大、ceil 部分秒、 0 秒边界。39 条 coordinator 测试全过。
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User description
背景
两个独立缺陷叠加,导致本地 Qwen3-ASR 在长语音 / 立即松键场景下丢内容或全段失败。
缺陷 1(主要,丢内容)
qwen_engine.rs:94注释明确写出transcribe_stream内部按 2s chunk 切片。用户说完最后一个字立刻按快捷键时,录音缓冲里没有任何静默尾巴 → 最后一个不足 2s 的 chunk 拿不到静默帧 → C 引擎不会把它当作"语音已结束" → 该 chunk 的转写结果被丢弃,末段内容消失。复现侧验证:等 5–10 秒静默再按快捷键,那段静默会随录音进入缓冲 → C 引擎见到静默 → 末 chunk 正常收尾 → 无丢失。
缺陷 2(次要,长录音超时)
COORDINATOR_GLOBAL_TIMEOUT_SECS = 15(coordinator.rs:3593)。本地 Qwen 路径走asr_transcribe_uses_global_timeout的默认 true 分支(coordinator.rs:87-93),命中 15s 全局超时。用户实测 RTF ≈ 0.3、慢机器可达 0.5 → 60s 录音需要约 18s 转写 → 直接超时把整段结果丢弃。
修复
缺陷 1(
local_provider.rs)duration_ms仍按原始缓冲长度计算,padding 不计入。缺陷 2(
coordinator.rs+coordinator/dictation.rs)新增 module-level helper:
dictation.rs的ActiveAsr::Local分支在调transcribe()前读local.buffer_duration_ms()算出audio_secs,用新 helper 决定超时。其他 ASR 路径(Volcengine / Whisper / Bailian / Foundry / QA)全部未改,仍是固定 15s。配套新增
LocalQwenAsr::buffer_duration_ms() -> u64,&self不消费缓冲。Test plan
cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml通过cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib local_qwen_timeout— 4 条新单测全过:cargo test ... --lib coordinator— 39 条现有测试全过,零回归关联
issue #420 评论里 @aeoform 反馈触发延伸排查时发现的两条独立缺陷,与 #420 主线(Wayland CLI 触发)无直接关联。
PR Type
Bug fix, Tests
Description
Fix local Qwen trailing audio loss
Replace fixed timeout with dynamic scaling
Add buffer duration reader for coordinator
Cover timeout rules with unit tests
Diagram Walkthrough
File Walkthrough
local_provider.rs
Add silence padding and buffer durationopenless-all/app/src-tauri/src/asr/local/local_provider.rs
buffer_duration_ms()to read queued audio length withoutconsuming it.
duration_msbased on original audio only.coordinator.rs
Add dynamic local Qwen timeout logicopenless-all/app/src-tauri/src/coordinator.rs
local_qwen_transcribe_timeout(audio_secs)helper.max(15, ceil(audio_s × 0.6) + 10)for local Qwen ASR.length.
dictation.rs
Use audio-based timeout during transcriptionopenless-all/app/src-tauri/src/coordinator/dictation.rs
timeout.
transcribe().