Skip to content

fix(local-asr): Qwen3-ASR 长语音末段丢内容 + 长录音超时#434

Merged
appergb merged 1 commit into
betafrom
fix/qwen-asr-long-audio-loss
May 13, 2026
Merged

fix(local-asr): Qwen3-ASR 长语音末段丢内容 + 长录音超时#434
appergb merged 1 commit into
betafrom
fix/qwen-asr-long-audio-loss

Conversation

@appergb
Copy link
Copy Markdown
Collaborator

@appergb appergb commented May 13, 2026

User description

背景

两个独立缺陷叠加,导致本地 Qwen3-ASR 在长语音 / 立即松键场景下丢内容或全段失败。

缺陷 1(主要,丢内容)

qwen_engine.rs:94 注释明确写出 transcribe_stream 内部按 2s chunk 切片。用户说完最后一个字立刻按快捷键时,录音缓冲里没有任何静默尾巴 → 最后一个不足 2s 的 chunk 拿不到静默帧 → C 引擎不会把它当作"语音已结束" → 该 chunk 的转写结果被丢弃,末段内容消失。

复现侧验证:等 5–10 秒静默再按快捷键,那段静默会随录音进入缓冲 → C 引擎见到静默 → 末 chunk 正常收尾 → 无丢失。

缺陷 2(次要,长录音超时)

COORDINATOR_GLOBAL_TIMEOUT_SECS = 15coordinator.rs:3593)。本地 Qwen 路径走 asr_transcribe_uses_global_timeout 的默认 true 分支(coordinator.rs:87-93),命中 15s 全局超时。

用户实测 RTF ≈ 0.3、慢机器可达 0.5 → 60s 录音需要约 18s 转写 → 直接超时把整段结果丢弃。

修复

缺陷 1(local_provider.rs

let mut samples_f32 = i16_le_bytes_to_f32(&pcm_bytes);
// 末 chunk 收尾信号:追加 0.5s 静默 = 8000 个 f32 零值 @ 16kHz
samples_f32.extend(std::iter::repeat(0.0f32).take(8_000));

duration_ms 仍按原始缓冲长度计算,padding 不计入。

缺陷 2(coordinator.rs + coordinator/dictation.rs

新增 module-level helper:

fn local_qwen_transcribe_timeout(audio_secs: f64) -> std::time::Duration {
    let secs = ((audio_secs * 0.6).ceil() as u64)
        .saturating_add(10)
        .max(COORDINATOR_GLOBAL_TIMEOUT_SECS);
    std::time::Duration::from_secs(secs)
}

dictation.rsActiveAsr::Local 分支在调 transcribe() 前读 local.buffer_duration_ms() 算出 audio_secs,用新 helper 决定超时。其他 ASR 路径(Volcengine / Whisper / Bailian / Foundry / QA)全部未改,仍是固定 15s。

配套新增 LocalQwenAsr::buffer_duration_ms() -> u64&self 不消费缓冲。

Test plan

  • cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml 通过
  • cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib local_qwen_timeout — 4 条新单测全过:
    • 短录音 5s 兜底返回 15s
    • 60s 录音返回 46s(与用户给的公式示例一致)
    • 10.1s 录音 ceil 到 17s
    • 0s 边界返回 15s
  • cargo test ... --lib coordinator — 39 条现有测试全过,零回归
  • 真机回归(macOS,必须)
    • 60s 一口气说完立刻松键 → 末段不应丢字(验证缺陷 1)
    • 90s 长录音 → 不再超时失败(验证缺陷 2)
    • 短录音(5s 内)行为不变

关联

issue #420 评论里 @aeoform 反馈触发延伸排查时发现的两条独立缺陷,与 #420 主线(Wayland CLI 触发)无直接关联。


PR Type

Bug fix, Tests


Description

  • Fix local Qwen trailing audio loss

  • Replace fixed timeout with dynamic scaling

  • Add buffer duration reader for coordinator

  • Cover timeout rules with unit tests


Diagram Walkthrough

flowchart LR
  A["Local audio buffer"] --> B["LocalQwenAsr transcribe"]
  B --> C["Append 0.5s silence padding"]
  A --> D["Read buffered duration"]
  D --> E["Dynamic Qwen timeout"]
  E --> F["Coordinator end_session"]
  F --> G["Transcribe with timeout"]
Loading

File Walkthrough

Relevant files
Bug fix
local_provider.rs
Add silence padding and buffer duration                                   

openless-all/app/src-tauri/src/asr/local/local_provider.rs

  • Added buffer_duration_ms() to read queued audio length without
    consuming it.
  • Appended 0.5 seconds of silence before local Qwen transcription.
  • Kept returned duration_ms based on original audio only.
  • Documented why padding helps the C engine finish the last chunk.
+14/-1   
coordinator.rs
Add dynamic local Qwen timeout logic                                         

openless-all/app/src-tauri/src/coordinator.rs

  • Added local_qwen_transcribe_timeout(audio_secs) helper.
  • Uses max(15, ceil(audio_s × 0.6) + 10) for local Qwen ASR.
  • Added unit tests for short audio, long audio, rounding, and zero
    length.
  • Preserved existing timeout behavior for other ASR paths.
+47/-0   
dictation.rs
Use audio-based timeout during transcription                         

openless-all/app/src-tauri/src/coordinator/dictation.rs

  • Switched local Qwen transcription from fixed timeout to dynamic
    timeout.
  • Read buffered audio duration before calling transcribe().
  • Added runtime logging for audio length and computed timeout.
  • Updated timeout error handling to report dynamic values.
+15/-6   

两个独立缺陷一并修:

缺陷 1(主要,丢内容):transcribe_stream 内部按 2s chunk 切片;
用户说完最后一个字立刻松键时录音缓冲没有任何静默尾巴,末 chunk
< 2s 拿不到静默帧 → C 引擎不收尾 → 该 chunk 转写结果被丢弃。等
5-10 秒静默再松键时由于尾部静默被录进缓冲反而正常。

修复:local_provider.rs transcribe() 把 PCM 转 f32 后追加 0.5 秒
(8000 个 f32 零值 @ 16kHz)静默,给 C 引擎收尾信号。duration_ms
仍按原始缓冲长度计算,padding 不计入。

缺陷 2(次要,长录音超时):COORDINATOR_GLOBAL_TIMEOUT_SECS = 15s
固定值;用户 RTF ≈ 0.3、慢机可达 0.5,60s 录音需 ~18s 转写就直接
超时。

修复:新增 local_qwen_transcribe_timeout(audio_secs) -> Duration,
公式 max(15, ceil(audio_s × 0.6) + 10);只在 Local Qwen 路径用
(Volcengine / Whisper / Bailian / Foundry / QA 路径不动)。配套
LocalQwenAsr::buffer_duration_ms() 不消费缓冲地读取音频时长。

加 4 条单测覆盖公式:短录音兜底 15s、长录音线性放大、ceil 部分秒、
0 秒边界。39 条 coordinator 测试全过。
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ❌

420 - Not compliant

Non-compliant requirements:

  • Global recording shortcut support on Debian Wayland.
  • Script/command guidance for configuring the shortcut in system settings.
  • End-to-end behavior of the shortcut in the global desktop environment.

Requires further human verification:

  • Runtime validation on an actual Debian Wayland desktop environment.
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ No major issues detected

@appergb appergb merged commit 1ab0807 into beta May 13, 2026
4 checks passed
@appergb appergb deleted the fix/qwen-asr-long-audio-loss branch May 13, 2026 15:08
appergb pushed a commit that referenced this pull request May 13, 2026
- PR #433:Wayland callout 补全 --toggle-qa / --cancel-dictation
  三命令并列 + 五语言同步翻译
- PR #434:本地 Qwen3-ASR 末段静默 padding 修末 chunk 丢内容;
  动态超时 max(15, ceil(audio_s × 0.6) + 10) 修长录音超时
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant