fix(asr): 用 verbose_json 元数据丢弃 Whisper 幻听段落（仅 OpenAI/Groq） by katanumahotori · Pull Request #572 · Open-Less/openless

katanumahotori · 2026-06-01T05:22:14Z

User description

问题

Whisper 在静音 / 弱音 / 噪声段会生成「听起来合理但用户没说」的文本（已知的 hallucination 缺陷）。录音前后的沉默、麦克风底噪经常被转写成无关词，污染最终结果。当前 transcribe_chunk 直接取 json["text"]，没有任何过滤。

方案

当 provider 返回 verbose_json 时，每个 segment 带 no_speech_prob / avg_logprob / compression_ratio。用保守阈值丢掉明显不是语音的段落：

no_speech_prob > 0.6 且 avg_logprob < -0.5（高静音概率 + 低置信）
compression_ratio > 2.4（反复幻听，Whisper 标准阈值）
avg_logprob < -1.0（置信极低，噪声被词化）

误删真实语音最糟，所以阈值偏保守。响应里没有 segments 时退回直接用 text（与旧行为一致）；某些指标字段缺失时按「保留」处理，所以对返回 segments 但缺指标的 provider 是无害空转。

Provider 门控（关键）

verbose_json 只对确证支持且有收益的 provider 开启，避免破坏其它后端：

provider	模型	verbose_json	处理
`whisper`（OpenAI）	Whisper	✅ 完整（含上述指标）	开启，过滤有效
`groq`	Whisper	✅ 完整（seek/avg_logprob/compression_ratio/no_speech_prob）	开启，过滤有效
`zhipu`（GLM-ASR）	GLM-ASR	接受该值但不产出上述指标	保持 json（过滤空转，最小化行为变更）
`siliconflow`	SenseVoice / TeleSpeech	文档无 `response_format`	保持 json（避免未知参数导致 4xx）

依据：OpenAI / Groq 现行文档均明确 verbose_json 返回上述 segment 指标；SiliconFlow 文档的转写接口没有 response_format 参数，模型为 SenseVoice/TeleSpeech；GLM-ASR 接受 verbose_json 但 segment 形态不同。

whisper_supports_verbose_json(provider_id) 决定是否开启；WhisperBatchASR 增加一个 verbose_json bool 参数。开启时同时把 temperature 固定为 0（转写是确定性任务）。

测试

extract_confident_text：丢弃幻听段 / 保留可信段 / 无 segments 回退 text / 缺指标时保留。
whisper_supports_verbose_json：仅 whisper/groq 为 true，siliconflow/zhipu 为 false。
cargo check --lib --tests 通过。

平台 / 兼容性

仅改 transcribe_chunk 与构造参数。未开启的 provider 行为完全不变。

fork 维护者在日语环境实际使用中发现幻听问题；SiliconFlow 无法本地实测（无凭据），故按文档保守门控，不改其行为。命名 / 阈值如需调整请直接指出。

PR Type

Bug fix, Tests

Description

Add verbose_json support to filter hallucinated segments via metadata (no_speech_prob, avg_logprob, compression_ratio)
Gate the feature to only OpenAI/Groq to avoid breaking other providers
Add extract_confident_text function with conservative thresholds
Add unit tests for the new function and provider gating

File Walkthrough

Relevant files

Enhancement

whisper.rs `Add verbose_json hallucination filter and tests` openless-all/app/src-tauri/src/asr/whisper.rs Added `verbose_json` boolean field to `WhisperBatchASR` Modified `transcribe` to conditionally request `response_format=verbose_json` and use `extract_confident_text` filter Added `extract_confident_text` function to drop hallucinated segments using thresholds (no_speech_prob, avg_logprob, compression_ratio) Added unit tests for the filtering logic	+138/-3
coordinator.rs `Gate verbose_json support to whisper/groq providers` openless-all/app/src-tauri/src/coordinator.rs Added `whisper_supports_verbose_json` function to gate the feature only to providers "whisper" and "groq" Modified `build_qa_asr_start` to pass the flag when constructing `WhisperBatchASR` Added unit test to verify provider gating	+25/-0
dictation.rs `Pass verbose_json flag in dictation session` openless-all/app/src-tauri/src/coordinator/dictation.rs Modified `begin_session` to pass the `verbose_json` flag when creating `WhisperBatchASR`	+1/-0

…I/Groq only) Whisper fabricates plausible-but-unspoken text on silence/noise (the classic hallucination defect): leading/trailing silence or mic hiss turns into unrelated words. When the provider returns verbose_json, each segment carries no_speech_prob / avg_logprob / compression_ratio — use them to drop segments that clearly aren't speech (conservative thresholds so real speech is never trimmed). No segments in the response → fall back to text. Provider-gated to avoid breaking non-Whisper backends: - whisper (OpenAI) / groq: native Whisper, verbose_json fully supported with the metrics above — filter is effective. Verified against both providers' current docs. - siliconflow: SenseVoice / TeleSpeech, response_format is undocumented; sending verbose_json risks a 4xx, so it stays on the existing json path. - zhipu (GLM-ASR): accepts verbose_json but does not emit those metrics (filter would be a no-op), so it also stays on json to minimize behavior change. Only whisper/groq opt in. whisper_supports_verbose_json(provider_id) decides the flag; WhisperBatchASR gains a verbose_json bool. Missing metric fields are treated as "keep" so the filter is harmless for any provider that returns segments without them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T05:23:07Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ No major issues detected

Aligns the fork with PR Open-Less#572: the Whisper hallucination filter only requests response_format=verbose_json for providers that return the metrics (whisper/groq). SiliconFlow (SenseVoice/TeleSpeech, no response_format) and zhipu (GLM-ASR, no metrics) keep the plain json path. Previously the fork always sent verbose_json, which was fine on Groq but would risk a 4xx if switched to SiliconFlow. WhisperBatchASR gains a verbose_json bool; whisper_supports_verbose_json decides it at construction. strip_prompt_echo still runs on both paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added the Review effort 3/5 label Jun 1, 2026

H-Chris233 merged commit ad62936 into Open-Less:beta Jun 1, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(asr): 用 verbose_json 元数据丢弃 Whisper 幻听段落（仅 OpenAI/Groq）#572

fix(asr): 用 verbose_json 元数据丢弃 Whisper 幻听段落（仅 OpenAI/Groq）#572
H-Chris233 merged 1 commit into
Open-Less:betafrom
katanumahotori:fix/whisper-hallucination-verbose-json

katanumahotori commented Jun 1, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

katanumahotori commented Jun 1, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

问题

方案

Provider 门控（关键）

测试

平台 / 兼容性

PR Type

Description

File Walkthrough

Uh oh!

github-actions Bot commented Jun 1, 2026

PR Reviewer Guide 🔍

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

katanumahotori commented Jun 1, 2026 •

edited by github-actions Bot

Loading