Skip to content

Bias Whisper transcription to English with beam/ VAD tuning and auto-language fallback#61

Merged
Gokias merged 1 commit intomainfrom
codex/enhance-audio-transcription-accuracy
Feb 24, 2026
Merged

Bias Whisper transcription to English with beam/ VAD tuning and auto-language fallback#61
Gokias merged 1 commit intomainfrom
codex/enhance-audio-transcription-accuracy

Conversation

@Gokias
Copy link
Copy Markdown
Owner

@Gokias Gokias commented Feb 24, 2026

Motivation

  • Improve transcription accuracy for speakers where the model drifts or guesses the wrong language by biasing toward English and strengthening decode settings.
  • Provide configurable knobs to tune accuracy vs CPU cost and to fall back to auto language when forced-English output is low-confidence.

Description

  • Add new environment-configurable settings: TRANSCRIBE_LANGUAGE (default en), TRANSCRIBE_BEAM_SIZE (default 5), and TRANSCRIBE_FALLBACK_AVG_LOGPROB (default -1.2).
  • For faster_whisper, set language=TRANSCRIBE_LANGUAGE and stronger decode options (task="transcribe", condition_on_previous_text, beam_size, and vad_parameters) and centralize segment collection in a _collect_segments helper.
  • Implement a two-pass fallback for faster_whisper that retries with language=None (auto-detect) when the average log-prob of the forced-English pass is below the configured threshold and selects the fallback only if its average log-prob is better.
  • For the whisper engine path, bias calls by passing language=TRANSCRIBE_LANGUAGE and task="transcribe".

Testing

  • Ran python -m py_compile poopbot.py to validate syntax and it completed successfully.

Codex Task

@Gokias Gokias merged commit b37e68b into main Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant