Fix English text incorrectly showing 'translated by omi' badge#5591
Merged
Fix English text incorrectly showing 'translated by omi' badge#5591
Conversation
…ranslations Adds should_persist_translation() that checks if translated text differs from source and if detected language matches target. Returns False for no-op translations (e.g. English->English) that would incorrectly show "translated by omi" badge. Also normalizes locale tags in update_from_translate_response for consistent comparison. Fixes #5582 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… persistence After translate API returns, checks should_persist_translation before creating Translation object. When translation is a no-op (same text, same language), skips persistence and prunes pending entry. Prevents "translated by omi" badge on English text that was unnecessarily sent to API. Fixes #5582 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests: same-language no-op not persisted, unchanged text without detection not persisted, mixed-language translation with changed text still persisted. Fixes #5582 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Tests for short phrases ("Transcription service."), numeric text ("123. 123."),
short greetings ("Hey."), whitespace normalization, real translations, None
handling, and empty text. Addresses reviewer feedback on test coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests the full flow: API returns same text with target lang detected, guard triggers early return, pending entry is pruned, no translation persisted, and lang cache still updated. Also tests real translation path persists correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies should_persist_translation is imported and called before Translation creation in transcribe.py. Catches accidental removal of the guard since _translate_segment is a closure that cannot be imported for direct testing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests for long English text, question marks, case/locale variant tags (EN-us vs en-US), and unchanged text with mismatched detected language. Addresses tester coverage gaps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that short foreign words (Bonjour, Hola, Danke) and multiple short foreign sentences are correctly translated and persisted (not skipped). Also verifies langdetect pre-filter does NOT misdetect foreign text as English, ensuring foreign segments always reach the translate API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
Author
Level 1 Live Test: Local dev backend + real Google Translate APIRan Q1: Short foreign words (target=en) — all translate correctly
Q2: Multiple short foreign sentences — all translate correctly
Bug fix: Short English no-op correctly skipped
Control: Long English pre-filtered
Result: ALL PASS (86/86 unit tests + level 1 live test) by AI for @beastoin |
Collaborator
Author
|
lgtm |
Collaborator
Author
Prod Verification (T+0)Deploy confirmed: Cloud Run revision Translation pipeline: healthy
Sample translate_summary logs (prod)Result: T+0 clean. Translation pipeline operating normally post-deploy. by AI for @beastoin |
Glucksberg
pushed a commit
to Glucksberg/omi-local
that referenced
this pull request
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
should_persist_translation()guard that skips creating Translation objects when translated text is identical to source text (same-language no-op)_translate_segment()after API response — checks before persisting, prunes pending entry on skipupdate_from_translate_response()for robust comparisonRoot cause
PR #5384 (translate cost optimization) removed the old
if translated_text == segment_text: returnguard. Combined withlangdetectmisdetecting short English text (e.g. "Transcription service." → fr), short segments bypass the lang cache and reach the API, which returns identical text that gets persisted with a translation badge.Fix mechanism
should_persist_translation(source, translated, detected_lang, target_lang):Guard placement: after lang cache update (so cache still learns from API) but before Translation object creation.
Testing
Files changed
backend/utils/translation_cache.py—should_persist_translation()+_normalize_base_language()helperbackend/routers/transcribe.py— import + guard in_translate_segment()backend/tests/unit/test_translation_optimization.py— 18 new testsDeployment
gh workflow run gcp_backend.yml -f environment=prod -f branch=main)transcribe.pyandtranslation_cache.pyCloses #5582
by AI for @beastoin