Skip to content

Fix English text incorrectly showing 'translated by omi' badge#5591

Merged
beastoin merged 8 commits intomainfrom
fix/translate-same-language-badge-5582
Mar 13, 2026
Merged

Fix English text incorrectly showing 'translated by omi' badge#5591
beastoin merged 8 commits intomainfrom
fix/translate-same-language-badge-5582

Conversation

@beastoin
Copy link
Copy Markdown
Collaborator

@beastoin beastoin commented Mar 13, 2026

Summary

  • Add should_persist_translation() guard that skips creating Translation objects when translated text is identical to source text (same-language no-op)
  • Wire guard into _translate_segment() after API response — checks before persisting, prunes pending entry on skip
  • Normalize locale tags consistently in update_from_translate_response() for robust comparison

Root cause

PR #5384 (translate cost optimization) removed the old if translated_text == segment_text: return guard. Combined with langdetect misdetecting short English text (e.g. "Transcription service." → fr), short segments bypass the lang cache and reach the API, which returns identical text that gets persisted with a translation badge.

Fix mechanism

should_persist_translation(source, translated, detected_lang, target_lang):

  1. If normalized texts differ → persist (real translation happened)
  2. If texts are identical AND detected_lang base matches target → skip (confirmed no-op)
  3. If texts are identical AND no detection → skip (conservative: unchanged text = no-op)

Guard placement: after lang cache update (so cache still learns from API) but before Translation object creation.

Testing

  • 86/86 unit tests pass (18 new tests)
  • Level 1 live test: real Google Translate API — all pass (short foreign words translate correctly, short English no-ops skipped)

Files changed

  • backend/utils/translation_cache.pyshould_persist_translation() + _normalize_base_language() helper
  • backend/routers/transcribe.py — import + guard in _translate_segment()
  • backend/tests/unit/test_translation_optimization.py — 18 new tests

Deployment

  • Type: Code-only change (no env vars, no config changes)
  • Action needed: Deploy Backend to Cloud Run (gh workflow run gcp_backend.yml -f environment=prod -f branch=main)
  • Cloud Run deploy rebuilds the image AND restarts backend-listen pods — no separate Helm upgrade needed
  • No pusher/diarizer/VAD deploy needed — change is only in transcribe.py and translation_cache.py

Closes #5582

by AI for @beastoin

beastoin and others added 3 commits March 13, 2026 07:59
…ranslations

Adds should_persist_translation() that checks if translated text differs from source
and if detected language matches target. Returns False for no-op translations (e.g.
English->English) that would incorrectly show "translated by omi" badge. Also
normalizes locale tags in update_from_translate_response for consistent comparison.

Fixes #5582

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… persistence

After translate API returns, checks should_persist_translation before creating
Translation object. When translation is a no-op (same text, same language),
skips persistence and prunes pending entry. Prevents "translated by omi" badge
on English text that was unnecessarily sent to API.

Fixes #5582

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests: same-language no-op not persisted, unchanged text without detection
not persisted, mixed-language translation with changed text still persisted.

Fixes #5582

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

beastoin and others added 5 commits March 13, 2026 08:03
Tests for short phrases ("Transcription service."), numeric text ("123. 123."),
short greetings ("Hey."), whitespace normalization, real translations, None
handling, and empty text. Addresses reviewer feedback on test coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests the full flow: API returns same text with target lang detected, guard
triggers early return, pending entry is pruned, no translation persisted, and
lang cache still updated. Also tests real translation path persists correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies should_persist_translation is imported and called before Translation
creation in transcribe.py. Catches accidental removal of the guard since
_translate_segment is a closure that cannot be imported for direct testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests for long English text, question marks, case/locale variant tags
(EN-us vs en-US), and unchanged text with mismatched detected language.
Addresses tester coverage gaps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that short foreign words (Bonjour, Hola, Danke) and multiple short
foreign sentences are correctly translated and persisted (not skipped).
Also verifies langdetect pre-filter does NOT misdetect foreign text as
English, ensuring foreign segments always reach the translate API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@beastoin
Copy link
Copy Markdown
Collaborator Author

Level 1 Live Test: Local dev backend + real Google Translate API

Ran TranslationService.translate_text_by_sentence() against real Google Translate API with dev credentials. Verified full pipeline: langdetect pre-filter → API call → should_persist_translation guard.

Q1: Short foreign words (target=en) — all translate correctly

Source API Result Detected Persist Status
"Bonjour." "Good morning." fr True PASS
"Hola." "Hello." es True PASS
"Danke." "Thanks." de True PASS
"Merci." "THANKS." fr True PASS

Q2: Multiple short foreign sentences — all translate correctly

Source API Result Detected Persist Status
"Bonjour. Comment allez-vous?" "Good morning. How are you doing?" fr True PASS
"Hola. Gracias. Adiós." "Hello. Thank you. Bye bye." es True PASS
"Oui. Non. Merci beaucoup." "Yes. Non. Thank you so much." fr True PASS

Bug fix: Short English no-op correctly skipped

Source API Result Detected Persist Status
"Transcription service." "Transcription service." en False PASS
"Hey." "Hey." en False PASS
"123. 123." "123. 123." en False PASS
"Hello? Can you hear?" "Hello? Can you hear?" en False PASS

Control: Long English pre-filtered

Source Pre-filter Persist Status
"Two three four five. The weather is nice today." skip=True False PASS

Result: ALL PASS (86/86 unit tests + level 1 live test)

by AI for @beastoin

@beastoin
Copy link
Copy Markdown
Collaborator Author

lgtm

@beastoin beastoin merged commit d78230b into main Mar 13, 2026
1 check passed
@beastoin beastoin deleted the fix/translate-same-language-badge-5582 branch March 13, 2026 08:54
@beastoin
Copy link
Copy Markdown
Collaborator Author

Prod Verification (T+0)

Deploy confirmed: Cloud Run revision backend-00839-z94, backend-listen pods rolling out.

Translation pipeline: healthy

  • Active sessions with translation enabled processing normally
  • lang_cache_skips working (e.g. total=66, lang_skip=65, buffered=1 — English segments correctly skipped by pre-filter)
  • Real translations flowing (e.g. total=88, buffered=84, translated=84 — foreign segments correctly translated)
  • Translation API calls active with cache hits
  • Zero translation errors in last 15min post-deploy

Sample translate_summary logs (prod)

total=66 buffered=1 translated=1 lang_skip=65  (mostly English, 1 foreign segment translated)
total=23 buffered=6 translated=6 lang_skip=17  (mixed session, all foreign segments translated)
total=88 buffered=84 translated=84 lang_skip=4  (heavily foreign session, all translated)
total=3 buffered=0 translated=0 lang_skip=3    (all English, correctly skipped)

Result: T+0 clean. Translation pipeline operating normally post-deploy.

by AI for @beastoin

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

English text incorrectly shows 'translated by omi' badge — missing same-language guard in translate pipeline

1 participant