-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Problem
The listen WebSocket endpoint (transcribe.py:1508) calls Google Cloud Translate API with three compounding inefficiencies that together inflate translation costs ~6–8x above necessary levels:
-
Detection double-billing (~43% of spend is redundant): Every segment triggers an explicit
detect_language()API call (translation.py:191), thentranslate_text()auto-detects the source language again for free (nosource_language_codeis set at line 298). Both are billed at the same per-character rate. The detection call is pure waste. -
Growing segments re-translated 5–10x: The 0.6s processing loop (
transcribe.py:1414) re-sends the full segment text on every tick as Deepgram adds words. Early words in a segment are billed 5–10 times as the segment grows. -
No persistent cache: Translation cache is an in-memory per-session
OrderedDict(translation.py:250), max 1000 items, lost on disconnect. Detection cache is global in-memory, lost on pod restart. Zero cross-user or cross-session reuse. Redis exists in the codebase for other features but is not used for translation. -
Per-sentence API calls with comma splitting:
split_into_sentences()(line 244) splits on.?!,— commas create 3–5 tiny fragments per segment, each sent as a separatetranslate_text()call. The API supportscontents=[]lists (line 299) but code only sends 1 item per call.
Code Flow
stream_transcript_process() [transcribe.py:1414]
└─ every 0.6s, for each updated segment:
└─ translate(updated_segments) [line 1508]
├─ detect_language(text) [translation_cache.py:26]
│ └─ _client.detect_language() ← PAID, redundant
└─ translate_text_by_sentence(text) [translation.py:269]
├─ split_into_sentences() ← splits on commas
└─ for each sentence:
└─ _client.translate_text(contents=[text]) ← PAID, 1 item at a time
Solution (4 changes, Codex-reviewed)
Change 1: Eliminate explicit detect_language API
- Remove
_client.detect_language()calls entirely - Use
response.translations[0].detected_language_codefrom translate response (free) - Replace
translated_text == segment_textcheck (line 1155) withdetected_language_code == target_language— Translate normalizes punctuation/spacing even for same-language text - Impact: ~43% reduction (eliminates entire detection cost line)
Change 2: Per-segment debounce queue
- Translate immediately on first segment appearance (zero UX delay)
- Debounce updates with 0.8–1.2s trailing window before re-translating
- If STT marks segment as "final," translate immediately (skip debounce)
- Track
segment_version/last_text_hashto prevent stale out-of-order responses - Pattern: per-segment tracking map, not asyncio queue:
pending[segment_id] = {last_text, last_hash, last_update_at, task}
- Impact: 60–80% reduction in translation characters
Change 3: Sentence-level Redis cache
- Key:
translate:v1:{md5(sentence)}:{dest_lang}→ translated text - TTL: 7–30 days + Redis LRU eviction
- Sentence granularity provides best cache hits (intra-session growing segments + cross-user common phrases)
- Store
detected_language_codealongside cached translation for ambiguous short text - Impact: additional 10–15% reduction on remaining characters
Change 4: Fix sentence splitting + batch API calls
- Remove comma-based splitting — creates tiny ambiguous fragments with worse translation quality
- Split on
.?!and newlines only, length-based fallback (max 200–300 chars) - Batch all cache-miss sentences into one
contents=[miss1, miss2, ...]call - Impact: overhead/latency reduction, better translation quality
Correctness Guardrails
- Use
detected_language_code == target_languageinstead of text equality — Translate normalizes punctuation/spacing/quotes - Track segment version to prevent out-of-order responses overwriting newer translations
- Version cache keys (
translate:v1:...) for future compatibility - Mixed-language segments: auto-detect picks dominant language (pre-existing limitation, unchanged)
Files to Modify
| File | Changes |
|---|---|
backend/utils/translation.py |
Remove detect API, add Redis cache, fix splitting, add batching |
backend/utils/translation_cache.py |
Simplify or remove (detection no longer needed) |
backend/routers/transcribe.py |
Add debounce queue in translate(), use detected_language_code |
backend/database/redis_db.py |
Add translation cache helpers (follows existing cache_signed_url pattern) |
Estimated Combined Impact
| Change | Reduction |
|---|---|
| Eliminate detection | ~43% |
| Dedup queue | ~35–40% of remainder |
| Redis cache | ~10–15% of remainder |
| Total | ~84–89% |
Analysis by @Chen (RepoOps), Codex-reviewed. All code references verified against current main.