Skip to content

perf: eliminate redundant Translate API calls — detection double-billing, segment re-translation, no persistent cache #4651

@beastoin

Description

@beastoin

Problem

The listen WebSocket endpoint (transcribe.py:1508) calls Google Cloud Translate API with three compounding inefficiencies that together inflate translation costs ~6–8x above necessary levels:

  1. Detection double-billing (~43% of spend is redundant): Every segment triggers an explicit detect_language() API call (translation.py:191), then translate_text() auto-detects the source language again for free (no source_language_code is set at line 298). Both are billed at the same per-character rate. The detection call is pure waste.

  2. Growing segments re-translated 5–10x: The 0.6s processing loop (transcribe.py:1414) re-sends the full segment text on every tick as Deepgram adds words. Early words in a segment are billed 5–10 times as the segment grows.

  3. No persistent cache: Translation cache is an in-memory per-session OrderedDict (translation.py:250), max 1000 items, lost on disconnect. Detection cache is global in-memory, lost on pod restart. Zero cross-user or cross-session reuse. Redis exists in the codebase for other features but is not used for translation.

  4. Per-sentence API calls with comma splitting: split_into_sentences() (line 244) splits on .?!, — commas create 3–5 tiny fragments per segment, each sent as a separate translate_text() call. The API supports contents=[] lists (line 299) but code only sends 1 item per call.

Code Flow

stream_transcript_process() [transcribe.py:1414]
  └─ every 0.6s, for each updated segment:
      └─ translate(updated_segments) [line 1508]
          ├─ detect_language(text) [translation_cache.py:26]
          │   └─ _client.detect_language() ← PAID, redundant
          └─ translate_text_by_sentence(text) [translation.py:269]
              ├─ split_into_sentences() ← splits on commas
              └─ for each sentence:
                  └─ _client.translate_text(contents=[text]) ← PAID, 1 item at a time

Solution (4 changes, Codex-reviewed)

Change 1: Eliminate explicit detect_language API

  • Remove _client.detect_language() calls entirely
  • Use response.translations[0].detected_language_code from translate response (free)
  • Replace translated_text == segment_text check (line 1155) with detected_language_code == target_language — Translate normalizes punctuation/spacing even for same-language text
  • Impact: ~43% reduction (eliminates entire detection cost line)

Change 2: Per-segment debounce queue

  • Translate immediately on first segment appearance (zero UX delay)
  • Debounce updates with 0.8–1.2s trailing window before re-translating
  • If STT marks segment as "final," translate immediately (skip debounce)
  • Track segment_version / last_text_hash to prevent stale out-of-order responses
  • Pattern: per-segment tracking map, not asyncio queue:
    pending[segment_id] = {last_text, last_hash, last_update_at, task}
  • Impact: 60–80% reduction in translation characters

Change 3: Sentence-level Redis cache

  • Key: translate:v1:{md5(sentence)}:{dest_lang} → translated text
  • TTL: 7–30 days + Redis LRU eviction
  • Sentence granularity provides best cache hits (intra-session growing segments + cross-user common phrases)
  • Store detected_language_code alongside cached translation for ambiguous short text
  • Impact: additional 10–15% reduction on remaining characters

Change 4: Fix sentence splitting + batch API calls

  • Remove comma-based splitting — creates tiny ambiguous fragments with worse translation quality
  • Split on .?! and newlines only, length-based fallback (max 200–300 chars)
  • Batch all cache-miss sentences into one contents=[miss1, miss2, ...] call
  • Impact: overhead/latency reduction, better translation quality

Correctness Guardrails

  1. Use detected_language_code == target_language instead of text equality — Translate normalizes punctuation/spacing/quotes
  2. Track segment version to prevent out-of-order responses overwriting newer translations
  3. Version cache keys (translate:v1:...) for future compatibility
  4. Mixed-language segments: auto-detect picks dominant language (pre-existing limitation, unchanged)

Files to Modify

File Changes
backend/utils/translation.py Remove detect API, add Redis cache, fix splitting, add batching
backend/utils/translation_cache.py Simplify or remove (detection no longer needed)
backend/routers/transcribe.py Add debounce queue in translate(), use detected_language_code
backend/database/redis_db.py Add translation cache helpers (follows existing cache_signed_url pattern)

Estimated Combined Impact

Change Reduction
Eliminate detection ~43%
Dedup queue ~35–40% of remainder
Redis cache ~10–15% of remainder
Total ~84–89%

Analysis by @Chen (RepoOps), Codex-reviewed. All code references verified against current main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendBackend Task (python)enhancementNew feature or requestmaintainerLane: High-risk, cross-system changesp1Priority: Critical (score 22-29)performanceunderstandLayer: Speech-to-text, language detection

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions