perf: eliminate redundant Translate API calls — detection double-billing, segment re-translation, no persistent cache

## Problem

The listen WebSocket endpoint (`transcribe.py:1508`) calls Google Cloud Translate API with three compounding inefficiencies that together inflate translation costs ~6–8x above necessary levels:

1. **Detection double-billing (~43% of spend is redundant):** Every segment triggers an explicit `detect_language()` API call (`translation.py:191`), then `translate_text()` auto-detects the source language again for free (no `source_language_code` is set at line 298). Both are billed at the same per-character rate. The detection call is pure waste.

2. **Growing segments re-translated 5–10x:** The 0.6s processing loop (`transcribe.py:1414`) re-sends the full segment text on every tick as Deepgram adds words. Early words in a segment are billed 5–10 times as the segment grows.

3. **No persistent cache:** Translation cache is an in-memory per-session `OrderedDict` (`translation.py:250`), max 1000 items, lost on disconnect. Detection cache is global in-memory, lost on pod restart. Zero cross-user or cross-session reuse. Redis exists in the codebase for other features but is not used for translation.

4. **Per-sentence API calls with comma splitting:** `split_into_sentences()` (line 244) splits on `.?!,` — commas create 3–5 tiny fragments per segment, each sent as a separate `translate_text()` call. The API supports `contents=[]` lists (line 299) but code only sends 1 item per call.

### Code Flow

```
stream_transcript_process() [transcribe.py:1414]
  └─ every 0.6s, for each updated segment:
      └─ translate(updated_segments) [line 1508]
          ├─ detect_language(text) [translation_cache.py:26]
          │   └─ _client.detect_language() ← PAID, redundant
          └─ translate_text_by_sentence(text) [translation.py:269]
              ├─ split_into_sentences() ← splits on commas
              └─ for each sentence:
                  └─ _client.translate_text(contents=[text]) ← PAID, 1 item at a time
```

## Solution (4 changes, Codex-reviewed)

### Change 1: Eliminate explicit `detect_language` API
- Remove `_client.detect_language()` calls entirely
- Use `response.translations[0].detected_language_code` from translate response (free)
- Replace `translated_text == segment_text` check (line 1155) with `detected_language_code == target_language` — Translate normalizes punctuation/spacing even for same-language text
- **Impact: ~43% reduction (eliminates entire detection cost line)**

### Change 2: Per-segment debounce queue
- Translate immediately on first segment appearance (zero UX delay)
- Debounce updates with 0.8–1.2s trailing window before re-translating
- If STT marks segment as "final," translate immediately (skip debounce)
- Track `segment_version` / `last_text_hash` to prevent stale out-of-order responses
- Pattern: per-segment tracking map, not asyncio queue:
  ```python
  pending[segment_id] = {last_text, last_hash, last_update_at, task}
  ```
- **Impact: 60–80% reduction in translation characters**

### Change 3: Sentence-level Redis cache
- Key: `translate:v1:{md5(sentence)}:{dest_lang}` → translated text
- TTL: 7–30 days + Redis LRU eviction
- Sentence granularity provides best cache hits (intra-session growing segments + cross-user common phrases)
- Store `detected_language_code` alongside cached translation for ambiguous short text
- **Impact: additional 10–15% reduction on remaining characters**

### Change 4: Fix sentence splitting + batch API calls
- Remove comma-based splitting — creates tiny ambiguous fragments with worse translation quality
- Split on `.?!` and newlines only, length-based fallback (max 200–300 chars)
- Batch all cache-miss sentences into one `contents=[miss1, miss2, ...]` call
- **Impact: overhead/latency reduction, better translation quality**

### Correctness Guardrails
1. Use `detected_language_code == target_language` instead of text equality — Translate normalizes punctuation/spacing/quotes
2. Track segment version to prevent out-of-order responses overwriting newer translations
3. Version cache keys (`translate:v1:...`) for future compatibility
4. Mixed-language segments: auto-detect picks dominant language (pre-existing limitation, unchanged)

## Files to Modify

| File | Changes |
|------|---------|
| `backend/utils/translation.py` | Remove detect API, add Redis cache, fix splitting, add batching |
| `backend/utils/translation_cache.py` | Simplify or remove (detection no longer needed) |
| `backend/routers/transcribe.py` | Add debounce queue in `translate()`, use `detected_language_code` |
| `backend/database/redis_db.py` | Add translation cache helpers (follows existing `cache_signed_url` pattern) |

## Estimated Combined Impact

| Change | Reduction |
|--------|-----------|
| Eliminate detection | ~43% |
| Dedup queue | ~35–40% of remainder |
| Redis cache | ~10–15% of remainder |
| **Total** | **~84–89%** |

---
_Analysis by @chen (RepoOps), Codex-reviewed. All code references verified against current main._

File	Changes
`backend/utils/translation.py`	Remove detect API, add Redis cache, fix splitting, add batching
`backend/utils/translation_cache.py`	Simplify or remove (detection no longer needed)
`backend/routers/transcribe.py`	Add debounce queue in `translate()`, use `detected_language_code`
`backend/database/redis_db.py`	Add translation cache helpers (follows existing `cache_signed_url` pattern)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: eliminate redundant Translate API calls — detection double-billing, segment re-translation, no persistent cache #4651

Problem

Code Flow

Solution (4 changes, Codex-reviewed)

Change 1: Eliminate explicit `detect_language` API

Change 2: Per-segment debounce queue

Change 3: Sentence-level Redis cache

Change 4: Fix sentence splitting + batch API calls

Correctness Guardrails

Files to Modify

Estimated Combined Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Change	Reduction
Eliminate detection	~43%
Dedup queue	~35–40% of remainder
Redis cache	~10–15% of remainder
Total	~84–89%

perf: eliminate redundant Translate API calls — detection double-billing, segment re-translation, no persistent cache #4651

Description

Problem

Code Flow

Solution (4 changes, Codex-reviewed)

Change 1: Eliminate explicit detect_language API

Change 2: Per-segment debounce queue

Change 3: Sentence-level Redis cache

Change 4: Fix sentence splitting + batch API calls

Correctness Guardrails

Files to Modify

Estimated Combined Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Change 1: Eliminate explicit `detect_language` API