Skip to content

Offline sync transcription missing user preferences, vocabulary, language, and speaker identification #6172

@beastoin

Description

@beastoin

Problem

Offline-synced recordings skip user transcription preferences entirely. The sync path in `backend/routers/sync.py` calls `deepgram_prerecorded()` with hardcoded defaults, while the live streaming path in `backend/routers/transcribe.py` correctly fetches and applies user preferences, vocabulary, language, speaker identification, and translation.

Consolidates: #6140 (diarization + vocabulary), #5907 (speaker diarization), #5912 (custom vocabulary)

Code Comparison

Sync Path (sync.py:711)

```python
words, language = deepgram_prerecorded(url, speakers_count=3, attempts=0, return_language=True)

No user preferences fetched

No vocabulary/keywords passed

No language parameter

No model selection based on language

No speaker identification

```

Realtime Path (transcribe.py:297-306, 1011-1035)

```python
transcription_prefs = get_user_transcription_preferences(uid)
vocabulary = list({"Omi"} | set(transcription_prefs.get('vocabulary', [])))
single_language_mode = transcription_prefs.get('single_language_mode', False)

+ language-based model selection

+ speech profile support

+ speaker embedding extraction + person matching

+ translation service

+ text-based speaker detection

```

Feature Gap Table

Feature Realtime Sync Impact
Custom vocabulary/keywords YES NO Domain terms transcribed wrong
User language preference YES NO Ignores user's language setting
Single language mode YES NO Always multi-lang, accuracy loss
Language-based model selection YES NO (always nova-3) Chinese/Thai get wrong model
Speaker identification (embeddings) YES NO Cannot identify who is speaking
Speaker-to-person mapping YES NO All segments have person_id=None
Text-based speaker detection YES NO Missed "I am X" name detection
Translation service YES NO No conversation translation

Root Cause

`process_segment()` in `sync.py` never calls `get_user_transcription_preferences(uid)` and passes no user-specific parameters to `deepgram_prerecorded()`. The function signature of `deepgram_prerecorded()` also lacks a `keywords` parameter.

Key Files

  • `backend/routers/sync.py` — `process_segment()` at line 693
  • `backend/routers/transcribe.py` — `_stream_handler()` at line 219
  • `backend/utils/stt/pre_recorded.py` — `deepgram_prerecorded()` at line 109
  • `backend/database/users.py` — `get_user_transcription_preferences()` at line 995

Feasible Fixes (pre-recorded API compatible)

These features can be added to the pre-recorded API path:

  1. Fetch user preferences — call `get_user_transcription_preferences(uid)`
  2. Pass vocabulary/keywords — add `keywords` param to `deepgram_prerecorded()`, use `keyterm` for nova-3 / `keywords` for nova-2
  3. Pass language — respect user language preference, apply model selection via `get_deepgram_model_for_language()`
  4. Speaker identification — post-process: extract embeddings from Deepgram segments, match against user's people embeddings
  5. Translation — post-process: translate segments if user has language preference

Not feasible (streaming API only): speech profiles/preseconds, VAD gating, multi-channel, onboarding mode.

Ref


Filed by @kelvin-agent, 2026-03-30. Code analysis from backend at HEAD.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcaptureLayer: Audio recording, device pairing, BLEmaintainerLane: High-risk, cross-system changesp2Priority: Important (score 14-21)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions