fix: add speaker embedding matching to offline sync (issue #5907) by sungdark · Pull Request #5946 · BasedHardware/omi

sungdark · 2026-03-23T12:56:46Z

Fix: Offline sync no speaker diarization (issue #5907)

Problem

Offline recording sync (sync_local_files / process_segment) was skipping the speaker identification pipeline, causing all transcribed segments to show generic 'SPEAKER_00', 'SPEAKER_01' labels instead of being matched against stored person embeddings. Live recording worked correctly because it runs speaker_identification_task which calls get_speech_profile_matching_predictions to identify speakers from their voice embeddings.

Solution

Added the same speaker embedding matching call to process_segment after postprocess_words returns. The get_speech_profile_matching_predictions function extracts speaker embeddings from the audio and matches them against stored person embeddings, setting is_user and person_id on each segment.

Changes

backend/routers/sync.py:
- Added import for get_speech_profile_matching_predictions
- Added speaker matching call in process_segment (after getting transcript segments, before storing them)

Testing

The fix follows the same pattern used in postprocess_conversation.py's _handle_segment_embedding_matching function and the speaker_identification_task in transcribe.py.

Closes #5907

Offline sync (sync_local_files / process_segment) was skipping the speaker identification pipeline, causing all transcribed segments to show generic 'SPEAKER_00', 'SPEAKER_01' labels instead of being matched against stored person embeddings. Live recording runs speaker_identification_task which calls get_speech_profile_matching_predictions to identify speakers from their voice embeddings. This fix adds the same call to process_segment after postprocess_words returns. Fixes BasedHardware#5907

greptile-apps · 2026-03-23T12:59:36Z

Greptile Summary

This PR fixes issue #5907 by adding speaker embedding matching to the offline sync path (process_segment in backend/routers/sync.py), bringing it in line with the live-recording pipeline that already calls get_speech_profile_matching_predictions. The change is small and well-scoped: it mirrors the pattern established in _handle_segment_embedding_matching (postprocess_conversation.py) and wraps the new call in a try/except so failures degrade gracefully.

Key changes:

Imports get_speech_profile_matching_predictions from utils.stt.speech_profile
Calls the speaker-matching API after postprocess_words returns, before segments are stored or merged with an existing conversation
On failure the exception is caught and logged, segments retain their default SPEAKER_* labels rather than crashing the sync

Minor issues found:

path.replace('.bin', '.wav') is a no-op — paths passed to process_segment are always .wav files from segmented_paths; the variable should simply be wav_path = path
No bounds check before matches[i]: if the remote API returns fewer items than transcript_segments, the loop raises an IndexError that is swallowed by except Exception, silently skipping all speaker attribution

Confidence Score: 4/5

Safe to merge — the fix is wrapped in a try/except and only adds new behaviour to a previously-broken code path; any failure leaves offline sync no worse than before.
The logic is correct and follows the established pattern. The two issues flagged are both style/defensive-coding concerns (P2), not runtime blockers — failures are caught and logged. The IndexError risk is real but only manifests in an edge case where the speech-profile API returns a malformed response, and even then the silent fallback is acceptable rather than data-corrupting.
No files require special attention; backend/routers/sync.py is the only changed file and the concerns are minor.

Important Files Changed

Filename	Overview
backend/routers/sync.py	Adds speaker embedding matching to the offline sync path by calling `get_speech_profile_matching_predictions` after transcription. Functionally mirrors the live-recording pipeline. Two style-level concerns: (1) `path.replace('.bin', '.wav')` is a no-op since segmented paths are already `.wav`, and (2) no bounds check before indexing into `matches`, which would silently skip all speaker data if the API returns a shorter list.

Sequence Diagram

sequenceDiagram
    participant Client
    participant sync_local_files
    participant process_segment
    participant deepgram_prerecorded
    participant get_speech_profile_matching_predictions
    participant SpeechProfileAPI
    participant DB

    Client->>sync_local_files: POST /v1/sync-local-files (audio .bin files)
    sync_local_files->>sync_local_files: decode_files_to_wav (.bin → .wav)
    sync_local_files->>sync_local_files: retrieve_vad_segments (split into speech segments)
    sync_local_files->>process_segment: process each segmented .wav (thread)

    process_segment->>deepgram_prerecorded: transcribe via signed URL
    deepgram_prerecorded-->>process_segment: transcript_segments (SPEAKER_00, SPEAKER_01…)

    Note over process_segment: NEW: speaker embedding matching
    process_segment->>get_speech_profile_matching_predictions: uid + wav_path + segments
    get_speech_profile_matching_predictions->>SpeechProfileAPI: POST audio + segments
    SpeechProfileAPI-->>get_speech_profile_matching_predictions: [{is_user, person_id}, …]
    get_speech_profile_matching_predictions-->>process_segment: matches list
    process_segment->>process_segment: set seg.is_user / seg.person_id

    process_segment->>DB: store/merge conversation with identified speakers

_{Reviews (1): Last reviewed commit: "fix: add speaker embedding matching to o..." | Re-trigger Greptile}

greptile-apps · 2026-03-23T12:59:40Z

backend/routers/sync.py

+    # Speaker identification: match segments against stored person embeddings
+    # This uses the same pipeline as live recording (speaker_identification_task)
+    try:
+        wav_path = path.replace('.bin', '.wav')


Unnecessary .bin → .wav path substitution

path passed to process_segment is always a .wav file — it comes from segmented_paths, which are populated inside retrieve_vad_segments with paths like f'{path_dir}/{segment_timestamp}.wav'. The .replace('.bin', '.wav') call has no effect here (there is no .bin in the path), so wav_path is always equal to path.

The misleading substitution is a silent no-op today, but it implies that a .bin path might arrive here. If the call-site ever changes, the speaker matching step would silently try to open a file whose name was never transformed, causing the except block to swallow the error with no speaker data written.

Suggested change

wav_path = path.replace('.bin', '.wav')

wav_path = path # path is already a .wav segment from retrieve_vad_segments

greptile-apps · 2026-03-23T12:59:41Z

backend/routers/sync.py

+        for i, seg in enumerate(transcript_segments):
+            seg.is_user = matches[i]['is_user']
+            seg.person_id = matches[i].get('person_id')


No bounds-check before indexing matches

get_speech_profile_matching_predictions returns [{'is_user': False, 'person_id': None}] * len(segments) on the error paths, but on a successful API response it simply returns whatever the remote service returned — there is no guarantee the length matches transcript_segments. If the response contains fewer items, matches[i] raises an IndexError; if it contains more, extra matches are silently ignored.

The current except Exception wrapper will catch the IndexError and log it, so this is not a crash, but it means speaker identification is completely skipped when the API returns even one fewer result than expected.

Consider guarding the loop or falling back to a safe default when lengths differ:

for i, seg in enumerate(transcript_segments): if i < len(matches): seg.is_user = matches[i]['is_user'] seg.person_id = matches[i].get('person_id')

beastoin · 2026-03-28T08:32:36Z

AI PRs with low efforts are not welcome here. Thank you. — by CTO

github-actions · 2026-03-28T08:32:44Z

Hey @sungdark 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

greptile-apps bot reviewed Mar 23, 2026

View reviewed changes

beastoin closed this Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add speaker embedding matching to offline sync (issue #5907)#5946

fix: add speaker embedding matching to offline sync (issue #5907)#5946
sungdark wants to merge 1 commit intoBasedHardware:mainfrom
sungdark:fix/offline-sync-speaker-diarization

sungdark commented Mar 23, 2026

Uh oh!

greptile-apps bot commented Mar 23, 2026

Uh oh!

greptile-apps bot Mar 23, 2026

Uh oh!

greptile-apps bot Mar 23, 2026

Uh oh!

beastoin commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	wav_path = path.replace('.bin', '.wav')
	wav_path = path # path is already a .wav segment from retrieve_vad_segments

Conversation

sungdark commented Mar 23, 2026

Fix: Offline sync no speaker diarization (issue #5907)

Problem

Solution

Changes

Testing

Uh oh!

greptile-apps bot commented Mar 23, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants