Speaker embedding dimension mismatch: 512-dim v1/v2 embeddings tagged as v3 crash cdist comparison

### Problem

4 users are generating ~1,197 errors/day from `scipy.cdist` dimension mismatch in speaker identification:

```
ValueError: XA and XB must have the same number of columns (i.e. feature dimension.)
```

Crash site: `compare_embeddings()` in `speaker_embedding.py` when comparing 256-dim (v3 model) vs 512-dim (v1/v2 model) embeddings.

### Root Cause (verified via Firestore query)

The v2→v3 speaker embedding migration (`speaker_sample_migration.py:186-190`) has a bug: when a contact has **no speech_samples** (0 audio files), it sets `speech_samples_version=3` but **leaves the old 512-dim embedding untouched**.

Verified on affected user `0hqtnbQeNR...`:
| Contact | emb_dim | version | speech_samples | Status |
|---------|---------|---------|----------------|--------|
| Kat | 512 | 3 | 0 | **BUG** — v3 tag, old embedding |
| John P. | 512 | NOT SET | 0 | Legacy, never migrated |
| Chris Bond | 256 | 3 | 1 | Correct — re-extracted |
| Vince | 256 | 3 | 1 | Correct — re-extracted |

The migration path: `if not samples: update_version(3); return` — bumps version without clearing or re-extracting the old embedding. Same bug exists in v1→v2 path (line 73-77).

### Fix

PR #6240:
1. `compare_embeddings()` returns 2.0 (max distance) on dimension mismatch — crash prevention
2. Migration clears `speaker_embedding` when no samples exist (both v1→v2 and v2→v3)
3. Hardening: transcribe.py and sync.py skip loading embeddings for contacts without speech_samples

### Files to Modify

- `backend/utils/stt/speaker_embedding.py` — dimension guard in `compare_embeddings()`
- `backend/utils/speaker_sample_migration.py` — clear stale embedding in no-samples path
- `backend/routers/transcribe.py` — skip loading embeddings without speech_samples
- `backend/routers/sync.py` — same hardening

---
_by AI for @beastoin_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker embedding dimension mismatch: 512-dim v1/v2 embeddings tagged as v3 crash cdist comparison #6238

Problem

Root Cause (verified via Firestore query)

Fix

Files to Modify

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Contact	emb_dim	version	speech_samples	Status
Kat	512	3	0	BUG — v3 tag, old embedding
John P.	512	NOT SET	0	Legacy, never migrated
Chris Bond	256	3	1	Correct — re-extracted
Vince	256	3	1	Correct — re-extracted

Speaker embedding dimension mismatch: 512-dim v1/v2 embeddings tagged as v3 crash cdist comparison #6238

Description

Problem

Root Cause (verified via Firestore query)

Fix

Files to Modify

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions