Replace presecond speech profile trick with speaker embedding for user identification

## Problem

The backend listen pipeline uses a "presecond speech profile trick" to identify the user's voice: it prepends 10s of the user's speech profile audio + 5s padding before the actual stream, then assumes Deepgram's `speaker == 0` is the user. This is:

1. **Redundant** — speaker embedding matching (`_match_speaker_embedding()` in `transcribe.py:1809-1980`) already exists and can identify the user by voice biometrics
2. **Adds 15s startup delay** — `SPEECH_PROFILE_STABILIZE_DELAY = 15` seconds before actual transcription begins
3. **Fragile** — assumes the speech profile audio will cause Deepgram to assign `speaker_id=0` to the user, which isn't guaranteed
4. **Creates dual sockets** — opens `deepgram_socket` (with preseconds=15) AND `deepgram_profile_socket` (preseconds=0), doubling Deepgram API costs per session

## Current Architecture

### The Presecond Trick (to be removed)
```
User speech profile (10s WAV) → 5s padding → actual audio stream
                                              ↓
                              Deepgram sees speaker_id=0 as "the user"
                              Words with start < 15s are filtered out
```

**Key code locations:**
- `streaming.py:20-23` — Constants: `SPEECH_PROFILE_FIXED_DURATION=10`, `PADDING=5`, `STABILIZE_DELAY=15`
- `streaming.py:347-350` — `is_user = True if word.speaker == 0 and preseconds > 0`
- `transcribe.py:970-1150` — Dual socket creation, profile audio sending, stabilization wait
- `transcribe.py:1117-1146` — `send_initial_file_path()` sends profile audio to Deepgram

### Speaker Embedding (already exists, should replace)
```
User's stored embedding (512-dim vector)
    ↓
Extract embedding from live audio segment → cosine distance comparison
    ↓
If distance < 0.45 → same speaker (user identified)
```

**Key code locations:**
- `utils/stt/speaker_embedding.py` — `extract_embedding_from_bytes()`, `compare_embeddings()`, `is_same_speaker()`
- `transcribe.py:1809-1980` — `_match_speaker_embedding()` already matches speakers in real-time
- `database/users.py:240-264` — Speaker embedding stored per person in Firestore

## Proposed Solution

1. **Extract user's speech profile embedding** at session start (or use pre-computed embedding from Firestore)
2. **When Deepgram assigns speaker IDs**, extract a short audio segment for each new speaker_id
3. **Compare extracted embedding against user's profile embedding** using existing `is_same_speaker()` (threshold 0.45)
4. **If match → mark as user** (same as current `is_user=True` behavior)
5. **Remove presecond trick**: no more profile audio prepending, no dual sockets, no 15s delay

## Benefits

- **Eliminates 15s startup delay** — transcription begins immediately
- **Saves Deepgram API costs** — single socket instead of dual
- **More accurate** — voice biometric matching vs fragile speaker_id=0 assumption
- **Simpler code** — removes ~100 lines of presecond handling

## Acceptance Criteria

- [ ] User identification uses speaker embedding comparison, not presecond trick
- [ ] Speech profile audio is NOT prepended to the stream
- [ ] Only one Deepgram socket opened per session (not two)
- [ ] No 15s startup delay — transcription begins immediately
- [ ] `is_user` flag is set correctly based on embedding match
- [ ] Existing speaker embedding infrastructure reused (no new ML models)
- [ ] All existing tests pass
- [ ] Backward compatible — users without speech profiles still work (just no user identification)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace presecond speech profile trick with speaker embedding for user identification #6117

Problem

Current Architecture

The Presecond Trick (to be removed)

Speaker Embedding (already exists, should replace)

Proposed Solution

Benefits

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace presecond speech profile trick with speaker embedding for user identification #6117

Description

Problem

Current Architecture

The Presecond Trick (to be removed)

Speaker Embedding (already exists, should replace)

Proposed Solution

Benefits

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions