Desktop: remove client-side API keys, route STT + Gemini through backend

## Problem

The desktop app (macOS) bundles vendor API keys (`DEEPGRAM_API_KEY`, `GEMINI_API_KEY`) in the app bundle's `.env` file and calls external APIs directly from the client:

- **Deepgram STT**: `TranscriptionService.swift` connects directly to `wss://api.deepgram.com/v1/listen` with the API key in the WebSocket auth header
- **Gemini**: `GeminiClient.swift` and `EmbeddingService.swift` call Google APIs with the key in URL query parameters (`?key=<KEY>`)

**Security risks:**
- Keys are extractable from the app bundle (`Contents/Resources/.env` — plain text)
- Keys are visible in network traffic (auth headers, URL params)
- No per-user attribution, rate limiting, or revocation granularity
- Blast radius = full vendor account billing

**Architectural inconsistency:**
- Mobile app routes ALL audio through the Python backend's `/v4/listen` WebSocket — API keys stay server-side
- Desktop app bypasses the backend entirely for STT — keys ship in the client
- Desktop misses backend features: VAD gate (~75% Deepgram cost savings), speech profiles, speaker identification, unified billing/monitoring

## Proposed Solution

### Phase 1: Route desktop STT through `/v4/listen`

The Python backend already has a fully-featured `/v4/listen` WebSocket endpoint with Firebase auth, used by all mobile clients. Desktop should use it too.

**Swift changes:**
- Replace direct Deepgram WebSocket connection in `TranscriptionService.swift` with a WebSocket connection to the backend's `/v4/listen` (or `/v4/web/listen` which supports first-message token auth)
- Remove `DEEPGRAM_API_KEY` from client-side `.env`
- Desktop gets VAD gate, speech profiles, speaker ID for free

**Backend changes:**
- May need minor adjustments to handle desktop audio format (16kHz stereo PCM vs mobile's opus/pcm8)
- Add `source=desktop` parameter for monitoring/billing segmentation

### Phase 2: Route Gemini through backend endpoints

- Add backend API endpoints for the proactive assistant operations currently calling Gemini directly (embeddings, generation)
- Remove `GEMINI_API_KEY` from client-side `.env`
- Enables server-side rate limiting, cost tracking, prompt governance

### Phase 3: Decommission direct API paths

- Remove direct Deepgram/Gemini code paths from desktop app
- Remove `.env` bundling of vendor keys from build pipeline
- Add CI check to block shipping vendor API keys in app bundles

## Benefits

| | Current (direct) | Proposed (backend proxy) |
|---|---|---|
| API key exposure | Client-side, extractable | Server-side only |
| Cost visibility | Invisible to backend | Unified monitoring |
| VAD gate savings | Not available | ~75% Deepgram cost reduction |
| Speech profiles | Not available | Speaker identification |
| Rate limiting | None | Per-user/device/session |
| Key rotation | Requires app update | Server-side, instant |
| Provider flexibility | Hardcoded Deepgram | Backend can switch STT providers |

## Latency Consideration

Adding a backend hop adds some latency. In practice, with persistent WebSocket connections and region colocation, the increase is modest relative to STT model inference + endpointing delays. Mitigated with dedicated streaming workers and autoscaling (same infra mobile already uses).

## References

- `desktop/Desktop/Sources/TranscriptionService.swift` — direct Deepgram connection
- `desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift` — direct Gemini calls  
- `backend/routers/transcribe.py` — existing `/v4/listen` endpoint
- `backend/utils/stt/streaming.py` — server-side STT providers
- `backend/utils/stt/vad_gate.py` — VAD gate (active on mobile)

_by AI for @beastoin_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Desktop: remove client-side API keys, route STT + Gemini through backend #5393

Problem

Proposed Solution

Phase 1: Route desktop STT through `/v4/listen`

Phase 2: Route Gemini through backend endpoints

Phase 3: Decommission direct API paths

Benefits

Latency Consideration

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Current (direct)	Proposed (backend proxy)
API key exposure	Client-side, extractable	Server-side only
Cost visibility	Invisible to backend	Unified monitoring
VAD gate savings	Not available	~75% Deepgram cost reduction
Speech profiles	Not available	Speaker identification
Rate limiting	None	Per-user/device/session
Key rotation	Requires app update	Server-side, instant
Provider flexibility	Hardcoded Deepgram	Backend can switch STT providers

Desktop: remove client-side API keys, route STT + Gemini through backend #5393

Description

Problem

Proposed Solution

Phase 1: Route desktop STT through /v4/listen

Phase 2: Route Gemini through backend endpoints

Phase 3: Decommission direct API paths

Benefits

Latency Consideration

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Phase 1: Route desktop STT through `/v4/listen`