Skip to content

Desktop: WebSocket transcription disconnects — 64K events, 269 users (75% of all errors) #6193

@beastoin

Description

@beastoin

Problem

The WebSocket connection between the desktop app and the transcription backend is extremely unstable. 75% of all desktop Sentry error events come from two related issues:

  • OMI-DESKTOP-3: TranscriptionService: Receive error — "Socket is not connected" (receive path)
  • OMI-DESKTOP-4: TranscriptionService: Send error — "Connection reset by peer" (send path)
  • 64,326 events across 269+ users in Sentry (org: omi-nk3, project: omi-desktop)

Both send and receive paths fail, meaning reconnection/backoff logic is insufficient.

Root Cause Analysis

Traced through TranscriptionService.swift:

1. Connection handshake race condition

connectWithAuth() marks isConnected = true via a hardcoded 0.5s delayed dispatch after webSocketTask?.resume(). Audio sent before this 500ms window is silently dropped (guard isConnected else { return }). If the WebSocket handshake takes longer than 500ms, early audio is lost with no error.

2. Max reconnect limit causes permanent failure

handleDisconnection() uses exponential backoff (2^n seconds, max 32s) but caps at 10 attempts. After 10 failures, the service gives up permanently — users lose transcription until app restart. During backend outages or network instability, this limit is easily hit.

3. Stale connection detection is too slow

  • Keepalive interval: 8 seconds
  • Watchdog check interval: 30 seconds
  • Stale threshold: 60 seconds

A connection that silently dies can go undetected for up to 60 seconds, during which all audio is lost.

4. Silent audio drops with no user feedback

sendAudio() returns silently when isConnected == false — no error callback, no user notification. The caller (PushToTalkManager) doesn't know audio was dropped.

5. Proxy connection drops

Backend-Rust/src/routes/proxy.rsproxy_ws_bidirectional() terminates the client connection abruptly on any relay error, manifesting as "Connection reset by peer" on the client.

Proposed Fix

  1. Replace 500ms delay with proper WebSocket open confirmation — use URLSessionWebSocketTask delegate didOpenWithProtocol instead of hardcoded timer
  2. Remove or significantly increase max reconnect limit — use infinite reconnects with increasing backoff (cap at 60s) while app is active
  3. Reduce watchdog thresholds — check every 15s, stale threshold 30s
  4. Buffer audio during reconnection — queue audio chunks while disconnected, replay on reconnect (with TTL to avoid stale data)
  5. Surface connection state to UI — show indicator when transcription is disconnected so users know
  6. Add graceful close handling in proxy — send WebSocket close frame before terminating

Key Files

  • desktop/Desktop/Sources/TranscriptionService.swift — connection lifecycle, keepalive, watchdog, send/receive
  • desktop/Backend-Rust/src/routes/proxy.rs — WebSocket proxy relay

by AI for @beastoin

Metadata

Metadata

Assignees

Labels

captureLayer: Audio recording, device pairing, BLEp0Priority: Existential (score >=30)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions