Problem
The desktop batch transcription fails with HTTP 413 ("Failed to buffer the request body: length limit exceeded") when the VAD gate accumulates long speech chunks:
- OMI-DESKTOP-10: 413 Payload Too Large
- 5,574 events across 70 users
- Sentry breadcrumb: VADGate batch speech chunk complete 3,208,532 bytes (50.1s) -> TranscriptionService batch transcribing -> 413 rejected
The VAD gate collects ~3.2MB stereo PCM audio with no size limit, but the backend (or its reverse proxy) rejects payloads above a threshold.
Root Cause Analysis
Traced through VADGateService.swift, TranscriptionService.swift, and AppState.swift:
1. VAD gate has no maximum chunk size
VADGateService.swift: Speech audio is accumulated in batchAudioBuffer (line 262) until 2+ seconds of silence (hangover timeout, line 207: batchHangoverMs = 2000). There is no maximum duration or size limit. A user speaking continuously for 50+ seconds produces a single 3.2MB chunk.
2. Stereo format doubles payload size
Audio format: stereo Int16 PCM at 16kHz = 64 KB/s. For 50.1 seconds: 50.1 x 16000 x 4 bytes = 3,206,400 bytes (~3.2 MB).
3. Single HTTP POST with no chunking
TranscriptionService.batchTranscribeFull() (line 639-737) sends the entire buffer as a single POST request with Content-Type: application/octet-stream to /v1/proxy/deepgram/v1/listen. There is no logic to split large audio into smaller chunks before upload.
4. Backend body size limit
The 413 indicates a body size limit at the backend or its reverse proxy (nginx, GCP Cloud Load Balancer, or Cloud Run). The exact limit is between 1-5MB. Cloud Run default is 32MB but nginx proxy_pass or middleware may impose tighter limits.
5. No retry with smaller chunks
When a 413 is received, the client logs the error and throws TranscriptionError.invalidResponse. No fallback to split and retry. The audio is lost.
Proposed Fix
Client-side (recommended primary fix)
- Add max chunk duration in VAD gate — cap at 30 seconds (1.92MB stereo). When buffer exceeds this, emit the chunk as complete and start a new accumulation
- Add chunk splitting in TranscriptionService — if audio exceeds a size threshold (e.g., 2MB), split into overlapping segments (with 1-2s overlap for context) and transcribe separately, then merge results
- Handle 413 gracefully — on 413 response, split the payload in half and retry each half
Backend-side (defense in depth)
- Increase body size limit — if the proxy or middleware has a limit below 5MB, increase it to at least 10MB
- Add streaming upload support — accept chunked transfer encoding for large audio payloads
Key Files
desktop/Desktop/Sources/VADGateService.swift — lines 554-684 (batch audio accumulation, no size limit)
desktop/Desktop/Sources/TranscriptionService.swift — lines 639-737 (batchTranscribeFull, single POST)
desktop/Desktop/Sources/AppState.swift — lines 1434-1438 (batchTranscribeChunk caller)
by AI for @beastoin
Problem
The desktop batch transcription fails with HTTP 413 ("Failed to buffer the request body: length limit exceeded") when the VAD gate accumulates long speech chunks:
The VAD gate collects ~3.2MB stereo PCM audio with no size limit, but the backend (or its reverse proxy) rejects payloads above a threshold.
Root Cause Analysis
Traced through
VADGateService.swift,TranscriptionService.swift, andAppState.swift:1. VAD gate has no maximum chunk size
VADGateService.swift: Speech audio is accumulated inbatchAudioBuffer(line 262) until 2+ seconds of silence (hangover timeout, line 207:batchHangoverMs = 2000). There is no maximum duration or size limit. A user speaking continuously for 50+ seconds produces a single 3.2MB chunk.2. Stereo format doubles payload size
Audio format: stereo Int16 PCM at 16kHz = 64 KB/s. For 50.1 seconds:
50.1 x 16000 x 4 bytes = 3,206,400 bytes (~3.2 MB).3. Single HTTP POST with no chunking
TranscriptionService.batchTranscribeFull()(line 639-737) sends the entire buffer as a singlePOSTrequest withContent-Type: application/octet-streamto/v1/proxy/deepgram/v1/listen. There is no logic to split large audio into smaller chunks before upload.4. Backend body size limit
The 413 indicates a body size limit at the backend or its reverse proxy (nginx, GCP Cloud Load Balancer, or Cloud Run). The exact limit is between 1-5MB. Cloud Run default is 32MB but nginx proxy_pass or middleware may impose tighter limits.
5. No retry with smaller chunks
When a 413 is received, the client logs the error and throws
TranscriptionError.invalidResponse. No fallback to split and retry. The audio is lost.Proposed Fix
Client-side (recommended primary fix)
Backend-side (defense in depth)
Key Files
desktop/Desktop/Sources/VADGateService.swift— lines 554-684 (batch audio accumulation, no size limit)desktop/Desktop/Sources/TranscriptionService.swift— lines 639-737 (batchTranscribeFull, single POST)desktop/Desktop/Sources/AppState.swift— lines 1434-1438 (batchTranscribeChunk caller)by AI for @beastoin