sync-local-files 504 timeouts on large payloads (>120s pipeline)

## Problem

`/v1/sync-local-files` on backend-sync returns 504 Gateway Timeout — 41 occurrences in 48h across 17 users. All failures hit 125-170s latency, exceeding the 120s `TimeoutMiddleware` cutoff.

Successful requests: P50=17s, max=55s. Failures: larger payloads (1-32MB), more speech (up to 487s per request).

## Pipeline Trace

The endpoint (`backend/routers/sync.py:756`) runs 4 serial stages, all synchronous within an `async def`:

| Stage | Function | Parallelism | Estimated time (large payload) |
|-------|----------|-------------|-------------------------------|
| 1. Decode | `decode_files_to_wav()` | Sequential | ~5s |
| 2. VAD | `retrieve_vad_segments()` via `chunk_threads` | 5 threads/chunk | 30-60s (hosted VAD API has 300s timeout) |
| 3. Transcription + LLM | `process_segment()` via `chunk_threads` | 5 threads/chunk | 50-120s (Deepgram 10-30s + LLM 15-30s per segment) |
| 4. Cleanup | `_cleanup_files()` | Sequential | <1s |

For a 487s-speech payload producing 8 segments:
- VAD: 1-2 chunks × 30s = 30-60s
- process_segment: 2 chunks × (Deepgram + LLM) = 50-120s
- **Total: 80-180s** → exceeds 120s timeout

## Key Observations

1. **Data is NOT lost**: The 504 fires via `asyncio.wait_for` which cancels the coroutine, but the sync threads continue running in the background. The conversation is eventually created/updated.

2. **The 02:36-02:57 UTC Mar 22 cluster** (13 failures, 9 IPs in 21 min) suggests Deepgram/LLM contention under concurrent load amplifies latency.

3. **`process_segment` serializes Deepgram + LLM per segment**: Each thread does network call (Deepgram) then LLM calls (`process_conversation` → `get_transcript_structure` + `extract_action_items` + folder assignment). No parallelism within a segment.

4. **VAD hosted API has 300s timeout** (`vad.py:34`) — a single slow VAD response can consume half the budget.

## Instrumentation PR

PR with `sync_timing` structured logs at each stage boundary — will show exactly where time is spent per request in prod.

## Potential Fixes (for discussion, not CTO-verified)

- **Background processing**: Return 202 Accepted immediately, process async, notify via FCM when done
- **Per-endpoint timeout override**: `TimeoutMiddleware.methods_timeout` already supports this — set `/v1/sync-local-files` to 300s
- **Parallelize within segment**: Run Deepgram and LLM in parallel where possible
- **Streaming progress**: SSE endpoint for client to poll status

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync-local-files 504 timeouts on large payloads (>120s pipeline) #5941

Problem

Pipeline Trace

Key Observations

Instrumentation PR

Potential Fixes (for discussion, not CTO-verified)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stage	Function	Parallelism	Estimated time (large payload)
1. Decode	`decode_files_to_wav()`	Sequential	~5s
2. VAD	`retrieve_vad_segments()` via `chunk_threads`	5 threads/chunk	30-60s (hosted VAD API has 300s timeout)
3. Transcription + LLM	`process_segment()` via `chunk_threads`	5 threads/chunk	50-120s (Deepgram 10-30s + LLM 15-30s per segment)
4. Cleanup	`_cleanup_files()`	Sequential	<1s

sync-local-files 504 timeouts on large payloads (>120s pipeline) #5941

Description

Problem

Pipeline Trace

Key Observations

Instrumentation PR

Potential Fixes (for discussion, not CTO-verified)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions