Problem
/v1/sync-local-files on backend-sync returns 504 Gateway Timeout — 41 occurrences in 48h across 17 users. All failures hit 125-170s latency, exceeding the 120s TimeoutMiddleware cutoff.
Successful requests: P50=17s, max=55s. Failures: larger payloads (1-32MB), more speech (up to 487s per request).
Pipeline Trace
The endpoint (backend/routers/sync.py:756) runs 4 serial stages, all synchronous within an async def:
| Stage |
Function |
Parallelism |
Estimated time (large payload) |
| 1. Decode |
decode_files_to_wav() |
Sequential |
~5s |
| 2. VAD |
retrieve_vad_segments() via chunk_threads |
5 threads/chunk |
30-60s (hosted VAD API has 300s timeout) |
| 3. Transcription + LLM |
process_segment() via chunk_threads |
5 threads/chunk |
50-120s (Deepgram 10-30s + LLM 15-30s per segment) |
| 4. Cleanup |
_cleanup_files() |
Sequential |
<1s |
For a 487s-speech payload producing 8 segments:
- VAD: 1-2 chunks × 30s = 30-60s
- process_segment: 2 chunks × (Deepgram + LLM) = 50-120s
- Total: 80-180s → exceeds 120s timeout
Key Observations
-
Data is NOT lost: The 504 fires via asyncio.wait_for which cancels the coroutine, but the sync threads continue running in the background. The conversation is eventually created/updated.
-
The 02:36-02:57 UTC Mar 22 cluster (13 failures, 9 IPs in 21 min) suggests Deepgram/LLM contention under concurrent load amplifies latency.
-
process_segment serializes Deepgram + LLM per segment: Each thread does network call (Deepgram) then LLM calls (process_conversation → get_transcript_structure + extract_action_items + folder assignment). No parallelism within a segment.
-
VAD hosted API has 300s timeout (vad.py:34) — a single slow VAD response can consume half the budget.
Instrumentation PR
PR with sync_timing structured logs at each stage boundary — will show exactly where time is spent per request in prod.
Potential Fixes (for discussion, not CTO-verified)
- Background processing: Return 202 Accepted immediately, process async, notify via FCM when done
- Per-endpoint timeout override:
TimeoutMiddleware.methods_timeout already supports this — set /v1/sync-local-files to 300s
- Parallelize within segment: Run Deepgram and LLM in parallel where possible
- Streaming progress: SSE endpoint for client to poll status
Problem
/v1/sync-local-fileson backend-sync returns 504 Gateway Timeout — 41 occurrences in 48h across 17 users. All failures hit 125-170s latency, exceeding the 120sTimeoutMiddlewarecutoff.Successful requests: P50=17s, max=55s. Failures: larger payloads (1-32MB), more speech (up to 487s per request).
Pipeline Trace
The endpoint (
backend/routers/sync.py:756) runs 4 serial stages, all synchronous within anasync def:decode_files_to_wav()retrieve_vad_segments()viachunk_threadsprocess_segment()viachunk_threads_cleanup_files()For a 487s-speech payload producing 8 segments:
Key Observations
Data is NOT lost: The 504 fires via
asyncio.wait_forwhich cancels the coroutine, but the sync threads continue running in the background. The conversation is eventually created/updated.The 02:36-02:57 UTC Mar 22 cluster (13 failures, 9 IPs in 21 min) suggests Deepgram/LLM contention under concurrent load amplifies latency.
process_segmentserializes Deepgram + LLM per segment: Each thread does network call (Deepgram) then LLM calls (process_conversation→get_transcript_structure+extract_action_items+ folder assignment). No parallelism within a segment.VAD hosted API has 300s timeout (
vad.py:34) — a single slow VAD response can consume half the budget.Instrumentation PR
PR with
sync_timingstructured logs at each stage boundary — will show exactly where time is spent per request in prod.Potential Fixes (for discussion, not CTO-verified)
TimeoutMiddleware.methods_timeoutalready supports this — set/v1/sync-local-filesto 300s