backend: lower precache concurrency cap 4 → 2 to free storage_executor for sync#7526
Conversation
…sync The storage_executor pool (96 workers) is shared between the sync v2 pipeline and the playback/precache flows. Under load, precache fan-out can pin up to ~36 workers per instance (4 outer slots × up to 9 each for chunk downloads), starving the sync pipeline's GCS reads. Halving the per-process precache concurrency cap halves precache's worst-case storage footprint to ~18 workers per instance, leaving the sync pipeline more headroom on the shared pool. The cost is slower speculative cache warming (one-off ~few-second hit when a user opens a brand-new conversation before warming completes); on-demand /urls playback path is unchanged.
Greptile SummaryHalves
Confidence Score: 5/5Safe to merge — the change is a single constant reduction with no logic alterations and no new code paths. The semaphore value is the only thing that changed. The acquire/release pattern in No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[precache_conversation_audio] -->|postprocess_executor.submit| B[_precache_all]
B -->|_PRECACHE_FILE_SEM.acquire max=2 was 4| C[storage_executor.submit _cache_single]
C --> D[get_or_create_merged_audio]
D -->|cache hit| E[return cached WAV bytes]
D -->|cache miss| F[download_audio_chunks_and_merge]
F -->|STORAGE_CHUNK_SEM x8 in-flight per call| G[storage_executor chunk downloads]
G --> H[GCS download + decode]
C -->|done_callback: _PRECACHE_FILE_SEM.release| B
Reviews (1): Last reviewed commit: "backend: lower _PRECACHE_FILE_SEM 4 → 2 ..." | Re-trigger Greptile |
Summary
_PRECACHE_FILE_SEMfrom 4 → 2 inbackend/utils/other/storage.py.Why
The
storage_executor(96 workers) is shared between the sync v2 pipeline and the playback / precache flows. Under load, precache fan-out can pin up to ~36 workers per instance (4 outer slots × up to 9 each for chunk downloads), starving the sync pipeline's GCS reads.Recent prod data (48h, all services):
caller=process_conversation: 7,914 events (37% of audio_merge work)caller=precache_endpoint: 8,430 events (40%)caller=sync_urls_first: 3,562 events (17%)caller=sync_urls_bg: 1,285 events (6%)Storage pool saturation warnings on
backend-syncwere averaging 96% utilization with peak queue depth 67. Sync's decode_ms p95 was sitting at 252s, largely queue wait.Halving the cap drops precache's worst-case storage footprint to ~18 workers per instance, leaving ~78 free for sync.
Tradeoff
Speculative cache warming takes longer. Users see a one-off slowdown only if they open a brand-new conversation before warming finishes; on-demand
/v1/sync/audio/{conv}/urlsplayback is unchanged.Test plan
executor_pool_healthwarnings onbackend-syncfor 30 min post-deploy —storagepool max_q should drop noticeablysync_v2 bg completedecode_ms p95 — should improve toward p50/v2/sync-local-filesor/v1/sync/audio/*/precache