Skip to content

backend: lower precache concurrency cap 4 → 2 to free storage_executor for sync#7526

Merged
mdmohsin7 merged 1 commit into
mainfrom
caleb/precache-sem-cap
May 28, 2026
Merged

backend: lower precache concurrency cap 4 → 2 to free storage_executor for sync#7526
mdmohsin7 merged 1 commit into
mainfrom
caleb/precache-sem-cap

Conversation

@mdmohsin7
Copy link
Copy Markdown
Member

Summary

  • Halve _PRECACHE_FILE_SEM from 4 → 2 in backend/utils/other/storage.py.

Why

The storage_executor (96 workers) is shared between the sync v2 pipeline and the playback / precache flows. Under load, precache fan-out can pin up to ~36 workers per instance (4 outer slots × up to 9 each for chunk downloads), starving the sync pipeline's GCS reads.

Recent prod data (48h, all services):

  • caller=process_conversation: 7,914 events (37% of audio_merge work)
  • caller=precache_endpoint: 8,430 events (40%)
  • caller=sync_urls_first: 3,562 events (17%)
  • caller=sync_urls_bg: 1,285 events (6%)

Storage pool saturation warnings on backend-sync were averaging 96% utilization with peak queue depth 67. Sync's decode_ms p95 was sitting at 252s, largely queue wait.

Halving the cap drops precache's worst-case storage footprint to ~18 workers per instance, leaving ~78 free for sync.

Tradeoff

Speculative cache warming takes longer. Users see a one-off slowdown only if they open a brand-new conversation before warming finishes; on-demand /v1/sync/audio/{conv}/urls playback is unchanged.

Test plan

  • Watch executor_pool_health warnings on backend-sync for 30 min post-deploy — storage pool max_q should drop noticeably
  • Watch sync_v2 bg complete decode_ms p95 — should improve toward p50
  • No new 5xx errors on /v2/sync-local-files or /v1/sync/audio/*/precache
  • Smoke: open a few recently-created conversations and confirm playback still works

…sync

The storage_executor pool (96 workers) is shared between the sync v2
pipeline and the playback/precache flows. Under load, precache fan-out
can pin up to ~36 workers per instance (4 outer slots × up to 9 each
for chunk downloads), starving the sync pipeline's GCS reads.

Halving the per-process precache concurrency cap halves precache's
worst-case storage footprint to ~18 workers per instance, leaving the
sync pipeline more headroom on the shared pool. The cost is slower
speculative cache warming (one-off ~few-second hit when a user opens a
brand-new conversation before warming completes); on-demand /urls
playback path is unchanged.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

Halves _PRECACHE_FILE_SEM from 4 → 2 in backend/utils/other/storage.py to reduce the peak storage_executor thread budget consumed by background audio pre-caching, freeing capacity for the sync v2 pipeline that shares the same 96-worker pool.

  • Each active precache slot occupies 1 blocking _cache_single thread plus up to 8 chunk-download threads via the _CHUNK_WINDOW_SIZE window, so the cap of 2 slots limits precache to ≤ 18 storage_executor workers (down from ≤ 36), leaving ~78 threads free for sync workloads.
  • The tradeoff is slower speculative cache warming; on-demand playback is unaffected because /v1/sync/audio/{conv}/urls calls get_or_create_merged_audio directly and bypasses _PRECACHE_FILE_SEM.

Confidence Score: 5/5

Safe to merge — the change is a single constant reduction with no logic alterations and no new code paths.

The semaphore value is the only thing that changed. The acquire/release pattern in _precache_all and its done-callback wiring are untouched, so there is no new deadlock surface. The new cap of 2 is mathematically consistent with the stated worker-budget goals (2 × 9 = 18 threads), and the production data cited in the PR description makes the direction of the change unambiguous.

No files require special attention.

Important Files Changed

Filename Overview
backend/utils/other/storage.py Single-line change lowering _PRECACHE_FILE_SEM from 4 → 2; reduces worst-case storage_executor thread usage for background precache from ~36 to ~18 per instance.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[precache_conversation_audio] -->|postprocess_executor.submit| B[_precache_all]
    B -->|_PRECACHE_FILE_SEM.acquire max=2 was 4| C[storage_executor.submit _cache_single]
    C --> D[get_or_create_merged_audio]
    D -->|cache hit| E[return cached WAV bytes]
    D -->|cache miss| F[download_audio_chunks_and_merge]
    F -->|STORAGE_CHUNK_SEM x8 in-flight per call| G[storage_executor chunk downloads]
    G --> H[GCS download + decode]
    C -->|done_callback: _PRECACHE_FILE_SEM.release| B
Loading

Reviews (1): Last reviewed commit: "backend: lower _PRECACHE_FILE_SEM 4 → 2 ..." | Re-trigger Greptile

@mdmohsin7 mdmohsin7 merged commit bf4a1d6 into main May 28, 2026
2 checks passed
@mdmohsin7 mdmohsin7 deleted the caleb/precache-sem-cap branch May 28, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant