backend: raise storage_executor pool 96 → 128#7529
Conversation
Storage pool was averaging 100% util on backend-sync with queue depth distribution (30-min steady-state, post minScale=10 / concurrency=6 / PRECACHE_FILE_SEM=2): p50=19, p95=30, p99=38, max=43. 96 + 32 ≈ 128 cleanly absorbs the observed p95 queue depth. Memory headroom is comfortable: per-instance p99 sits at 13% of 8GB, and Linux lazy-commits thread stacks so the additional 32 threads add ~1–2 MB actual RSS per instance, not the virtual address space figure. CPU p99 bursts to 84% but most storage work is I/O-bound (GCS calls, network) — GIL is released during blocking syscalls so the extra threads scale with the I/O latency they hide, not with CPU. The CPU-bound portion (PCM merge / WAV encoding inside audio_merge) remains bounded by the 2 vCPU ceiling regardless. If queue depth still trends above ~10 after this, the next move is a bump to 140 (covers max=43); 192 is overshoot for the observed distribution.
Greptile SummaryThis PR raises the
Confidence Score: 5/5Safe to merge — the diff is a single integer constant change backed by production queue-depth data and a full memory/CPU safety analysis. The change touches exactly one value in a pool-size declaration. The +32 workers are I/O-bound (GCS/network), so they scale with I/O latency rather than CPU, and the PR's own memory analysis shows RSS impact of ~1–2 MB per instance — well within the reported 87% headroom. No logic, no interfaces, and no shutdown paths were altered. No files require special attention. backend/utils/executors.py is the only changed file and the modification is a single constant. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph PerInstance["Per Cloud Run Instance (2 vCPU / 8 GB)"]
direction TB
CE["critical_executor\nmax_workers=8"]
DE["db_executor\nmax_workers=24"]
LE["llm_executor\nmax_workers=6"]
SE["stripe_executor\nmax_workers=4"]
SYE["sync_executor\nmax_workers=16"]
PE["postprocess_executor\nmax_workers=24"]
STOE["storage_executor\nmax_workers=128\n(was 96 — changed)"]
end
REQ["Incoming Request"] --> CE & DE & LE & SE & SYE & PE & STOE
STOE -->|"GCS / network I/O\nGIL released"| GCS["Google Cloud Storage"]
STOE -->|"PCM merge / WAV encode\nbounded by 2 vCPU"| AUDIO["Audio processing"]
STOE -.->|"queue_depth p95=30\nmax=43"| QUEUE["Work queue\n(was pegged at 100% util)"]
style STOE fill:#ffd700,stroke:#333,stroke-width:2px
Reviews (1): Last reviewed commit: "backend: raise storage_executor pool 96 ..." | Re-trigger Greptile |
Summary
storage_executormax_workers from 96 → 128 inbackend/utils/executors.py.Why
After the previous round of changes (
minScale=10,concurrency=6,_PRECACHE_FILE_SEM=2) the storage pool onbackend-syncis still pegged at 100% utilization. The queue depth distribution over the past 30 minutes (steady-state on the new revision):96 + 32 ≈ 128 cleanly absorbs the observed p95 queue depth. Sync work that was waiting in this queue should drop to near zero in the common case.
Safety analysis (verified, not assumed)
Memory:
CPU:
Cloud Run autoscaler: unaffected. Autoscales on CPU + request concurrency; thread count is a per-instance knob.
Test plan
executor_pool_healthwarnings onbackend-syncfor 15–30 min post-deploy — storagemax_qshould drop substantially (target: well under 10 in p95)/v2/sync-local-filesor/v1/sync/audio/*max_qstill > 10 consistently after this, follow-up bump to 140 (one-line change)