fix(pusher): cap unbounded private_cloud_queue and increase storage workers to prevent OOM#6762
Conversation
…ency Remove intermediate bytes(data) copy in _async_trigger_realtime_audio_bytes that pinned an extra ~64KB per audio chunk for the full duration of the slowest webhook call. Pass the bytearray directly to httpx (it accepts bytes-like objects). Also cap per-call concurrency to 8 apps at a time to limit memory pressure from concurrent webhook calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…webhook Pass bytearray directly to httpx instead of creating an intermediate bytes copy. Eliminates one ~64KB allocation per audio chunk webhook call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both _webhook_circuit_breakers and _latest_wins_versions grew unbounded. Add TTL-based eviction: - Circuit breakers: evict idle entries after 1 hour (max 500 entries) - Latest-wins: evict UIDs not seen in 10 minutes (max 10000 entries) Also clear both registries on shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR reduces pusher memory pressure by removing redundant
Confidence Score: 4/5Safe to merge after updating the one broken test assertion in test_async_webhooks.py. One P1 finding: the existing test backend/utils/webhooks.py (broken companion test) and backend/utils/http_client.py (open-CB eviction gap). Important Files Changed
Sequence DiagramsequenceDiagram
participant P as pusher.py
participant AI as app_integrations.py
participant WH as webhooks.py
participant CB as CircuitBreaker
participant LW as latest_wins
participant HX as httpx.AsyncClient
P->>AI: trigger_realtime_audio_bytes(uid, sample_rate, data: bytearray)
Note over AI: chunk_size = 8
loop for each chunk of 8 apps
AI->>LW: latest_wins_start(uid) → version
AI->>CB: get_webhook_circuit_breaker(url)
CB-->>AI: allow_request()?
AI->>LW: latest_wins_check(uid, version)
AI->>HX: client.post(url, content=data ← bytearray, no copy)
HX-->>AI: response
AI->>CB: record_success() / record_failure()
AI->>LW: latest_wins_check → break if stale
end
P->>WH: send_audio_bytes_developer_webhook(uid, sample_rate, data: bytearray)
WH->>CB: get_webhook_circuit_breaker(webhook_url)
WH->>HX: client.post(webhook_url, content=data ← bytearray, no copy)
HX-->>WH: response
Reviews (1): Last reviewed commit: "fix(http_client): add TTL eviction for c..." | Re-trigger Greptile |
| client = get_webhook_client() | ||
| response = await client.post( | ||
| webhook_url, content=bytes(data), headers={'Content-Type': 'application/octet-stream'} | ||
| webhook_url, content=data, headers={'Content-Type': 'application/octet-stream'} |
There was a problem hiding this comment.
Broken existing test assertion
test_async_webhooks.py::TestSendAudioBytesDeveloperWebhook::test_bytearray_converted_to_bytes (line 157) asserts isinstance(sent_content, bytes) — i.e., the content passed to client.post must be a bytes object. The test module stubs user_webhook_status_db to return True, so the client.post() call is always reached. Now that content=bytes(data) has been replaced with content=data, the argument type is bytearray, and isinstance(bytearray(b'\xab\xcd'), bytes) returns False in Python 3. This test will fail.
The PR lists only test_async_http_infrastructure and test_pusher_circuit_breaker as having been run — test_async_webhooks.py was not covered. The test should be updated to reflect the new intended type:
| webhook_url, content=data, headers={'Content-Type': 'application/octet-stream'} | |
| webhook_url, content=data, headers={'Content-Type': 'application/octet-stream'} |
And in test_async_webhooks.py, update line 157:
assert isinstance(sent_content, (bytes, bytearray))| def _evict_stale_circuit_breakers(): | ||
| """Remove circuit breaker entries that have been idle (closed, no recent failures).""" | ||
| now = time.monotonic() | ||
| stale_keys = [ | ||
| k | ||
| for k, cb in _webhook_circuit_breakers.items() | ||
| if cb._state == 'closed' | ||
| and (now - cb._last_failure_time > _CIRCUIT_BREAKER_IDLE_TTL or cb._last_failure_time == 0.0) | ||
| ] | ||
| for k in stale_keys: | ||
| del _webhook_circuit_breakers[k] | ||
| if stale_keys: | ||
| logger.info(f'Evicted {len(stale_keys)} stale circuit breakers, {len(_webhook_circuit_breakers)} remaining') |
There was a problem hiding this comment.
Open circuit breakers are never evicted
_evict_stale_circuit_breakers only removes entries whose _state == 'closed'. An entry that has been in 'open' state for hours (a permanently-failing webhook URL that never gets a successful probe) will never be picked up by this eviction, even though it is functionally idle. In a deployment with many transient plugin webhook URLs that go offline permanently, the 500-entry soft cap still allows up to ~500 stale-open entries to accumulate until the next pod restart.
A simple fix would extend the stale criteria to also evict entries that are open/half_open and idle beyond a larger TTL:
stale_keys = [
k
for k, cb in _webhook_circuit_breakers.items()
if cb._state == 'closed'
and (now - cb._last_failure_time > _CIRCUIT_BREAKER_IDLE_TTL or cb._last_failure_time == 0.0)
or (cb._state in ('open', 'half_open')
and now - cb._last_failure_time > _CIRCUIT_BREAKER_IDLE_TTL * 24) # 24 h for failed targets
]| if key not in _webhook_circuit_breakers: | ||
| if len(_webhook_circuit_breakers) > _CIRCUIT_BREAKER_MAX_ENTRIES: | ||
| _evict_stale_circuit_breakers() | ||
| _webhook_circuit_breakers[key] = WebhookCircuitBreaker(key) |
There was a problem hiding this comment.
Soft cap silently exceeded when eviction yields no candidates
The new entry is inserted unconditionally after _evict_stale_circuit_breakers(), regardless of whether any entries were actually freed. If all 500+ existing entries are 'open' or 'half_open' (e.g., during a widespread webhook outage), eviction is a no-op and the dict silently grows past _CIRCUIT_BREAKER_MAX_ENTRIES. The same pattern applies to _latest_wins_versions. This is fine as a best-effort cap but it's worth documenting as a soft upper bound in the constant name or docstring to avoid future confusion.
96+ concurrent private cloud upload coroutines compete for 4 workers, each requiring 3 sequential executor calls per flush. With 4 workers, only ~1.3 concurrent flushes are possible — far below the 96 uploads/min needed, causing private_cloud_queue to grow unboundedly (8.3GB observed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… OOM Root cause of 4.5GB memory / 173 OOM kills per 24h: unbounded private_cloud_queue (List[dict]) grew to 150+ items per user (~147MB each) when storage_executor was saturated. 96 users × 90 avg items × 960KB = ~8.3GB queued audio across pods. Fix: deque(maxlen=10) drops oldest chunk when queue is full. An OOM kill loses ALL queued data for ALL users on the pod — dropping the oldest 60s chunk for one user is strictly better. Adds explicit warning logging before drops at both enqueue points. Also frees batch bytearray immediately after bytes() copy in _flush_batch to reduce peak memory during upload. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tcha Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CODEx review caught the third private_cloud_queue enqueue point (disconnect flush) was missing the drop-oldest warning log. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously only closed breakers were evicted. Open/half_open breakers for abandoned webhook URLs never aged out, allowing monotonic growth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
httpx 0.28 treats bytearray as iterable, creating an IteratorByteStream that fails with AsyncClient. Convert bytearray to bytes inline at the httpx call site (not at function entry) to minimize copy lifetime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…er_webhook Same httpx 0.28 bytearray compatibility fix as app_integrations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Track _last_access_time on every allow_request() call so actively used breakers are never evicted, even if they never failed. Fixes issue where healthy endpoints with _last_failure_time=0.0 were incorrectly treated as stale on registry overflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- TestPrivateCloudQueueCap: verifies deque(maxlen=10), PRIVATE_CLOUD_QUEUE_MAX_SIZE=10, 3 overflow warning sites, and deque drop-oldest behavior - TestCircuitBreakerAccessTracking: verifies active breakers not evicted, stale breakers evicted, and allow_request() updates _last_access_time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies all 12 apps receive audio when sent in chunks of 8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…se.conversations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Guard http_client stub attributes with __file__ check to avoid overwriting the real module when tests are collected together. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
E2E Live Test Evidence — curl/wscat/pytest1. Service Startup2. wscat WebSocket TestsServer closes with 1006 after Redis auth fails for test UIDs (expected in local dev without prod Redis creds). Key: WS handshake and connection lifecycle works correctly. 3. curl HTTP Tests4. Code Verification (46/46 PASS)
5. Unit Tests (73/73 PASS)Key Evidence Summary
by AI for @beastoin |
With activeConnectionsPerPod=30 (not 62), the safe cap is much higher. Math: floor((4096-2048)/(30*2*0.92)) = 37 theoretical max. 20 provides 1.85x safety margin while halving data loss risk. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
lgtm |
…orkers to prevent OOM (BasedHardware#6762)
Summary
private_cloud_queue(List[dict]) in pusher.py grew to 150+ items per user (~147MB each) whenstorage_executor(4 workers) was saturated. 96 users × 90 avg items × 960KB ≈ 8.3GB queued audio across pods → OOM kills (173 kills/24h)private_cloud_queuewithdeque(maxlen=20)— drops oldest chunk when full instead of OOM-killing the entire pod (which loses ALL data for ALL users). Sized for 30 connections/pod (activeConnectionsPerPod: 30), safe cap = floor((4096-2048)/(30×2×0.92)) = 37, using 20 for 1.85× safety marginstorage_executorfrom 4 to 16 workers — 30 concurrent connections need > 4 workers (each flush requires 3 sequential executor calls). With 16 workers: capacity = 192 flushes/min vs 30 peak arrival = 6.4× headroom_flush_batchSizing math (Codex-verified)
Queue cap (pod limit 4096 MiB,
activeConnectionsPerPod: 30):Worker count (each flush = 3 sequential executor calls, 2-5s each):
Live profiling evidence (prod pod gqxxz, 905 MB RSS)
Per-connection queue sizes (sampled):
Memory growth trajectory (pod w5p69, 2-hour capture):
Kill timeline: 173 kills in 24h, pods hit 4608Mi limit in ~2 hours
Root cause chain:
storage_executorhas only 4 workers, queue=35 → drain rate < ingest rateprivate_cloud_queueis unboundedList[dict]→ grows at ~1.9 GB/hourHow this PR fixes each link:
deque(maxlen=20)(sized for 30 conns/pod)storage_executorqueue=35 (4 workers)bytes(data)copiesbytes()at call site onlyChanges
backend/routers/pusher.pyprivate_cloud_queuewithdeque(maxlen=20), add drop-oldest logging at all 3 enqueue points, free batch data in_flush_batchbackend/utils/executors.pystorage_executorfrom 4 to 16 workersbackend/utils/http_client.py_last_access_timetracking for circuit breaker eviction, TTL eviction for all states, latest-wins TTL evictionbackend/utils/app_integrations.pybytes()inline at httpx call, capasyncio.gatherto 8 apps/batchbackend/utils/webhooks.pybytes()inline at httpx callbackend/CLAUDE.mdbackend/tests/unit/test_async_http_infrastructure.pybackend/tests/unit/test_async_app_integrations.pybackend/tests/unit/test_async_webhooks.pyTest plan
Expected impact
Closes #6022
by AI for @beastoin