Conversation
The runtime process heap grows monotonically until it hits the 256MB V8 limit, causing an abrupt exit-134 crash ~3x every 6 hours. Instead of letting V8 kill the process, detect when heap reaches 235MB and initiate a controlled drain: reject new requests with 503, give in-flight requests 5s to complete, then exit cleanly for Render to restart a fresh instance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ions The prior graceful shutdown set a flag and returned 503s but never called server.close(), so the Node http.Server kept accepting TCP connections during the drain period.
GeneralJerel
left a comment
There was a problem hiding this comment.
Review
The approach is sound — graceful shutdown to mitigate the OOM crash is a reasonable stopgap while the underlying leak is profiled.
Issues to address
1. setTimeout(...).unref() may skip the drain window entirely
.unref() on the drain timer tells Node not to keep the event loop alive for it. If server.close() finishes quickly and nothing else holds the loop open, the process exits immediately — before in-flight requests get their 5s. Consider removing .unref(), or documenting that the intent is "exit as soon as possible, up to 5s max."
2. Shutdown callbacks swallow errors
for (const cb of shutdownCallbacks) cb() — if any callback throws, the remaining callbacks and the drain logic below never run. Wrap in try/catch:
for (const cb of shutdownCallbacks) {
try { cb(); } catch (e) { console.error("[shutdown] callback error", e); }
}3. Race at startup: resilience interval starts before serve()
The import moved to the top of server.ts, so the setInterval in resilience.ts starts before serve() returns and before onShutdown(() => server.close()) is registered. If heap is already near 235MB at startup, gracefulShutdown() fires with no callbacks registered. Low probability but worth guarding — e.g. register the shutdown callback immediately after serve() (already done) and ensure server.close() tolerates being called on an undefined ref.
Nice-to-haves
Retry-Afterheader on 503 —c.header("Retry-After", "5")helps well-behaved LBs/clients back off.- 60s polling granularity is coarse — a burst of large streaming responses could jump past 235MB between checks.
v8.getHeapStatistics()or a tighter interval near the threshold would reduce the window.
- Remove .unref() on drain timer so in-flight requests get the full 5s window - Wrap shutdown callbacks in try/catch to prevent one failure from skipping the rest - Guard server.close() against undefined ref during startup race - Add Retry-After header on 503 to help LBs/clients back off
Summary
resilience.ts: when heap reaches 235MB, the process drains for 5s then exits cleanly instead of crashingserver.ts(registered before routes) to reject new requests during drain, withConnection: closeto signal the load balancerRoot cause
A memory leak (likely in streaming proxy internals) causes steady heap growth. This PR mitigates the crash impact — the underlying leak still needs profiling to find and fix.
Test plan
[memory] Heap at XMB — initiating graceful shutdowninstead ofFATAL ERROR: Ineffective mark-compacts near heap limit🤖 Generated with Claude Code