fix: graceful shutdown before V8 OOM crash by GeneralJerel · Pull Request #2 · CopilotKit/shadify

GeneralJerel · 2026-03-24T13:02:40Z

Summary

Runtime crashes ~3x every 6 hours with exit code 134 (V8 OOM) as heap grows monotonically from ~228MB to 246MB+ against the 256MB limit
Added graceful shutdown in resilience.ts: when heap reaches 235MB, the process drains for 5s then exits cleanly instead of crashing
Added 503 middleware in server.ts (registered before routes) to reject new requests during drain, with Connection: close to signal the load balancer

Root cause

A memory leak (likely in streaming proxy internals) causes steady heap growth. This PR mitigates the crash impact — the underlying leak still needs profiling to find and fix.

Test plan

Deploy to Render and monitor logs for [memory] Heap at XMB — initiating graceful shutdown instead of FATAL ERROR: Ineffective mark-compacts near heap limit
Verify the service recovers without "Instance failed" events (clean exit 0 vs crash exit 134)
Confirm in-flight requests complete during the 5s drain window

🤖 Generated with Claude Code

The runtime process heap grows monotonically until it hits the 256MB V8 limit, causing an abrupt exit-134 crash ~3x every 6 hours. Instead of letting V8 kill the process, detect when heap reaches 235MB and initiate a controlled drain: reject new requests with 503, give in-flight requests 5s to complete, then exit cleanly for Render to restart a fresh instance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ions The prior graceful shutdown set a flag and returned 503s but never called server.close(), so the Node http.Server kept accepting TCP connections during the drain period.

GeneralJerel

Review

The approach is sound — graceful shutdown to mitigate the OOM crash is a reasonable stopgap while the underlying leak is profiled.

Issues to address

1. setTimeout(...).unref() may skip the drain window entirely

.unref() on the drain timer tells Node not to keep the event loop alive for it. If server.close() finishes quickly and nothing else holds the loop open, the process exits immediately — before in-flight requests get their 5s. Consider removing .unref(), or documenting that the intent is "exit as soon as possible, up to 5s max."

2. Shutdown callbacks swallow errors

for (const cb of shutdownCallbacks) cb() — if any callback throws, the remaining callbacks and the drain logic below never run. Wrap in try/catch:

for (const cb of shutdownCallbacks) {
  try { cb(); } catch (e) { console.error("[shutdown] callback error", e); }
}

3. Race at startup: resilience interval starts before serve()

The import moved to the top of server.ts, so the setInterval in resilience.ts starts before serve() returns and before onShutdown(() => server.close()) is registered. If heap is already near 235MB at startup, gracefulShutdown() fires with no callbacks registered. Low probability but worth guarding — e.g. register the shutdown callback immediately after serve() (already done) and ensure server.close() tolerates being called on an undefined ref.

Nice-to-haves

Retry-After header on 503 — c.header("Retry-After", "5") helps well-behaved LBs/clients back off.
60s polling granularity is coarse — a burst of large streaming responses could jump past 235MB between checks. v8.getHeapStatistics() or a tighter interval near the threshold would reduce the window.

- Remove .unref() on drain timer so in-flight requests get the full 5s window - Wrap shutdown callbacks in try/catch to prevent one failure from skipping the rest - Guard server.close() against undefined ref during startup race - Add Retry-After header on 503 to help LBs/clients back off

GeneralJerel and others added 2 commits March 24, 2026 06:02

fix: close HTTP server on graceful shutdown to stop accepting connect…

096ab5a

…ions The prior graceful shutdown set a flag and returned 503s but never called server.close(), so the Node http.Server kept accepting TCP connections during the drain period.

GeneralJerel commented Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: graceful shutdown before V8 OOM crash#2

fix: graceful shutdown before V8 OOM crash#2
GeneralJerel wants to merge 3 commits intomainfrom
fix/runtime-oom-crash

GeneralJerel commented Mar 24, 2026

Uh oh!

GeneralJerel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GeneralJerel commented Mar 24, 2026

Summary

Root cause

Test plan

Uh oh!

GeneralJerel left a comment

Choose a reason for hiding this comment

Review

Issues to address

Nice-to-haves

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant