Skip to content

fix: two independent hanging-write bugs in the execution engine and local Bun server#158

Merged
RhysSullivan merged 4 commits intoRhysSullivan:mainfrom
aryasaatvik:fix/pausable-execution-fork-daemon
Apr 10, 2026
Merged

fix: two independent hanging-write bugs in the execution engine and local Bun server#158
RhysSullivan merged 4 commits intoRhysSullivan:mainfrom
aryasaatvik:fix/pausable-execution-fork-daemon

Conversation

@aryasaatvik
Copy link
Copy Markdown
Contributor

@aryasaatvik aryasaatvik commented Apr 10, 2026

Summary

Fixes two independent causes of hung non-SAFE tool calls that could still complete upstream, producing phantom writes where the mutation succeeded but the caller never received a response.

Changes

  • Use Effect.forkDaemon for pausable sandbox executions.

    • executeWithPause and resume are often driven by separate top-level Effect.runPromise calls, such as HTTP requests or CLI invocations.
    • A regular Effect.fork ties the sandbox fiber to the first caller scope, so it can be interrupted as soon as the paused result is returned.
    • forkDaemon lets the paused execution survive across the resume boundary.
  • Set Bun.serve({ idleTimeout: 0 }) for the local server.

    • Bun's default 10s idle timeout is too short for MCP elicitation and pause/resume flows involving human approval.
    • Disabling the Bun-level socket timeout prevents the server from closing the connection mid-flow.
    • MCP and HTTP clients still enforce their own request timeouts.

Tests

  • Added a regression test for a single-elicit tool across separate runPromise boundaries.
  • Verified the regression fails with Effect.fork and passes with Effect.forkDaemon.

Validation

  • bun run typecheck and bun x vitest run in packages/core/execution
  • bun run typecheck in apps/local
  • End-to-end check against Gmail labels.create

…omise boundaries

The execution engine's `startPausableExecution` forked the sandbox fiber
with `Effect.fork`, which attaches it to the current fiber's `Local`
FiberScope. Every host that drives `executeWithPause` and `resume` from
separate contexts (the HTTP API handlers, CLI `executor call` + `executor
resume`, and any host that pauses across an async boundary) runs each
call in its own top-level `Effect.runPromise`. When the first
runPromise's root fiber exits to return the paused result, its
`FiberRuntime.evaluateEffect` calls `interruptAllChildren()` and
interrupts the sandbox before the caller ever sees the response.

The subsequent `resume` then races `Fiber.join(paused.fiber)` — which
returns the interrupt exit immediately (converted to a defect by
`Effect.orDie` in `awaitCompletionOrPause`) — against
`Deferred.await(nextSignal)`. `Effect.race`'s `onSelfDone` on failure
waits for the loser, but nothing ever signals the next pause Deferred:
the tool's continuation runs on a separate root fiber spawned by the
quickjs sandbox bridge for each tool call, and once that completes with
no further elicitations, `nextSignal` is never filled. The resume call
hangs forever. Crucially, the underlying HTTP side effect of the tool
can still succeed in this window — producing "phantom writes" where the
upstream observes the mutation but the caller never sees the response.

`Effect.forkDaemon` attaches the sandbox fiber to the global FiberScope
instead of the parent's children set, so the parent's
`interruptAllChildren` on exit does not touch it. FiberRefs and Context
are copied at fork time via `unsafeMakeChildFiber`, so the daemon is
fully independent of the driving runPromise's lifetime.

Adds a regression test (`"resume returns across separate runPromise
boundaries for a single-elicit tool (HTTP-like)"`) that uses plain
`it(async () => ...)` so each engine call is its own top-level
`runPromise`, matching the HTTP handler shape. The test runs against a
new single-elicit `singleApproval` tool that mirrors the Gmail
`labels.create`-style "one approval, one side effect, one response"
shape. The existing `multiApproval`-based `it.effect` test masks the bug
because both calls share one runPromise scope AND the second elicit
eventually fills `nextSignal`, letting the race complete even with a
dead sandbox fiber.
@aryasaatvik aryasaatvik marked this pull request as draft April 10, 2026 14:34
…ion round-trips can complete

Bun.serve's default idleTimeout is 10 seconds (see
`src/bun.js/api/server/ServerConfig.zig:26`). That is too short for the
executor's HTTP endpoints:

- The MCP streamable-HTTP transport on `/mcp` keeps the TCP socket open
  for the duration of a tool call. When a non-SAFE operation trips the
  `requiresApproval` annotation in the google-discovery / openapi /
  graphql plugins, the executor's elicitation handler calls
  `server.server.elicitInput(...)`, which sends an `elicitation/create`
  request to the MCP client and awaits its response. For clients that
  surface an approval UI to a human operator, the full round-trip —
  render prompt, read it, click, serialize the response, send it back
  over the socket — routinely takes 5–10 seconds. Bun's 10s idle timeout
  was closing the socket mid-flight, either:
    (a) before the response returned — the tool call appeared to hang
        and the caller saw a transport error; or
    (b) after the sandbox had already received `{ action: "accept" }`
        and the underlying HTTP call to the upstream API had gone
        through — producing "phantom writes" where the upstream observed
        the mutation but the caller never saw the response.
  Both failure modes were reproduced against Gmail's `labels.create`
  during debugging.

- The executor's own `/api/executions` + `/api/executions/:id/resume`
  pause/resume flow also benefits: long-running sandboxed computations
  (e.g. large batch operations driven from a `tools.search` loop) can
  legitimately idle the socket beyond 10 seconds while awaiting tool
  continuations.

`idleTimeout: 0` disables the idle timeout entirely. This is a
documented behavior of Bun's uSockets: `us_socket_timeout` in
`packages/bun-usockets/src/socket.c:86-92` sets the internal `s->timeout`
field to the sentinel `255` — which never fires — when `seconds == 0`.
`idleTimeout: 0` is therefore the correct "no timeout" signal, not a
"close immediately" bug.

The executor's MCP host and HTTP API clients are responsible for
enforcing their own per-request timeouts (MCP SDK's
`DEFAULT_REQUEST_TIMEOUT_MSEC = 60000`, and any caller-side timeouts).
Disabling the Bun-level idle timeout only removes a racing cutoff that
was shorter than those logical timeouts and shorter than a realistic
human approval cycle.
@aryasaatvik aryasaatvik changed the title fix(execution): fork sandbox as daemon so pause/resume survives runPromise boundaries fix: two independent hanging-write bugs in the execution engine and local Bun server Apr 10, 2026
@aryasaatvik aryasaatvik marked this pull request as ready for review April 10, 2026 15:28

const invoker = makeFullInvoker(executor, { onElicitation: elicitationHandler });
fiber = yield* Effect.fork(codeExecutor.execute(code, invoker));
fiber = yield* Effect.forkDaemon(codeExecutor.execute(code, invoker));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core lifetime fix. A paused execution can cross multiple host calls: /api/executions returns the pause, then /api/executions/:id/resume joins the same sandbox later. With Effect.fork, the sandbox remains a child of the first runPromise, so returning the paused response can interrupt it before the caller has a chance to resume. forkDaemon moves that sandbox fiber into the global scope, which matches the execution ID lifecycle instead of the request lifecycle.

// nothing can accidentally unstick a dead-fiber race via a second
// elicitation (which is what masks the bug in the `multiApproval` test
// above).
it(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important detail here is using a normal async Vitest test instead of it.effect. That makes executeWithPause and resume run through separate top-level Effect.runPromise calls, matching HTTP/CLI usage. The single-elicit tool avoids the old multi-elicit test accidentally unblocking the dead-fiber race with a second pause signal.

// and `/api/executions` pause/resume can legitimately keep the socket
// idle longer than that. `0` is the "no timeout" sentinel per Bun's
// uSockets; MCP/HTTP clients enforce their own per-request timeouts.
idleTimeout: 0,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers the other hang path. Bun defaults HTTP connections to a 10s idle timeout, which is short enough to cut off MCP elicitation while a human approval UI is still round-tripping. Setting this to 0 disables the server-level socket cutoff; MCP and callers still keep their own logical request timeouts.

@RhysSullivan RhysSullivan merged commit 2a6312c into RhysSullivan:main Apr 10, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants