Skip to content

fix: prevent 180s timeout cascade on dynamic worker failures#523

Merged
RhysSullivan merged 5 commits into
mainfrom
fix/execute-with-pause-failure-propagation
May 5, 2026
Merged

fix: prevent 180s timeout cascade on dynamic worker failures#523
RhysSullivan merged 5 commits into
mainfrom
fix/execute-with-pause-failure-propagation

Conversation

@RhysSullivan
Copy link
Copy Markdown
Owner

@RhysSullivan RhysSullivan commented May 4, 2026

Summary

Tool calls against any JavaScript-only sandbox could hang for the full upstream client timeout (180s on Claude / Cowork) when the user's code failed fast — e.g. a TypeScript type annotation that workerd / QuickJS rejects with Unexpected token ':'. One hang would also poison the per-id JSON-RPC queue and cascade to subsequent same-id calls. Three independent fixes, ordered by impact:

(a) Root cause — Effect.raceEffect.raceFirst in awaitCompletionOrPause. Effect.race in Effect v4 has prefer-success semantics ("returns the first successful result"), so when the forked executor.code.exec fiber failed, the race waited forever for the pause Deferred to succeed. raceFirst settles on whichever side completes first, success or failure.

(b) Strip TypeScript types before evaluation. The execute tool description tells callers to write TypeScript and tools.describe.tool() hands them TypeScript shapes, but the JS-only sandboxes evaluate plain JavaScript. Run user code through sucrase's TypeScript-only transform inside @executor-js/codemode-core (stripTypeScript), called from both runtime-dynamic-worker and runtime-quickjs. runtime-deno-subprocess skips because Deno parses TS natively. Sucrase is pure JS, ~280KB, works in workerd. On parse failure the error surfaces as a typed runtime error (now actually visible thanks to fix (a)) instead of a silent timeout. Description updated to document the behaviour.

(c) Cap JsonRpcRequestIdQueue previous-request wait at 60s. Defense in depth on top of fix (a). A previously hung POST with a given JSON-RPC id used to hold inFlight forever; subsequent same-id calls (Cowork reuses id: 1) blocked on Promise.all(previous) indefinitely. Now caps at 60s with a warn log and proceeds.

Repro: executor.runtime.evaluate ERROR'd in 12ms but mcp.host.tool.execute / mcp.execute / mcp.peek_response parent spans never closed, while the worker-side mcp.do.handle_request ran 184s before the client gave up.

Fix (a) already deployed to production. (b) and (c) still need a deploy after merge.

Test plan

  • @executor-js/execution — 17/17 (3 new in engine.test.ts covering failure propagation through executeWithPause)
  • @executor-js/host-mcp — 25/25 (resume path uses the same awaitCompletionOrPause)
  • @executor-js/codemode-core — 37/37 (9 new in strip-types.test.ts covering annotations, casts, generics, interfaces, type aliases, regression for the customer's failure shape)
  • @executor-js/runtime-dynamic-worker — 42/42
  • @executor-js/runtime-quickjs — 8/8 (now also goes through stripTypeScript)
  • worker queue tests — 4/4 (regression: hung previous request no longer blocks new same-id call)
  • tsgo --noEmit clean on all modified packages
  • Re-trigger the syntax-error path on a fresh Cowork session, watch for clean error response in <100ms

…ilure

Effect.race in Effect v4 has prefer-success semantics. When the
codeExecutor fiber failed fast (e.g. user code with a TS type
annotation, syntax error in <30ms), the race would keep waiting
for the pause Deferred to succeed and never settle. Outer Effect
hung until the upstream client gave up at 180s, which then
poisoned the per-id JSON-RPC queue and cascaded to subsequent
calls on the same MCP session.

raceFirst returns the first effect to complete, success or
failure, so a fiber failure now propagates immediately.
The execute tool description tells callers to write TypeScript and the
describe.tool output hands them TypeScript shapes, but workerd only
evaluates plain JavaScript. A single ': T' annotation in user code
threw 'Unexpected token' inside the dynamic worker, which used to
manifest as a 180s client timeout (now a clean DynamicWorkerExecutionError
after the engine race fix).

Run user code through sucrase's TypeScript-only transform before
buildExecutorModule. Sucrase is pure JS, ~280KB, works in workerd, and
is the same syntactic-only transform Node's experimental
strip-types feature uses. On parse failure the error surfaces as a
typed DynamicWorkerExecutionError so the model gets actionable
feedback instead of a silent timeout.

Description gets one extra rule documenting the behaviour so callers
know decorators / enum aren't supported.
…request can't poison subsequent calls

A previously hung POST with a given JSON-RPC id (e.g. id: 1, which
Cowork reuses across calls) used to hold inFlight forever, so any
subsequent call with the same id would block on Promise.all(previous)
indefinitely. Mostly moot now that the engine race fix prevents the
root hang, but keep this as defense in depth: cap the wait at 60s and
log if we hit the cap, then proceed with the new request.

Constructor accepts an override so tests can drive the timeout to
100ms instead of waiting wall-clock 60s in CI.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 4, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing fe178e4 Commit Preview URL

Branch Preview URL
May 04 2026, 09:09 PM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 4, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
executor-cloud fe178e4 May 04 2026, 09:10 PM

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 4, 2026

Open in StackBlitz

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@523

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@523

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@523

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@523

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@523

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@523

@executor-js/storage-core

npm i https://pkg.pr.new/@executor-js/storage-core@523

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@523

@executor-js/plugin-google-discovery

npm i https://pkg.pr.new/@executor-js/plugin-google-discovery@523

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@523

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@523

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@523

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@523

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@523

executor

npm i https://pkg.pr.new/executor@523

commit: fe178e4

…n runtime-quickjs too

Originally scoped to runtime-dynamic-worker but the same TS-strip step
applies to any JavaScript-only sandbox. QuickJS doesn't understand TS
either, so a typed annotation in user code would surface as a generic
QuickJsExecutionError instead of a clear syntax error. Lift
stripTypeScript into kernel/core (alongside recoverExecutionBody),
re-export from @executor-js/codemode-core, and call it from both
runtime-dynamic-worker and runtime-quickjs. runtime-deno-subprocess
still skips because Deno parses TS natively.

Tests move with the implementation (kernel/core/src/strip-types.test.ts).
Sucrase dep moves from runtime-dynamic-worker to kernel/core.
Repo-wide oxlint rule enforces @effect/vitest over plain vitest for
test helpers. Add the missing devDep to kernel/core and update the
two new test files.
@RhysSullivan RhysSullivan merged commit aeec8aa into main May 5, 2026
9 checks passed
@RhysSullivan RhysSullivan mentioned this pull request May 10, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant