fix: prevent 180s timeout cascade on dynamic worker failures#523
Merged
Conversation
…ilure Effect.race in Effect v4 has prefer-success semantics. When the codeExecutor fiber failed fast (e.g. user code with a TS type annotation, syntax error in <30ms), the race would keep waiting for the pause Deferred to succeed and never settle. Outer Effect hung until the upstream client gave up at 180s, which then poisoned the per-id JSON-RPC queue and cascaded to subsequent calls on the same MCP session. raceFirst returns the first effect to complete, success or failure, so a fiber failure now propagates immediately.
The execute tool description tells callers to write TypeScript and the describe.tool output hands them TypeScript shapes, but workerd only evaluates plain JavaScript. A single ': T' annotation in user code threw 'Unexpected token' inside the dynamic worker, which used to manifest as a 180s client timeout (now a clean DynamicWorkerExecutionError after the engine race fix). Run user code through sucrase's TypeScript-only transform before buildExecutorModule. Sucrase is pure JS, ~280KB, works in workerd, and is the same syntactic-only transform Node's experimental strip-types feature uses. On parse failure the error surfaces as a typed DynamicWorkerExecutionError so the model gets actionable feedback instead of a silent timeout. Description gets one extra rule documenting the behaviour so callers know decorators / enum aren't supported.
…request can't poison subsequent calls A previously hung POST with a given JSON-RPC id (e.g. id: 1, which Cowork reuses across calls) used to hold inFlight forever, so any subsequent call with the same id would block on Promise.all(previous) indefinitely. Mostly moot now that the engine race fix prevents the root hang, but keep this as defense in depth: cap the wait at 60s and log if we hit the cap, then proceed with the new request. Constructor accepts an override so tests can drive the timeout to 100ms instead of waiting wall-clock 60s in CI.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
executor-marketing | fe178e4 | Commit Preview URL Branch Preview URL |
May 04 2026, 09:09 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
executor-cloud | fe178e4 | May 04 2026, 09:10 PM |
@executor-js/codemode-core
@executor-js/runtime-quickjs
@executor-js/cli
@executor-js/config
@executor-js/execution
@executor-js/sdk
@executor-js/storage-core
@executor-js/plugin-file-secrets
@executor-js/plugin-google-discovery
@executor-js/plugin-graphql
@executor-js/plugin-keychain
@executor-js/plugin-mcp
@executor-js/plugin-onepassword
@executor-js/plugin-openapi
executor
commit: |
…n runtime-quickjs too Originally scoped to runtime-dynamic-worker but the same TS-strip step applies to any JavaScript-only sandbox. QuickJS doesn't understand TS either, so a typed annotation in user code would surface as a generic QuickJsExecutionError instead of a clear syntax error. Lift stripTypeScript into kernel/core (alongside recoverExecutionBody), re-export from @executor-js/codemode-core, and call it from both runtime-dynamic-worker and runtime-quickjs. runtime-deno-subprocess still skips because Deno parses TS natively. Tests move with the implementation (kernel/core/src/strip-types.test.ts). Sucrase dep moves from runtime-dynamic-worker to kernel/core.
Repo-wide oxlint rule enforces @effect/vitest over plain vitest for test helpers. Add the missing devDep to kernel/core and update the two new test files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tool calls against any JavaScript-only sandbox could hang for the full upstream client timeout (180s on Claude / Cowork) when the user's code failed fast — e.g. a TypeScript type annotation that workerd / QuickJS rejects with
Unexpected token ':'. One hang would also poison the per-id JSON-RPC queue and cascade to subsequent same-id calls. Three independent fixes, ordered by impact:(a) Root cause —
Effect.race→Effect.raceFirstinawaitCompletionOrPause.Effect.racein Effect v4 has prefer-success semantics ("returns the first successful result"), so when the forkedexecutor.code.execfiber failed, the race waited forever for the pause Deferred to succeed.raceFirstsettles on whichever side completes first, success or failure.(b) Strip TypeScript types before evaluation. The execute tool description tells callers to write TypeScript and
tools.describe.tool()hands them TypeScript shapes, but the JS-only sandboxes evaluate plain JavaScript. Run user code through sucrase's TypeScript-only transform inside@executor-js/codemode-core(stripTypeScript), called from bothruntime-dynamic-workerandruntime-quickjs.runtime-deno-subprocessskips because Deno parses TS natively. Sucrase is pure JS, ~280KB, works in workerd. On parse failure the error surfaces as a typed runtime error (now actually visible thanks to fix (a)) instead of a silent timeout. Description updated to document the behaviour.(c) Cap
JsonRpcRequestIdQueueprevious-request wait at 60s. Defense in depth on top of fix (a). A previously hung POST with a given JSON-RPC id used to holdinFlightforever; subsequent same-id calls (Cowork reusesid: 1) blocked onPromise.all(previous)indefinitely. Now caps at 60s with a warn log and proceeds.Repro:
executor.runtime.evaluateERROR'd in 12ms butmcp.host.tool.execute/mcp.execute/mcp.peek_responseparent spans never closed, while the worker-sidemcp.do.handle_requestran 184s before the client gave up.Fix (a) already deployed to production. (b) and (c) still need a deploy after merge.
Test plan
@executor-js/execution— 17/17 (3 new inengine.test.tscovering failure propagation throughexecuteWithPause)@executor-js/host-mcp— 25/25 (resume path uses the sameawaitCompletionOrPause)@executor-js/codemode-core— 37/37 (9 new instrip-types.test.tscovering annotations, casts, generics, interfaces, type aliases, regression for the customer's failure shape)@executor-js/runtime-dynamic-worker— 42/42@executor-js/runtime-quickjs— 8/8 (now also goes throughstripTypeScript)tsgo --noEmitclean on all modified packages