Skip to content

Fix saveMessages cancellation race with external AbortSignal#1411

Merged
threepointone merged 3 commits intomainfrom
fix/savemessages-external-abort-signal
Apr 28, 2026
Merged

Fix saveMessages cancellation race with external AbortSignal#1411
threepointone merged 3 commits intomainfrom
fix/savemessages-external-abort-signal

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented Apr 28, 2026

Summary

Resolves #1406: Think.saveMessages (and the equivalent AIChatAgent.saveMessages) had no race-free way for callers to cancel an in-flight programmatic turn. The previous workaround required reaching into private _aborts state and still had a window where a cancel could land between turn-id assignment and registry insertion.

saveMessages and continueLastTurn now accept options.signal: AbortSignal. The external signal is bridged into the per-turn AbortRegistry controller via a new AbortRegistry.linkExternal(id, signal), which:

  • handles already-aborted signals synchronously,
  • attaches a one-shot abort listener with a returned detacher,
  • propagates signal.reason into the registry's cancel(id, reason),
  • coexists cleanly with MSG_CHAT_CANCEL, internal cancellation, and per-request listener cleanup.

SaveMessagesResult.status now includes "aborted" alongside "completed" and "skipped". Existing callers that only switch on "completed" are unaffected. Two new protected helpers — abortRequest(id, reason?) and abortAllRequests() — replace the historical (this as { _aborts: ... })._aborts.destroyAll() pattern.

The agents-as-tools example is updated to use the new contract: each helper turn now owns a local AbortController whose signal is passed to saveMessages, and the returned ReadableStream's cancel callback aborts that controller. This is the canonical pattern for bridging a parent DO's intent into a child DO across RPC.

Changes

Core (packages/agents)

  • AbortRegistry.linkExternal(id, signal) + cancel(id, reason) accepting an explicit reason.
  • New SaveMessagesOptions type exported from chat/.
  • SaveMessagesResult.status extended with "aborted".

Think (packages/think)

  • saveMessages(messages, options?) and continueLastTurn(options?) accept options.signal.
  • Protected abortRequest() / abortAllRequests() methods.
  • External signal lifecycle is wired through the turn-queue/inference loop with detacher cleanup in finally.

AIChatAgent (packages/ai-chat)

  • Mirror of the Think changes for saveMessages/continueLastTurn, including _runProgrammaticChatTurn honoring the external signal.

Example (examples/agents-as-tools)

  • Helper turn uses a local AbortController; stream cancel aborts it.
  • Removed the legacy _aborts bracket-access workaround.

Tests

  • 7 new AbortRegistry unit tests covering pre-abort, deferred abort, listener leakage across many turns, double-abort, coexistence with internal cancel, and reason propagation.
  • Type-level tests for SaveMessagesOptions (tests-d/save-messages-options.test-d.ts).
  • 9 new Think integration tests and 5 new AIChatAgent integration tests for completion / pre-abort / mid-stream abort / post-completion no-op / public abortAllRequests() / leak prevention.
  • Helper-stream end-to-end tests rewritten to use new test seams (testRunHelperPreCancelled, testRunHelperMidCancelled, getAbortRegistrySize, waitForAbortRegistryDrained) plus a delayed mock model to make the cancel callback land mid-inference reliably.

Docs

  • docs/think/sub-agents.md: options.signal, "aborted" status, "Crossing DO boundaries", "Hibernation and recovery".
  • docs/server-driven-messages.md: cancellation section + limitations.
  • docs/chat-agents.md: updated signatures and behavior.
  • docs/think/lifecycle-hooks.md: onChatResponse example for aborted status.
  • packages/ai-chat/README.md: usage example referencing Think.saveMessages should accept an external AbortSignal so callers can cancel an in-flight turn from outside #1406.
  • examples/agents-as-tools/README.md: updated B4 cancellation note.
  • wip/inline-sub-agent-events.md: B4 marked resolved.

Changesets (patch per repo convention for pre-1.0 additive features):

  • agents-abort-registry-link-external.md
  • think-savemessages-abort-signal.md
  • ai-chat-savemessages-abort-signal.md

Limitations (documented)

  • AbortSignal cannot cross Durable Object RPC. Construct the controller inside the DO that calls saveMessages. To bridge a parent's intent into a child DO, return a ReadableStream from the child whose cancel callback aborts a per-turn controller — see examples/agents-as-tools.
  • The signal lives in memory only. If the DO hibernates mid-turn and chatRecovery is enabled, the recovered turn runs without the original signal.

Test plan

  • pnpm -w turbo run test --filter=@cloudflare/agents --filter=@cloudflare/think --filter=@cloudflare/ai-chat --filter=agents-as-tools-example passes
  • pnpm -w turbo run typecheck lint passes
  • oxfmt --check passes (auto-applied via lint-staged on commit)
  • New AbortRegistry unit tests cover pre-abort, deferred abort, listener leak prevention, reason propagation
  • Think + AIChatAgent integration tests cover pre-abort, mid-stream abort, post-completion no-op, public abortAllRequests()
  • Helper-stream tests verify both pre-cancel and mid-cancel paths against the new race-free contract
  • CI green on all matrices

Closes #1406.

Made with Cursor


Open in Devin Review

Resolves the race in `Think.saveMessages` and `AIChatAgent.saveMessages`
where callers had no way to safely cancel an in-flight programmatic turn
without reaching into private `_aborts` state. `saveMessages` and
`continueLastTurn` now accept `options.signal`, the signal is bridged
into the per-turn `AbortRegistry` controller via a new
`AbortRegistry.linkExternal()`, and `SaveMessagesResult.status` reports
`"aborted"` when the external signal fires. Adds protected
`abortRequest()` / `abortAllRequests()` helpers so subclasses no longer
need bracket-access workarounds.

Updates the `agents-as-tools` example to use the new contract, expands
unit + integration coverage (registry, Think, AIChatAgent, helper
stream), and documents the API plus its DO RPC and hibernation
limitations.

See #1406.

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 28, 2026

🦋 Changeset detected

Latest commit: 49f3ec0

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
agents Patch
@cloudflare/ai-chat Patch
@cloudflare/think Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration[bot]

This comment was marked as resolved.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 28, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1411

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1411

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1411

hono-agents

npm i https://pkg.pr.new/hono-agents@1411

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1411

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1411

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1411

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1411

commit: 49f3ec0

Fixes a listener leak in `_runProgrammaticChatTurn` where
`linkExternal` was called before the `runFiber` boundary while the
`try/finally` that calls `detachExternal()` lived inside
`programmaticBody`. If `runFiber` itself threw — e.g. a SQLite error
inserting the fiber row, or `keepAlive()` failing — the body never
ran, so the external-signal listener was never removed and the
registry entry was never cleaned up. Long-lived parent signals
driving many helper turns would accumulate listeners across failures.

The fix mirrors the structure already used by `Think.saveMessages`
and `Think.continueLastTurn`: the `try/finally` now wraps both the
`runFiber` and direct-call branches, so cleanup runs regardless of
where the throw originates. `AIChatAgent.continueLastTurn` was already
structurally safe (linkExternal runs *inside* the runFiber boundary)
and is unchanged.

Adds a regression test that monkey-patches `runFiber` to throw
synchronously and asserts the abort registry drains and no listener
remains on the external signal.

Made-with: Cursor
Use `Parameters<...>` of the bound `addEventListener`/`removeEventListener`
overloads (instead of redeclaring the signature) and reference the
non-polymorphic `RecoverySlowStreamAgent["runFiber"]` so `typeof this`
doesn't leak into the cast. Behavior is unchanged.

Made-with: Cursor
@threepointone threepointone merged commit 2fa68be into main Apr 28, 2026
2 checks passed
@threepointone threepointone deleted the fix/savemessages-external-abort-signal branch April 28, 2026 22:34
@github-actions github-actions Bot mentioned this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Think.saveMessages should accept an external AbortSignal so callers can cancel an in-flight turn from outside

1 participant