Inline sub-agent event streaming: agents-as-tools example + design notes#1405
Inline sub-agent event streaming: agents-as-tools example + design notes#1405threepointone merged 28 commits intomainfrom
agents-as-tools example + design notes#1405Conversation
Lets non-chat consumers (e.g. helper sub-agents that share a WebSocket with a parent's chat) stamp replay frames with a distinct wire-type tag without colliding with the chat protocol. Defaults to CHAT_MESSAGE_TYPES.USE_CHAT_RESPONSE so existing AIChatAgent and Think callers preserve byte-identical behavior — all 28 existing resumable-streaming.test.ts regression tests pass unchanged. Also exports the new ResumableStreamOptions type from agents/chat. This is the smaller version of the fix proposed in #1377: the tablePrefix part of that proposal is intentionally omitted because the recommended pattern for "events alongside a chat" puts helper events on the helper sub-agent's own DO (so collisions are impossible by isolation). See wip/inline-sub-agent-events.md for the full design. Refs: #1377 Made-with: Cursor
…rwarded events
A focused minimal proof of the agents-as-tools pattern: during a
single chat turn, the assistant (a Think agent) dispatches a helper
sub-agent (Researcher) to do multi-step work, and the helper's
lifecycle events stream live into the chat UI.
Architecture (per the 2026-04-27 design pivot in
wip/inline-sub-agent-events.md):
- Researcher is a real sub-agent with its own DO and SQLite. It owns
its own ResumableStream configured with messageType: "helper-event"
so its replay frames don't collide with the chat protocol.
- Researcher.startAndStream(query, helperId) returns a
ReadableStream<{ sequence, body }> over DO RPC. Each emitted event
is durably stored on the helper's own SQLite before being written
to the stream.
- Assistant's tool execute reads from the helper's stream and
broadcasts each event onto the chat WS via this.broadcast(...).
Browser keeps one WS to the parent — no second connection.
- Assistant maintains a tiny active_helpers table for reconnect
replay: on onConnect, it walks the table, fetches each in-flight
helper's stored events via DO RPC, and forwards them as
replay: true frames to the connecting client.
- Assistant.onStart sweeps stale active_helpers rows from a previous
parent crash and deletes the leaked helper facets.
Multi-ai-chat compatible because helpers are sub-agents of whichever
DO terminates the WebSocket — top-level Assistant in this demo, or
a Chat (which is itself a sub-agent of Inbox) in multi-ai-chat. The
forwarding pattern is identical at any nesting depth.
Drill-in to a specific helper (a separate `useAgent({ sub: [...] })`
connection direct to the helper) is supported structurally by the
routing primitive but not wired in the example UI yet.
Wire protocol (src/protocol.ts) is shared between server and client,
zero Worker-runtime imports — the front-end bundle never transitively
pulls in agents/Think/workers-ai-provider through a stray value
import. The protocol carries six event kinds (started / step /
tool-call / tool-result / finished / error) plus a sequence field
for client-side dedup against the reconnect-window race where one
event can arrive both as a replay frame and as a live broadcast.
The client renders helper events inline under the matching tool
part in the assistant message, dedupes by (parentToolCallId,
sequence) Set semantics, and inserts in sorted position so the
timeline always renders in helper-emit order regardless of wire
arrival order.
Adds a one-line cross-reference from examples/ai-chat/README.md.
Refs: #1377
Made-with: Cursor
Captures the design context behind agents-as-tools and the messageType ctor option on ResumableStream. Frames issue #1377 as the visible tip of a missing first-class pattern (helpers running inside a single chat turn with their lifecycle events streaming into the chat UI), walks through the original "multi-channel ResumableStream on the parent DO" design, and records the 2026-04-27 pivot to "helpers as sub-agents with parent-forwarded events." The pivot rationale, condensed: - State containment. A helper's events belong to the helper's work, not the chat's. Putting them on the helper's own DO is the honest representation. Persistent / inspectable / drill-in-able helpers follow naturally; they don't with parent-side storage. - Reuses the routing primitive instead of inventing. Drill-in, addressing, lifecycle, parent/child RPC are all already shipped. - Smaller framework change. One ctor option (messageType) instead of multi-channel schema + per-channel state machine. Tables don't need to be parameterized — each DO has its own SQLite, so collisions are impossible by isolation. - Multi-ai-chat-compatible. The forwarding pattern works recursively; "the parent" is whichever DO terminates the WS. The doc preserves the original multi-channel design as a record of the design space that was explored. Multi-channel ResumableStream is parked, not killed — if a future use case needs one DO to multiplex N independent durable streams to its own clients (e.g. a workflow DO with parallel tracks), the design is captured and ready. Stages 1 and 2 land alongside this doc: - Stage 1: messageType ctor option on ResumableStream (back-compat patch to agents). - Stage 2: examples/agents-as-tools — focused minimal proof of helpers-as-sub-agents. Open questions around event vocabulary (Ring 2), parallel helpers (empirically should work, v0.2 stress-test follow-up), persistent lifetime (Ring 5), and the AIChatAgent port (Ring 6) are explicit in the doc as the things the example needs to validate before Stage 3's RFC commits to a public API. Made-with: Cursor
Tighten the agents-as-tools prototype based on runtime testing and
review feedback.
Key runtime fix: helper event streams now use byte chunks over DO RPC.
`Researcher.startAndStream` returns `ReadableStream<Uint8Array>` where
each chunk is NDJSON (`{ sequence, body }`) because workerd's DO RPC
stream bridge only transports byte streams. Object chunks caused the
parent's first `reader.read()` to fail with the opaque
"Network connection lost" error. The parent now decodes bytes,
splits on newlines, and forwards each parsed helper event frame.
Also improves helper error semantics: helper-side failures are still
emitted as inline `error` events, but the parent now turns them into
real tool failures instead of returning a successful empty summary.
If no `finished` event produces a summary, the tool fails explicitly.
UI polish:
- add a Clear button wired to `clearHistory()` and local helper-event
cleanup
- reset helper-event state when messages are cleared from another tab
- replace the composer textarea with a single-line Input so Enter
submits normally
- render text and reasoning parts with Streamdown + @streamdown/code,
using the standard Kumo theme bridge
- clean Tailwind class warnings in JSON pre blocks
Operational cleanup:
- bump the example compat date to 2026-04-15 so partyserver 0.5.x can
rely on `ctx.id.name` inside alarm handlers
- update README and WIP notes to describe the byte-stream design, the
Streamdown UI, and the follow-up issues filed from the prototype
- remove the duplicate DEMO_USER export from server.ts so protocol.ts
stays the single runtime-safe shared module
Issues filed while debugging:
- cloudflare/partykit#390 for fresh partyserver 0.5.x DOs + old compat dates
- cloudflare/workerd#6675 for object ReadableStreams failing with
"Network connection lost"
- #1399 for discussing Rpc.Stub<T>-narrowed sub-agent types
Made-with: Cursor
Move the agents-as-tools helper timeline registry from a short-lived active_helpers table to a durable cf_agent_helper_runs table. Helper runs now track running/completed/error/interrupted status and retain completed helper facets so timelines can replay after refresh even after the assistant turn has completed. Runtime behavior: - insert running helper run before reading the helper event stream - mark completed/error when the helper finishes - keep the helper DO after completion because it owns the durable ResumableStream event log - on parent wake, mark any stale running rows as interrupted instead of deleting helper facets - on reconnect, replay stored events for all helper runs and append a synthetic terminal error for interrupted runs (or malformed error runs with no terminal event) - Clear now calls clearHelperRuns before clearHistory, deleting retained helper facets before the chat-clear broadcast to avoid a stale replay race in other tabs Also updates README and the WIP plan to reflect the hibernation state: active-helper refresh, completed-helper refresh, and interrupted replay now work; helper-side keepAliveWhile, helper fibers, live-tail reattachment, TTL/count GC, and a hibernation test matrix remain open follow-ups. Made-with: Cursor
Wrap the Researcher helper's ReadableStream body in keepAliveWhile so helper live execution has the same Agents-level lifecycle shape as a main Think chat turn. Today keepAlive() is a soft no-op on facets because workerd does not yet support independent facet alarms, so this does not change crash recovery semantics: active RPC stream / Promise chain liveness still carries the run, and parent wake still marks running helper rows as interrupted. Keeping the wrapper is intentional future-proofing for the moment alarms start working inside facets, and keeps the helper code in the shape the eventual framework helper should generate. Update README and WIP notes to make that nuance explicit. Made-with: Cursor
Adds 25 tests across four files modeled on examples/assistant/src/tests:
- registry.test.ts: cf_agent_helper_runs schema, the running →
interrupted onStart sweep, and that the sweep is idempotent and
leaves completed/error/interrupted rows untouched.
- clear-helper-runs.test.ts: empty-registry no-op, mixed-status
cleanup of both rows and helper sub-agents, idempotent re-clear,
and the production "missing sub-agent" best-effort path.
- helper-stream.test.ts: drives Researcher.startAndStream end-to-end
through subAgent and pins down the byte-stream contract — NDJSON
envelope, sequence monotonic from 0, started-first ordering,
durable storage round-trip, and the synthesize error path that
fires when env.AI is unbound.
- reconnect-replay.test.ts: every branch of Assistant.onConnect
replay — empty, completed, running, error-with-stored-terminal,
error-without-terminal (synthetic appended), interrupted (with
and without stored events), and multiple runs replayed in
started_at order with per-run sequence preserved.
Test worker subclasses Assistant and Researcher with a focused
seed/inspect surface (testSeedHelperRun, testReadHelperRuns,
testRunHelperToCompletion, Researcher.testWriteEvents) so tests can
construct lifecycle states without a Workers AI binding. Mirrors the
pattern in packages/ai-chat/src/tests/worker.ts; production code
stays untouched modulo a single private → protected on
Researcher._stream / .stream so the test subclass can write into the
helper's own ResumableStream the same way startAndStream does.
Made-with: Cursor
After comparing v0.1 against the screenshots in #1377 (comment 4328296343), pin down what's done vs. what's still needed for the workload the OP is actually shipping. Refreshes three sections of the WIP doc: - New "Coverage gap vs. issue #1377's actual workload" subsection with a status matrix mapping each piece of his use case to its implementation status and where to find it. - New "Decisions confirmed 2026-04-28" subsection: helpers must run their own inference loop (Think-first; ai-chat later); parallel fan-out is in scope and orchestrator-driven; tablePrefix is not shipping (the pivot is the answer); per-helper drill-in is in scope; first-class framework integration (helperTool) deferred until the protocol is validated against multi-turn and parallel cases. - Status block updated: vitest harness landed (25 tests, 4 files), "Hibernation test matrix" bullet trimmed to the work that's actually still missing (real eviction cycles), next-steps rewritten to reflect the new ordering — multi-turn Think helper, parallel fan-out, then drill-in detail UI — and an explicit "not in this near-term list" call-out for tablePrefix / helperTool / AIChatAgent port. Made-with: Cursor
… helper Closes the "multi-turn helpers" gap from issue #1377's actual workload. Implements "Option B" from wip/inline-sub-agent-events.md. Server side: - `Researcher` now `extends Think<Env>` with its own getModel, getSystemPrompt, getTools (one simulated `web_search`). Helper runs are real Think turns driven by `saveMessages`. Think's own `_resumableStream` is the canonical durable event log — there is no second `ResumableStream` on the helper, so the same-DO collision the original #1377 was about cannot occur. - Forwarder is wired by overriding `broadcast`: while a `runTurnAndStream` is in flight, MSG_CHAT_RESPONSE chunks are tee'd into the active RPC stream. Other broadcasts (state, identity, MSG_CHAT_MESSAGES, direct WS clients of the helper for drill-in) pass through untouched. - Wire vocabulary collapsed from six kinds to four: `started` / `chunk` / `finished` / `error`. Lifecycle events are synthesized by the parent from `cf_agent_helper_runs` row data; `chunk` carries an opaque JSON-encoded UIMessageChunk forwarded verbatim from the helper's `_streamResult`. - Schema gained `helper_type`, `query`, `summary`, `error_message` columns so the parent can synthesize lifecycle events on `onConnect` replay without RPCing the helper for metadata. - `getChatChunksForReplay()` and `getFinalAssistantText()` expose Think's own stored chunks and the synthesized summary for the parent's reconnect-replay and tool-output paths. Client side: - Per-helper UIMessage accumulator using `applyChunkToParts` from `agents/chat` — the same primitive `useAgentChat` uses for the assistant's main message. The helper panel now renders text, reasoning blocks, and internal tool calls as a mini-chat, matching the shape of GLips's screenshots in #1377-comment-4328296343. - `seenSequencesRef` Set handles the small reconnect-window race where a chunk arrives both as a replay frame and as a live broadcast (applyChunkToParts mutates the parts array, so dedup has to happen before application). Tests: - `TestResearcher` overrides `getModel()` with a deterministic mock LanguageModel V3, so the helper's Think inference loop runs end-to-end inside the harness with no Workers AI binding. - Reconnect-replay tests seed pre-built UIMessageChunk bodies via `testWriteChunks(chunks, status)`, which writes through Think's own `_resumableStream` exactly the way production does. - Coverage updated for the new four-kind vocabulary and new schema columns. 26 tests across 4 files, all green. Documentation: README + wip doc reflect Option B landing, the new wire vocabulary, and the next-steps reordering (parallel fan-out is now item 1, drill-in detail UI item 2). Made-with: Cursor
Followups to the multi-turn Think helper review. Fixes 8 of the 9
items called out (B3 schema migration deferred — still a prototype).
Server changes:
- **B1 + B2 (error surfacing).** `Researcher.broadcast` override
now detects `parsed.error === true` chat-response frames (which
Think's `_streamResult` broadcasts on inference errors with the
error string as the body, not a `UIMessageChunk`). Those frames
are stashed into `_lastStreamError` instead of being mis-forwarded
through the chunk pipeline (where `applyChunkToParts` would
silently drop them on the client). The parent's `runResearchHelper`
now reads `helper.getLastStreamError()` when no summary is
produced and surfaces the actual error message instead of the
generic "Researcher finished without producing assistant text"
fallback.
- **H1 (drill-in safety).** Replaced `getFinalAssistantText` with
`getFinalTurnText`, which identifies the assistant message
produced by THIS turn by diffing message ids against a snapshot
taken at the start of the turn — robust against drill-in clients
appending their own turns before the parent reads the summary.
- **H2 (concurrent-call guard).** `_runInProgress` boolean is set
sync at entry, cleared in finally/cancel. Prevents two concurrent
`runTurnAndStream` calls on the same helper from overwriting each
other's `_activeForwarder` / `_activeRequestId` state.
- **H4 (chatRecovery off).** `Researcher` now declares
`override chatRecovery = false`. Helpers are per-turn workers;
Think's default chat-recovery fiber would silently re-run the
inference loop into a parent that's no longer listening on every
helper hibernate-and-wake cycle.
- **B4 (abort propagation).** Capture the helper's `requestId` from
`saveMessages`'s return value into `_activeRequestId`. The
ReadableStream's `cancel` callback now calls `abortCurrentTurn`,
which dispatches into Think's `_aborts` registry to actually
cancel the in-flight inference loop. No more burning Workers AI
on output the parent already abandoned.
- **S1 (drop redundant keepAliveWhile).** `saveMessages` already
wraps its body in `keepAliveWhile`; the outer wrap was redundant.
- **S2 (orphan stream cleanup).** `getChatChunksForReplay` detects
a stream still marked `streaming` whose live LLM reader is gone
(orphaned by hibernation) and finalizes the metadata before
returning chunks. Prevents per-helper streaming-row leaks that
would otherwise wait 24h for `_maybeCleanupOldStreams` to GC.
Tests added:
- **B2.** Drives a turn against a mock model whose `doStream`
throws synchronously; asserts `getLastStreamError` returns the
actual error message and `getFinalTurnText` returns null.
- **H1.** `getFinalTurnText` returns null on a helper that never
ran a turn, and returns THIS turn's text after a successful run.
The H2 concurrent-call guard is verified by code review (the in-test
seam approach lit up an unhandled-rejection trail through workerd's
JSRPC bridge that doesn't affect correctness but lights up vitest's
detector; documented inline alongside where the test would have lived).
Made-with: Cursor
Append a "Stage 2 (Option B review fixes)" subsection to the Status block summarizing what landed in 8357dee: B1+B2 error surfacing, B4 abort propagation, H1 final-turn text by snapshot/diff, H2 concurrent-call guard, H4 chatRecovery=false, S1 drop redundant keepAliveWhile, S2 orphan stream cleanup. B3 schema migration explicitly deferred. Bump test count from 26 to 29 and update the stale getFinalAssistantText reference to getFinalTurnText. Made-with: Cursor
…demux Closes the "parallel fan-out" gap from #1377's actual workload (image 3 of comment 4328296343 shows several sibling subagents under one parent tool call). Two fan-out shapes are now wired and tested: - **Alpha (LLM-driven).** The orchestrator LLM calls `research` multiple times in one turn (AI SDK `parallel_tool_calls` default). Each helper has its own `parentToolCallId` and renders as one panel under one chat tool part. - **Beta (programmer-driven).** New `compare(a, b)` tool whose `execute` dispatches both helpers via `Promise.all`. Both share the chat tool call's `parentToolCallId`; the wire format distinguishes them by the `helperId` carried inside each event. Renders as two sibling `<HelperPanel>`s under one `<ToolPart>` — the visible GLips pattern. Server changes: - Add `compare(a, b)` tool that dispatches two `runResearchHelper` calls in parallel and returns both summaries. Adjusted system prompt to nudge the LLM toward `compare` for compare/contrast queries. - Test seam: `Assistant.testRunResearchHelper(query, parentToolCallId)` on the test subclass so tests can drive concurrent helpers without going through the LLM. Client changes: - State shape: `Record<parentToolCallId, Record<helperId, HelperState>>`. Single-helper tool calls show one panel; fan-out tool calls show several stacked siblings. - Dedup key extended from `(parentToolCallId, sequence)` to `(parentToolCallId, helperId, sequence)` because two parallel helpers under one tool call both legitimately emit a `sequence: 0` started event. - `<ToolPart>` accepts `helperStates: HelperState[]` and renders them in arrival order (insertion-order-preserving `Object.values` on the bucket). Tests (`parallel-fanout.test.ts`, 3 new tests, 32 total): - Alpha live broadcast — two concurrent `runResearchHelper` calls with different parentToolCallIds; both complete, registry has two distinct rows, frames split cleanly per parentToolCallId with monotonic sequences. - Beta live broadcast — two concurrent calls sharing parentToolCallId; both complete, frames split per helperId, each helper's sequences run 0/1/2/.../N. - Beta replay — `onConnect` walks two seeded rows sharing parentToolCallId and emits per-helper frames. Both helpers' `sequence: 0` started events arrive without colliding because the dedup key now includes helperId. New helper: `startCollectingHelperEvents(ws)` — persistent message accumulator. The existing `collectHelperEvents` lazily attaches a `once`-listener per next-message and would miss broadcasts that fire synchronously inside an awaited `Promise.all` before any test-side await. Docs: README adds a "Tools" section and updates the test-coverage list. WIP doc marks parallel fan-out as landed in the coverage matrix, adds a Stage 2 entry summarizing what landed, and reorders next-steps so per-helper drill-in detail UI is now item 1. Made-with: Cursor
…istic ordering
Five review findings from the post-fan-out walkthrough addressed.
- **B1: `compare` uses `Promise.allSettled`** instead of
`Promise.all`. A partial failure (one helper errored, the other
succeeded) used to flip the whole tool call to error while the
surviving helper's panel still showed "Done" — a confusing mixed
signal. The new shape returns `{ a: { query, summary | error },
b: { query, summary | error } }` so the orchestrator LLM can
react to "one of two succeeded" honestly. Killing the surviving
helper on first failure is left for a future B4-style abort
propagation pass.
- **B2: deterministic panel ordering.** `started` event now
carries `order: number`; the parent stamps it from a new
`displayOrder` parameter on `runResearchHelper` (defaults to 0
for the single-helper `research` tool; `compare` passes 0/1).
The client sorts each tool-call's helper bucket by `order` so
panels appear left-to-right matching the LLM's input position
rather than the random arrival order of `started` broadcasts.
Persisted in `cf_agent_helper_runs.display_order` so `onConnect`
replay synthesizes the same ordering. Schema bump applied via
an idempotent `try { ALTER TABLE ... ADD COLUMN } catch {}` in
`onStart`, which doubles as a real (if minimal) migration path
for the v0.1 → v0.2 schema gap.
- **N3: bulletproof dedup key.** Client's seen-sequence map is
now keyed by `JSON.stringify([parentToolCallId, helperId])`
rather than a `${parent}::${helper}` template. Removes the
theoretical collision when either id contains `::`.
- **C1: 3-helper Beta test.** New test stresses the parent's
broadcast path with three concurrent helpers under one
parentToolCallId. All three rows complete; live frames demux
per-helper with monotonic sequences each starting at 0.
- **C2: replay-order assertion** added to the existing Beta
replay test. Asserts (a) `started` events on replay carry the
row's `display_order` as `order`, and (b) `onConnect` replay
does NOT interleave per-helper frames — helper-x's last frame
arrives before helper-y's first. Pins down the per-row
serialization `onConnect` does today.
Tests: 33 (was 32). Both typechecks clean.
Made-with: Cursor
Closes the third v0.2 gap from #1377's actual workload (the screenshots showed nested chat-like detail under each subagent). Confirms the "drill-in is real chat, not a custom event view" promise of Option B. Each helper panel grew a small ↗ button; clicking it opens a side panel that runs `useAgentChat` directly against the helper's sub-agent URL: useAgent({ agent: "Assistant", name: DEMO_USER, sub: [{ agent: "Researcher", name: helperId }] }) The framework's `subAgent` routing primitive does all the work — no parent intervention, no cross-DO state, just a normal chat hook against a sub-agent. The side panel renders messages with the same `<MessageParts>` component the main chat uses; sending a follow-up message in the panel triggers a real Think turn on the helper with the parent's original query already in context. Client changes: - New `<DrillInPanel>` component: full-height side overlay, backdrop / Escape / ✕ to close, full `useAgentChat` against the helper sub-agent connection, `<MessageParts>` for rendering, an `<Input>` composer for follow-up turns. - `<HelperPanel>` accepts an `onDrillIn?: (helperId) => void` callback and renders an ↗ button next to the status badge when set. Threaded through `<MessageParts>` → `<ToolPart>` → `<HelperPanel>`. - App owns a `drillInHelperId: string | null` state; the panel reads `helperType` and `query` from the existing `helperStateByToolCall` map. Cleared on chat clear (this tab and cross-tab via the messages.length effect). While a turn is running, both the inline panel (parent's broadcast tee) and the side panel (helper's own chat-protocol broadcasts) update live with the same chunks viewed two ways. Caveats kept honest in the README and WIP doc: - `onBeforeSubAgent` is open — any `helperId` will spawn a fresh facet if it doesn't exist. Production should add a `cf_agent_helper_runs` lookup gate. - Recursive drill-in (helper → its own sub-helper) isn't wired; helpers don't dispatch their own helpers in this example. - Sending in the drill-in panel goes through `saveMessages`, which reads `_lastClientTools` / `_lastBody`. With no client tools defined this is a no-op leak; documented under H3. WIP doc: marks drill-in done in the coverage matrix, adds a Stage 2 entry summarizing the implementation, and updates the next-steps list. The example-side roadmap (multi-turn, parallel fan-out, drill-in) is now complete; the next motions are a second helper class to validate the protocol against non-research workloads, `helperTool(Cls)` framework promotion, and the Stage 3 RFC. Tests still pass (33/33); drill-in is purely a client UI change. Made-with: Cursor
Four review findings from the post-drill-in walkthrough addressed.
- **D1: replay reads back THIS turn's chunks, not "latest".**
Previously, after a drill-in user fired a follow-up turn through
the side panel's composer, the helper had a NEW stream stored.
On parent reconnect, `getChatChunksForReplay()` picked the most
recent stream by `created_at` and the inline panel rebuilt from
the follow-up turn — even though the parent's tool output and
`summary` row column reflected the original turn. The two views
drifted on every refresh.
Fix: capture the helper's stream id after `saveMessages`
resolves (`_lastTurnStreamId`, exposed via `getLastTurnStreamId`),
stash it in `cf_agent_helper_runs.stream_id`, and have
`getChatChunksForReplay(streamId?)` accept an explicit id.
`onConnect` reads `row.stream_id` and passes it through. Schema
bump applied via the same idempotent `try { ALTER TABLE … ADD
COLUMN } catch {}` pattern as `display_order`.
Regression test in `reconnect-replay.test.ts`: seeds turn 1's
chunks (capturing the row's stream_id), then writes turn 2's
chunks via the new `testWriteAdditionalHelperChunks` seam,
asserts replay returns turn 1's body and not turn 2's.
- **D2: `<DrillInPanel>` keyed by `helperId`.** Switching from one
helper's drill-in to another now fully unmounts/remounts the
panel — tears down the previous `useAgent` WebSocket cleanly,
resets the composer's input state, and avoids any prop-vs-
hook-arg drift. One-line fix.
- **N1: status badge in drill-in header.** Mirrors the inline
panel's Running / Done / Error badge so the side panel's header
feels consistent with the panel the user just clicked through
from. Prop named `helperStatus` (not `status`) to avoid
colliding with `useAgentChat`'s own `status` symbol.
- **N2: system-prompt nudge for `compare` partial failure.** Added
one line to `Assistant.getSystemPrompt`: "If a `compare` result
includes an `error` field for one branch, acknowledge the gap
and synthesize from the successful branch only." The structured
`Promise.allSettled` shape from the previous polish pass already
gives the LLM the data; this nudge tells it what to do with it.
WIP doc: appended a "drill-in review polish" Stage 2 entry
summarizing what landed AND what we explicitly punted on (N3
side-panel parent-turn indicator; E1 stream-metadata growth; E2
concurrent drill-in tabs; E3 STREAM_RESUMING smoke-test; E4 open
`onBeforeSubAgent` gate; focus trap / aria-modal; drill-in tests;
duplicated `query` in `compare` output) so we have a paper trail
for the choices.
Tests: 34 (was 33); D1 regression test added.
Made-with: Cursor
…nt` base
Closes the "is the helper-event vocabulary right?" gap from Ring 2 of
the design notes. With a second helper class running the same
protocol against a meaningfully different workload, the answer is
yes — the chunk firehose generalizes without any vocabulary changes.
Server changes:
- **Extracted `HelperAgent extends Think<Env>` as the shared base.**
All helper-protocol bits — the `broadcast` tee, `runTurnAndStream`,
`chatRecovery = false`, the concurrent-call guard, and every
lifecycle accessor (`getChatChunksForReplay`,
`getLastTurnStreamId`, `getFinalTurnText`, `getLastStreamError`)
— live there. Concrete helpers stay thin: pick a model, a system
prompt, and a tool surface.
- **`Researcher` slimmed down to extends `HelperAgent`** with just
its three overrides. Behavior unchanged.
- **New `Planner extends HelperAgent`** that produces structured
implementation plans (Overview / Affected files / Step-by-step
/ Open questions) with a single simulated `inspect_file` tool.
Different system prompt, different tool surface, same protocol.
- **`Assistant._runHelperTurn(cls, query, parentToolCallId,
displayOrder?)`** generalizes the previous `runResearchHelper`.
`cls` is typed as `HelperClass = typeof Researcher | typeof
Planner`; `cls.name` feeds the row's `helper_type` and
`subAgent(cls, ...)` spawns the right facet.
- **Class registry `helperClassByType: Record<string,
HelperClass>`** used by `onConnect` and `clearHelperRuns` to
resolve the row's stored `helper_type` string back to a class.
Defensive fallback to `Researcher` for unknown types.
- **New `plan(description)` tool** dispatching `Planner` via
`_runHelperTurn`. Updated system prompt to nudge the LLM
toward `plan` for "how do I implement X" queries.
- **Wrangler v2 migration** adds `Planner` to `new_sqlite_classes`;
same in `tests/wrangler.jsonc`. Idempotent — existing v1
deployments pick it up additively.
Test changes:
- `Planner` test subclass with the same mock-model + `testWriteChunks`
surface as the `Researcher` test class. Deliberately duplicated
rather than mixed in (TypeScript class mixins are gnarlier than
two ~30-line classes are noisy).
- All seams that used to hardcode `Researcher` now accept an
optional `className: "Researcher" | "Planner"` arg, defaulting
to `Researcher` so existing tests don't have to thread it.
- Two new tests (36 total, was 34):
- **Planner end-to-end** in `helper-stream.test.ts` — drives a
Planner turn through the byte-stream protocol and verifies the
same NDJSON / chunk storage / final-text pipe holds.
- **Mixed-class clear** in `clear-helper-runs.test.ts` — seeds
one Researcher row + one Planner row, verifies
`clearHelperRuns` resolves the right facet table for each
via the new class registry (would leak a Planner facet if the
old hardcoded `Researcher` lookup were still in place).
What this validates for Stage 4:
- `HelperAgent` IS the shape `helperTool(Cls)` will accept —
`Cls extends HelperAgent` is a plausible constraint.
- The class-registry pattern is what `helperTool(Cls)` would
generate as part of its setup.
- `_runHelperTurn` is the ~80-line body that should move into
the framework helper. Everything else in `Assistant`
(`getTools`, `onStart`, schema migration) stays as
consumer-side code.
WIP doc: marks the second-helper-class step done in the
next-actionable list, adds a Stage 2 entry summarizing the
extraction and what it unblocks for Stage 4, updates the test count.
Made-with: Cursor
…rcher"
Symptom: clicking ↗ on a `Planner` helper panel opens the side
panel, which sits on "Connecting to helper…" forever. No errors or
warnings — silent failure.
Cause: `<DrillInPanel>` hardcoded `agent: "Researcher"` in the
`useAgent({ sub: [...] })` call. For a Planner helper, this routed
to a Researcher facet with that helper's id. Because
`onBeforeSubAgent` is open, the framework auto-spawned a fresh
empty Researcher facet, which `useAgentChat` connected to and
showed `messages: []` indefinitely. Researcher helpers worked by
coincidence — the hardcoded class happened to match the actual
class.
Fix: pass `helperType` from the helper's state into the `sub.agent`
field. The drill-in now routes correctly for any helper class —
Researcher to `Researcher`, Planner to `Planner`. The framework's
kebab-case URL builder turns the class name into the right path
segment (`/sub/researcher/...` or `/sub/planner/...`).
README also updated so the documented snippet reflects the
dynamic-class pattern, not a hardcoded class name.
This was a real consequence of the wip doc's "no drill-in tests"
punt — the bug wouldn't have shipped if there were a test that
opened drill-in for a non-Researcher helper. Worth keeping in mind
when we revisit the React-side test gap.
Tests still pass (36/36) since drill-in isn't covered.
Made-with: Cursor
Review fixes for the `Planner` / `HelperAgent` extraction (`02ab6d05`) based on a deep read across that commit and the drill-in routing fix (`e9c0e0ff`): - **M2**: `helperClassByType` is now `as const` and `HelperClass` is derived from `keyof typeof` it. Adding a class is one site (the registry); the type, `_runHelperTurn`'s arg, and the `helperClassFor` lookup all flow from there. The unknown-`helper_type` fallback also `console.warn`s once so drift surfaces early. - **C1**: Planner-specific `onConnect` replay test in `reconnect-replay.test.ts` — seeds a `helperType: "Planner"` row + chunks and asserts replay emits `started` (carrying `helperType: "Planner"`), the seeded `chunk`, and `finished`. Pins the registry lookup so we don't regress to a hardcoded class. - **C2**: `<DrillInPanel>` now validates `helperType` against a `KNOWN_HELPER_TYPES` set before opening `useAgent`. On miss it renders an explicit "Unknown helper class: X" error state instead of the silent "Connecting to helper…" hang the 2026-04-28 routing bug exposed; composer is also disabled in that state. - **N1**: removed `className = "Researcher"` defaults from all test seams (`hasHelper`, `testRunHelperToCompletion`, `testReadStoredHelperChunks`, `testReadHelperFinalText`, `testReadHelperStreamError`, `testSetHelperMockMode`, `testWriteAdditionalHelperChunks`); renamed `testRunResearchHelper` → `testRunHelper(className, query, parentToolCallId, displayOrder?)` with class first to match production. Existing tests updated to pass `"Researcher"` explicitly. Closes the footgun where a future Planner test could silently check Researcher's facet table and pass for the wrong reason. - **Wrangler v2 → v1 consolidation**: rolled the v2 entry that added `Planner` to `new_sqlite_classes` into v1 in both example and test `wrangler.jsonc` files. Nothing's deployed; cleaner for first-time deploys. - **M1 reasoned away**: `cls.name` is stable across the esbuild + `@cloudflare/vite-plugin` build because workerd's `ctx.exports` requires top-level class export names to match the wrangler binding strings. If tooling ever did mangle them, migration is a one-shot SQL `UPDATE` on `cf_agent_helper_runs.helper_type`. Documented in the wip doc; not blocking. Doc polish: - README "How to read this code" walkthrough refreshed to mention `HelperAgent` and the class registry. - README "If you want to extend it" rewritten — both prior bullets (parallel fan-out, drill-in) are now shipped features. - README's "Tests" section updated for `TestPlanner` + Planner end-to-end + C1 + D1 + 3-helper Beta stress test. - README's diagram and inline drill-in snippet updated to use `helperType` rather than hardcoded `"Researcher"`. - Renamed `runResearchHelper` → `_runHelperTurn` references in README, server doc-comments, test file headers, and the Stage 1 + earlier sections of `wip/inline-sub-agent-events.md` that described current state. Rename history entries left as-is. - New Stage 2 entry in the wip doc tracking M1 (skipped), M2 / C1 / C2 / N1 (landed), wrangler consolidation, and the polish pass. Tests: 37 (was 36); one new C1 Planner replay test. Made-with: Cursor
…nt gate
Two of the items from the README's "out of scope" table were really
"deferred but small" rather than genuinely out-of-scope. Shipping them
lets the example be honestly described as production-shaped rather
than demo-shaped.
**B4 cancellation propagation: fully wired.**
Helper-side cancel was already in place (the RPC stream's `cancel`
callback aborts via Think's `_aborts`). What was missing was the
parent-side thread: the AI SDK passes an `abortSignal` on each tool
execute's second arg, but the example wasn't reading it.
Each tool execute now destructures `{ toolCallId, abortSignal }` and
threads the signal into `_runHelperTurn` via a new `opts.abortSignal`.
The function registers an `abort` listener that cancels the helper RPC
reader; the cancel propagates over JSRPC to the helper's `cancel`
callback, which calls `abortCurrentTurn`. The post-loop arm checks
`signal.aborted` and surfaces the abort as an error (rather than a
silent empty summary), which flows through the existing catch arm:
row marked `error`, synthesized `error` event broadcast, panel
doesn't sit on "Running…". A `finally` arm detaches the listener so
a parent that runs many helpers across many turns doesn't accumulate
stale closures on its abort signals.
**E4 `onBeforeSubAgent` gate: production-shaped.**
`Assistant` now overrides `onBeforeSubAgent` to look up the requested
`(helperType, helperId)` in `cf_agent_helper_runs` and return a 404
if the row doesn't exist. Drill-in URLs are no longer guessable; an
attacker can't drill into a Researcher facet by routing through the
Planner endpoint (the gate's `WHERE` clause is on
`(helper_id, helper_type)`, so cross-class access fails closed).
Internal `subAgent(...)` calls bypass the hook by design (matches
`getAgentByName` bypassing `onBeforeConnect`), so `_runHelperTurn`'s
own helper spawn isn't blocked by its own check.
**Helper-class-agnostic error message.**
The empty-summary fallback used to say "Researcher finished without
producing assistant text"; updated to use `${helperType}` so a Planner
failure now reads "Planner finished…" rather than impersonating
Researcher.
**Tests.**
New `cancellation-and-gate.test.ts` (6 cases):
- B4: pre-aborted signal rejects with an abort error
- B4: pre-aborted signal marks the row `error` with abort message
- B4: same path works for Planner (class-agnostic)
- E4: gate rejects an unseeded helperId with 404
- E4: gate accepts a seeded helperId with 101 WS upgrade
- E4: gate rejects cross-class drill-in (seeded as Researcher,
drilled-in as Planner → 404)
Tests: 43 (was 37, +6 new).
**Docs.**
README's "out of scope" table is now four rows instead of six. The
drill-in section's caveat about an "open gate" is replaced with a
note documenting the production posture. New "Cancellation
propagation (B4)" paragraph next to "Parent-crash recovery". The
wip doc's Stage 2 entry now includes a "production-shape polish"
sub-section, the older B4 entry is updated to clarify "helper-side
half" landed earlier and the parent-side thread landed today, and
the punted E4 item is struck through with a pointer to the new
landing entry.
Made-with: Cursor
The 2026-04-28 drill-in routing bug shipped because the existing test harness only covers DO RPC + WebSocket frame paths. Bugs that live in the React layer — `useAgent` URL resolution, drill-in routing, replay-vs-live state reducers — slip through. Adds a Playwright suite at `examples/agents-as-tools/e2e/` that boots `vite dev` and drives the real app in Chromium against the production `ai` binding (`remote: true`). High-fidelity: actual WS frames, actual DO routing, actual LLM tool selection. **Tests (7):** - `smoke.e2e.ts` — page loads, WS handshake completes, composer becomes interactive. - `research-drill-in.e2e.ts` — research prompt spawns Researcher panel; drill-in connects to a Researcher facet and renders messages. - `planner-drill-in.e2e.ts` — same flow for `plan`. Pins the `e9c0e0ff` regression: with the bug, drill-in to a Planner panel hung on "Connecting to helper…". - `compare-fanout.e2e.ts` — `compare` prompt renders TWO Researcher panels under one chat tool call. - `refresh-replay.e2e.ts` — completed runs survive a page reload. Single-helper + Researcher+Planner two-helper cases. - `clear.e2e.ts` — Clear wipes both surfaces; reload after Clear doesn't bring panels back. **Per-test fresh DO:** Each test goes to `/?user=<unique>`. The client now honors that param as an override for `DEMO_USER`, so each test runs against its own Assistant DO. Sidesteps a framework gap: alarms scheduled inside helper facets lose `ctx.id.name` when they fire after a dev-server restart (the 2026-04-15 compat-date fix covers top-level DOs, not facets). With unique users, each test's DO is fresh — no in-flight alarms from a previous run. Captured the framework gap in the wip doc as an upstream-needed fix in partyserver / agents. **Stable selectors:** Added minimal `data-testid` hooks to `client.tsx`: `helper-panel` (with `data-helper-type` / `data-helper-id` / `data-helper-status`) and `drill-in-panel`. The rest of the suite uses ARIA semantics (`getByRole`, `getByPlaceholder`). Drive-by: fixed two Kumo a11y warnings by adding `aria-label` to the parent and drill-in composers. **Config:** `playwright.config.ts` boots `vite dev` via `webServer`; `workers: 1` serializes tests so they don't fight over Workers AI capacity; `retries: 1` rides out occasional 504s; `timeout: 180_000` covers the slow `kimi-k2.5` model. `expect`'s per-action timeout is 60s. Scripts: `npm run test:e2e` (headless), `test:e2e:headed`, `test:e2e:ui` (interactive). **Local-only for now:** User's stated workflow is local. CI integration would need `playwright install --with-deps chromium` and a Workers AI auth shape from the runner; punted. Full suite ~4-5 minutes locally. 7/7 passing. Made-with: Cursor
…6-04-28 The branch covers a lot of ground (20 commits, ~8500 insertions across the example, packages/agents, and the wip doc). Adds a session-handoff read at the top of the wip doc and a status banner on the example README so the next session opens to a clear "what's shipped, what's next" view rather than scrolling through the chronological log. **`wip/inline-sub-agent-events.md`** — new "Resuming this work — snapshot 2026-04-28" section right after the intro: - What's shipped on this branch (Stage 1 + Stage 2 + tests, with pointers to Stage 2's two-helper-class extraction, drill-in, cancellation, gate, and e2e suite). - What's NOT shipped (Stage 3 RFC, Stage 4 framework helper, Stage 5 AIChatAgent port). - The newly-surfaced framework gap (alarms inside helper facets lose `ctx.id.name` after dev-server restarts; the 2026-04-15 compat-date fix covers top-level DOs but not facets) — captured as a near-term next-step candidate. - Three concrete next-step candidates ranked by leverage: Stage 3 RFC (highest), framework facet-alarm fix (medium scope), Stage 4 helperTool(Cls) (premature without the RFC). - "How to resume tactically" — exact commands to verify the current state (`npm test`, `npm run test:e2e`, `npm start`) and the canonical entry points in the doc + code. **`examples/agents-as-tools/README.md`** — new "Status (2026-04-28)" section right after the intro: one paragraph summary of what the example covers (three tools, two helper classes, drill-in, cancellation, gate, replay), the test counts (43 vitest + 7 Playwright), and the next-up work pointer to the wip doc. No code changes — the comments + doc-comments inline already capture the per-bug context (e.g. the framework facet-alarm gap is documented at the e2e helper that works around it). Made-with: Cursor
🦋 Changeset detectedLatest commit: 0148a7b The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| // `this.name` getter when a wake path goes through alarm() instead | ||
| // of fetch() with idFromName(). Symptom is | ||
| // "Error in Assistant:<unnamed> fetch: … this.ctx.id.name is not set". | ||
| "compatibility_date": "2026-04-15", |
There was a problem hiding this comment.
🟡 wrangler.jsonc uses compatibility_date: "2026-04-15" violating AGENTS.md convention of "2026-01-28"
The AGENTS.md Workers conventions section states: "All wrangler configs use compatibility_date: "2026-01-28" and compatibility_flags: ["nodejs_compat"]". Both examples/agents-as-tools/wrangler.jsonc:10 and examples/agents-as-tools/src/tests/wrangler.jsonc:5 use "2026-04-15" instead. The deviation is documented (needed for ctx.id.name in alarm handlers) and the AGENTS.md has an "Ask first" carve-out for compatibility date changes, but the stated convention is still violated.
Was this helpful? React with 👍 or 👎 to provide feedback.
…ream`
Reverts `acce611c` (the very first commit on this branch). The option
landed before the v0.2 design pivot moved helper events onto each
helper's own DO. After the pivot, no caller in the repo uses a
non-default value:
- Think's only `new ResumableStream(this.sql.bind(this))` doesn't
pass options.
- The `agents-as-tools` example doesn't instantiate
`ResumableStream` directly at all — helpers use Think's, and
`helper-event` envelopes are broadcast outside the
`ResumableStream` machinery (parent `broadcast()` over its
own WebSocket, not a second `ResumableStream` over the same
connection).
- No test exercises a non-default `messageType`.
The option's whole purpose was to prevent frame-type collisions when
two `ResumableStream` instances share a WebSocket. The pivot ensured
that situation cannot arise in the example, and the changeset itself
acknowledged the same isolation argument applies to the dropped
`tablePrefix` from #1377. Shipping `messageType` therefore added
public API surface that nothing exercised — speculative API we
hadn't validated. Removing it now keeps the framework's surface
honest.
What's reverted:
- `ResumableStreamOptions` type + `messageType` ctor arg in
`packages/agents/src/chat/resumable-stream.ts` — back to the
original `constructor(sql: SqlTaggedTemplate)` signature.
- The four `this._messageType` usages in `replayChunks` restored
to hardcoded `CHAT_MESSAGE_TYPES.USE_CHAT_RESPONSE`.
- `ResumableStreamOptions` removed from `packages/agents/src/chat/index.ts`
public exports.
- Changeset `.changeset/resumable-stream-message-type.md` deleted.
Verification:
- `git diff main -- packages/agents .changeset` is empty.
- 28 `resumable-streaming.test.ts` regression tests in `packages/ai-chat`
still pass.
- 43 `agents-as-tools` vitest tests still pass (none touched
`ResumableStream`'s ctor surface).
- Typecheck clean across `packages/agents`, the example, its
test worker, and the e2e suite.
wip doc updates:
- Stage 1 status flipped from "landed" to "not shipped" in three
places (top handoff, Status section, Stage 1 plan section
header). The original plan kept as historical context.
- The "Decisions confirmed 2026-04-28" entry that ruled out
`tablePrefix` is rewritten to also cover `messageType` —
same isolation argument applies, both options are unnecessary
after the pivot.
- "What's shipped vs unshipped" table row for `messageType`
flipped to "not shipping".
- "Explicitly NOT in this near-term list" entry now mentions
both ctor options.
- The v0.1 narrative section that describes the original
`messageType: "helper-event"` setup is left as-is; it
accurately captures historical state at the time and is
superseded by the v0.2 update entry that follows it.
If a future use case needs a non-default `messageType` (or a
`tablePrefix`), it's a small additive change with a real caller to
point at.
Made-with: Cursor
agents-as-tools example + messageType ctor optionagents-as-tools example + design notes
…at` README The "Related" section pointing at `examples/agents-as-tools` was added in `d95f0b53` (the initial example commit) before the design firmed up. Cross-linking the two examples implies a parity between them — "this is the AIChatAgent equivalent of that" — that doesn't exist yet: `agents-as-tools` is Think-only and the AIChatAgent port is explicitly Stage 5 (deferred). Promoting the link from `ai-chat`'s README before the port lands sets a misleading expectation. The reverse direction (`agents-as-tools` → `ai-chat`) stays — it correctly describes `ai-chat` as "the canonical AIChatAgent reference" without claiming the example is itself ported there. Will re-add this link to `ai-chat`'s README when the AIChatAgent port lands as part of Stage 5. Made-with: Cursor
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
| <head> | ||
| <meta charset="UTF-8" /> | ||
| <link rel="icon" href="/favicon.ico" /> | ||
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> |
There was a problem hiding this comment.
🔴 Missing required public/favicon.ico directory per examples/AGENTS.md
The examples/AGENTS.md requires every example to include public/favicon.ico. This example has no public/ directory at all. The index.html references <link rel="icon" href="/favicon.ico" /> which will 404 since the file doesn't exist. Other examples (e.g. examples/playground, examples/assistant) include a public/favicon.ico.
Was this helpful? React with 👍 or 👎 to provide feedback.
| // `this.name` getter when a wake path goes through alarm() instead | ||
| // of fetch() with idFromName(). Symptom is | ||
| // "Error in Assistant:<unnamed> fetch: … this.ctx.id.name is not set". | ||
| "compatibility_date": "2026-04-15", |
There was a problem hiding this comment.
🔴 compatibility_date is "2026-04-15" instead of the required "2026-01-28"
examples/AGENTS.md and the root AGENTS.md both mandate compatibility_date: "2026-01-28" for all wrangler configs. This example uses "2026-04-15" in wrangler.jsonc (line 10). The deviation is documented with a technical reason (alarm handler fix for ctx.id.name propagation), but the root AGENTS.md says to "Ask first" before "Changing wrangler.jsonc compatibility dates across the repo".
Prompt for agents
The examples/AGENTS.md convention mandates compatibility_date: 2026-01-28. This example uses 2026-04-15 with a documented reason (alarm handler ctx.id.name propagation). If the later date is genuinely required for the example to function, the deviation should be pre-approved and the examples/AGENTS.md rule should be updated to allow exceptions. If the alarm handler issue only affects the e2e test suite (which already works around it with per-test unique users), the production wrangler.jsonc could potentially use the standard date while the test wrangler.jsonc uses the newer one. Both wrangler.jsonc files (examples/agents-as-tools/wrangler.jsonc and examples/agents-as-tools/src/tests/wrangler.jsonc) need to be aligned with whatever decision is made.
Was this helpful? React with 👍 or 👎 to provide feedback.
…-recovery gap `partyserver` 0.5.4 fixes the bug filed at [`cloudflare/partykit#390`](cloudflare/partykit#390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 would lose `this.name` on alarm wake (no `ctx.id.name` propagation in old runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent; restores the safety net pre-0.5.x had. Surfaced in this branch during e2e suite development. Worked around at the time by giving each test its own Assistant DO via a `?user=<id>` query-param override, so each test's DO was fresh and never hit the alarm path with stale state. With 0.5.4, the workaround is no longer needed for the bug. The per-test unique-user pattern stays purely for test isolation (no helper-row / chat-history state leaks across tests). Verification: - `npm ls partyserver` shows 0.5.4 deduped across `agents`, `examples/voice-input`, and the root. - 43 vitest tests still pass. - 7 Playwright e2e tests still pass — output no longer contains the partyserver "Attempting to read .name… `this.ctx.id.name` is not set" error that fired in the background of every previous run. Updates to docs: - `wip/inline-sub-agent-events.md` — handoff snapshot's "Open framework gap" reframed to "surfaced and fixed"; next-step candidates list drops the framework facet-alarm fix (it landed); two Stage 2 entries reframed; related-issues paragraph marks #390 as fixed; e2e helpers comment updated. - `examples/agents-as-tools/README.md` — e2e tests blurb no longer claims the unique-user pattern works around a framework gap; 0.5.4 link added as a parenthetical. - `examples/agents-as-tools/e2e/helpers.ts` — `uniqueUser` doc- comment updated. What's still open at the framework level (not addressed by 0.5.4): - workerd doesn't yet support independent facet alarms. The helper-side `keepAliveWhile` wrapper is a soft no-op on facets for that reason. Documented in the wip doc's "Hibernation / fibers gaps" section; not in scope for this PR. Made-with: Cursor
…troyAll()` + add changeset **Adds the missing changeset** for the partyserver 0.5.4 bump (`.changeset/agents-partyserver-0.5.4.md`) — should have landed with `c4a0d887` and didn't. Patch bump on `agents` documenting the peer-dependency raise. **Fixes a real bug** in the helper-side cancellation path. The review found that `_activeRequestId` is only set AFTER `saveMessages` resolves (line 377 of `src/server.ts` was after the await), and `releaseClaim()` immediately clears it back to undefined (line 400). The synchronous span between line 377 and line 400 has no awaits, so the `cancel()` callback at line 407 cannot ever observe `_activeRequestId !== undefined` during a real cancellation — the entire abort path was dead. The B4 vitest tests still pass because they validate the PARENT-SIDE error surfacing (`signal.aborted` check + thrown error + row update + synthesized `error` event), which works end-to-end. The helper-side cancellation never actually fired. What the fix does: 1. **Drops the dead `_activeRequestId` field** — no longer used for anything. The stream-id capture at the same point (`_lastTurnStreamId`) is unaffected. 2. **Switches `abortCurrentTurn` to `_aborts.destroyAll()`.** The helper is single-purpose (one in-flight turn at a time, guarded by `_runInProgress`), so the only controller in the registry, if any, is the one we want to cancel. destroyAll doesn't need the requestId Think generates internally. 3. **Honest doc-comment on `abortCurrentTurn`** explaining the remaining race window: Think's `saveMessages` lazily creates the controller via `_aborts.getSignal(requestId)` after several internal awaits (`keepAliveWhile` → `_turnQueue.enqueue` → `appendMessage` → `_broadcastMessages` → then `getSignal`). If `cancel()` arrives before that point, the registry is empty and `destroyAll()` is a no-op; the inference runs to completion. 4. **Updated `cancel()` callback comment** documenting the same. In practice cancels arrive mid-inference (Stop button after several seconds of streaming) and the controller exists by the time we destroyAll, so the abort works. Early cancels (a pre-aborted signal, or an instant tab close) still waste one inference pass. The proper fix needs `Think.saveMessages` to accept an external `AbortSignal` so the helper can pre-create a controller it owns from the start of the turn. That's a Think public API addition — deliberately out of scope for this PR; tracked in the wip doc as a Stage 4 / framework follow-up. **Doc updates:** - `examples/agents-as-tools/README.md` — "Cancellation propagation (B4)" rewritten to be honest about the best-effort nature and the race window. The "Try the cancellation propagation path" extension hint reframed accordingly. - `wip/inline-sub-agent-events.md` — both B4 entries (chronological "helper-side abort plumbing" and Stage 2 recap) updated. First-attempt approach, the dead-code finding, and the destroyAll fix all captured. Notes that the B4 vitest tests validate parent-side surfacing, not helper-side abort. Verification: - 43 vitest tests pass (no behavior change — they were validating the parent-side path, not the helper-side claim). - Typecheck clean across the example, test worker, and e2e. Made-with: Cursor
Filed [#1406](#1406): \`Think.saveMessages\` should accept an external \`AbortSignal\` so callers can cancel an in-flight turn from outside. Captures the cancellation race window documented in this branch across three places that all referenced the gap as "Stage 4 / framework follow-up" without a concrete link to follow: - \`examples/agents-as-tools/src/server.ts\` — \`abortCurrentTurn\` doc-comment now links to #1406 instead of the abstract "Stage 4" pointer. - \`examples/agents-as-tools/README.md\` — "Cancellation propagation (B4)" paragraph likewise. - \`wip/inline-sub-agent-events.md\` — related-issues paragraph adds #1406 alongside the existing partykit#390 and workerd#6675 links; the B4 chronological entry replaces "Stage 4 / framework follow-up" with the explicit issue link. Also promoted the saveMessages-AbortSignal fix to a concrete next-step candidate at the top of the wip doc, slotted between the Stage 3 RFC and Stage 4 framework helper. It's a small additive API change that could land before either of those and would unblock proper helper-side cancellation without waiting on the broader framework promotion. No code changes — purely cross-linking. Made-with: Cursor
| declare namespace Cloudflare { | ||
| interface GlobalProps { | ||
| mainModule: typeof import("./src/server"); | ||
| durableNamespaces: "Assistant" | "Researcher"; |
There was a problem hiding this comment.
🔴 Stale env.d.ts missing Planner from durableNamespaces
The env.d.ts declares durableNamespaces: "Assistant" | "Researcher" but wrangler.jsonc:27 lists three classes in new_sqlite_classes: ["Assistant", "Researcher", "Planner"]. The file header says it was generated by wrangler types but it was not regenerated after Planner was added. While Researcher (also a sub-agent, not a top-level binding) is correctly listed, Planner is missing. This inconsistency means the Cloudflare Vite plugin may not discover the Planner class for sub-agent routing, which could cause drill-in to Planner facets to fail at runtime. Regenerating with npx wrangler types env.d.ts --include-runtime false would fix this.
| durableNamespaces: "Assistant" | "Researcher"; | |
| durableNamespaces: "Assistant" | "Researcher" | "Planner"; |
Was this helpful? React with 👍 or 👎 to provide feedback.
Update partyserver dependency to ^0.5.5 in package.json (and workspace packages) and refresh the lockfile. Reduce test flakiness by widening streaming chunk delays in ai-chat tests (increase chunkDelayMs to give more wall-clock headroom) and add clarifying comments. Also relax a strict ordering assertion in message-concurrency.test.ts to assert set-equality (sort the request IDs) to avoid transient microtask ordering failures while preserving the intended guarantees.
| "agents": patch | ||
| --- | ||
|
|
||
| Bump `partyserver` peer dependency to `^0.5.4`. 0.5.4 closes [`cloudflare/partykit#390`](https://github.com/cloudflare/partykit/issues/390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 could lose `this.name` on alarm wake (no `ctx.id.name` propagation in older runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent, restores the safety net pre-0.5.x had. Affects any project on a pre-cutoff `compatibility_date` whose DOs schedule alarms (which includes Think's `_chatRecoveryContinue`). |
There was a problem hiding this comment.
🟡 Changeset claims ^0.5.4 but actual dependency is bumped to ^0.5.5
The changeset file says "Bump partyserver peer dependency to ^0.5.4" but the actual changes in both package.json:65 and packages/agents/package.json:34 set the version to ^0.5.5. The published changelog will say 0.5.4 is the minimum, but the package.json requires 0.5.5+. A user reading the changelog who installs partyserver@0.5.4 would not satisfy the actual ^0.5.5 constraint.
| Bump `partyserver` peer dependency to `^0.5.4`. 0.5.4 closes [`cloudflare/partykit#390`](https://github.com/cloudflare/partykit/issues/390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 could lose `this.name` on alarm wake (no `ctx.id.name` propagation in older runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent, restores the safety net pre-0.5.x had. Affects any project on a pre-cutoff `compatibility_date` whose DOs schedule alarms (which includes Think's `_chatRecoveryContinue`). | |
| Bump `partyserver` peer dependency to `^0.5.5`. 0.5.4 closes [`cloudflare/partykit#390`](https://github.com/cloudflare/partykit/issues/390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 could lose `this.name` on alarm wake (no `ctx.id.name` propagation in older runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent, restores the safety net pre-0.5.x had. Affects any project on a pre-cutoff `compatibility_date` whose DOs schedule alarms (which includes Think's `_chatRecoveryContinue`). |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
This PR is the empirical work behind issue #1377 (`ResumableStream` is hardcoded to chat use case). It does not ship the framework patch from #1377 — the v0.2 design pivot rendered both ctor options unnecessary. What ships:
ResumableStreamis hardcoded to chat use case #1377 was the visible tip of: helpers-as-sub-agents-with-parent-forwarded-events. A chat agent dispatches a helper sub-agent inside a tool execute; the helper is itself a Think instance with its own model, system prompt, tools, and inference loop; its chat stream forwards live into the parent chat UI as an inline mini-panel under the matching tool call.Net diff vs `main` in `packages/agents`'s code: zero. A `messageType` option on `ResumableStream` was briefly shipped as `acce611c` (the very first commit on this branch, before the pivot) and reverted on 2026-04-28 once it became clear no caller used a non-default value: helpers run on their own DOs, the example forwards `helper-event` envelopes via `broadcast()` rather than a second `ResumableStream`, and the same DO-isolation argument that killed `tablePrefix` kills `messageType`. The wip doc captures the reasoning under "Decisions confirmed 2026-04-28" #3. If a future use case needs either ctor option, it's a small additive change with a real caller to point at.
The example is feature-complete for v0.2 and is the empirical grounding for the Stage 3 RFC, which is the natural next motion (in a separate session/PR).
What lands in `examples/agents-as-tools`
Tests
Framework gap surfaced in this branch — fixed in
partyserver0.5.4Filed `cloudflare/partykit#390`: fresh partyserver 0.5.x DOs with `compatibility_date` older than 2026-03-15 would lose `this.name` on alarm wake. Surfaced by this branch's e2e suite during dev-server restarts when stale alarms woke helper facets. Fixed in `partyserver` 0.5.4 via a defensive one-time `__ps_name` write on first fetch; this PR bumps the pin. The e2e suite's per-test unique-user pattern stays for test isolation but no longer compensates for an upstream bug. Verified locally — the partyserver "Attempting to read .name…" error that fired in the background of every previous e2e run is now absent.
Test plan
Where to start reading
Branch has 23 commits. Squash-merging is fine; the chronological history is captured in `wip/inline-sub-agent-events.md`'s Status section.