Skip to content

Inline sub-agent event streaming: agents-as-tools example + design notes#1405

Merged
threepointone merged 28 commits intomainfrom
agents-as-tools
Apr 28, 2026
Merged

Inline sub-agent event streaming: agents-as-tools example + design notes#1405
threepointone merged 28 commits intomainfrom
agents-as-tools

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented Apr 28, 2026

Summary

This PR is the empirical work behind issue #1377 (`ResumableStream` is hardcoded to chat use case). It does not ship the framework patch from #1377 — the v0.2 design pivot rendered both ctor options unnecessary. What ships:

  1. A complete prototype example (`examples/agents-as-tools`) of the bigger thing ResumableStream is hardcoded to chat use case #1377 was the visible tip of: helpers-as-sub-agents-with-parent-forwarded-events. A chat agent dispatches a helper sub-agent inside a tool execute; the helper is itself a Think instance with its own model, system prompt, tools, and inference loop; its chat stream forwards live into the parent chat UI as an inline mini-panel under the matching tool call.
  2. A design doc (`wip/inline-sub-agent-events.md`) capturing the design space, the 2026-04-27 pivot to "helpers as sub-agents", decisions confirmed 2026-04-28, and the staged plan for promoting the example pattern into a framework primitive (Stage 3 RFC, Stage 4 `helperTool(Cls)`, Stage 5 AIChatAgent port).
  3. A small framework dependency bump: `partyserver` ^0.5.3 → ^0.5.4 across the monorepo. 0.5.4 fixes `cloudflare/partykit#390`, a facet-alarm name-recovery bug surfaced by this branch's e2e suite. Defensive one-time `__ps_name` storage write on first fetch; idempotent.

Net diff vs `main` in `packages/agents`'s code: zero. A `messageType` option on `ResumableStream` was briefly shipped as `acce611c` (the very first commit on this branch, before the pivot) and reverted on 2026-04-28 once it became clear no caller used a non-default value: helpers run on their own DOs, the example forwards `helper-event` envelopes via `broadcast()` rather than a second `ResumableStream`, and the same DO-isolation argument that killed `tablePrefix` kills `messageType`. The wip doc captures the reasoning under "Decisions confirmed 2026-04-28" #3. If a future use case needs either ctor option, it's a small additive change with a real caller to point at.

The example is feature-complete for v0.2 and is the empirical grounding for the Stage 3 RFC, which is the natural next motion (in a separate session/PR).

What lands in `examples/agents-as-tools`

  • Three helper-dispatching tools: `research` (single `Researcher`), `plan` (single `Planner`), `compare` (parallel two-`Researcher` fan-out via `Promise.allSettled`).
  • Two helper classes sharing a `HelperAgent extends Think` base. Concrete classes (`Researcher`, `Planner`) are thin — pick a model, system prompt, and tools.
  • Per-helper drill-in side panel — clicking ↗ on any inline panel opens a `useAgentChat` against the helper's sub-agent URL. Real chat, not a custom event view.
  • Parallel fan-out: `compare` dispatches both helpers under the same chat tool call; the client renders them as siblings under one `` with deterministic left-to-right order.
  • Cancellation propagation (B4): parent abort threads through the AI SDK's `abortSignal` into `_runHelperTurn`, cancels the helper RPC reader, fires the helper's `cancel` callback, calls `abortCurrentTurn` to terminate the in-flight Think turn via `_aborts`. No more burning Workers AI on output the user already moved on from.
  • Production sub-agent gate (E4): `Assistant.onBeforeSubAgent` validates the requested `(helperType, helperId)` against `cf_agent_helper_runs` — drill-in URLs are no longer guessable, cross-class drill-in is blocked.
  • Full reconnect-replay: `Assistant.onConnect` walks the registry, resolves the row's `helper_type` to the right class via `helperClassByType`, and replays `(started, ...chunks, finished)` from durable storage. Pinned `stream_id` per row keeps drill-in follow-up turns from shadowing the original turn's chunks.

Tests

  • 43 vitest tests (`src/tests/`): `registry.test.ts`, `clear-helper-runs.test.ts`, `helper-stream.test.ts`, `reconnect-replay.test.ts`, `parallel-fanout.test.ts`, `cancellation-and-gate.test.ts`. `vitest-pool-workers` against a real Assistant DO with deterministic mock models on the helpers.
  • 7 Playwright e2e tests (`e2e/`): `smoke`, `research-drill-in`, `planner-drill-in` (regression for the `agent: "Researcher"` hardcode bug fixed in `e9c0e0ff`), `compare-fanout`, `refresh-replay` (single + multi-helper), `clear`. Real Workers AI binding (`@cf/moonshotai/kimi-k2.5`), real WS, real DO routing.

Framework gap surfaced in this branch — fixed in partyserver 0.5.4

Filed `cloudflare/partykit#390`: fresh partyserver 0.5.x DOs with `compatibility_date` older than 2026-03-15 would lose `this.name` on alarm wake. Surfaced by this branch's e2e suite during dev-server restarts when stale alarms woke helper facets. Fixed in `partyserver` 0.5.4 via a defensive one-time `__ps_name` write on first fetch; this PR bumps the pin. The e2e suite's per-test unique-user pattern stays for test isolation but no longer compensates for an upstream bug. Verified locally — the partyserver "Attempting to read .name…" error that fired in the background of every previous e2e run is now absent.

Test plan

  • `packages/ai-chat` regression suite (28 `resumable-streaming.test.ts` tests) — pass byte-identical (no framework code changes)
  • Server-side tests — `npm test -w @cloudflare/agents-agents-as-tools-example` → 43/43
  • Browser-side tests — `cd examples/agents-as-tools && npm run test:e2e` → 7/7 (~4-5min, real Workers AI; clean run with no partyserver background errors)
  • Manual smoke — `cd examples/agents-as-tools && npm start`, then prompts like "Compare HTTP/3 and gRPC" and "Plan how I'd add dark mode" exercise the full surface
  • Drill-in routing for both Researcher and Planner panels (the `e9c0e0ff` regression)

Where to start reading

  1. `wip/inline-sub-agent-events.md` — design doc, top-of-file "Resuming this work — snapshot 2026-04-28" section is the TL;DR.
  2. `examples/agents-as-tools/README.md` — developer-facing walkthrough. "How to read this code in order" is the canonical entry point.
  3. `examples/agents-as-tools/src/server.ts` — `HelperAgent` (lines ~150-540), then `Researcher` / `Planner`, then `Assistant._runHelperTurn` (~947), then `Assistant.onConnect` (~1280).
  4. `examples/agents-as-tools/src/client.tsx` — start at `applyHelperEvent` reducer.

Branch has 23 commits. Squash-merging is fine; the chronological history is captured in `wip/inline-sub-agent-events.md`'s Status section.

Lets non-chat consumers (e.g. helper sub-agents that share a WebSocket
with a parent's chat) stamp replay frames with a distinct wire-type
tag without colliding with the chat protocol. Defaults to
CHAT_MESSAGE_TYPES.USE_CHAT_RESPONSE so existing AIChatAgent and
Think callers preserve byte-identical behavior — all 28 existing
resumable-streaming.test.ts regression tests pass unchanged.

Also exports the new ResumableStreamOptions type from agents/chat.

This is the smaller version of the fix proposed in #1377: the
tablePrefix part of that proposal is intentionally omitted because
the recommended pattern for "events alongside a chat" puts helper
events on the helper sub-agent's own DO (so collisions are
impossible by isolation). See wip/inline-sub-agent-events.md for
the full design.

Refs: #1377
Made-with: Cursor
…rwarded events

A focused minimal proof of the agents-as-tools pattern: during a
single chat turn, the assistant (a Think agent) dispatches a helper
sub-agent (Researcher) to do multi-step work, and the helper's
lifecycle events stream live into the chat UI.

Architecture (per the 2026-04-27 design pivot in
wip/inline-sub-agent-events.md):

- Researcher is a real sub-agent with its own DO and SQLite. It owns
  its own ResumableStream configured with messageType: "helper-event"
  so its replay frames don't collide with the chat protocol.
- Researcher.startAndStream(query, helperId) returns a
  ReadableStream<{ sequence, body }> over DO RPC. Each emitted event
  is durably stored on the helper's own SQLite before being written
  to the stream.
- Assistant's tool execute reads from the helper's stream and
  broadcasts each event onto the chat WS via this.broadcast(...).
  Browser keeps one WS to the parent — no second connection.
- Assistant maintains a tiny active_helpers table for reconnect
  replay: on onConnect, it walks the table, fetches each in-flight
  helper's stored events via DO RPC, and forwards them as
  replay: true frames to the connecting client.
- Assistant.onStart sweeps stale active_helpers rows from a previous
  parent crash and deletes the leaked helper facets.

Multi-ai-chat compatible because helpers are sub-agents of whichever
DO terminates the WebSocket — top-level Assistant in this demo, or
a Chat (which is itself a sub-agent of Inbox) in multi-ai-chat. The
forwarding pattern is identical at any nesting depth.

Drill-in to a specific helper (a separate `useAgent({ sub: [...] })`
connection direct to the helper) is supported structurally by the
routing primitive but not wired in the example UI yet.

Wire protocol (src/protocol.ts) is shared between server and client,
zero Worker-runtime imports — the front-end bundle never transitively
pulls in agents/Think/workers-ai-provider through a stray value
import. The protocol carries six event kinds (started / step /
tool-call / tool-result / finished / error) plus a sequence field
for client-side dedup against the reconnect-window race where one
event can arrive both as a replay frame and as a live broadcast.

The client renders helper events inline under the matching tool
part in the assistant message, dedupes by (parentToolCallId,
sequence) Set semantics, and inserts in sorted position so the
timeline always renders in helper-emit order regardless of wire
arrival order.

Adds a one-line cross-reference from examples/ai-chat/README.md.

Refs: #1377
Made-with: Cursor
Captures the design context behind agents-as-tools and the
messageType ctor option on ResumableStream. Frames issue #1377 as
the visible tip of a missing first-class pattern (helpers running
inside a single chat turn with their lifecycle events streaming
into the chat UI), walks through the original "multi-channel
ResumableStream on the parent DO" design, and records the
2026-04-27 pivot to "helpers as sub-agents with parent-forwarded
events."

The pivot rationale, condensed:

- State containment. A helper's events belong to the helper's work,
  not the chat's. Putting them on the helper's own DO is the honest
  representation. Persistent / inspectable / drill-in-able helpers
  follow naturally; they don't with parent-side storage.
- Reuses the routing primitive instead of inventing. Drill-in,
  addressing, lifecycle, parent/child RPC are all already shipped.
- Smaller framework change. One ctor option (messageType) instead of
  multi-channel schema + per-channel state machine. Tables don't
  need to be parameterized — each DO has its own SQLite, so
  collisions are impossible by isolation.
- Multi-ai-chat-compatible. The forwarding pattern works
  recursively; "the parent" is whichever DO terminates the WS.

The doc preserves the original multi-channel design as a record of
the design space that was explored. Multi-channel ResumableStream
is parked, not killed — if a future use case needs one DO to
multiplex N independent durable streams to its own clients (e.g. a
workflow DO with parallel tracks), the design is captured and ready.

Stages 1 and 2 land alongside this doc:

- Stage 1: messageType ctor option on ResumableStream (back-compat
  patch to agents).
- Stage 2: examples/agents-as-tools — focused minimal proof of
  helpers-as-sub-agents.

Open questions around event vocabulary (Ring 2), parallel helpers
(empirically should work, v0.2 stress-test follow-up), persistent
lifetime (Ring 5), and the AIChatAgent port (Ring 6) are explicit
in the doc as the things the example needs to validate before
Stage 3's RFC commits to a public API.

Made-with: Cursor
Tighten the agents-as-tools prototype based on runtime testing and
review feedback.

Key runtime fix: helper event streams now use byte chunks over DO RPC.
`Researcher.startAndStream` returns `ReadableStream<Uint8Array>` where
each chunk is NDJSON (`{ sequence, body }`) because workerd's DO RPC
stream bridge only transports byte streams. Object chunks caused the
parent's first `reader.read()` to fail with the opaque
"Network connection lost" error. The parent now decodes bytes,
splits on newlines, and forwards each parsed helper event frame.

Also improves helper error semantics: helper-side failures are still
emitted as inline `error` events, but the parent now turns them into
real tool failures instead of returning a successful empty summary.
If no `finished` event produces a summary, the tool fails explicitly.

UI polish:
- add a Clear button wired to `clearHistory()` and local helper-event
  cleanup
- reset helper-event state when messages are cleared from another tab
- replace the composer textarea with a single-line Input so Enter
  submits normally
- render text and reasoning parts with Streamdown + @streamdown/code,
  using the standard Kumo theme bridge
- clean Tailwind class warnings in JSON pre blocks

Operational cleanup:
- bump the example compat date to 2026-04-15 so partyserver 0.5.x can
  rely on `ctx.id.name` inside alarm handlers
- update README and WIP notes to describe the byte-stream design, the
  Streamdown UI, and the follow-up issues filed from the prototype
- remove the duplicate DEMO_USER export from server.ts so protocol.ts
  stays the single runtime-safe shared module

Issues filed while debugging:
- cloudflare/partykit#390 for fresh partyserver 0.5.x DOs + old compat dates
- cloudflare/workerd#6675 for object ReadableStreams failing with
  "Network connection lost"
- #1399 for discussing Rpc.Stub<T>-narrowed sub-agent types

Made-with: Cursor
Move the agents-as-tools helper timeline registry from a short-lived
active_helpers table to a durable cf_agent_helper_runs table. Helper
runs now track running/completed/error/interrupted status and retain
completed helper facets so timelines can replay after refresh even
after the assistant turn has completed.

Runtime behavior:
- insert running helper run before reading the helper event stream
- mark completed/error when the helper finishes
- keep the helper DO after completion because it owns the durable
  ResumableStream event log
- on parent wake, mark any stale running rows as interrupted instead
  of deleting helper facets
- on reconnect, replay stored events for all helper runs and append a
  synthetic terminal error for interrupted runs (or malformed error
  runs with no terminal event)
- Clear now calls clearHelperRuns before clearHistory, deleting retained
  helper facets before the chat-clear broadcast to avoid a stale replay
  race in other tabs

Also updates README and the WIP plan to reflect the hibernation state:
active-helper refresh, completed-helper refresh, and interrupted replay
now work; helper-side keepAliveWhile, helper fibers, live-tail
reattachment, TTL/count GC, and a hibernation test matrix remain open
follow-ups.

Made-with: Cursor
Wrap the Researcher helper's ReadableStream body in keepAliveWhile so
helper live execution has the same Agents-level lifecycle shape as a
main Think chat turn.

Today keepAlive() is a soft no-op on facets because workerd does not
yet support independent facet alarms, so this does not change crash
recovery semantics: active RPC stream / Promise chain liveness still
carries the run, and parent wake still marks running helper rows as
interrupted. Keeping the wrapper is intentional future-proofing for the
moment alarms start working inside facets, and keeps the helper code in
the shape the eventual framework helper should generate.

Update README and WIP notes to make that nuance explicit.

Made-with: Cursor
Adds 25 tests across four files modeled on examples/assistant/src/tests:

  - registry.test.ts: cf_agent_helper_runs schema, the running →
    interrupted onStart sweep, and that the sweep is idempotent and
    leaves completed/error/interrupted rows untouched.
  - clear-helper-runs.test.ts: empty-registry no-op, mixed-status
    cleanup of both rows and helper sub-agents, idempotent re-clear,
    and the production "missing sub-agent" best-effort path.
  - helper-stream.test.ts: drives Researcher.startAndStream end-to-end
    through subAgent and pins down the byte-stream contract — NDJSON
    envelope, sequence monotonic from 0, started-first ordering,
    durable storage round-trip, and the synthesize error path that
    fires when env.AI is unbound.
  - reconnect-replay.test.ts: every branch of Assistant.onConnect
    replay — empty, completed, running, error-with-stored-terminal,
    error-without-terminal (synthetic appended), interrupted (with
    and without stored events), and multiple runs replayed in
    started_at order with per-run sequence preserved.

Test worker subclasses Assistant and Researcher with a focused
seed/inspect surface (testSeedHelperRun, testReadHelperRuns,
testRunHelperToCompletion, Researcher.testWriteEvents) so tests can
construct lifecycle states without a Workers AI binding. Mirrors the
pattern in packages/ai-chat/src/tests/worker.ts; production code
stays untouched modulo a single private → protected on
Researcher._stream / .stream so the test subclass can write into the
helper's own ResumableStream the same way startAndStream does.

Made-with: Cursor
After comparing v0.1 against the screenshots in
#1377 (comment 4328296343), pin down what's done
vs. what's still needed for the workload the OP is actually
shipping. Refreshes three sections of the WIP doc:

  - New "Coverage gap vs. issue #1377's actual workload" subsection
    with a status matrix mapping each piece of his use case to its
    implementation status and where to find it.
  - New "Decisions confirmed 2026-04-28" subsection: helpers must run
    their own inference loop (Think-first; ai-chat later); parallel
    fan-out is in scope and orchestrator-driven; tablePrefix is not
    shipping (the pivot is the answer); per-helper drill-in is in
    scope; first-class framework integration (helperTool) deferred
    until the protocol is validated against multi-turn and parallel
    cases.
  - Status block updated: vitest harness landed (25 tests, 4 files),
    "Hibernation test matrix" bullet trimmed to the work that's
    actually still missing (real eviction cycles), next-steps
    rewritten to reflect the new ordering — multi-turn Think helper,
    parallel fan-out, then drill-in detail UI — and an explicit
    "not in this near-term list" call-out for tablePrefix /
    helperTool / AIChatAgent port.

Made-with: Cursor
… helper

Closes the "multi-turn helpers" gap from issue #1377's actual workload.
Implements "Option B" from wip/inline-sub-agent-events.md.

Server side:

  - `Researcher` now `extends Think<Env>` with its own getModel,
    getSystemPrompt, getTools (one simulated `web_search`). Helper
    runs are real Think turns driven by `saveMessages`. Think's own
    `_resumableStream` is the canonical durable event log — there is
    no second `ResumableStream` on the helper, so the same-DO
    collision the original #1377 was about cannot occur.
  - Forwarder is wired by overriding `broadcast`: while a
    `runTurnAndStream` is in flight, MSG_CHAT_RESPONSE chunks are
    tee'd into the active RPC stream. Other broadcasts (state,
    identity, MSG_CHAT_MESSAGES, direct WS clients of the helper for
    drill-in) pass through untouched.
  - Wire vocabulary collapsed from six kinds to four: `started` /
    `chunk` / `finished` / `error`. Lifecycle events are synthesized
    by the parent from `cf_agent_helper_runs` row data; `chunk`
    carries an opaque JSON-encoded UIMessageChunk forwarded verbatim
    from the helper's `_streamResult`.
  - Schema gained `helper_type`, `query`, `summary`, `error_message`
    columns so the parent can synthesize lifecycle events on
    `onConnect` replay without RPCing the helper for metadata.
  - `getChatChunksForReplay()` and `getFinalAssistantText()` expose
    Think's own stored chunks and the synthesized summary for the
    parent's reconnect-replay and tool-output paths.

Client side:

  - Per-helper UIMessage accumulator using `applyChunkToParts` from
    `agents/chat` — the same primitive `useAgentChat` uses for the
    assistant's main message. The helper panel now renders text,
    reasoning blocks, and internal tool calls as a mini-chat,
    matching the shape of GLips's screenshots in
    #1377-comment-4328296343.
  - `seenSequencesRef` Set handles the small reconnect-window race
    where a chunk arrives both as a replay frame and as a live
    broadcast (applyChunkToParts mutates the parts array, so dedup
    has to happen before application).

Tests:

  - `TestResearcher` overrides `getModel()` with a deterministic mock
    LanguageModel V3, so the helper's Think inference loop runs
    end-to-end inside the harness with no Workers AI binding.
  - Reconnect-replay tests seed pre-built UIMessageChunk bodies via
    `testWriteChunks(chunks, status)`, which writes through Think's
    own `_resumableStream` exactly the way production does.
  - Coverage updated for the new four-kind vocabulary and new schema
    columns. 26 tests across 4 files, all green.

Documentation: README + wip doc reflect Option B landing, the new
wire vocabulary, and the next-steps reordering (parallel fan-out is
now item 1, drill-in detail UI item 2).

Made-with: Cursor
Followups to the multi-turn Think helper review. Fixes 8 of the 9
items called out (B3 schema migration deferred — still a prototype).

Server changes:

  - **B1 + B2 (error surfacing).** `Researcher.broadcast` override
    now detects `parsed.error === true` chat-response frames (which
    Think's `_streamResult` broadcasts on inference errors with the
    error string as the body, not a `UIMessageChunk`). Those frames
    are stashed into `_lastStreamError` instead of being mis-forwarded
    through the chunk pipeline (where `applyChunkToParts` would
    silently drop them on the client). The parent's `runResearchHelper`
    now reads `helper.getLastStreamError()` when no summary is
    produced and surfaces the actual error message instead of the
    generic "Researcher finished without producing assistant text"
    fallback.
  - **H1 (drill-in safety).** Replaced `getFinalAssistantText` with
    `getFinalTurnText`, which identifies the assistant message
    produced by THIS turn by diffing message ids against a snapshot
    taken at the start of the turn — robust against drill-in clients
    appending their own turns before the parent reads the summary.
  - **H2 (concurrent-call guard).** `_runInProgress` boolean is set
    sync at entry, cleared in finally/cancel. Prevents two concurrent
    `runTurnAndStream` calls on the same helper from overwriting each
    other's `_activeForwarder` / `_activeRequestId` state.
  - **H4 (chatRecovery off).** `Researcher` now declares
    `override chatRecovery = false`. Helpers are per-turn workers;
    Think's default chat-recovery fiber would silently re-run the
    inference loop into a parent that's no longer listening on every
    helper hibernate-and-wake cycle.
  - **B4 (abort propagation).** Capture the helper's `requestId` from
    `saveMessages`'s return value into `_activeRequestId`. The
    ReadableStream's `cancel` callback now calls `abortCurrentTurn`,
    which dispatches into Think's `_aborts` registry to actually
    cancel the in-flight inference loop. No more burning Workers AI
    on output the parent already abandoned.
  - **S1 (drop redundant keepAliveWhile).** `saveMessages` already
    wraps its body in `keepAliveWhile`; the outer wrap was redundant.
  - **S2 (orphan stream cleanup).** `getChatChunksForReplay` detects
    a stream still marked `streaming` whose live LLM reader is gone
    (orphaned by hibernation) and finalizes the metadata before
    returning chunks. Prevents per-helper streaming-row leaks that
    would otherwise wait 24h for `_maybeCleanupOldStreams` to GC.

Tests added:

  - **B2.** Drives a turn against a mock model whose `doStream`
    throws synchronously; asserts `getLastStreamError` returns the
    actual error message and `getFinalTurnText` returns null.
  - **H1.** `getFinalTurnText` returns null on a helper that never
    ran a turn, and returns THIS turn's text after a successful run.

The H2 concurrent-call guard is verified by code review (the in-test
seam approach lit up an unhandled-rejection trail through workerd's
JSRPC bridge that doesn't affect correctness but lights up vitest's
detector; documented inline alongside where the test would have lived).

Made-with: Cursor
Append a "Stage 2 (Option B review fixes)" subsection to the Status
block summarizing what landed in 8357dee: B1+B2 error surfacing,
B4 abort propagation, H1 final-turn text by snapshot/diff, H2
concurrent-call guard, H4 chatRecovery=false, S1 drop redundant
keepAliveWhile, S2 orphan stream cleanup. B3 schema migration
explicitly deferred. Bump test count from 26 to 29 and update the
stale getFinalAssistantText reference to getFinalTurnText.

Made-with: Cursor
…demux

Closes the "parallel fan-out" gap from #1377's
actual workload (image 3 of comment 4328296343 shows several
sibling subagents under one parent tool call).

Two fan-out shapes are now wired and tested:

  - **Alpha (LLM-driven).** The orchestrator LLM calls `research`
    multiple times in one turn (AI SDK `parallel_tool_calls`
    default). Each helper has its own `parentToolCallId` and renders
    as one panel under one chat tool part.
  - **Beta (programmer-driven).** New `compare(a, b)` tool whose
    `execute` dispatches both helpers via `Promise.all`. Both share
    the chat tool call's `parentToolCallId`; the wire format
    distinguishes them by the `helperId` carried inside each event.
    Renders as two sibling `<HelperPanel>`s under one `<ToolPart>` —
    the visible GLips pattern.

Server changes:

  - Add `compare(a, b)` tool that dispatches two `runResearchHelper`
    calls in parallel and returns both summaries. Adjusted system
    prompt to nudge the LLM toward `compare` for compare/contrast
    queries.
  - Test seam: `Assistant.testRunResearchHelper(query, parentToolCallId)`
    on the test subclass so tests can drive concurrent helpers
    without going through the LLM.

Client changes:

  - State shape: `Record<parentToolCallId, Record<helperId, HelperState>>`.
    Single-helper tool calls show one panel; fan-out tool calls show
    several stacked siblings.
  - Dedup key extended from `(parentToolCallId, sequence)` to
    `(parentToolCallId, helperId, sequence)` because two parallel
    helpers under one tool call both legitimately emit a `sequence: 0`
    started event.
  - `<ToolPart>` accepts `helperStates: HelperState[]` and renders
    them in arrival order (insertion-order-preserving `Object.values`
    on the bucket).

Tests (`parallel-fanout.test.ts`, 3 new tests, 32 total):

  - Alpha live broadcast — two concurrent `runResearchHelper` calls
    with different parentToolCallIds; both complete, registry has
    two distinct rows, frames split cleanly per parentToolCallId
    with monotonic sequences.
  - Beta live broadcast — two concurrent calls sharing
    parentToolCallId; both complete, frames split per helperId, each
    helper's sequences run 0/1/2/.../N.
  - Beta replay — `onConnect` walks two seeded rows sharing
    parentToolCallId and emits per-helper frames. Both helpers'
    `sequence: 0` started events arrive without colliding because
    the dedup key now includes helperId.

  New helper: `startCollectingHelperEvents(ws)` — persistent
  message accumulator. The existing `collectHelperEvents` lazily
  attaches a `once`-listener per next-message and would miss
  broadcasts that fire synchronously inside an awaited `Promise.all`
  before any test-side await.

Docs: README adds a "Tools" section and updates the test-coverage
list. WIP doc marks parallel fan-out as landed in the coverage
matrix, adds a Stage 2 entry summarizing what landed, and reorders
next-steps so per-helper drill-in detail UI is now item 1.

Made-with: Cursor
…istic ordering

Five review findings from the post-fan-out walkthrough addressed.

  - **B1: `compare` uses `Promise.allSettled`** instead of
    `Promise.all`. A partial failure (one helper errored, the other
    succeeded) used to flip the whole tool call to error while the
    surviving helper's panel still showed "Done" — a confusing mixed
    signal. The new shape returns `{ a: { query, summary | error },
    b: { query, summary | error } }` so the orchestrator LLM can
    react to "one of two succeeded" honestly. Killing the surviving
    helper on first failure is left for a future B4-style abort
    propagation pass.
  - **B2: deterministic panel ordering.** `started` event now
    carries `order: number`; the parent stamps it from a new
    `displayOrder` parameter on `runResearchHelper` (defaults to 0
    for the single-helper `research` tool; `compare` passes 0/1).
    The client sorts each tool-call's helper bucket by `order` so
    panels appear left-to-right matching the LLM's input position
    rather than the random arrival order of `started` broadcasts.
    Persisted in `cf_agent_helper_runs.display_order` so `onConnect`
    replay synthesizes the same ordering. Schema bump applied via
    an idempotent `try { ALTER TABLE ... ADD COLUMN } catch {}` in
    `onStart`, which doubles as a real (if minimal) migration path
    for the v0.1 → v0.2 schema gap.
  - **N3: bulletproof dedup key.** Client's seen-sequence map is
    now keyed by `JSON.stringify([parentToolCallId, helperId])`
    rather than a `${parent}::${helper}` template. Removes the
    theoretical collision when either id contains `::`.
  - **C1: 3-helper Beta test.** New test stresses the parent's
    broadcast path with three concurrent helpers under one
    parentToolCallId. All three rows complete; live frames demux
    per-helper with monotonic sequences each starting at 0.
  - **C2: replay-order assertion** added to the existing Beta
    replay test. Asserts (a) `started` events on replay carry the
    row's `display_order` as `order`, and (b) `onConnect` replay
    does NOT interleave per-helper frames — helper-x's last frame
    arrives before helper-y's first. Pins down the per-row
    serialization `onConnect` does today.

Tests: 33 (was 32). Both typechecks clean.
Made-with: Cursor
Closes the third v0.2 gap from #1377's actual
workload (the screenshots showed nested chat-like detail under each
subagent). Confirms the "drill-in is real chat, not a custom event
view" promise of Option B.

Each helper panel grew a small ↗ button; clicking it opens a side
panel that runs `useAgentChat` directly against the helper's
sub-agent URL:

    useAgent({
      agent: "Assistant",
      name: DEMO_USER,
      sub: [{ agent: "Researcher", name: helperId }]
    })

The framework's `subAgent` routing primitive does all the work —
no parent intervention, no cross-DO state, just a normal chat hook
against a sub-agent. The side panel renders messages with the same
`<MessageParts>` component the main chat uses; sending a follow-up
message in the panel triggers a real Think turn on the helper with
the parent's original query already in context.

Client changes:

  - New `<DrillInPanel>` component: full-height side overlay,
    backdrop / Escape / ✕ to close, full `useAgentChat` against the
    helper sub-agent connection, `<MessageParts>` for rendering, an
    `<Input>` composer for follow-up turns.
  - `<HelperPanel>` accepts an `onDrillIn?: (helperId) => void`
    callback and renders an ↗ button next to the status badge when
    set. Threaded through `<MessageParts>` → `<ToolPart>` →
    `<HelperPanel>`.
  - App owns a `drillInHelperId: string | null` state; the panel
    reads `helperType` and `query` from the existing
    `helperStateByToolCall` map. Cleared on chat clear (this tab and
    cross-tab via the messages.length effect).

While a turn is running, both the inline panel (parent's broadcast
tee) and the side panel (helper's own chat-protocol broadcasts) update
live with the same chunks viewed two ways.

Caveats kept honest in the README and WIP doc:

  - `onBeforeSubAgent` is open — any `helperId` will spawn a fresh
    facet if it doesn't exist. Production should add a
    `cf_agent_helper_runs` lookup gate.
  - Recursive drill-in (helper → its own sub-helper) isn't wired;
    helpers don't dispatch their own helpers in this example.
  - Sending in the drill-in panel goes through `saveMessages`, which
    reads `_lastClientTools` / `_lastBody`. With no client tools
    defined this is a no-op leak; documented under H3.

WIP doc: marks drill-in done in the coverage matrix, adds a Stage 2
entry summarizing the implementation, and updates the next-steps
list. The example-side roadmap (multi-turn, parallel fan-out,
drill-in) is now complete; the next motions are a second helper
class to validate the protocol against non-research workloads,
`helperTool(Cls)` framework promotion, and the Stage 3 RFC.

Tests still pass (33/33); drill-in is purely a client UI change.

Made-with: Cursor
Four review findings from the post-drill-in walkthrough addressed.

  - **D1: replay reads back THIS turn's chunks, not "latest".**
    Previously, after a drill-in user fired a follow-up turn through
    the side panel's composer, the helper had a NEW stream stored.
    On parent reconnect, `getChatChunksForReplay()` picked the most
    recent stream by `created_at` and the inline panel rebuilt from
    the follow-up turn — even though the parent's tool output and
    `summary` row column reflected the original turn. The two views
    drifted on every refresh.

    Fix: capture the helper's stream id after `saveMessages`
    resolves (`_lastTurnStreamId`, exposed via `getLastTurnStreamId`),
    stash it in `cf_agent_helper_runs.stream_id`, and have
    `getChatChunksForReplay(streamId?)` accept an explicit id.
    `onConnect` reads `row.stream_id` and passes it through. Schema
    bump applied via the same idempotent `try { ALTER TABLE … ADD
    COLUMN } catch {}` pattern as `display_order`.

    Regression test in `reconnect-replay.test.ts`: seeds turn 1's
    chunks (capturing the row's stream_id), then writes turn 2's
    chunks via the new `testWriteAdditionalHelperChunks` seam,
    asserts replay returns turn 1's body and not turn 2's.

  - **D2: `<DrillInPanel>` keyed by `helperId`.** Switching from one
    helper's drill-in to another now fully unmounts/remounts the
    panel — tears down the previous `useAgent` WebSocket cleanly,
    resets the composer's input state, and avoids any prop-vs-
    hook-arg drift. One-line fix.

  - **N1: status badge in drill-in header.** Mirrors the inline
    panel's Running / Done / Error badge so the side panel's header
    feels consistent with the panel the user just clicked through
    from. Prop named `helperStatus` (not `status`) to avoid
    colliding with `useAgentChat`'s own `status` symbol.

  - **N2: system-prompt nudge for `compare` partial failure.** Added
    one line to `Assistant.getSystemPrompt`: "If a `compare` result
    includes an `error` field for one branch, acknowledge the gap
    and synthesize from the successful branch only." The structured
    `Promise.allSettled` shape from the previous polish pass already
    gives the LLM the data; this nudge tells it what to do with it.

WIP doc: appended a "drill-in review polish" Stage 2 entry
summarizing what landed AND what we explicitly punted on (N3
side-panel parent-turn indicator; E1 stream-metadata growth; E2
concurrent drill-in tabs; E3 STREAM_RESUMING smoke-test; E4 open
`onBeforeSubAgent` gate; focus trap / aria-modal; drill-in tests;
duplicated `query` in `compare` output) so we have a paper trail
for the choices.

Tests: 34 (was 33); D1 regression test added.
Made-with: Cursor
…nt` base

Closes the "is the helper-event vocabulary right?" gap from Ring 2 of
the design notes. With a second helper class running the same
protocol against a meaningfully different workload, the answer is
yes — the chunk firehose generalizes without any vocabulary changes.

Server changes:

  - **Extracted `HelperAgent extends Think<Env>` as the shared base.**
    All helper-protocol bits — the `broadcast` tee, `runTurnAndStream`,
    `chatRecovery = false`, the concurrent-call guard, and every
    lifecycle accessor (`getChatChunksForReplay`,
    `getLastTurnStreamId`, `getFinalTurnText`, `getLastStreamError`)
    — live there. Concrete helpers stay thin: pick a model, a system
    prompt, and a tool surface.
  - **`Researcher` slimmed down to extends `HelperAgent`** with just
    its three overrides. Behavior unchanged.
  - **New `Planner extends HelperAgent`** that produces structured
    implementation plans (Overview / Affected files / Step-by-step
    / Open questions) with a single simulated `inspect_file` tool.
    Different system prompt, different tool surface, same protocol.
  - **`Assistant._runHelperTurn(cls, query, parentToolCallId,
    displayOrder?)`** generalizes the previous `runResearchHelper`.
    `cls` is typed as `HelperClass = typeof Researcher | typeof
    Planner`; `cls.name` feeds the row's `helper_type` and
    `subAgent(cls, ...)` spawns the right facet.
  - **Class registry `helperClassByType: Record<string,
    HelperClass>`** used by `onConnect` and `clearHelperRuns` to
    resolve the row's stored `helper_type` string back to a class.
    Defensive fallback to `Researcher` for unknown types.
  - **New `plan(description)` tool** dispatching `Planner` via
    `_runHelperTurn`. Updated system prompt to nudge the LLM
    toward `plan` for "how do I implement X" queries.
  - **Wrangler v2 migration** adds `Planner` to `new_sqlite_classes`;
    same in `tests/wrangler.jsonc`. Idempotent — existing v1
    deployments pick it up additively.

Test changes:

  - `Planner` test subclass with the same mock-model + `testWriteChunks`
    surface as the `Researcher` test class. Deliberately duplicated
    rather than mixed in (TypeScript class mixins are gnarlier than
    two ~30-line classes are noisy).
  - All seams that used to hardcode `Researcher` now accept an
    optional `className: "Researcher" | "Planner"` arg, defaulting
    to `Researcher` so existing tests don't have to thread it.
  - Two new tests (36 total, was 34):
    - **Planner end-to-end** in `helper-stream.test.ts` — drives a
      Planner turn through the byte-stream protocol and verifies the
      same NDJSON / chunk storage / final-text pipe holds.
    - **Mixed-class clear** in `clear-helper-runs.test.ts` — seeds
      one Researcher row + one Planner row, verifies
      `clearHelperRuns` resolves the right facet table for each
      via the new class registry (would leak a Planner facet if the
      old hardcoded `Researcher` lookup were still in place).

What this validates for Stage 4:

  - `HelperAgent` IS the shape `helperTool(Cls)` will accept —
    `Cls extends HelperAgent` is a plausible constraint.
  - The class-registry pattern is what `helperTool(Cls)` would
    generate as part of its setup.
  - `_runHelperTurn` is the ~80-line body that should move into
    the framework helper. Everything else in `Assistant`
    (`getTools`, `onStart`, schema migration) stays as
    consumer-side code.

WIP doc: marks the second-helper-class step done in the
next-actionable list, adds a Stage 2 entry summarizing the
extraction and what it unblocks for Stage 4, updates the test count.

Made-with: Cursor
…rcher"

Symptom: clicking ↗ on a `Planner` helper panel opens the side
panel, which sits on "Connecting to helper…" forever. No errors or
warnings — silent failure.

Cause: `<DrillInPanel>` hardcoded `agent: "Researcher"` in the
`useAgent({ sub: [...] })` call. For a Planner helper, this routed
to a Researcher facet with that helper's id. Because
`onBeforeSubAgent` is open, the framework auto-spawned a fresh
empty Researcher facet, which `useAgentChat` connected to and
showed `messages: []` indefinitely. Researcher helpers worked by
coincidence — the hardcoded class happened to match the actual
class.

Fix: pass `helperType` from the helper's state into the `sub.agent`
field. The drill-in now routes correctly for any helper class —
Researcher to `Researcher`, Planner to `Planner`. The framework's
kebab-case URL builder turns the class name into the right path
segment (`/sub/researcher/...` or `/sub/planner/...`).

README also updated so the documented snippet reflects the
dynamic-class pattern, not a hardcoded class name.

This was a real consequence of the wip doc's "no drill-in tests"
punt — the bug wouldn't have shipped if there were a test that
opened drill-in for a non-Researcher helper. Worth keeping in mind
when we revisit the React-side test gap.

Tests still pass (36/36) since drill-in isn't covered.

Made-with: Cursor
Review fixes for the `Planner` / `HelperAgent` extraction (`02ab6d05`)
based on a deep read across that commit and the drill-in routing fix
(`e9c0e0ff`):

- **M2**: `helperClassByType` is now `as const` and `HelperClass` is
  derived from `keyof typeof` it. Adding a class is one site (the
  registry); the type, `_runHelperTurn`'s arg, and the `helperClassFor`
  lookup all flow from there. The unknown-`helper_type` fallback also
  `console.warn`s once so drift surfaces early.
- **C1**: Planner-specific `onConnect` replay test in
  `reconnect-replay.test.ts` — seeds a `helperType: "Planner"` row +
  chunks and asserts replay emits `started` (carrying
  `helperType: "Planner"`), the seeded `chunk`, and `finished`. Pins
  the registry lookup so we don't regress to a hardcoded class.
- **C2**: `<DrillInPanel>` now validates `helperType` against a
  `KNOWN_HELPER_TYPES` set before opening `useAgent`. On miss it
  renders an explicit "Unknown helper class: X" error state instead
  of the silent "Connecting to helper…" hang the 2026-04-28 routing
  bug exposed; composer is also disabled in that state.
- **N1**: removed `className = "Researcher"` defaults from all test
  seams (`hasHelper`, `testRunHelperToCompletion`,
  `testReadStoredHelperChunks`, `testReadHelperFinalText`,
  `testReadHelperStreamError`, `testSetHelperMockMode`,
  `testWriteAdditionalHelperChunks`); renamed `testRunResearchHelper`
  → `testRunHelper(className, query, parentToolCallId, displayOrder?)`
  with class first to match production. Existing tests updated to pass
  `"Researcher"` explicitly. Closes the footgun where a future Planner
  test could silently check Researcher's facet table and pass for the
  wrong reason.
- **Wrangler v2 → v1 consolidation**: rolled the v2 entry that added
  `Planner` to `new_sqlite_classes` into v1 in both example and test
  `wrangler.jsonc` files. Nothing's deployed; cleaner for first-time
  deploys.
- **M1 reasoned away**: `cls.name` is stable across the
  esbuild + `@cloudflare/vite-plugin` build because workerd's
  `ctx.exports` requires top-level class export names to match the
  wrangler binding strings. If tooling ever did mangle them, migration
  is a one-shot SQL `UPDATE` on `cf_agent_helper_runs.helper_type`.
  Documented in the wip doc; not blocking.

Doc polish:

- README "How to read this code" walkthrough refreshed to mention
  `HelperAgent` and the class registry.
- README "If you want to extend it" rewritten — both prior bullets
  (parallel fan-out, drill-in) are now shipped features.
- README's "Tests" section updated for `TestPlanner` + Planner
  end-to-end + C1 + D1 + 3-helper Beta stress test.
- README's diagram and inline drill-in snippet updated to use
  `helperType` rather than hardcoded `"Researcher"`.
- Renamed `runResearchHelper` → `_runHelperTurn` references in
  README, server doc-comments, test file headers, and the Stage 1 +
  earlier sections of `wip/inline-sub-agent-events.md` that described
  current state. Rename history entries left as-is.
- New Stage 2 entry in the wip doc tracking M1 (skipped),
  M2 / C1 / C2 / N1 (landed), wrangler consolidation, and the
  polish pass.

Tests: 37 (was 36); one new C1 Planner replay test.
Made-with: Cursor
…nt gate

Two of the items from the README's "out of scope" table were really
"deferred but small" rather than genuinely out-of-scope. Shipping them
lets the example be honestly described as production-shaped rather
than demo-shaped.

**B4 cancellation propagation: fully wired.**

Helper-side cancel was already in place (the RPC stream's `cancel`
callback aborts via Think's `_aborts`). What was missing was the
parent-side thread: the AI SDK passes an `abortSignal` on each tool
execute's second arg, but the example wasn't reading it.

Each tool execute now destructures `{ toolCallId, abortSignal }` and
threads the signal into `_runHelperTurn` via a new `opts.abortSignal`.
The function registers an `abort` listener that cancels the helper RPC
reader; the cancel propagates over JSRPC to the helper's `cancel`
callback, which calls `abortCurrentTurn`. The post-loop arm checks
`signal.aborted` and surfaces the abort as an error (rather than a
silent empty summary), which flows through the existing catch arm:
row marked `error`, synthesized `error` event broadcast, panel
doesn't sit on "Running…". A `finally` arm detaches the listener so
a parent that runs many helpers across many turns doesn't accumulate
stale closures on its abort signals.

**E4 `onBeforeSubAgent` gate: production-shaped.**

`Assistant` now overrides `onBeforeSubAgent` to look up the requested
`(helperType, helperId)` in `cf_agent_helper_runs` and return a 404
if the row doesn't exist. Drill-in URLs are no longer guessable; an
attacker can't drill into a Researcher facet by routing through the
Planner endpoint (the gate's `WHERE` clause is on
`(helper_id, helper_type)`, so cross-class access fails closed).
Internal `subAgent(...)` calls bypass the hook by design (matches
`getAgentByName` bypassing `onBeforeConnect`), so `_runHelperTurn`'s
own helper spawn isn't blocked by its own check.

**Helper-class-agnostic error message.**

The empty-summary fallback used to say "Researcher finished without
producing assistant text"; updated to use `${helperType}` so a Planner
failure now reads "Planner finished…" rather than impersonating
Researcher.

**Tests.**

New `cancellation-and-gate.test.ts` (6 cases):

  - B4: pre-aborted signal rejects with an abort error
  - B4: pre-aborted signal marks the row `error` with abort message
  - B4: same path works for Planner (class-agnostic)
  - E4: gate rejects an unseeded helperId with 404
  - E4: gate accepts a seeded helperId with 101 WS upgrade
  - E4: gate rejects cross-class drill-in (seeded as Researcher,
    drilled-in as Planner → 404)

Tests: 43 (was 37, +6 new).

**Docs.**

README's "out of scope" table is now four rows instead of six. The
drill-in section's caveat about an "open gate" is replaced with a
note documenting the production posture. New "Cancellation
propagation (B4)" paragraph next to "Parent-crash recovery". The
wip doc's Stage 2 entry now includes a "production-shape polish"
sub-section, the older B4 entry is updated to clarify "helper-side
half" landed earlier and the parent-side thread landed today, and
the punted E4 item is struck through with a pointer to the new
landing entry.

Made-with: Cursor
The 2026-04-28 drill-in routing bug shipped because the existing
test harness only covers DO RPC + WebSocket frame paths. Bugs
that live in the React layer — `useAgent` URL resolution,
drill-in routing, replay-vs-live state reducers — slip through.

Adds a Playwright suite at `examples/agents-as-tools/e2e/` that
boots `vite dev` and drives the real app in Chromium against the
production `ai` binding (`remote: true`). High-fidelity: actual
WS frames, actual DO routing, actual LLM tool selection.

**Tests (7):**

- `smoke.e2e.ts` — page loads, WS handshake completes, composer
  becomes interactive.
- `research-drill-in.e2e.ts` — research prompt spawns Researcher
  panel; drill-in connects to a Researcher facet and renders
  messages.
- `planner-drill-in.e2e.ts` — same flow for `plan`. Pins the
  `e9c0e0ff` regression: with the bug, drill-in to a Planner
  panel hung on "Connecting to helper…".
- `compare-fanout.e2e.ts` — `compare` prompt renders TWO
  Researcher panels under one chat tool call.
- `refresh-replay.e2e.ts` — completed runs survive a page reload.
  Single-helper + Researcher+Planner two-helper cases.
- `clear.e2e.ts` — Clear wipes both surfaces; reload after Clear
  doesn't bring panels back.

**Per-test fresh DO:**

Each test goes to `/?user=<unique>`. The client now honors that
param as an override for `DEMO_USER`, so each test runs against
its own Assistant DO. Sidesteps a framework gap: alarms scheduled
inside helper facets lose `ctx.id.name` when they fire after a
dev-server restart (the 2026-04-15 compat-date fix covers
top-level DOs, not facets). With unique users, each test's DO
is fresh — no in-flight alarms from a previous run.

Captured the framework gap in the wip doc as an upstream-needed
fix in partyserver / agents.

**Stable selectors:**

Added minimal `data-testid` hooks to `client.tsx`:
`helper-panel` (with `data-helper-type` / `data-helper-id` /
`data-helper-status`) and `drill-in-panel`. The rest of the
suite uses ARIA semantics (`getByRole`, `getByPlaceholder`).

Drive-by: fixed two Kumo a11y warnings by adding `aria-label`
to the parent and drill-in composers.

**Config:**

`playwright.config.ts` boots `vite dev` via `webServer`;
`workers: 1` serializes tests so they don't fight over Workers
AI capacity; `retries: 1` rides out occasional 504s; `timeout:
180_000` covers the slow `kimi-k2.5` model. `expect`'s
per-action timeout is 60s.

Scripts: `npm run test:e2e` (headless), `test:e2e:headed`,
`test:e2e:ui` (interactive).

**Local-only for now:**

User's stated workflow is local. CI integration would need
`playwright install --with-deps chromium` and a Workers AI
auth shape from the runner; punted.

Full suite ~4-5 minutes locally. 7/7 passing.

Made-with: Cursor
…6-04-28

The branch covers a lot of ground (20 commits, ~8500 insertions across
the example, packages/agents, and the wip doc). Adds a session-handoff
read at the top of the wip doc and a status banner on the example
README so the next session opens to a clear "what's shipped, what's
next" view rather than scrolling through the chronological log.

**`wip/inline-sub-agent-events.md`** — new "Resuming this work —
snapshot 2026-04-28" section right after the intro:

- What's shipped on this branch (Stage 1 + Stage 2 + tests, with
  pointers to Stage 2's two-helper-class extraction, drill-in,
  cancellation, gate, and e2e suite).
- What's NOT shipped (Stage 3 RFC, Stage 4 framework helper,
  Stage 5 AIChatAgent port).
- The newly-surfaced framework gap (alarms inside helper facets
  lose `ctx.id.name` after dev-server restarts; the 2026-04-15
  compat-date fix covers top-level DOs but not facets) — captured
  as a near-term next-step candidate.
- Three concrete next-step candidates ranked by leverage: Stage 3
  RFC (highest), framework facet-alarm fix (medium scope), Stage 4
  helperTool(Cls) (premature without the RFC).
- "How to resume tactically" — exact commands to verify the
  current state (`npm test`, `npm run test:e2e`, `npm start`) and
  the canonical entry points in the doc + code.

**`examples/agents-as-tools/README.md`** — new "Status (2026-04-28)"
section right after the intro: one paragraph summary of what the
example covers (three tools, two helper classes, drill-in,
cancellation, gate, replay), the test counts (43 vitest + 7
Playwright), and the next-up work pointer to the wip doc.

No code changes — the comments + doc-comments inline already
capture the per-bug context (e.g. the framework facet-alarm gap is
documented at the e2e helper that works around it).

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 28, 2026

🦋 Changeset detected

Latest commit: 0148a7b

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

// `this.name` getter when a wake path goes through alarm() instead
// of fetch() with idFromName(). Symptom is
// "Error in Assistant:<unnamed> fetch: … this.ctx.id.name is not set".
"compatibility_date": "2026-04-15",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 wrangler.jsonc uses compatibility_date: "2026-04-15" violating AGENTS.md convention of "2026-01-28"

The AGENTS.md Workers conventions section states: "All wrangler configs use compatibility_date: "2026-01-28" and compatibility_flags: ["nodejs_compat"]". Both examples/agents-as-tools/wrangler.jsonc:10 and examples/agents-as-tools/src/tests/wrangler.jsonc:5 use "2026-04-15" instead. The deviation is documented (needed for ctx.id.name in alarm handlers) and the AGENTS.md has an "Ask first" carve-out for compatibility date changes, but the stated convention is still violated.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…ream`

Reverts `acce611c` (the very first commit on this branch). The option
landed before the v0.2 design pivot moved helper events onto each
helper's own DO. After the pivot, no caller in the repo uses a
non-default value:

  - Think's only `new ResumableStream(this.sql.bind(this))` doesn't
    pass options.
  - The `agents-as-tools` example doesn't instantiate
    `ResumableStream` directly at all — helpers use Think's, and
    `helper-event` envelopes are broadcast outside the
    `ResumableStream` machinery (parent `broadcast()` over its
    own WebSocket, not a second `ResumableStream` over the same
    connection).
  - No test exercises a non-default `messageType`.

The option's whole purpose was to prevent frame-type collisions when
two `ResumableStream` instances share a WebSocket. The pivot ensured
that situation cannot arise in the example, and the changeset itself
acknowledged the same isolation argument applies to the dropped
`tablePrefix` from #1377. Shipping `messageType` therefore added
public API surface that nothing exercised — speculative API we
hadn't validated. Removing it now keeps the framework's surface
honest.

What's reverted:

  - `ResumableStreamOptions` type + `messageType` ctor arg in
    `packages/agents/src/chat/resumable-stream.ts` — back to the
    original `constructor(sql: SqlTaggedTemplate)` signature.
  - The four `this._messageType` usages in `replayChunks` restored
    to hardcoded `CHAT_MESSAGE_TYPES.USE_CHAT_RESPONSE`.
  - `ResumableStreamOptions` removed from `packages/agents/src/chat/index.ts`
    public exports.
  - Changeset `.changeset/resumable-stream-message-type.md` deleted.

Verification:

  - `git diff main -- packages/agents .changeset` is empty.
  - 28 `resumable-streaming.test.ts` regression tests in `packages/ai-chat`
    still pass.
  - 43 `agents-as-tools` vitest tests still pass (none touched
    `ResumableStream`'s ctor surface).
  - Typecheck clean across `packages/agents`, the example, its
    test worker, and the e2e suite.

wip doc updates:

  - Stage 1 status flipped from "landed" to "not shipped" in three
    places (top handoff, Status section, Stage 1 plan section
    header). The original plan kept as historical context.
  - The "Decisions confirmed 2026-04-28" entry that ruled out
    `tablePrefix` is rewritten to also cover `messageType` —
    same isolation argument applies, both options are unnecessary
    after the pivot.
  - "What's shipped vs unshipped" table row for `messageType`
    flipped to "not shipping".
  - "Explicitly NOT in this near-term list" entry now mentions
    both ctor options.
  - The v0.1 narrative section that describes the original
    `messageType: "helper-event"` setup is left as-is; it
    accurately captures historical state at the time and is
    superseded by the v0.2 update entry that follows it.

If a future use case needs a non-default `messageType` (or a
`tablePrefix`), it's a small additive change with a real caller to
point at.

Made-with: Cursor
@threepointone threepointone changed the title Inline sub-agent event streaming — agents-as-tools example + messageType ctor option Inline sub-agent event streaming: agents-as-tools example + design notes Apr 28, 2026
…at` README

The "Related" section pointing at `examples/agents-as-tools` was
added in `d95f0b53` (the initial example commit) before the design
firmed up. Cross-linking the two examples implies a parity between
them — "this is the AIChatAgent equivalent of that" — that doesn't
exist yet: `agents-as-tools` is Think-only and the AIChatAgent port
is explicitly Stage 5 (deferred). Promoting the link from
`ai-chat`'s README before the port lands sets a misleading
expectation.

The reverse direction (`agents-as-tools` → `ai-chat`) stays — it
correctly describes `ai-chat` as "the canonical AIChatAgent
reference" without claiming the example is itself ported there.

Will re-add this link to `ai-chat`'s README when the AIChatAgent
port lands as part of Stage 5.

Made-with: Cursor
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 28, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1405

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1405

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1405

hono-agents

npm i https://pkg.pr.new/hono-agents@1405

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1405

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1405

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1405

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1405

commit: 0148a7b

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment thread examples/agents-as-tools/src/server.ts
<head>
<meta charset="UTF-8" />
<link rel="icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Missing required public/favicon.ico directory per examples/AGENTS.md

The examples/AGENTS.md requires every example to include public/favicon.ico. This example has no public/ directory at all. The index.html references <link rel="icon" href="/favicon.ico" /> which will 404 since the file doesn't exist. Other examples (e.g. examples/playground, examples/assistant) include a public/favicon.ico.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

// `this.name` getter when a wake path goes through alarm() instead
// of fetch() with idFromName(). Symptom is
// "Error in Assistant:<unnamed> fetch: … this.ctx.id.name is not set".
"compatibility_date": "2026-04-15",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 compatibility_date is "2026-04-15" instead of the required "2026-01-28"

examples/AGENTS.md and the root AGENTS.md both mandate compatibility_date: "2026-01-28" for all wrangler configs. This example uses "2026-04-15" in wrangler.jsonc (line 10). The deviation is documented with a technical reason (alarm handler fix for ctx.id.name propagation), but the root AGENTS.md says to "Ask first" before "Changing wrangler.jsonc compatibility dates across the repo".

Prompt for agents
The examples/AGENTS.md convention mandates compatibility_date: 2026-01-28. This example uses 2026-04-15 with a documented reason (alarm handler ctx.id.name propagation). If the later date is genuinely required for the example to function, the deviation should be pre-approved and the examples/AGENTS.md rule should be updated to allow exceptions. If the alarm handler issue only affects the e2e test suite (which already works around it with per-test unique users), the production wrangler.jsonc could potentially use the standard date while the test wrangler.jsonc uses the newer one. Both wrangler.jsonc files (examples/agents-as-tools/wrangler.jsonc and examples/agents-as-tools/src/tests/wrangler.jsonc) need to be aligned with whatever decision is made.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…-recovery gap

`partyserver` 0.5.4 fixes the bug filed at
[`cloudflare/partykit#390`](cloudflare/partykit#390):
fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15
would lose `this.name` on alarm wake (no `ctx.id.name`
propagation in old runtimes, and 0.5.x had stopped writing the
`__ps_name` legacy fallback record). The fix is a defensive
one-time `__ps_name` write on first fetch — idempotent;
restores the safety net pre-0.5.x had.

Surfaced in this branch during e2e suite development. Worked
around at the time by giving each test its own Assistant DO
via a `?user=<id>` query-param override, so each test's DO was
fresh and never hit the alarm path with stale state.

With 0.5.4, the workaround is no longer needed for the bug. The
per-test unique-user pattern stays purely for test isolation
(no helper-row / chat-history state leaks across tests).

Verification:

- `npm ls partyserver` shows 0.5.4 deduped across `agents`,
  `examples/voice-input`, and the root.
- 43 vitest tests still pass.
- 7 Playwright e2e tests still pass — output no longer contains
  the partyserver "Attempting to read .name… `this.ctx.id.name`
  is not set" error that fired in the background of every
  previous run.

Updates to docs:

- `wip/inline-sub-agent-events.md` — handoff snapshot's "Open
  framework gap" reframed to "surfaced and fixed"; next-step
  candidates list drops the framework facet-alarm fix (it
  landed); two Stage 2 entries reframed; related-issues paragraph
  marks #390 as fixed; e2e helpers comment updated.
- `examples/agents-as-tools/README.md` — e2e tests blurb no
  longer claims the unique-user pattern works around a framework
  gap; 0.5.4 link added as a parenthetical.
- `examples/agents-as-tools/e2e/helpers.ts` — `uniqueUser` doc-
  comment updated.

What's still open at the framework level (not addressed by 0.5.4):

- workerd doesn't yet support independent facet alarms. The
  helper-side `keepAliveWhile` wrapper is a soft no-op on facets
  for that reason. Documented in the wip doc's "Hibernation /
  fibers gaps" section; not in scope for this PR.

Made-with: Cursor
…troyAll()` + add changeset

**Adds the missing changeset** for the partyserver 0.5.4 bump
(`.changeset/agents-partyserver-0.5.4.md`) — should have landed
with `c4a0d887` and didn't. Patch bump on `agents` documenting
the peer-dependency raise.

**Fixes a real bug** in the helper-side cancellation path. The
review found that `_activeRequestId` is only set AFTER
`saveMessages` resolves (line 377 of `src/server.ts` was after
the await), and `releaseClaim()` immediately clears it back to
undefined (line 400). The synchronous span between line 377
and line 400 has no awaits, so the `cancel()` callback at
line 407 cannot ever observe `_activeRequestId !== undefined`
during a real cancellation — the entire abort path was dead.

The B4 vitest tests still pass because they validate the
PARENT-SIDE error surfacing (`signal.aborted` check + thrown
error + row update + synthesized `error` event), which works
end-to-end. The helper-side cancellation never actually fired.

What the fix does:

1. **Drops the dead `_activeRequestId` field** — no longer used
   for anything. The stream-id capture at the same point
   (`_lastTurnStreamId`) is unaffected.
2. **Switches `abortCurrentTurn` to `_aborts.destroyAll()`.** The
   helper is single-purpose (one in-flight turn at a time,
   guarded by `_runInProgress`), so the only controller in the
   registry, if any, is the one we want to cancel. destroyAll
   doesn't need the requestId Think generates internally.
3. **Honest doc-comment on `abortCurrentTurn`** explaining the
   remaining race window: Think's `saveMessages` lazily creates
   the controller via `_aborts.getSignal(requestId)` after
   several internal awaits (`keepAliveWhile` →
   `_turnQueue.enqueue` → `appendMessage` →
   `_broadcastMessages` → then `getSignal`). If `cancel()`
   arrives before that point, the registry is empty and
   `destroyAll()` is a no-op; the inference runs to completion.
4. **Updated `cancel()` callback comment** documenting the same.

In practice cancels arrive mid-inference (Stop button after
several seconds of streaming) and the controller exists by the
time we destroyAll, so the abort works. Early cancels (a
pre-aborted signal, or an instant tab close) still waste one
inference pass.

The proper fix needs `Think.saveMessages` to accept an external
`AbortSignal` so the helper can pre-create a controller it owns
from the start of the turn. That's a Think public API addition
— deliberately out of scope for this PR; tracked in the wip
doc as a Stage 4 / framework follow-up.

**Doc updates:**

- `examples/agents-as-tools/README.md` — "Cancellation
  propagation (B4)" rewritten to be honest about the
  best-effort nature and the race window. The "Try the
  cancellation propagation path" extension hint reframed
  accordingly.
- `wip/inline-sub-agent-events.md` — both B4 entries
  (chronological "helper-side abort plumbing" and Stage 2
  recap) updated. First-attempt approach, the dead-code finding,
  and the destroyAll fix all captured. Notes that the B4 vitest
  tests validate parent-side surfacing, not helper-side abort.

Verification:

- 43 vitest tests pass (no behavior change — they were
  validating the parent-side path, not the helper-side claim).
- Typecheck clean across the example, test worker, and e2e.

Made-with: Cursor
Filed [#1406](#1406):
\`Think.saveMessages\` should accept an external \`AbortSignal\` so
callers can cancel an in-flight turn from outside.

Captures the cancellation race window documented in this branch
across three places that all referenced the gap as "Stage 4 /
framework follow-up" without a concrete link to follow:

- \`examples/agents-as-tools/src/server.ts\` — \`abortCurrentTurn\`
  doc-comment now links to #1406 instead of the abstract "Stage 4"
  pointer.
- \`examples/agents-as-tools/README.md\` — "Cancellation
  propagation (B4)" paragraph likewise.
- \`wip/inline-sub-agent-events.md\` — related-issues paragraph
  adds #1406 alongside the existing partykit#390 and workerd#6675
  links; the B4 chronological entry replaces "Stage 4 / framework
  follow-up" with the explicit issue link.

Also promoted the saveMessages-AbortSignal fix to a concrete
next-step candidate at the top of the wip doc, slotted between
the Stage 3 RFC and Stage 4 framework helper. It's a small
additive API change that could land before either of those and
would unblock proper helper-side cancellation without waiting on
the broader framework promotion.

No code changes — purely cross-linking.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 15 additional findings in Devin Review.

Open in Devin Review

declare namespace Cloudflare {
interface GlobalProps {
mainModule: typeof import("./src/server");
durableNamespaces: "Assistant" | "Researcher";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Stale env.d.ts missing Planner from durableNamespaces

The env.d.ts declares durableNamespaces: "Assistant" | "Researcher" but wrangler.jsonc:27 lists three classes in new_sqlite_classes: ["Assistant", "Researcher", "Planner"]. The file header says it was generated by wrangler types but it was not regenerated after Planner was added. While Researcher (also a sub-agent, not a top-level binding) is correctly listed, Planner is missing. This inconsistency means the Cloudflare Vite plugin may not discover the Planner class for sub-agent routing, which could cause drill-in to Planner facets to fail at runtime. Regenerating with npx wrangler types env.d.ts --include-runtime false would fix this.

Suggested change
durableNamespaces: "Assistant" | "Researcher";
durableNamespaces: "Assistant" | "Researcher" | "Planner";
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Update partyserver dependency to ^0.5.5 in package.json (and workspace packages) and refresh the lockfile. Reduce test flakiness by widening streaming chunk delays in ai-chat tests (increase chunkDelayMs to give more wall-clock headroom) and add clarifying comments. Also relax a strict ordering assertion in message-concurrency.test.ts to assert set-equality (sort the request IDs) to avoid transient microtask ordering failures while preserving the intended guarantees.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 18 additional findings in Devin Review.

Open in Devin Review

"agents": patch
---

Bump `partyserver` peer dependency to `^0.5.4`. 0.5.4 closes [`cloudflare/partykit#390`](https://github.com/cloudflare/partykit/issues/390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 could lose `this.name` on alarm wake (no `ctx.id.name` propagation in older runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent, restores the safety net pre-0.5.x had. Affects any project on a pre-cutoff `compatibility_date` whose DOs schedule alarms (which includes Think's `_chatRecoveryContinue`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Changeset claims ^0.5.4 but actual dependency is bumped to ^0.5.5

The changeset file says "Bump partyserver peer dependency to ^0.5.4" but the actual changes in both package.json:65 and packages/agents/package.json:34 set the version to ^0.5.5. The published changelog will say 0.5.4 is the minimum, but the package.json requires 0.5.5+. A user reading the changelog who installs partyserver@0.5.4 would not satisfy the actual ^0.5.5 constraint.

Suggested change
Bump `partyserver` peer dependency to `^0.5.4`. 0.5.4 closes [`cloudflare/partykit#390`](https://github.com/cloudflare/partykit/issues/390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 could lose `this.name` on alarm wake (no `ctx.id.name` propagation in older runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent, restores the safety net pre-0.5.x had. Affects any project on a pre-cutoff `compatibility_date` whose DOs schedule alarms (which includes Think's `_chatRecoveryContinue`).
Bump `partyserver` peer dependency to `^0.5.5`. 0.5.4 closes [`cloudflare/partykit#390`](https://github.com/cloudflare/partykit/issues/390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 could lose `this.name` on alarm wake (no `ctx.id.name` propagation in older runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent, restores the safety net pre-0.5.x had. Affects any project on a pre-cutoff `compatibility_date` whose DOs schedule alarms (which includes Think's `_chatRecoveryContinue`).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@threepointone threepointone merged commit 03620a6 into main Apr 28, 2026
3 checks passed
@threepointone threepointone deleted the agents-as-tools branch April 28, 2026 19:33
@github-actions github-actions Bot mentioned this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alarms on fresh 0.5.x DOs with pre-2026-03-15 compat date fail to recover this.name

1 participant