update (most) dependencies#390
Merged
threepointone merged 1 commit intomainfrom Aug 4, 2025
Merged
Conversation
🦋 Changeset detectedLatest commit: 70d93cc The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Merged
threepointone
added a commit
that referenced
this pull request
Apr 28, 2026
…-recovery gap `partyserver` 0.5.4 fixes the bug filed at [`cloudflare/partykit#390`](cloudflare/partykit#390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 would lose `this.name` on alarm wake (no `ctx.id.name` propagation in old runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent; restores the safety net pre-0.5.x had. Surfaced in this branch during e2e suite development. Worked around at the time by giving each test its own Assistant DO via a `?user=<id>` query-param override, so each test's DO was fresh and never hit the alarm path with stale state. With 0.5.4, the workaround is no longer needed for the bug. The per-test unique-user pattern stays purely for test isolation (no helper-row / chat-history state leaks across tests). Verification: - `npm ls partyserver` shows 0.5.4 deduped across `agents`, `examples/voice-input`, and the root. - 43 vitest tests still pass. - 7 Playwright e2e tests still pass — output no longer contains the partyserver "Attempting to read .name… `this.ctx.id.name` is not set" error that fired in the background of every previous run. Updates to docs: - `wip/inline-sub-agent-events.md` — handoff snapshot's "Open framework gap" reframed to "surfaced and fixed"; next-step candidates list drops the framework facet-alarm fix (it landed); two Stage 2 entries reframed; related-issues paragraph marks #390 as fixed; e2e helpers comment updated. - `examples/agents-as-tools/README.md` — e2e tests blurb no longer claims the unique-user pattern works around a framework gap; 0.5.4 link added as a parenthetical. - `examples/agents-as-tools/e2e/helpers.ts` — `uniqueUser` doc- comment updated. What's still open at the framework level (not addressed by 0.5.4): - workerd doesn't yet support independent facet alarms. The helper-side `keepAliveWhile` wrapper is a soft no-op on facets for that reason. Documented in the wip doc's "Hibernation / fibers gaps" section; not in scope for this PR. Made-with: Cursor
threepointone
added a commit
that referenced
this pull request
Apr 28, 2026
…notes (#1405) * feat(agents): add messageType ctor option to ResumableStream Lets non-chat consumers (e.g. helper sub-agents that share a WebSocket with a parent's chat) stamp replay frames with a distinct wire-type tag without colliding with the chat protocol. Defaults to CHAT_MESSAGE_TYPES.USE_CHAT_RESPONSE so existing AIChatAgent and Think callers preserve byte-identical behavior — all 28 existing resumable-streaming.test.ts regression tests pass unchanged. Also exports the new ResumableStreamOptions type from agents/chat. This is the smaller version of the fix proposed in #1377: the tablePrefix part of that proposal is intentionally omitted because the recommended pattern for "events alongside a chat" puts helper events on the helper sub-agent's own DO (so collisions are impossible by isolation). See wip/inline-sub-agent-events.md for the full design. Refs: #1377 Made-with: Cursor * feat(example): agents-as-tools — helpers as sub-agents with parent-forwarded events A focused minimal proof of the agents-as-tools pattern: during a single chat turn, the assistant (a Think agent) dispatches a helper sub-agent (Researcher) to do multi-step work, and the helper's lifecycle events stream live into the chat UI. Architecture (per the 2026-04-27 design pivot in wip/inline-sub-agent-events.md): - Researcher is a real sub-agent with its own DO and SQLite. It owns its own ResumableStream configured with messageType: "helper-event" so its replay frames don't collide with the chat protocol. - Researcher.startAndStream(query, helperId) returns a ReadableStream<{ sequence, body }> over DO RPC. Each emitted event is durably stored on the helper's own SQLite before being written to the stream. - Assistant's tool execute reads from the helper's stream and broadcasts each event onto the chat WS via this.broadcast(...). Browser keeps one WS to the parent — no second connection. - Assistant maintains a tiny active_helpers table for reconnect replay: on onConnect, it walks the table, fetches each in-flight helper's stored events via DO RPC, and forwards them as replay: true frames to the connecting client. - Assistant.onStart sweeps stale active_helpers rows from a previous parent crash and deletes the leaked helper facets. Multi-ai-chat compatible because helpers are sub-agents of whichever DO terminates the WebSocket — top-level Assistant in this demo, or a Chat (which is itself a sub-agent of Inbox) in multi-ai-chat. The forwarding pattern is identical at any nesting depth. Drill-in to a specific helper (a separate `useAgent({ sub: [...] })` connection direct to the helper) is supported structurally by the routing primitive but not wired in the example UI yet. Wire protocol (src/protocol.ts) is shared between server and client, zero Worker-runtime imports — the front-end bundle never transitively pulls in agents/Think/workers-ai-provider through a stray value import. The protocol carries six event kinds (started / step / tool-call / tool-result / finished / error) plus a sequence field for client-side dedup against the reconnect-window race where one event can arrive both as a replay frame and as a live broadcast. The client renders helper events inline under the matching tool part in the assistant message, dedupes by (parentToolCallId, sequence) Set semantics, and inserts in sorted position so the timeline always renders in helper-emit order regardless of wire arrival order. Adds a one-line cross-reference from examples/ai-chat/README.md. Refs: #1377 Made-with: Cursor * chore(wip): inline sub-agent events design + 2026-04-27 pivot Captures the design context behind agents-as-tools and the messageType ctor option on ResumableStream. Frames issue #1377 as the visible tip of a missing first-class pattern (helpers running inside a single chat turn with their lifecycle events streaming into the chat UI), walks through the original "multi-channel ResumableStream on the parent DO" design, and records the 2026-04-27 pivot to "helpers as sub-agents with parent-forwarded events." The pivot rationale, condensed: - State containment. A helper's events belong to the helper's work, not the chat's. Putting them on the helper's own DO is the honest representation. Persistent / inspectable / drill-in-able helpers follow naturally; they don't with parent-side storage. - Reuses the routing primitive instead of inventing. Drill-in, addressing, lifecycle, parent/child RPC are all already shipped. - Smaller framework change. One ctor option (messageType) instead of multi-channel schema + per-channel state machine. Tables don't need to be parameterized — each DO has its own SQLite, so collisions are impossible by isolation. - Multi-ai-chat-compatible. The forwarding pattern works recursively; "the parent" is whichever DO terminates the WS. The doc preserves the original multi-channel design as a record of the design space that was explored. Multi-channel ResumableStream is parked, not killed — if a future use case needs one DO to multiplex N independent durable streams to its own clients (e.g. a workflow DO with parallel tracks), the design is captured and ready. Stages 1 and 2 land alongside this doc: - Stage 1: messageType ctor option on ResumableStream (back-compat patch to agents). - Stage 2: examples/agents-as-tools — focused minimal proof of helpers-as-sub-agents. Open questions around event vocabulary (Ring 2), parallel helpers (empirically should work, v0.2 stress-test follow-up), persistent lifetime (Ring 5), and the AIChatAgent port (Ring 6) are explicit in the doc as the things the example needs to validate before Stage 3's RFC commits to a public API. Made-with: Cursor * fix(example): make agents-as-tools helper streaming production-shaped Tighten the agents-as-tools prototype based on runtime testing and review feedback. Key runtime fix: helper event streams now use byte chunks over DO RPC. `Researcher.startAndStream` returns `ReadableStream<Uint8Array>` where each chunk is NDJSON (`{ sequence, body }`) because workerd's DO RPC stream bridge only transports byte streams. Object chunks caused the parent's first `reader.read()` to fail with the opaque "Network connection lost" error. The parent now decodes bytes, splits on newlines, and forwards each parsed helper event frame. Also improves helper error semantics: helper-side failures are still emitted as inline `error` events, but the parent now turns them into real tool failures instead of returning a successful empty summary. If no `finished` event produces a summary, the tool fails explicitly. UI polish: - add a Clear button wired to `clearHistory()` and local helper-event cleanup - reset helper-event state when messages are cleared from another tab - replace the composer textarea with a single-line Input so Enter submits normally - render text and reasoning parts with Streamdown + @streamdown/code, using the standard Kumo theme bridge - clean Tailwind class warnings in JSON pre blocks Operational cleanup: - bump the example compat date to 2026-04-15 so partyserver 0.5.x can rely on `ctx.id.name` inside alarm handlers - update README and WIP notes to describe the byte-stream design, the Streamdown UI, and the follow-up issues filed from the prototype - remove the duplicate DEMO_USER export from server.ts so protocol.ts stays the single runtime-safe shared module Issues filed while debugging: - cloudflare/partykit#390 for fresh partyserver 0.5.x DOs + old compat dates - cloudflare/workerd#6675 for object ReadableStreams failing with "Network connection lost" - cloudflare/agents#1399 for discussing Rpc.Stub<T>-narrowed sub-agent types Made-with: Cursor * feat(example): retain helper runs for hibernation replay Move the agents-as-tools helper timeline registry from a short-lived active_helpers table to a durable cf_agent_helper_runs table. Helper runs now track running/completed/error/interrupted status and retain completed helper facets so timelines can replay after refresh even after the assistant turn has completed. Runtime behavior: - insert running helper run before reading the helper event stream - mark completed/error when the helper finishes - keep the helper DO after completion because it owns the durable ResumableStream event log - on parent wake, mark any stale running rows as interrupted instead of deleting helper facets - on reconnect, replay stored events for all helper runs and append a synthetic terminal error for interrupted runs (or malformed error runs with no terminal event) - Clear now calls clearHelperRuns before clearHistory, deleting retained helper facets before the chat-clear broadcast to avoid a stale replay race in other tabs Also updates README and the WIP plan to reflect the hibernation state: active-helper refresh, completed-helper refresh, and interrupted replay now work; helper-side keepAliveWhile, helper fibers, live-tail reattachment, TTL/count GC, and a hibernation test matrix remain open follow-ups. Made-with: Cursor * chore(example): wrap helper stream body in keepAliveWhile Wrap the Researcher helper's ReadableStream body in keepAliveWhile so helper live execution has the same Agents-level lifecycle shape as a main Think chat turn. Today keepAlive() is a soft no-op on facets because workerd does not yet support independent facet alarms, so this does not change crash recovery semantics: active RPC stream / Promise chain liveness still carries the run, and parent wake still marks running helper rows as interrupted. Keeping the wrapper is intentional future-proofing for the moment alarms start working inside facets, and keeps the helper code in the shape the eventual framework helper should generate. Update README and WIP notes to make that nuance explicit. Made-with: Cursor * test(example): vitest+workers harness for agents-as-tools Adds 25 tests across four files modeled on examples/assistant/src/tests: - registry.test.ts: cf_agent_helper_runs schema, the running → interrupted onStart sweep, and that the sweep is idempotent and leaves completed/error/interrupted rows untouched. - clear-helper-runs.test.ts: empty-registry no-op, mixed-status cleanup of both rows and helper sub-agents, idempotent re-clear, and the production "missing sub-agent" best-effort path. - helper-stream.test.ts: drives Researcher.startAndStream end-to-end through subAgent and pins down the byte-stream contract — NDJSON envelope, sequence monotonic from 0, started-first ordering, durable storage round-trip, and the synthesize error path that fires when env.AI is unbound. - reconnect-replay.test.ts: every branch of Assistant.onConnect replay — empty, completed, running, error-with-stored-terminal, error-without-terminal (synthetic appended), interrupted (with and without stored events), and multiple runs replayed in started_at order with per-run sequence preserved. Test worker subclasses Assistant and Researcher with a focused seed/inspect surface (testSeedHelperRun, testReadHelperRuns, testRunHelperToCompletion, Researcher.testWriteEvents) so tests can construct lifecycle states without a Workers AI binding. Mirrors the pattern in packages/ai-chat/src/tests/worker.ts; production code stays untouched modulo a single private → protected on Researcher._stream / .stream so the test subclass can write into the helper's own ResumableStream the same way startAndStream does. Made-with: Cursor * chore(wip): inline sub-agent events — capture 2026-04-28 decisions After comparing v0.1 against the screenshots in cloudflare/agents#1377 (comment 4328296343), pin down what's done vs. what's still needed for the workload the OP is actually shipping. Refreshes three sections of the WIP doc: - New "Coverage gap vs. issue #1377's actual workload" subsection with a status matrix mapping each piece of his use case to its implementation status and where to find it. - New "Decisions confirmed 2026-04-28" subsection: helpers must run their own inference loop (Think-first; ai-chat later); parallel fan-out is in scope and orchestrator-driven; tablePrefix is not shipping (the pivot is the answer); per-helper drill-in is in scope; first-class framework integration (helperTool) deferred until the protocol is validated against multi-turn and parallel cases. - Status block updated: vitest harness landed (25 tests, 4 files), "Hibernation test matrix" bullet trimmed to the work that's actually still missing (real eviction cycles), next-steps rewritten to reflect the new ordering — multi-turn Think helper, parallel fan-out, then drill-in detail UI — and an explicit "not in this near-term list" call-out for tablePrefix / helperTool / AIChatAgent port. Made-with: Cursor * feat(example): agents-as-tools — Researcher is now a multi-turn Think helper Closes the "multi-turn helpers" gap from issue #1377's actual workload. Implements "Option B" from wip/inline-sub-agent-events.md. Server side: - `Researcher` now `extends Think<Env>` with its own getModel, getSystemPrompt, getTools (one simulated `web_search`). Helper runs are real Think turns driven by `saveMessages`. Think's own `_resumableStream` is the canonical durable event log — there is no second `ResumableStream` on the helper, so the same-DO collision the original #1377 was about cannot occur. - Forwarder is wired by overriding `broadcast`: while a `runTurnAndStream` is in flight, MSG_CHAT_RESPONSE chunks are tee'd into the active RPC stream. Other broadcasts (state, identity, MSG_CHAT_MESSAGES, direct WS clients of the helper for drill-in) pass through untouched. - Wire vocabulary collapsed from six kinds to four: `started` / `chunk` / `finished` / `error`. Lifecycle events are synthesized by the parent from `cf_agent_helper_runs` row data; `chunk` carries an opaque JSON-encoded UIMessageChunk forwarded verbatim from the helper's `_streamResult`. - Schema gained `helper_type`, `query`, `summary`, `error_message` columns so the parent can synthesize lifecycle events on `onConnect` replay without RPCing the helper for metadata. - `getChatChunksForReplay()` and `getFinalAssistantText()` expose Think's own stored chunks and the synthesized summary for the parent's reconnect-replay and tool-output paths. Client side: - Per-helper UIMessage accumulator using `applyChunkToParts` from `agents/chat` — the same primitive `useAgentChat` uses for the assistant's main message. The helper panel now renders text, reasoning blocks, and internal tool calls as a mini-chat, matching the shape of GLips's screenshots in cloudflare/agents#1377-comment-4328296343. - `seenSequencesRef` Set handles the small reconnect-window race where a chunk arrives both as a replay frame and as a live broadcast (applyChunkToParts mutates the parts array, so dedup has to happen before application). Tests: - `TestResearcher` overrides `getModel()` with a deterministic mock LanguageModel V3, so the helper's Think inference loop runs end-to-end inside the harness with no Workers AI binding. - Reconnect-replay tests seed pre-built UIMessageChunk bodies via `testWriteChunks(chunks, status)`, which writes through Think's own `_resumableStream` exactly the way production does. - Coverage updated for the new four-kind vocabulary and new schema columns. 26 tests across 4 files, all green. Documentation: README + wip doc reflect Option B landing, the new wire vocabulary, and the next-steps reordering (parallel fan-out is now item 1, drill-in detail UI item 2). Made-with: Cursor * fix(example): address Option B review findings Followups to the multi-turn Think helper review. Fixes 8 of the 9 items called out (B3 schema migration deferred — still a prototype). Server changes: - **B1 + B2 (error surfacing).** `Researcher.broadcast` override now detects `parsed.error === true` chat-response frames (which Think's `_streamResult` broadcasts on inference errors with the error string as the body, not a `UIMessageChunk`). Those frames are stashed into `_lastStreamError` instead of being mis-forwarded through the chunk pipeline (where `applyChunkToParts` would silently drop them on the client). The parent's `runResearchHelper` now reads `helper.getLastStreamError()` when no summary is produced and surfaces the actual error message instead of the generic "Researcher finished without producing assistant text" fallback. - **H1 (drill-in safety).** Replaced `getFinalAssistantText` with `getFinalTurnText`, which identifies the assistant message produced by THIS turn by diffing message ids against a snapshot taken at the start of the turn — robust against drill-in clients appending their own turns before the parent reads the summary. - **H2 (concurrent-call guard).** `_runInProgress` boolean is set sync at entry, cleared in finally/cancel. Prevents two concurrent `runTurnAndStream` calls on the same helper from overwriting each other's `_activeForwarder` / `_activeRequestId` state. - **H4 (chatRecovery off).** `Researcher` now declares `override chatRecovery = false`. Helpers are per-turn workers; Think's default chat-recovery fiber would silently re-run the inference loop into a parent that's no longer listening on every helper hibernate-and-wake cycle. - **B4 (abort propagation).** Capture the helper's `requestId` from `saveMessages`'s return value into `_activeRequestId`. The ReadableStream's `cancel` callback now calls `abortCurrentTurn`, which dispatches into Think's `_aborts` registry to actually cancel the in-flight inference loop. No more burning Workers AI on output the parent already abandoned. - **S1 (drop redundant keepAliveWhile).** `saveMessages` already wraps its body in `keepAliveWhile`; the outer wrap was redundant. - **S2 (orphan stream cleanup).** `getChatChunksForReplay` detects a stream still marked `streaming` whose live LLM reader is gone (orphaned by hibernation) and finalizes the metadata before returning chunks. Prevents per-helper streaming-row leaks that would otherwise wait 24h for `_maybeCleanupOldStreams` to GC. Tests added: - **B2.** Drives a turn against a mock model whose `doStream` throws synchronously; asserts `getLastStreamError` returns the actual error message and `getFinalTurnText` returns null. - **H1.** `getFinalTurnText` returns null on a helper that never ran a turn, and returns THIS turn's text after a successful run. The H2 concurrent-call guard is verified by code review (the in-test seam approach lit up an unhandled-rejection trail through workerd's JSRPC bridge that doesn't affect correctness but lights up vitest's detector; documented inline alongside where the test would have lived). Made-with: Cursor * chore(wip): inline sub-agent events — capture Option B review fixes Append a "Stage 2 (Option B review fixes)" subsection to the Status block summarizing what landed in 8357dee7: B1+B2 error surfacing, B4 abort propagation, H1 final-turn text by snapshot/diff, H2 concurrent-call guard, H4 chatRecovery=false, S1 drop redundant keepAliveWhile, S2 orphan stream cleanup. B3 schema migration explicitly deferred. Bump test count from 26 to 29 and update the stale getFinalAssistantText reference to getFinalTurnText. Made-with: Cursor * feat(example): parallel helper fan-out — `compare` tool + per-helper demux Closes the "parallel fan-out" gap from cloudflare/agents#1377's actual workload (image 3 of comment 4328296343 shows several sibling subagents under one parent tool call). Two fan-out shapes are now wired and tested: - **Alpha (LLM-driven).** The orchestrator LLM calls `research` multiple times in one turn (AI SDK `parallel_tool_calls` default). Each helper has its own `parentToolCallId` and renders as one panel under one chat tool part. - **Beta (programmer-driven).** New `compare(a, b)` tool whose `execute` dispatches both helpers via `Promise.all`. Both share the chat tool call's `parentToolCallId`; the wire format distinguishes them by the `helperId` carried inside each event. Renders as two sibling `<HelperPanel>`s under one `<ToolPart>` — the visible GLips pattern. Server changes: - Add `compare(a, b)` tool that dispatches two `runResearchHelper` calls in parallel and returns both summaries. Adjusted system prompt to nudge the LLM toward `compare` for compare/contrast queries. - Test seam: `Assistant.testRunResearchHelper(query, parentToolCallId)` on the test subclass so tests can drive concurrent helpers without going through the LLM. Client changes: - State shape: `Record<parentToolCallId, Record<helperId, HelperState>>`. Single-helper tool calls show one panel; fan-out tool calls show several stacked siblings. - Dedup key extended from `(parentToolCallId, sequence)` to `(parentToolCallId, helperId, sequence)` because two parallel helpers under one tool call both legitimately emit a `sequence: 0` started event. - `<ToolPart>` accepts `helperStates: HelperState[]` and renders them in arrival order (insertion-order-preserving `Object.values` on the bucket). Tests (`parallel-fanout.test.ts`, 3 new tests, 32 total): - Alpha live broadcast — two concurrent `runResearchHelper` calls with different parentToolCallIds; both complete, registry has two distinct rows, frames split cleanly per parentToolCallId with monotonic sequences. - Beta live broadcast — two concurrent calls sharing parentToolCallId; both complete, frames split per helperId, each helper's sequences run 0/1/2/.../N. - Beta replay — `onConnect` walks two seeded rows sharing parentToolCallId and emits per-helper frames. Both helpers' `sequence: 0` started events arrive without colliding because the dedup key now includes helperId. New helper: `startCollectingHelperEvents(ws)` — persistent message accumulator. The existing `collectHelperEvents` lazily attaches a `once`-listener per next-message and would miss broadcasts that fire synchronously inside an awaited `Promise.all` before any test-side await. Docs: README adds a "Tools" section and updates the test-coverage list. WIP doc marks parallel fan-out as landed in the coverage matrix, adds a Stage 2 entry summarizing what landed, and reorders next-steps so per-helper drill-in detail UI is now item 1. Made-with: Cursor * fix(example): parallel fan-out polish — partial-failure UX + deterministic ordering Five review findings from the post-fan-out walkthrough addressed. - **B1: `compare` uses `Promise.allSettled`** instead of `Promise.all`. A partial failure (one helper errored, the other succeeded) used to flip the whole tool call to error while the surviving helper's panel still showed "Done" — a confusing mixed signal. The new shape returns `{ a: { query, summary | error }, b: { query, summary | error } }` so the orchestrator LLM can react to "one of two succeeded" honestly. Killing the surviving helper on first failure is left for a future B4-style abort propagation pass. - **B2: deterministic panel ordering.** `started` event now carries `order: number`; the parent stamps it from a new `displayOrder` parameter on `runResearchHelper` (defaults to 0 for the single-helper `research` tool; `compare` passes 0/1). The client sorts each tool-call's helper bucket by `order` so panels appear left-to-right matching the LLM's input position rather than the random arrival order of `started` broadcasts. Persisted in `cf_agent_helper_runs.display_order` so `onConnect` replay synthesizes the same ordering. Schema bump applied via an idempotent `try { ALTER TABLE ... ADD COLUMN } catch {}` in `onStart`, which doubles as a real (if minimal) migration path for the v0.1 → v0.2 schema gap. - **N3: bulletproof dedup key.** Client's seen-sequence map is now keyed by `JSON.stringify([parentToolCallId, helperId])` rather than a `${parent}::${helper}` template. Removes the theoretical collision when either id contains `::`. - **C1: 3-helper Beta test.** New test stresses the parent's broadcast path with three concurrent helpers under one parentToolCallId. All three rows complete; live frames demux per-helper with monotonic sequences each starting at 0. - **C2: replay-order assertion** added to the existing Beta replay test. Asserts (a) `started` events on replay carry the row's `display_order` as `order`, and (b) `onConnect` replay does NOT interleave per-helper frames — helper-x's last frame arrives before helper-y's first. Pins down the per-row serialization `onConnect` does today. Tests: 33 (was 32). Both typechecks clean. Made-with: Cursor * feat(example): per-helper drill-in detail view Closes the third v0.2 gap from cloudflare/agents#1377's actual workload (the screenshots showed nested chat-like detail under each subagent). Confirms the "drill-in is real chat, not a custom event view" promise of Option B. Each helper panel grew a small ↗ button; clicking it opens a side panel that runs `useAgentChat` directly against the helper's sub-agent URL: useAgent({ agent: "Assistant", name: DEMO_USER, sub: [{ agent: "Researcher", name: helperId }] }) The framework's `subAgent` routing primitive does all the work — no parent intervention, no cross-DO state, just a normal chat hook against a sub-agent. The side panel renders messages with the same `<MessageParts>` component the main chat uses; sending a follow-up message in the panel triggers a real Think turn on the helper with the parent's original query already in context. Client changes: - New `<DrillInPanel>` component: full-height side overlay, backdrop / Escape / ✕ to close, full `useAgentChat` against the helper sub-agent connection, `<MessageParts>` for rendering, an `<Input>` composer for follow-up turns. - `<HelperPanel>` accepts an `onDrillIn?: (helperId) => void` callback and renders an ↗ button next to the status badge when set. Threaded through `<MessageParts>` → `<ToolPart>` → `<HelperPanel>`. - App owns a `drillInHelperId: string | null` state; the panel reads `helperType` and `query` from the existing `helperStateByToolCall` map. Cleared on chat clear (this tab and cross-tab via the messages.length effect). While a turn is running, both the inline panel (parent's broadcast tee) and the side panel (helper's own chat-protocol broadcasts) update live with the same chunks viewed two ways. Caveats kept honest in the README and WIP doc: - `onBeforeSubAgent` is open — any `helperId` will spawn a fresh facet if it doesn't exist. Production should add a `cf_agent_helper_runs` lookup gate. - Recursive drill-in (helper → its own sub-helper) isn't wired; helpers don't dispatch their own helpers in this example. - Sending in the drill-in panel goes through `saveMessages`, which reads `_lastClientTools` / `_lastBody`. With no client tools defined this is a no-op leak; documented under H3. WIP doc: marks drill-in done in the coverage matrix, adds a Stage 2 entry summarizing the implementation, and updates the next-steps list. The example-side roadmap (multi-turn, parallel fan-out, drill-in) is now complete; the next motions are a second helper class to validate the protocol against non-research workloads, `helperTool(Cls)` framework promotion, and the Stage 3 RFC. Tests still pass (33/33); drill-in is purely a client UI change. Made-with: Cursor * fix(example): drill-in review polish — replay-isolation + UX consistency Four review findings from the post-drill-in walkthrough addressed. - **D1: replay reads back THIS turn's chunks, not "latest".** Previously, after a drill-in user fired a follow-up turn through the side panel's composer, the helper had a NEW stream stored. On parent reconnect, `getChatChunksForReplay()` picked the most recent stream by `created_at` and the inline panel rebuilt from the follow-up turn — even though the parent's tool output and `summary` row column reflected the original turn. The two views drifted on every refresh. Fix: capture the helper's stream id after `saveMessages` resolves (`_lastTurnStreamId`, exposed via `getLastTurnStreamId`), stash it in `cf_agent_helper_runs.stream_id`, and have `getChatChunksForReplay(streamId?)` accept an explicit id. `onConnect` reads `row.stream_id` and passes it through. Schema bump applied via the same idempotent `try { ALTER TABLE … ADD COLUMN } catch {}` pattern as `display_order`. Regression test in `reconnect-replay.test.ts`: seeds turn 1's chunks (capturing the row's stream_id), then writes turn 2's chunks via the new `testWriteAdditionalHelperChunks` seam, asserts replay returns turn 1's body and not turn 2's. - **D2: `<DrillInPanel>` keyed by `helperId`.** Switching from one helper's drill-in to another now fully unmounts/remounts the panel — tears down the previous `useAgent` WebSocket cleanly, resets the composer's input state, and avoids any prop-vs- hook-arg drift. One-line fix. - **N1: status badge in drill-in header.** Mirrors the inline panel's Running / Done / Error badge so the side panel's header feels consistent with the panel the user just clicked through from. Prop named `helperStatus` (not `status`) to avoid colliding with `useAgentChat`'s own `status` symbol. - **N2: system-prompt nudge for `compare` partial failure.** Added one line to `Assistant.getSystemPrompt`: "If a `compare` result includes an `error` field for one branch, acknowledge the gap and synthesize from the successful branch only." The structured `Promise.allSettled` shape from the previous polish pass already gives the LLM the data; this nudge tells it what to do with it. WIP doc: appended a "drill-in review polish" Stage 2 entry summarizing what landed AND what we explicitly punted on (N3 side-panel parent-turn indicator; E1 stream-metadata growth; E2 concurrent drill-in tabs; E3 STREAM_RESUMING smoke-test; E4 open `onBeforeSubAgent` gate; focus trap / aria-modal; drill-in tests; duplicated `query` in `compare` output) so we have a paper trail for the choices. Tests: 34 (was 33); D1 regression test added. Made-with: Cursor * feat(example): second helper class — `Planner` + extracted `HelperAgent` base Closes the "is the helper-event vocabulary right?" gap from Ring 2 of the design notes. With a second helper class running the same protocol against a meaningfully different workload, the answer is yes — the chunk firehose generalizes without any vocabulary changes. Server changes: - **Extracted `HelperAgent extends Think<Env>` as the shared base.** All helper-protocol bits — the `broadcast` tee, `runTurnAndStream`, `chatRecovery = false`, the concurrent-call guard, and every lifecycle accessor (`getChatChunksForReplay`, `getLastTurnStreamId`, `getFinalTurnText`, `getLastStreamError`) — live there. Concrete helpers stay thin: pick a model, a system prompt, and a tool surface. - **`Researcher` slimmed down to extends `HelperAgent`** with just its three overrides. Behavior unchanged. - **New `Planner extends HelperAgent`** that produces structured implementation plans (Overview / Affected files / Step-by-step / Open questions) with a single simulated `inspect_file` tool. Different system prompt, different tool surface, same protocol. - **`Assistant._runHelperTurn(cls, query, parentToolCallId, displayOrder?)`** generalizes the previous `runResearchHelper`. `cls` is typed as `HelperClass = typeof Researcher | typeof Planner`; `cls.name` feeds the row's `helper_type` and `subAgent(cls, ...)` spawns the right facet. - **Class registry `helperClassByType: Record<string, HelperClass>`** used by `onConnect` and `clearHelperRuns` to resolve the row's stored `helper_type` string back to a class. Defensive fallback to `Researcher` for unknown types. - **New `plan(description)` tool** dispatching `Planner` via `_runHelperTurn`. Updated system prompt to nudge the LLM toward `plan` for "how do I implement X" queries. - **Wrangler v2 migration** adds `Planner` to `new_sqlite_classes`; same in `tests/wrangler.jsonc`. Idempotent — existing v1 deployments pick it up additively. Test changes: - `Planner` test subclass with the same mock-model + `testWriteChunks` surface as the `Researcher` test class. Deliberately duplicated rather than mixed in (TypeScript class mixins are gnarlier than two ~30-line classes are noisy). - All seams that used to hardcode `Researcher` now accept an optional `className: "Researcher" | "Planner"` arg, defaulting to `Researcher` so existing tests don't have to thread it. - Two new tests (36 total, was 34): - **Planner end-to-end** in `helper-stream.test.ts` — drives a Planner turn through the byte-stream protocol and verifies the same NDJSON / chunk storage / final-text pipe holds. - **Mixed-class clear** in `clear-helper-runs.test.ts` — seeds one Researcher row + one Planner row, verifies `clearHelperRuns` resolves the right facet table for each via the new class registry (would leak a Planner facet if the old hardcoded `Researcher` lookup were still in place). What this validates for Stage 4: - `HelperAgent` IS the shape `helperTool(Cls)` will accept — `Cls extends HelperAgent` is a plausible constraint. - The class-registry pattern is what `helperTool(Cls)` would generate as part of its setup. - `_runHelperTurn` is the ~80-line body that should move into the framework helper. Everything else in `Assistant` (`getTools`, `onStart`, schema migration) stays as consumer-side code. WIP doc: marks the second-helper-class step done in the next-actionable list, adds a Stage 2 entry summarizing the extraction and what it unblocks for Stage 4, updates the test count. Made-with: Cursor * fix(example): drill-in routes by helper's class, not hardcoded "Researcher" Symptom: clicking ↗ on a `Planner` helper panel opens the side panel, which sits on "Connecting to helper…" forever. No errors or warnings — silent failure. Cause: `<DrillInPanel>` hardcoded `agent: "Researcher"` in the `useAgent({ sub: [...] })` call. For a Planner helper, this routed to a Researcher facet with that helper's id. Because `onBeforeSubAgent` is open, the framework auto-spawned a fresh empty Researcher facet, which `useAgentChat` connected to and showed `messages: []` indefinitely. Researcher helpers worked by coincidence — the hardcoded class happened to match the actual class. Fix: pass `helperType` from the helper's state into the `sub.agent` field. The drill-in now routes correctly for any helper class — Researcher to `Researcher`, Planner to `Planner`. The framework's kebab-case URL builder turns the class name into the right path segment (`/sub/researcher/...` or `/sub/planner/...`). README also updated so the documented snippet reflects the dynamic-class pattern, not a hardcoded class name. This was a real consequence of the wip doc's "no drill-in tests" punt — the bug wouldn't have shipped if there were a test that opened drill-in for a non-Researcher helper. Worth keeping in mind when we revisit the React-side test gap. Tests still pass (36/36) since drill-in isn't covered. Made-with: Cursor * fix(example): second-helper-class review follow-ups + doc polish Review fixes for the `Planner` / `HelperAgent` extraction (`02ab6d05`) based on a deep read across that commit and the drill-in routing fix (`e9c0e0ff`): - **M2**: `helperClassByType` is now `as const` and `HelperClass` is derived from `keyof typeof` it. Adding a class is one site (the registry); the type, `_runHelperTurn`'s arg, and the `helperClassFor` lookup all flow from there. The unknown-`helper_type` fallback also `console.warn`s once so drift surfaces early. - **C1**: Planner-specific `onConnect` replay test in `reconnect-replay.test.ts` — seeds a `helperType: "Planner"` row + chunks and asserts replay emits `started` (carrying `helperType: "Planner"`), the seeded `chunk`, and `finished`. Pins the registry lookup so we don't regress to a hardcoded class. - **C2**: `<DrillInPanel>` now validates `helperType` against a `KNOWN_HELPER_TYPES` set before opening `useAgent`. On miss it renders an explicit "Unknown helper class: X" error state instead of the silent "Connecting to helper…" hang the 2026-04-28 routing bug exposed; composer is also disabled in that state. - **N1**: removed `className = "Researcher"` defaults from all test seams (`hasHelper`, `testRunHelperToCompletion`, `testReadStoredHelperChunks`, `testReadHelperFinalText`, `testReadHelperStreamError`, `testSetHelperMockMode`, `testWriteAdditionalHelperChunks`); renamed `testRunResearchHelper` → `testRunHelper(className, query, parentToolCallId, displayOrder?)` with class first to match production. Existing tests updated to pass `"Researcher"` explicitly. Closes the footgun where a future Planner test could silently check Researcher's facet table and pass for the wrong reason. - **Wrangler v2 → v1 consolidation**: rolled the v2 entry that added `Planner` to `new_sqlite_classes` into v1 in both example and test `wrangler.jsonc` files. Nothing's deployed; cleaner for first-time deploys. - **M1 reasoned away**: `cls.name` is stable across the esbuild + `@cloudflare/vite-plugin` build because workerd's `ctx.exports` requires top-level class export names to match the wrangler binding strings. If tooling ever did mangle them, migration is a one-shot SQL `UPDATE` on `cf_agent_helper_runs.helper_type`. Documented in the wip doc; not blocking. Doc polish: - README "How to read this code" walkthrough refreshed to mention `HelperAgent` and the class registry. - README "If you want to extend it" rewritten — both prior bullets (parallel fan-out, drill-in) are now shipped features. - README's "Tests" section updated for `TestPlanner` + Planner end-to-end + C1 + D1 + 3-helper Beta stress test. - README's diagram and inline drill-in snippet updated to use `helperType` rather than hardcoded `"Researcher"`. - Renamed `runResearchHelper` → `_runHelperTurn` references in README, server doc-comments, test file headers, and the Stage 1 + earlier sections of `wip/inline-sub-agent-events.md` that described current state. Rename history entries left as-is. - New Stage 2 entry in the wip doc tracking M1 (skipped), M2 / C1 / C2 / N1 (landed), wrangler consolidation, and the polish pass. Tests: 37 (was 36); one new C1 Planner replay test. Made-with: Cursor * feat(example): production-shape polish — B4 cancellation + E4 sub-agent gate Two of the items from the README's "out of scope" table were really "deferred but small" rather than genuinely out-of-scope. Shipping them lets the example be honestly described as production-shaped rather than demo-shaped. **B4 cancellation propagation: fully wired.** Helper-side cancel was already in place (the RPC stream's `cancel` callback aborts via Think's `_aborts`). What was missing was the parent-side thread: the AI SDK passes an `abortSignal` on each tool execute's second arg, but the example wasn't reading it. Each tool execute now destructures `{ toolCallId, abortSignal }` and threads the signal into `_runHelperTurn` via a new `opts.abortSignal`. The function registers an `abort` listener that cancels the helper RPC reader; the cancel propagates over JSRPC to the helper's `cancel` callback, which calls `abortCurrentTurn`. The post-loop arm checks `signal.aborted` and surfaces the abort as an error (rather than a silent empty summary), which flows through the existing catch arm: row marked `error`, synthesized `error` event broadcast, panel doesn't sit on "Running…". A `finally` arm detaches the listener so a parent that runs many helpers across many turns doesn't accumulate stale closures on its abort signals. **E4 `onBeforeSubAgent` gate: production-shaped.** `Assistant` now overrides `onBeforeSubAgent` to look up the requested `(helperType, helperId)` in `cf_agent_helper_runs` and return a 404 if the row doesn't exist. Drill-in URLs are no longer guessable; an attacker can't drill into a Researcher facet by routing through the Planner endpoint (the gate's `WHERE` clause is on `(helper_id, helper_type)`, so cross-class access fails closed). Internal `subAgent(...)` calls bypass the hook by design (matches `getAgentByName` bypassing `onBeforeConnect`), so `_runHelperTurn`'s own helper spawn isn't blocked by its own check. **Helper-class-agnostic error message.** The empty-summary fallback used to say "Researcher finished without producing assistant text"; updated to use `${helperType}` so a Planner failure now reads "Planner finished…" rather than impersonating Researcher. **Tests.** New `cancellation-and-gate.test.ts` (6 cases): - B4: pre-aborted signal rejects with an abort error - B4: pre-aborted signal marks the row `error` with abort message - B4: same path works for Planner (class-agnostic) - E4: gate rejects an unseeded helperId with 404 - E4: gate accepts a seeded helperId with 101 WS upgrade - E4: gate rejects cross-class drill-in (seeded as Researcher, drilled-in as Planner → 404) Tests: 43 (was 37, +6 new). **Docs.** README's "out of scope" table is now four rows instead of six. The drill-in section's caveat about an "open gate" is replaced with a note documenting the production posture. New "Cancellation propagation (B4)" paragraph next to "Parent-crash recovery". The wip doc's Stage 2 entry now includes a "production-shape polish" sub-section, the older B4 entry is updated to clarify "helper-side half" landed earlier and the parent-side thread landed today, and the punted E4 item is struck through with a pointer to the new landing entry. Made-with: Cursor * test(example): browser-level e2e suite — Playwright + real Workers AI The 2026-04-28 drill-in routing bug shipped because the existing test harness only covers DO RPC + WebSocket frame paths. Bugs that live in the React layer — `useAgent` URL resolution, drill-in routing, replay-vs-live state reducers — slip through. Adds a Playwright suite at `examples/agents-as-tools/e2e/` that boots `vite dev` and drives the real app in Chromium against the production `ai` binding (`remote: true`). High-fidelity: actual WS frames, actual DO routing, actual LLM tool selection. **Tests (7):** - `smoke.e2e.ts` — page loads, WS handshake completes, composer becomes interactive. - `research-drill-in.e2e.ts` — research prompt spawns Researcher panel; drill-in connects to a Researcher facet and renders messages. - `planner-drill-in.e2e.ts` — same flow for `plan`. Pins the `e9c0e0ff` regression: with the bug, drill-in to a Planner panel hung on "Connecting to helper…". - `compare-fanout.e2e.ts` — `compare` prompt renders TWO Researcher panels under one chat tool call. - `refresh-replay.e2e.ts` — completed runs survive a page reload. Single-helper + Researcher+Planner two-helper cases. - `clear.e2e.ts` — Clear wipes both surfaces; reload after Clear doesn't bring panels back. **Per-test fresh DO:** Each test goes to `/?user=<unique>`. The client now honors that param as an override for `DEMO_USER`, so each test runs against its own Assistant DO. Sidesteps a framework gap: alarms scheduled inside helper facets lose `ctx.id.name` when they fire after a dev-server restart (the 2026-04-15 compat-date fix covers top-level DOs, not facets). With unique users, each test's DO is fresh — no in-flight alarms from a previous run. Captured the framework gap in the wip doc as an upstream-needed fix in partyserver / agents. **Stable selectors:** Added minimal `data-testid` hooks to `client.tsx`: `helper-panel` (with `data-helper-type` / `data-helper-id` / `data-helper-status`) and `drill-in-panel`. The rest of the suite uses ARIA semantics (`getByRole`, `getByPlaceholder`). Drive-by: fixed two Kumo a11y warnings by adding `aria-label` to the parent and drill-in composers. **Config:** `playwright.config.ts` boots `vite dev` via `webServer`; `workers: 1` serializes tests so they don't fight over Workers AI capacity; `retries: 1` rides out occasional 504s; `timeout: 180_000` covers the slow `kimi-k2.5` model. `expect`'s per-action timeout is 60s. Scripts: `npm run test:e2e` (headless), `test:e2e:headed`, `test:e2e:ui` (interactive). **Local-only for now:** User's stated workflow is local. CI integration would need `playwright install --with-deps chromium` and a Workers AI auth shape from the runner; punted. Full suite ~4-5 minutes locally. 7/7 passing. Made-with: Cursor * docs(wip+example): session-handoff snapshot for `agents-as-tools` 2026-04-28 The branch covers a lot of ground (20 commits, ~8500 insertions across the example, packages/agents, and the wip doc). Adds a session-handoff read at the top of the wip doc and a status banner on the example README so the next session opens to a clear "what's shipped, what's next" view rather than scrolling through the chronological log. **`wip/inline-sub-agent-events.md`** — new "Resuming this work — snapshot 2026-04-28" section right after the intro: - What's shipped on this branch (Stage 1 + Stage 2 + tests, with pointers to Stage 2's two-helper-class extraction, drill-in, cancellation, gate, and e2e suite). - What's NOT shipped (Stage 3 RFC, Stage 4 framework helper, Stage 5 AIChatAgent port). - The newly-surfaced framework gap (alarms inside helper facets lose `ctx.id.name` after dev-server restarts; the 2026-04-15 compat-date fix covers top-level DOs but not facets) — captured as a near-term next-step candidate. - Three concrete next-step candidates ranked by leverage: Stage 3 RFC (highest), framework facet-alarm fix (medium scope), Stage 4 helperTool(Cls) (premature without the RFC). - "How to resume tactically" — exact commands to verify the current state (`npm test`, `npm run test:e2e`, `npm start`) and the canonical entry points in the doc + code. **`examples/agents-as-tools/README.md`** — new "Status (2026-04-28)" section right after the intro: one paragraph summary of what the example covers (three tools, two helper classes, drill-in, cancellation, gate, replay), the test counts (43 vitest + 7 Playwright), and the next-up work pointer to the wip doc. No code changes — the comments + doc-comments inline already capture the per-bug context (e.g. the framework facet-alarm gap is documented at the e2e helper that works around it). Made-with: Cursor * revert(agents): drop unused `messageType` ctor option on `ResumableStream` Reverts `acce611c` (the very first commit on this branch). The option landed before the v0.2 design pivot moved helper events onto each helper's own DO. After the pivot, no caller in the repo uses a non-default value: - Think's only `new ResumableStream(this.sql.bind(this))` doesn't pass options. - The `agents-as-tools` example doesn't instantiate `ResumableStream` directly at all — helpers use Think's, and `helper-event` envelopes are broadcast outside the `ResumableStream` machinery (parent `broadcast()` over its own WebSocket, not a second `ResumableStream` over the same connection). - No test exercises a non-default `messageType`. The option's whole purpose was to prevent frame-type collisions when two `ResumableStream` instances share a WebSocket. The pivot ensured that situation cannot arise in the example, and the changeset itself acknowledged the same isolation argument applies to the dropped `tablePrefix` from #1377. Shipping `messageType` therefore added public API surface that nothing exercised — speculative API we hadn't validated. Removing it now keeps the framework's surface honest. What's reverted: - `ResumableStreamOptions` type + `messageType` ctor arg in `packages/agents/src/chat/resumable-stream.ts` — back to the original `constructor(sql: SqlTaggedTemplate)` signature. - The four `this._messageType` usages in `replayChunks` restored to hardcoded `CHAT_MESSAGE_TYPES.USE_CHAT_RESPONSE`. - `ResumableStreamOptions` removed from `packages/agents/src/chat/index.ts` public exports. - Changeset `.changeset/resumable-stream-message-type.md` deleted. Verification: - `git diff main -- packages/agents .changeset` is empty. - 28 `resumable-streaming.test.ts` regression tests in `packages/ai-chat` still pass. - 43 `agents-as-tools` vitest tests still pass (none touched `ResumableStream`'s ctor surface). - Typecheck clean across `packages/agents`, the example, its test worker, and the e2e suite. wip doc updates: - Stage 1 status flipped from "landed" to "not shipped" in three places (top handoff, Status section, Stage 1 plan section header). The original plan kept as historical context. - The "Decisions confirmed 2026-04-28" entry that ruled out `tablePrefix` is rewritten to also cover `messageType` — same isolation argument applies, both options are unnecessary after the pivot. - "What's shipped vs unshipped" table row for `messageType` flipped to "not shipping". - "Explicitly NOT in this near-term list" entry now mentions both ctor options. - The v0.1 narrative section that describes the original `messageType: "helper-event"` setup is left as-is; it accurately captures historical state at the time and is superseded by the v0.2 update entry that follows it. If a future use case needs a non-default `messageType` (or a `tablePrefix`), it's a small additive change with a real caller to point at. Made-with: Cursor * Update inline-sub-agent-events.md * revert(example): drop premature `agents-as-tools` mention from `ai-chat` README The "Related" section pointing at `examples/agents-as-tools` was added in `d95f0b53` (the initial example commit) before the design firmed up. Cross-linking the two examples implies a parity between them — "this is the AIChatAgent equivalent of that" — that doesn't exist yet: `agents-as-tools` is Think-only and the AIChatAgent port is explicitly Stage 5 (deferred). Promoting the link from `ai-chat`'s README before the port lands sets a misleading expectation. The reverse direction (`agents-as-tools` → `ai-chat`) stays — it correctly describes `ai-chat` as "the canonical AIChatAgent reference" without claiming the example is itself ported there. Will re-add this link to `ai-chat`'s README when the AIChatAgent port lands as part of Stage 5. Made-with: Cursor * chore(deps): bump partyserver to ^0.5.4 — closes the facet-alarm name-recovery gap `partyserver` 0.5.4 fixes the bug filed at [`cloudflare/partykit#390`](https://github.com/cloudflare/partykit/issues/390): fresh 0.5.x DOs with `compatibility_date` older than 2026-03-15 would lose `this.name` on alarm wake (no `ctx.id.name` propagation in old runtimes, and 0.5.x had stopped writing the `__ps_name` legacy fallback record). The fix is a defensive one-time `__ps_name` write on first fetch — idempotent; restores the safety net pre-0.5.x had. Surfaced in this branch during e2e suite development. Worked around at the time by giving each test its own Assistant DO via a `?user=<id>` query-param override, so each test's DO was fresh and never hit the alarm path with stale state. With 0.5.4, the workaround is no longer needed for the bug. The per-test unique-user pattern stays purely for test isolation (no helper-row / chat-history state leaks across tests). Verification: - `npm ls partyserver` shows 0.5.4 deduped across `agents`, `examples/voice-input`, and the root. - 43 vitest tests still pass. - 7 Playwright e2e tests still pass — output no longer contains the partyserver "Attempting to read .name… `this.ctx.id.name` is not set" error that fired in the background of every previous run. Updates to docs: - `wip/inline-sub-agent-events.md` — handoff snapshot's "Open framework gap" reframed to "surfaced and fixed"; next-step candidates list drops the framework facet-alarm fix (it landed); two Stage 2 entries reframed; related-issues paragraph marks #390 as fixed; e2e helpers comment updated. - `examples/agents-as-tools/README.md` — e2e tests blurb no longer claims the unique-user pattern works around a framework gap; 0.5.4 link added as a parenthetical. - `examples/agents-as-tools/e2e/helpers.ts` — `uniqueUser` doc- comment updated. What's still open at the framework level (not addressed by 0.5.4): - workerd doesn't yet support independent facet alarms. The helper-side `keepAliveWhile` wrapper is a soft no-op on facets for that reason. Documented in the wip doc's "Hibernation / fibers gaps" section; not in scope for this PR. Made-with: Cursor * fix(example): helper-side abort was dead code; switch to `_aborts.destroyAll()` + add changeset **Adds the missing changeset** for the partyserver 0.5.4 bump (`.changeset/agents-partyserver-0.5.4.md`) — should have landed with `c4a0d887` and didn't. Patch bump on `agents` documenting the peer-dependency raise. **Fixes a real bug** in the helper-side cancellation path. The review found that `_activeRequestId` is only set AFTER `saveMessages` resolves (line 377 of `src/server.ts` was after the await), and `releaseClaim()` immediately clears it back to undefined (line 400). The synchronous span between line 377 and line 400 has no awaits, so the `cancel()` callback at line 407 cannot ever observe `_activeRequestId !== undefined` during a real cancellation — the entire abort path was dead. The B4 vitest tests still pass because they validate the PARENT-SIDE error surfacing (`signal.aborted` check + thrown error + row update + synthesized `error` event), which works end-to-end. The helper-side cancellation never actually fired. What the fix does: 1. **Drops the dead `_activeRequestId` field** — no longer used for anything. The stream-id capture at the same point (`_lastTurnStreamId`) is unaffected. 2. **Switches `abortCurrentTurn` to `_aborts.destroyAll()`.** The helper is single-purpose (one in-flight turn at a time, guarded by `_runInProgress`), so the only controller in the registry, if any, is the one we want to cancel. destroyAll doesn't need the requestId Think generates internally. 3. **Honest doc-comment on `abortCurrentTurn`** explaining the remaining race window: Think's `saveMessages` lazily creates the controller via `_aborts.getSignal(requestId)` after several internal awaits (`keepAliveWhile` → `_turnQueue.enqueue` → `appendMessage` → `_broadcastMessages` → then `getSignal`). If `cancel()` arrives before that point, the registry is empty and `destroyAll()` is a no-op; the inference runs to completion. 4. **Updated `cancel()` callback comment** documenting the same. In practice cancels arrive mid-inference (Stop button after several seconds of streaming) and the controller exists by the time we destroyAll, so the abort works. Early cancels (a pre-aborted signal, or an instant tab close) still waste one inference pass. The proper fix needs `Think.saveMessages` to accept an external `AbortSignal` so the helper can pre-create a controller it owns from the start of the turn. That's a Think public API addition — deliberately out of scope for this PR; tracked in the wip doc as a Stage 4 / framework follow-up. **Doc updates:** - `examples/agents-as-tools/README.md` — "Cancellation propagation (B4)" rewritten to be honest about the best-effort nature and the race window. The "Try the cancellation propagation path" extension hint reframed accordingly. - `wip/inline-sub-agent-events.md` — both B4 entries (chronological "helper-side abort plumbing" and Stage 2 recap) updated. First-attempt approach, the dead-code finding, and the destroyAll fix all captured. Notes that the B4 vitest tests validate parent-side surfacing, not helper-side abort. Verification: - 43 vitest tests pass (no behavior change — they were validating the parent-side path, not the helper-side claim). - Typecheck clean across the example, test worker, and e2e. Made-with: Cursor * docs(wip+example): link to filed issue cloudflare/agents#1406 Filed [cloudflare/agents#1406](https://github.com/cloudflare/agents/issues/1406): \`Think.saveMessages\` should accept an external \`AbortSignal\` so callers can cancel an in-flight turn from outside. Captures the cancellation race window documented in this branch across three places that all referenced the gap as "Stage 4 / framework follow-up" without a concrete link to follow: - \`examples/agents-as-tools/src/server.ts\` — \`abortCurrentTurn\` doc-comment now links to #1406 instead of the abstract "Stage 4" pointer. - \`examples/agents-as-tools/README.md\` — "Cancellation propagation (B4)" paragraph likewise. - \`wip/inline-sub-agent-events.md\` — related-issues paragraph adds #1406 alongside the existing partykit#390 and workerd#6675 links; the B4 chronological entry replaces "Stage 4 / framework follow-up" with the explicit issue link. Also promoted the saveMessages-AbortSignal fix to a concrete next-step candidate at the top of the wip doc, slotted between the Stage 3 RFC and Stage 4 framework helper. It's a small additive API change that could land before either of those and would unblock proper helper-side cancellation without waiting on the broader framework promotion. No code changes — purely cross-linking. Made-with: Cursor * Bump partyserver and stabilize tests Update partyserver dependency to ^0.5.5 in package.json (and workspace packages) and refresh the lockfile. Reduce test flakiness by widening streaming chunk delays in ai-chat tests (increase chunkDelayMs to give more wall-clock headroom) and add clarifying comments. Also relax a strict ordering assertion in message-concurrency.test.ts to assert set-equality (sort the request IDs) to avoid transient microtask ordering failures while preserving the intended guarantees.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.