Skip to content

sub-agent routing: RFC + implementation (phases 1–3)#1355

Merged
threepointone merged 18 commits intomainfrom
spike-subagent-routing
Apr 22, 2026
Merged

sub-agent routing: RFC + implementation (phases 1–3)#1355
threepointone merged 18 commits intomainfrom
spike-subagent-routing

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented Apr 21, 2026

Foundation for #1350's Chats composition pattern. RFC + full server/client implementation of external addressability for sub-agents.

What ships

Let a client reach a facet (child DO created by Agent#subAgent()) directly via a nested URL:

/agents/{parent-class}/{parent-name}/sub/{child-class}/{child-name}[/...]

Four new public primitives forming a symmetric table with the existing top-level APIs:

Get a stub Handle a full request
Top-level getAgentByName(namespace, name) routeAgentRequest(req, env) — runs onBeforeConnect/Request
Sub-agent getSubAgentByName(parent, Cls, name) routeSubAgentRequest(req, parent, opts) — runs onBeforeSubAgent

Plus:

  • onBeforeSubAgent(req, { class, name }) — parent-side middleware hook on Agent; mirrors onBeforeConnect / onBeforeRequest (return Request | Response | void).
  • this.parentPath / this.selfPath — ancestor chain on the Agent base. Inductive across recursive nesting; populated at facet init time.
  • this.hasSubAgent(class, name) / this.listSubAgents(class?) — parent-side introspection, maintained as a registry side-effect of subAgent() / deleteSubAgent().
  • useAgent({ agent, name, sub: [...] }) — flat-array nesting on the React client, with .path on the hook return.

All additive. Existing consumers unaffected.

Commit sequence

  1. Spike — a self-contained proof that WS upgrade + HTTP propagate through a two-hop fetch() chain (Worker → parent DO → facet Fetcher), and that after upgrade the parent is not in the hot path.
  2. RFCdesign/rfc-sub-agent-routing.md. Design discussion, decisions, edge cases, follow-ups.
  3. RFC tightening — renames to the final API names (onBeforeSubAgent, routeSubAgentRequest, getSubAgentByName), middleware-hook framing, flat sub array.
  4. RFC deep review — flat array, parentPath/selfPath, hasSubAgent/listSubAgents, retry semantics for reconnect, edge-cases section, 4-quadrant primitive table.
  5. Spike extension → design locked — confirmed RpcTarget doesn't survive across separate RPC calls (lifetime tied to the returning call). The viable path is a stateless per-call bridge; RFC updated with the resolved design.
  6. Phase 1 — FoundationparentPath / selfPath, registry-backed hasSubAgent / listSubAgents, _cf_initAsFacet extended to carry the path, null-character validation on names.
  7. Phase 2 — RoutingonBeforeSubAgent hook, Agent base fetch dispatch arm, sub-routing.ts with parseSubAgentPath + routeSubAgentRequest + forwardToFacet, _cf_invokeSubAgent bridge, getSubAgentByName Proxy.
  8. Phase 3 — ClientUseAgentOptions.sub flat array, URL construction (buildSubPath), cache key on full chain, .path on return, return-type uses Omit<PartySocket, "path">.
  9. RFC tracking — the multi-session and routing RFCs now reference the landed primitives instead of describing them hypothetically.
  10. Changeset.

Test coverage

  • Spike (9 tests): WS/HTTP pass-through, parent-out-of-hot-path invariant, per-child isolation, unknown-class 404, plus the per-call bridge in 4 variants — getSubAgentByName direct RPC, state readback across WS + RPC paths, multiple independent calls, .fetch() rejection pointer.
  • Phase 1 / sub-agent (9 new tests): direct-child parentPath, selfPath, nested two-level parentPath, persistence across abort+re-access, hasSubAgent round-trips, listSubAgents enumerate + filter, null-char rejected.
  • Phase 2 / sub-agent-routing (15 tests): getSubAgentByName RPC round-trip, registry reflects spawns, .fetch() rejection, thenable guard, onBeforeSubAgent allow/reject/mutate with strict-registry + 401+WWW-Authenticate patterns, routeSubAgentRequest from a custom handler, parseSubAgentPath matches default/trailing/encoded/unknown-class/missing/truncated.

Full npm run check clean (74 projects typecheck, tests green).

What's NOT in this PR

Relationship to other PRs

Test plan

  • npm run check — all 74 tsconfigs typecheck, sherif / export checks / oxfmt / oxlint clean.
  • @cloudflare/agents tests (all 71 test files): passing. Includes the spike, the sub-agent suite (43 tests), and sub-agent-routing (15 tests).
  • examples/elevenlabs-starter pre-existing error unrelated.

Proves that a facet (created via `subAgent()` and obtained via
`ctx.facets.get()`) is reachable from outside the Worker via a
two-hop `fetch()` chain: Worker → parent DO → facet Fetcher. This is
the foundation that "Option B" in the upcoming sub-agent routing RFC
rests on.

The test worker exposes `/spike-sub/{parent}/sub/SpikeSubChild/{name}`,
which routes to a SpikeSubParent DO. The parent's `fetch()` override
matches `/sub/{class}/{name}`, resolves the facet via
`ctx.facets.get()`, rewrites the URL to strip the prefix, and returns
`fetcher.fetch(req)`. The child is a regular Agent with `onConnect` /
`onMessage` / `onRequest`.

Five tests, all green. Key confirmations:

1. WS upgrade propagates through the double hop. The 101 response
   carries a `webSocket` that the client can accept and use.
2. After upgrade, application frames go direct to the child — the
   parent's `fetch()` counter stays at 1 no matter how many messages
   the client sends. This is the critical invariant that makes the
   pattern usable for per-chat DOs: the parent gatekeeps connects
   and then gets out of the hot path.
3. Agent protocol messages (e.g. `cf_agent_identity`) sent from the
   facet reach the client — so the Agent base's connect-time broadcast
   still works end to end across the chain.
4. HTTP works symmetrically. `POST /spike-sub/.../sub/.../anything`
   reaches `onRequest` with the `/sub/{class}/{name}` prefix stripped.
5. Per-child isolation holds across multiple children rooted at the
   same parent.

This is intentionally kept small and self-contained in the existing
agents test harness — a single new pair of agent classes plus one
test file. It is **not** the public API; the RFC and subsequent
implementation will wire this through `routeAgentRequest`, add a
parent-side `authorizeSubAgent` hook, and expose a client-side
nested-address API for `useAgent`.

Made-with: Cursor
Complements the sub-agent routing spike (97e814d). Proposes:

- URL shape: /agents/{parent}/{parent-name}/sub/{child}/{child-name}
  with configurable `subPrefix` (defaults to "sub"), recursive
  through additional /sub/... segments.
- Parent-side `authorizeSubAgent(req, { class, name })` hook, called
  after `onBeforeConnect`/`onBeforeRequest`; default permissive.
  Apps override to implement strict registry-based access without
  needing a separate option.
- Client-side nested `useAgent({ agent, name, sub: { agent, name } })`
  with `.path` added to the hook's return surface and identity
  messages. `.agent`/`.name` still refer to the leaf so downstream
  hooks like useAgentChat are unchanged.
- Composable `forwardToSubAgent()` helper so users with custom
  routing (basePath, custom prefixes) can wire this into their own
  fetch handlers.
- WS and HTTP are symmetric; auth is a three-tier model layered over
  existing `onBeforeConnect`/`onBeforeRequest`.
- Zero-migration for existing consumers; opt-in via the `sub` API
  and URL shape.

Implementation plan is split into four steps (sub-routing helper,
parent-side dispatch in the Agent base, client API extensions,
tests) with explicit follow-ups for capability tokens, enumeration
APIs, and the partyserver backport.

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 21, 2026

🦋 Changeset detected

Latest commit: 961cded

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 21, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1355

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1355

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1355

hono-agents

npm i https://pkg.pr.new/hono-agents@1355

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1355

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1355

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1355

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1355

commit: 961cded

Renames the parent-side middleware hook to match the existing
`onBeforeConnect` / `onBeforeRequest` pattern exactly — same prefix,
same return-type shape (`Request | Response | void`), same mental
model. The earlier name boxed the hook into "auth only"; the real
shape is a middleware that can allow (default), mutate the request,
short-circuit with a response, or reject.

- D2 section reshaped to lead with "middleware hook" framing; auth
  is documented as one of several use cases (strict gate, header
  injection, cached short-circuit).
- Return type now `Request | Response | void`, mirroring the top-
  level hooks instead of `boolean | Response`.
- Implementation pseudocode updated to branch on `instanceof
  Response` / `instanceof Request` / fall-through.
- Tests section lists the three cases (Response → verbatim,
  Request → forward mutated, void → forward original).
- Added a "Decided" section under Open questions to capture the
  naming decision and rationale, so we don't re-litigate.

Made-with: Cursor
Two linked changes that tighten the public surface:

1. Rename `forwardToSubAgent` → `routeSubAgentRequest`. Symmetric
   with `routeAgentRequest` (same verb, same noun, same return
   shape). Users who know the top-level router immediately know the
   sub-agent one.

2. Add `getSubAgentByName(parent, Cls, name)` as the sub-agent
   analog of `getAgentByName`. For callers outside the parent DO
   that want a typed stub — not to handle an incoming request, but
   to make a single RPC call into a specific child.

Together these form a clean four-quadrant table of primitives:

                  | Get a stub                  | Handle a request
  ----------------|-----------------------------|--------------------------
  Top-level       | getAgentByName              | routeAgentRequest
  Sub-agent       | getSubAgentByName           | routeSubAgentRequest

Semantics pinned: `routeSubAgentRequest` runs `onBeforeSubAgent`
(parent is on the external-routing path). `getSubAgentByName` does
not (caller already has the parent stub, responsibility moves up
the stack). Same split as `routeAgentRequest` vs `getAgentByName`
with respect to `onBeforeConnect`.

Implementation plan now in five steps with the cross-DO RPC
question called out explicitly — the spike proved `.fetch()` across
the double hop, but we need an extra test confirming that a
Fetcher returned from a parent RPC call still supports full RPC
semantics on the caller side. If that doesn't work, documented
fallback is "getSubAgentByName returns a fetch-only stub, expose
explicit parent bridge methods for RPC."

D8 rewritten to lead with the four-quadrant table, then walk
through `routeSubAgentRequest` and `getSubAgentByName` with code
examples. Test plan updated to cover both, including a specific
test that pins the `onBeforeSubAgent` firing semantic.

Decided section captures both rename decisions with rationale.

Made-with: Cursor
devin-ai-integration[bot]

This comment was marked as resolved.

Consolidates the findings from a careful review pass into a tighter
document. Substantive changes:

Design
- D1: name URL-encoding spec; null-char reserved; class-name vs
  subPrefix collision rule. Documented class-collision-with-top-
  level-binding footgun.
- D3: lazy-create is default; strict is one-line opt-in using the
  new `hasSubAgent` primitive.
- D4: `sub` becomes a flat array `[{agent,name}, ...]` — cleaner
  dynamic construction, symmetric with `.path`. Drop `path` from
  the wire protocol; client computes it locally from its own input.
- D6: client retry hardening — 4xx + terminal WS close codes stop
  reconnection. Needed for sub-agent deletion UX and is general
  improvement anyway.
- D7 (new): parent-side introspection — `parentPath`, `selfPath`,
  `hasSubAgent`, `listSubAgents`. Registry maintained inside the
  existing `subAgent` / `deleteSubAgent` calls. Collapses the
  planned "parent-side enumeration" follow-up into v1.
- Edge cases: consolidated semantics section covering hook throws,
  hook-before-class-check ordering, URL rewrite scope, header
  passthrough, recursive auth, name encoding, class case.

Docs/structure
- Rewritten top-to-bottom for readability — same rough outline, but
  tighter prose and fewer redundant paragraphs.
- Removed spike commit hash (will go stale); reference the test
  file instead.
- "Decided" section is now a running log of locked-in choices with
  the reasoning, for reviewer context.

Implementation plan
- Five steps with the `parentPath` + registry wired through step 2.
- `_cf_initAsFacet` extended to take `parentPath`; derivation is
  inductive across recursive nesting.
- Cross-DO RPC-via-returned-Fetcher flagged as the one remaining
  spike we'd run before the feature PR lands.

No public-surface decisions changed beyond what was reviewed — this
is the captured reading of all those review-conversation answers.

Made-with: Cursor
Spike extension confirms the cross-DO stub passthrough question the
RFC left open, and lands the answer in the design doc.

## Finding

- Returning a facet stub from a parent's RPC method fails with
  DataCloneError ("Could not serialize object of type DurableObject").
  Same for a top-level DO stub — this is a general limit, not
  facet-specific.
- An `RpcTarget` wrapper that holds the facet stub and proxies an
  `invoke(method, args)` surface *does* cross the boundary, but its
  lifetime is scoped to the RPC call that returned it. Reusing the
  reference across separate calls breaks with "internal error".
- The viable path is a stateless per-call bridge: the parent
  exposes one RPC method (`invokeSubAgent(childName, method, args)`)
  that resolves the facet via `this.subAgent(...)` (idempotent) and
  dispatches the call fresh each time. The caller-side
  `getSubAgentByName` wraps it in a JS Proxy so the public API
  stays exactly as the RFC specified.

Cost: one extra RPC hop per call (caller → parent → facet).
Benefit: works across hibernation, no reference-lifetime gotchas.
Limitation: the returned Proxy supports RPC method calls only, not
`.fetch()`. External HTTP/WS routing goes through
`routeSubAgentRequest`.

Subtle gotcha pinned: the parent-side bridge must call
`handle[method](...args)` in one expression. Extracting via `const
fn = handle[method]; fn.apply(handle, args)` detaches the workerd
RpcProperty binding and throws.

## Tests

Spike file now has 9 tests (up from 5): the original WS/HTTP
passthrough confirmations, plus four covering the per-call bridge
(direct invoke, reading state mutated via WS, JS-Proxy ergonomics,
reuse across multiple independent calls).

## RFC updates

- Step 3 of implementation plan rewritten around the per-call
  bridge and its gotcha. Previous direct-stub-return sketch + open
  question replaced with the actual answer.
- D8 section for `getSubAgentByName` documents the .fetch()
  limitation and the extra RPC hop.
- Decided section captures the finding.
- Tests section updated.

## Devin review feedback (PR #1355)

- "Broken link to rfc-ai-chat-maintenance.md" — the file lives on
  PR #1353 (not yet on main). Replaced the dead relative link with
  a PR reference that'll resolve.
- "New RFC not added to design/AGENTS.md Current contents table" —
  added the row.

Made-with: Cursor
…tion

Additive foundation for the sub-agent routing RFC. No new public
behavior users must opt into; all changes layer cleanly under
existing primitives.

## On `Agent`

- `parentPath` (readonly getter) and `selfPath` (getter) —
  root-first ancestor chain. `parentPath` is empty for top-level
  DOs; populated at facet init time from the parent's own
  `selfPath`. Inductive across recursive nesting: Tenant → Inbox →
  Chat correctly produces a two-level chain on the Chat.

- `_cf_initAsFacet(name, parentPath)` — signature extended (second
  arg defaults to []). Persists the chain to storage. Restored on
  boot alongside the existing `_isFacet` flag.

- `hasSubAgent(className, name)` and `listSubAgents(className?)` —
  parent-side introspection backed by an auto-maintained
  `cf_agents_sub_agents` SQLite table. Rows are written by
  `subAgent()` and deleted by `deleteSubAgent()`. The table is
  created lazily on first use. Primarily for strict-registry
  access patterns in `onBeforeSubAgent` (coming in phase 2).

## On `subAgent` / `deleteSubAgent`

- `subAgent()` now derives the child's `parentPath` from its own
  `selfPath` and passes it to `_cf_initAsFacet`. Also records a
  registry row. Null character in `name` is rejected (reserved for
  the facet composite key).

- `deleteSubAgent()` removes the registry row in addition to the
  existing `ctx.facets.delete(...)` call.

- `destroy()` drops the new `cf_agents_sub_agents` table.

## Tests

9 new tests in `sub-agent.test.ts`, covering:

- Direct-child parentPath / selfPath structure.
- Nested parentPath (two-level).
- parentPath survives abort + re-access (persisted in storage).
- `hasSubAgent` returns true after spawn, false before.
- `hasSubAgent` returns false after `deleteSubAgent`.
- `listSubAgents` enumerates all spawned children.
- `listSubAgents` filters by class.
- Name with null character rejected.

Full agents test suite green (both the existing 34 sub-agent tests
and the new 9; broader suite also passes).

Made-with: Cursor
Server-side of the RFC. All pieces wired through the base Agent
class; existing consumers unaffected (URLs without /sub/ fall
through to super.fetch unchanged; default onBeforeSubAgent is void).

## What lands

- `Agent#fetch` override that detects `/sub/{child-class}/{child-name}`
  in the incoming URL, calls `onBeforeSubAgent`, and forwards the
  request into the facet Fetcher with the prefix stripped. The
  child sees a clean request URL.

- `Agent#onBeforeSubAgent(req, child)` — overridable middleware
  hook mirroring `onBeforeConnect` / `onBeforeRequest`. Returns
  `Request | Response | void`. Default: void.

- `_cf_invokeSubAgent(className, name, method, args)` — bridge RPC
  method. Called by `getSubAgentByName`'s client-side Proxy to
  dispatch typed RPC into a facet. The parent resolves the facet
  fresh each call (idempotent via `this.subAgent`) and dispatches.
  Survives hibernation.

- `_cf_resolveSubAgent(className, name)` — shared internal that
  `subAgent(cls, name)` and `_cf_invokeSubAgent` both funnel
  through. Takes the class-name string rather than the class ref,
  so `_cf_invokeSubAgent` can work from an RPC-marshalled string
  without the class-name lookup footgun where `ctx.exports[cls.name]`
  returns a wrapper whose `.name` is no longer the original.

- New `sub-routing.ts` module:
  - `parseSubAgentPath(url, { subPrefix?, knownClasses? })` —
    splits a URL into `{ childClass, childName, remainingPath }`.
    Handles kebab↔CamelCase class resolution, URL-decoded names.
  - `routeSubAgentRequest(req, parent, { fromPath?, subPrefix? })`
    — sub-agent analog of `routeAgentRequest`. Use in custom fetch
    handlers that don't match the default `/agents/...` shape.
    Runs `onBeforeSubAgent` on the parent.
  - `getSubAgentByName(parent, Cls, name)` — sub-agent analog of
    `getAgentByName`. Returns a typed Proxy. RPC-only: `.fetch()`
    throws with a pointer at `routeSubAgentRequest`. Does *not*
    run `onBeforeSubAgent` (consistent with `getAgentByName` not
    running `onBeforeConnect`). Thenable guard on the Proxy so
    `await` doesn't probe `.then` and trigger a ghost RPC.
  - `DEFAULT_SUB_PREFIX` export for documentation / testing.

- Public exports from `agents` package: `routeSubAgentRequest`,
  `getSubAgentByName`, `parseSubAgentPath`, `DEFAULT_SUB_PREFIX`,
  `SubAgentPathMatch`.

## Spike refactor

The spike parent no longer hand-rolls `/sub/` detection or the
per-call bridge — both now live on the base Agent. `SpikeSubParent`
becomes a thin agent with an `onBeforeSubAgent` override that
counts invocations, which lets the "parent is out of the hot path
post-upgrade" invariant stay pinned against the production code.

The spike test was using CamelCase class segments in URLs; updated
to kebab-case to match the production URL convention (which is
what `useAgent` / `routePartykitRequest` have always used).

## New test file: `sub-agent-routing.test.ts`

Production-path coverage via the `TestSubAgentParent` /
`CounterSubAgent` pair plus a new `HookingSubAgentParent` that
exercises `onBeforeSubAgent` return variants. 15 tests covering:

- `getSubAgentByName` RPC round-trip
- Registry reflects spawns via the real bridge
- `.fetch()` rejected with a helpful error message
- `await getSubAgentByName(...)` doesn't trigger the thenable probe
- `parseSubAgentPath` — default match, trailing path, URL-decoded
  names, unknown-class → null, missing `/sub/` → null, truncated
  match → null
- `onBeforeSubAgent` variants: void (pass through), Response
  (short-circuit 404, 401 with WWW-Authenticate), strict-registry
  pattern using `hasSubAgent`
- `routeSubAgentRequest` from a custom fetch handler honors the
  hook end-to-end

Full agents test suite green (spike 9 + sub-agent 43 + sub-agent-
routing 15 = 67 directly exercised by this branch, plus existing
coverage).

Made-with: Cursor
React client support for the sub-agent routing primitive. The hook
gains a flat `sub` array, builds the nested URL, surfaces the full
`.path` on the returned object, and keys its cache on the chain so
nested sessions with the same leaf name don't collide.

## UseAgentOptions

- `sub?: ReadonlyArray<{ agent, name }>` — flat root-first chain,
  matching the server-side URL shape. Optional; when unset the
  hook behaves exactly as before.
- `subPrefix?: string` — defaults to `"sub"`, matches the server's
  URL parser.

## Returned hook object

- `.agent` / `.name` — the **leaf** identity (the deepest sub
  entry). Downstream hooks like `useAgentChat(agent)` see the
  child they actually talk to, unchanged.
- `.path` — root-first chain including the leaf. Exposed as an
  array for observability, reconnect keying, and UI. Single entry
  `[{ agent, name }]` when `sub` isn't set (just the top-level
  address).
- Return-type uses `Omit<PartySocket, "path">` to shadow
  PartySocket's own string `.path` property.

## URL construction

`buildSubPath(subChain, subPrefix, extraPath)` assembles the
`/sub/{agent-kebab}/{name}/...` tail from the chain and merges it
with any user-provided `path`. Name segments are
`encodeURIComponent`-encoded, matching the server parser's decode.
Class segments are kebab-cased.

## Cache key

`createCacheKey` is extended to include the sub-chain. To avoid
invalidating existing caches for non-sub consumers, an empty
chain produces the same key as the old 3-arg shape. The helper
keeps backwards compatibility — old 3-arg callers (including the
`_testUtils` surface and the existing react-tests) still work.

## Not in this phase

Retry hardening on 4xx / terminal WS close codes — the spec
change the RFC calls for. This needs to hook into partysocket's
reconnection loop, which lives in a separate package. Deferring
to a follow-up (tracked in the RFC follow-ups table) so this
phase stays focused on the agents-package surface.

## Tests

Full agents test suite (71 test files) green, including the
existing react-tests that exercise `createCacheKey` via
`_testUtils`.

Made-with: Cursor
Updates both RFCs now that the routing implementation (phases 1–3)
is on this branch.

## rfc-sub-agent-routing.md

- Migration section rewritten: explicit about which downstream
  consumers migrate where. The Think multi-session RFC builds on
  top; `examples/multi-ai-chat` lives on #1353 and will rebase
  there; user-facing docs are a follow-up.

## rfc-think-multi-session.md

- Adds a "Related" link to the routing RFC, marking it as landed.
- Replaces the planned `parentAgent<T>()` helper with the real
  pattern using the now-shipped `this.parentPath` + a DO namespace
  lookup. Strictly more flexible (grandparents work too) and
  consistent with the example code.
- `useChats()` is now explicitly a thin wrapper over the landed
  `useAgent({ sub: [...] })` primitive instead of a nested
  useAgent-in-useAgent dance.
- The migration section removes the landed items (registry,
  parentPath, useAgent sub) and lists what still needs building
  (Chats class, RemoteContextProvider/SearchProvider, useChats,
  examples/chats, docs updates).
- Summary section reflects the primitive is no longer hypothetical.

No implementation changes in this commit — docs tracking reality.

Made-with: Cursor
Lists every public API the routing implementation adds: the four
top-level functions (`routeSubAgentRequest`, `getSubAgentByName`,
`parseSubAgentPath`, `DEFAULT_SUB_PREFIX`), the `onBeforeSubAgent`
middleware hook, `parentPath`/`selfPath` ancestor chain,
`hasSubAgent`/`listSubAgents` introspection, and the `sub` /
`subPrefix` options on `useAgent`.

All additive — no breaking changes to existing consumers.

Made-with: Cursor
@threepointone threepointone changed the title sub-agent routing: spike + RFC sub-agent routing: RFC + implementation (phases 1–3) Apr 21, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

Round of code-review fixes before merging the sub-agent routing
primitive:

- Rename `{ class, name }` → `{ className, name }` everywhere it's
  user-facing (`onBeforeSubAgent`, `parentPath`, `selfPath`,
  `listSubAgents` rows). Destructuring the hook no longer requires the
  `{ class: cls }` keyword dance. SQL columns stay `class` internally.
- Drop `subPrefix` customization: the `/sub/` separator is hardcoded
  across server, client, and helpers. Rename `DEFAULT_SUB_PREFIX` →
  `SUB_PREFIX` (kept public for symbolic URL building).
- Make the Agent base's `fetch` dispatch use a static
  `import { parseSubAgentPath }` instead of a dynamic `await import(...)`.
- Reject sub-agent class literally named `Sub` at spawn time with a
  clear error (the `/sub/` URL separator is reserved).
- Scrub 404/400 response bodies on the routing path to terse
  `"Not Found"` / `"Bad Request"` strings; keep the real error in
  worker logs via `console.error`.
- Make `deleteSubAgent` idempotent. `ctx.facets.delete` throws on
  missing keys — swallow that so double-delete and
  delete-never-spawned both succeed silently.
- Validate `parentPath` shape on restore from storage (defensive
  against corrupted / legacy records).
- Overload `hasSubAgent` / `listSubAgents` to accept either a class
  constructor or a CamelCase name string.
- Add a JSDoc callout about preserving `Upgrade` / `Sec-WebSocket-*`
  headers when mutating requests in `onBeforeSubAgent`.

Tests added:
- class literally named `Sub` is rejected at spawn
- `hasSubAgent` / `listSubAgents` accept both class ref and string
- `deleteSubAgent` never-spawned and double-delete both silent
- scrubbed `"Bad Request"` body for null-char child names

RFC and changeset updated to match the final API shape.

Made-with: Cursor
devin-ai-integration[bot]

This comment was marked as resolved.

Two independent bugs reported in PR review:

**1. `parseSubAgentPath` mis-matched on earlier `sub` segments.**

`parts.indexOf(SUB_PREFIX)` returned the first occurrence of `sub`
among path segments, not the one that actually marks a parent↔child
boundary. When the parent's instance name was literally `"sub"`, or
when the `basePath` contained a `sub` segment (e.g. `/subscriptions/`),
the parser picked the wrong position, `resolveClassName` typically
null-ed out on the next segment, and `parseSubAgentPath` returned
null. The parent's fetch override then fell through to
`super.fetch()` silently — sub-agent routing was broken for that
shape of URL with no error surfaced.

Fix: walk every occurrence of the `sub` segment and return the
first position where `parts[i+1]` resolves to a valid class. With
`knownClasses` supplied (the parent's fetch path), this pins the
real boundary. Existing matches and recursive nesting are
unchanged.

Regression tests:
- parent instance named literally `"sub"`
- `basePath` segment containing `sub` (e.g. `/subscriptions/...`)
- nested `/sub/A/a/sub/B/b` — outer parse returns first hop only

**2. `useAgent` client dropped sub-agent path segments.**

The destructure at the top of `useAgent` extracted `query`,
`queryDeps`, `cacheTtl`, and `sub`, but NOT `path`. So the user's
raw `path` stayed in `restOptions`. The socket options then set
`path: combinedPath` (which includes `/sub/{child}/{name}/...` +
user path) and spread `...restOptions` afterwards — the later
`restOptions.path` overwrote `combinedPath`. Result: when a caller
passed both `sub` and `path`, every `/sub/...` segment was silently
dropped from the WebSocket URL.

Fix: destructure `path` into `userPath` so it's excluded from
`restOptions`. `buildSubPath` already composes sub-chain + user
path correctly at the `combinedPath` level.

Regression tests (in `useAgent.test.tsx`):
- `sub` + `path` → URL is
  `/agents/{parent}/{name}/sub/{child}/{name}/{user-path}`
- `sub` alone → URL ends at
  `/agents/{parent}/{name}/sub/{child}/{name}`

Made-with: Cursor
devin-ai-integration[bot]

This comment was marked as resolved.

Two follow-on review items.

**1. Comprehensive JS-internal guards on getSubAgentByName Proxy.**

The Proxy only guarded `then` and `fetch`. The existing
`createStubProxy` in client.ts guards a much larger set (`toJSON`,
`catch`, `finally`, `valueOf`, `toString`, `constructor`,
`prototype`, `$$typeof`, `@@toStringTag`, `asymmetricMatch`,
`nodeType`, plus all symbol keys). Without those guards, routine JS
operations — `JSON.stringify(stub)`, `console.log(stub)`, Vitest
matcher duck-typing — fire bogus RPC calls that fail on the child
with "Method not found".

Extract the shared guard list into `utils.ts` as
`INTERNAL_JS_STUB_PROPS` + `isInternalJsStubProp()`. Use it in both
`createStubProxy` (replacing the inline list) and
`getSubAgentByName` (replacing the `then`-only check, keeping the
`fetch` error path).

**2. Reserved-name check uses kebab-case comparison, not literal "Sub".**

`className === "Sub"` only catches the titlecase spelling. But
`camelCaseToKebabCase` has an all-uppercase special branch: `"SUB"`
lowercases to `"sub"`. `"Sub_"` also kebab-cases to `"sub"`
(trailing-dash is stripped). All three spellings collide with the
`/sub/` URL separator and must be rejected uniformly.

Fix: compare against `camelCaseToKebabCase(className) === SUB_PREFIX`.
Error message quotes the offending class name verbatim and points at
the URL form.

Regression tests:
- Proxy returns `undefined` for each guarded property (`toJSON`,
  `then`, `catch`, `finally`, `valueOf`, `toString`, `constructor`,
  `prototype`, `$$typeof`, `@@toStringTag`, `asymmetricMatch`,
  `nodeType`); `JSON.stringify(stub) === "{}"`; real RPC methods
  still dispatch after internal probes.
- Rejects `SUB` at spawn (all-uppercase branch).
- Rejects `Sub_` at spawn (trailing-underscore stripped by kebab).

Made-with: Cursor
devin-ai-integration[bot]

This comment was marked as resolved.

`routeSubAgentRequest({ fromPath })` constructed the forwarded URL
via `new URL(fromPath, req.url).toString()`. When `fromPath` is an
absolute path (which it typically is — callers extract it from
`url.pathname.match(...)`), `new URL(absolutePath, base)` discards
the base's search and hash. Every client query param — auth tokens,
feature flags, PartySocket's `_pk=...` handshake key — silently
vanished between the custom handler and the parent DO.

The default path (`fromPath` omitted) is unaffected: `req` is
forwarded as-is and the parent's fetch override parses the URL
intact. The bug was specific to the `fromPath` branch — an
inconsistency with the rest of the routing layer (`_cf_forwardToFacet`
mutates only `pathname` and preserves search).

Fix: extract a local `rewritePathname(url, fromPath)` that mutates
the original URL's `pathname` (and, if `fromPath` carries its own
`?query`, its `search`), matching `_cf_forwardToFacet`'s semantics.

Regression tests:
- `/custom-sub/...?token=xyz&flag=1` → parent observes both params
  in the URL surfaced to `onBeforeSubAgent`.
- `fromPath` with an explicit `?overridden=yes` wins over the
  original request's `?original=keep` (standard URL-rewrite
  semantics).

The `HookingSubAgentParent` test fixture grew a `last_url` table
plus a `lastObservedUrl()` RPC so tests can inspect what the hook
saw.

Made-with: Cursor
The `isolates per-child state across different child names` spike
test has a latent race that surfaces in batch runs:

    wsA.send("from-a");
    wsB.send("from-b");
    const [replyA] = await collectMessages(wsA, 1);  // attaches A here
    const [replyB] = await collectMessages(wsB, 1);  // attaches B here — too late

`collectMessages` installs its `addEventListener` inside the Promise
executor (synchronous at call time). The single-socket pattern works
by accident because both the send and the subsequent
`collectMessages(...)` call happen before the next event-loop tick,
so the listener is in place before workerd delivers the reply.

With two sockets, awaiting wsA's reply yields. During that yield,
the server can deliver wsB's reply — but wsB has no listener yet,
so the message is dropped. When `collectMessages(wsB, 1)` finally
attaches its listener, the reply is long gone, and the test hits
the 2000ms timeout with `replyB === undefined`.

Fix: construct both `collectMessages(...)` promises (which
synchronously register their listeners) BEFORE calling
`ws.send(...)`. This is the correct shape for any "send on N
sockets, observe replies" pattern — `addEventListener` is not
replay-safe.

Confirmed fix: 3x consecutive batch runs, 9/9 spike tests pass;
71/71 across the three sub-agent test files in one run.

Made-with: Cursor
The RFC claimed "The /sub/{class}/{name} segment is stripped before
the hook sees the request", but the implementation passes the
original un-stripped request through to `onBeforeSubAgent` — which
is actually the right call (it mirrors how partyserver's
`onBeforeConnect` / `onBeforeRequest` receive the un-stripped
request). The stripping happens downstream in `_cf_forwardToFacet`,
not before the hook.

A second (related) inaccuracy: the RFC said "If the hook returns a
modified Request, that's the URL the child sees". In reality, the
framework always overrides `pathname` to `match.remainingPath`
(computed at parse time) when forwarding to the facet — the hook's
headers, body, method, and query string do flow through, but
pathname is fixed. The route decision is frozen at parse time, so
URL-rewriting in the hook can't redirect to a different facet.

Fix the RFC text and add the same clarification to the JSDoc on
`onBeforeSubAgent` so users reaching for URL rewriting know to
customize via headers/body instead.

No behavior change. The existing query-param test
(`preserves original request query params when fromPath is used`)
already pins the "hook sees the un-stripped URL" contract by
asserting that the observed pathname is
`/sub/counter-sub-agent/{name}/anything`.

Made-with: Cursor
Everything landed in this PR is marked @experimental and no
published API shape changes — patch-level is the accurate
semver bump.

Made-with: Cursor
@threepointone threepointone merged commit df2023f into main Apr 22, 2026
2 checks passed
@threepointone threepointone deleted the spike-subagent-routing branch April 22, 2026 04:40
@github-actions github-actions Bot mentioned this pull request Apr 21, 2026
threepointone added a commit that referenced this pull request Apr 22, 2026
Rebuilds the example on top of the sub-agent routing primitive that
landed in #1355. The original commit on this branch was written
before that primitive existed and used two top-level DO bindings
(`Inbox` + `Chat`) with direct namespace RPC between them. Now that
the routing primitive is merged, the example can — and should —
demonstrate it.

## Server (`src/server.ts`)

- `Chat` becomes a **facet** of `Inbox`. No top-level binding; no
  namespace lookup for the child. `Inbox.createChat` calls
  `this.subAgent(Chat, id)` to spawn the facet and register it in
  the parent's sub-agent registry. `deleteChat` calls
  `this.deleteSubAgent(Chat, id)`.
- `Inbox.onBeforeSubAgent` implements a strict-registry gate using
  `hasSubAgent`. A chat becomes reachable only after `createChat`
  has spawned it; unknown ids get a 404 before any facet is woken.
- `Chat` reaches its parent via `this.parentPath[0]` — the root-first
  ancestor chain the framework populates at facet-init time. No
  hardcoded user id inside the chat.
- Worker entry collapses to a one-line `routeAgentRequest` call:
  `/agents/inbox/{user}/sub/chat/{chatId}` is handled natively.

## Client (`src/client.tsx`)

- `ActiveChat` connects via
  `useAgent({ agent: "Inbox", name: DEMO_USER, sub: [{ agent: "Chat", name: chatId }] })`
  — the hook builds the nested `/sub/chat/{chatId}` URL; everything
  downstream (identity, state sync, `useAgentChat`) works unchanged.
  The sidebar connection stays as a plain `useAgent({ agent: "Inbox", ... })`.

## Config

- `wrangler.jsonc` drops the `Chat` top-level binding but keeps
  `Chat` in `new_sqlite_classes` so the runtime can still construct
  it as a facet.
- `env.d.ts` drops the `Chat: DurableObjectNamespace<...>` entry for
  the same reason.

## Docs

- README rewritten to describe the actual mechanics (URLs, hook
  gate, parentPath) rather than a forward-looking "Chats pattern
  sketch". Adds a link to the now-landed sub-agent routing RFC.
- Changeset updated to note the example exercises the routing
  primitive end-to-end.

The `Chats` base class from `rfc-think-multi-session.md` will
collapse `Inbox`'s chat bookkeeping (create / delete / list /
`onBeforeSubAgent` gate) into framework defaults. When that lands,
this example's `Inbox` becomes ~10 lines.

Made-with: Cursor
threepointone added a commit that referenced this pull request Apr 22, 2026
**User-visible bug**: In `examples/multi-ai-chat`, the assistant's
streaming reply didn't appear in the chat UI until the user
refreshed the page. The sidebar "last message preview" updated in
real time (it goes through `recordChatTurn` RPC to the parent
Inbox), but the streaming chunks never reached the browser over the
WebSocket. On refresh, `/get-messages` fetched the persisted turn
from the facet's SQLite and it showed up — so data was being
written; only live broadcast was silent.

**Root cause**: two guards in `Agent` — an early-return in
`_broadcastProtocol` and an override on `broadcast` itself — that
no-op'd whenever `_isFacet` was true. The comments explained the
concern:

> Facets share the parent DO's WebSocket registry: getConnections()
> returns parent-owned sockets, so iterating from a facet throws
> "Cannot perform I/O on behalf of a different Durable Object".
> Sub-agents are RPC-only and have no WS clients of their own.

That was accurate for the pre-routing world where facets existed
only as RPC targets reachable by the parent. Sub-agent routing
(#1355) changed the model: clients now connect directly to facets
via `/agents/{parent}/{name}/sub/{class}/{name}`, and those
WebSockets are upgraded on — and owned by — the facet's isolate.
`getConnections()` inside the facet returns the facet's own
sockets. The "cross-DO I/O" concern no longer applies.

The consequence was that every `this.broadcast(...)` call on a
facet silently did nothing. That includes:

- `AIChatAgent._broadcastChatMessage` — streaming chunks to the
  client during a chat turn. **This is the one that broke the
  demo.**
- `setState()` → `_broadcastProtocol` → `CF_AGENT_STATE` — state
  sync to connected clients from a facet.
- `broadcastMcpServers` — MCP server updates.
- Any user-defined broadcast from subclass code.

**Fix**: remove both guards. `this.broadcast(...)` and
`this._broadcastProtocol(...)` now iterate the facet's own
connections — same behavior as a top-level DO.

Regression test (spike suite): a facet is connected to directly,
then invokes `this.broadcast(...)` from `onMessage`. The client
receives the broadcast. Before this fix the broadcast was
silently dropped; now it round-trips.

Other `_isFacet` guards are unchanged:
- `schedule()` / `cancelSchedule()` / `keepAlive()` still special-case
  facets — workerd doesn't support alarms on SQLite-backed facets
  today. The previous commit documents `keepAlive`'s soft-no-op
  semantics.
- `destroy()`'s `deleteAlarm` skip for facets stays (facets never
  set alarms, so there's nothing to clear).

Fixes the "chat UI doesn't update until refresh" symptom in
`examples/multi-ai-chat`.

Made-with: Cursor
threepointone added a commit that referenced this pull request Apr 22, 2026
…uting primitive (#1353)

* ai-chat: align with think + maintenance RFC + multi-ai-chat example

Mechanical alignments between `@cloudflare/ai-chat` and `@cloudflare/think`,
paired with a stance RFC and a reference example for multi-session chat.

## Code changes

- `AIChatAgent` gains a `Props` generic to match the Think change we
  just shipped: `AIChatAgent<Env, State, Props>` extending
  `Agent<Env, State, Props>`. `this.ctx.props` is typed now.
- `ChatResponseResult`, `ChatRecoveryContext`, `ChatRecoveryOptions`,
  `SaveMessagesResult`, and `MessageConcurrency` move into
  `agents/chat/lifecycle.ts`. Both `@cloudflare/ai-chat` and
  `@cloudflare/think` import from `agents/chat` and re-export. No
  behavior change; one place to edit when the shapes evolve.
- `AIChatAgent` drops the `UIMessage as ChatMessage` import alias and
  uses `UIMessage` everywhere. The `ChatMessage` type is no longer
  exported from `@cloudflare/ai-chat`. Internal `message-reconciler`
  also drops its local alias.
- `AIChatAgent.messages` becomes a getter over a protected
  `_messages` backing field. Prevents `this.messages = [...]`
  reassignment from subclasses. The returned array type stays mutable
  for AI SDK compat (`convertToModelMessages(this.messages)` works
  unchanged); signatures on the `reconcileMessages` helpers and the
  `OutgoingMessage` wire type accept `readonly UIMessage[]` where
  they only read.

## Docs

- `design/rfc-ai-chat-maintenance.md` captures the stance:
  `AIChatAgent` stays first-class and fully supported while `Think`
  stabilizes. New features land in `agents/chat` where both benefit.
  Deferred structural work (hoisting protocol handling, promoting
  `agents/chat` to a public toolkit, `onChatMessage` signature
  revision) is listed with rationale.

## Example

- `examples/multi-ai-chat/` — a hand-rolled preview of the `Chats`
  pattern from `rfc-think-multi-session.md`, using `AIChatAgent`
  children. An `Inbox` parent DO owns the chat list + per-user
  shared memory; per-chat `AIChatAgent` DOs run in parallel. Client
  wires up via `useAgent` + `useAgentChat` directly, so when the
  `Chats` base class lands, the migration is ~10 lines.

Made-with: Cursor

* example: evolve multi-ai-chat onto the sub-agent routing primitive

Rebuilds the example on top of the sub-agent routing primitive that
landed in #1355. The original commit on this branch was written
before that primitive existed and used two top-level DO bindings
(`Inbox` + `Chat`) with direct namespace RPC between them. Now that
the routing primitive is merged, the example can — and should —
demonstrate it.

## Server (`src/server.ts`)

- `Chat` becomes a **facet** of `Inbox`. No top-level binding; no
  namespace lookup for the child. `Inbox.createChat` calls
  `this.subAgent(Chat, id)` to spawn the facet and register it in
  the parent's sub-agent registry. `deleteChat` calls
  `this.deleteSubAgent(Chat, id)`.
- `Inbox.onBeforeSubAgent` implements a strict-registry gate using
  `hasSubAgent`. A chat becomes reachable only after `createChat`
  has spawned it; unknown ids get a 404 before any facet is woken.
- `Chat` reaches its parent via `this.parentPath[0]` — the root-first
  ancestor chain the framework populates at facet-init time. No
  hardcoded user id inside the chat.
- Worker entry collapses to a one-line `routeAgentRequest` call:
  `/agents/inbox/{user}/sub/chat/{chatId}` is handled natively.

## Client (`src/client.tsx`)

- `ActiveChat` connects via
  `useAgent({ agent: "Inbox", name: DEMO_USER, sub: [{ agent: "Chat", name: chatId }] })`
  — the hook builds the nested `/sub/chat/{chatId}` URL; everything
  downstream (identity, state sync, `useAgentChat`) works unchanged.
  The sidebar connection stays as a plain `useAgent({ agent: "Inbox", ... })`.

## Config

- `wrangler.jsonc` drops the `Chat` top-level binding but keeps
  `Chat` in `new_sqlite_classes` so the runtime can still construct
  it as a facet.
- `env.d.ts` drops the `Chat: DurableObjectNamespace<...>` entry for
  the same reason.

## Docs

- README rewritten to describe the actual mechanics (URLs, hook
  gate, parentPath) rather than a forward-looking "Chats pattern
  sketch". Adds a link to the now-landed sub-agent routing RFC.
- Changeset updated to note the example exercises the routing
  primitive end-to-end.

The `Chats` base class from `rfc-think-multi-session.md` will
collapse `Inbox`'s chat bookkeeping (create / delete / list /
`onBeforeSubAgent` gate) into framework defaults. When that lands,
this example's `Inbox` becomes ~10 lines.

Made-with: Cursor

* agents: make keepAlive()/identity-warning facet-safe

Two regressions surfaced by running the multi-ai-chat example:

**1. `keepAlive()` threw inside a facet, breaking streaming chats.**

`AIChatAgent._reply` wraps the streaming turn in `keepAliveWhile(...)`
to guarantee the DO finishes committing the final message even if
the client disconnects mid-stream. That path crashed every turn
inside a Chat facet with:

    Error: keepAlive() is not supported in sub-agents.

The original guard assumed "facets delegate lifecycle to the parent"
but that left a real hole: a facet's `_reply` can't just give up
keepalive bookkeeping because the parent doesn't know about it.

workerd doesn't support independent alarms on facets yet ("alarms
are not yet implemented for SQLite-backed Durable Objects" when you
try), so the fix can't be "add an alarm on the facet". Instead,
make `keepAlive()` a **soft no-op** in facets: return an inert
disposer, don't throw. Facets piggyback on the parent isolate —
active Promise chains, WebSockets, and the parent's own alarm all
keep the shared isolate alive; the defensive keepalive is redundant
in that context. Documented in the JSDoc with a pointer at
"call `keepAlive()` on the parent via RPC if you really need it".

**2. `sendIdentityOnConnect` mis-warned for facet instances.**

The warning fires when the instance name isn't visible in the
URL — but it checks the request URL the DO itself sees, which for
a facet has been rewritten by `_cf_forwardToFacet` to strip
`/sub/{class}/{name}`. The CLIENT always put the name in the URL
(that's literally how sub-agent routing works). Suppress the
warning for facets; the concern doesn't apply.

Tests:
- `keepAlive() works inside a sub-agent` (no throw, returns a
  working disposer)
- `keepAliveWhile() runs to completion inside a sub-agent` — same
  call shape as AIChatAgent._reply, pins the multi-ai-chat
  regression
- The old "keepAlive throws in facets" assertion is flipped to
  assert it succeeds.

Made-with: Cursor

* agents: let facets broadcast to their own WebSocket clients

**User-visible bug**: In `examples/multi-ai-chat`, the assistant's
streaming reply didn't appear in the chat UI until the user
refreshed the page. The sidebar "last message preview" updated in
real time (it goes through `recordChatTurn` RPC to the parent
Inbox), but the streaming chunks never reached the browser over the
WebSocket. On refresh, `/get-messages` fetched the persisted turn
from the facet's SQLite and it showed up — so data was being
written; only live broadcast was silent.

**Root cause**: two guards in `Agent` — an early-return in
`_broadcastProtocol` and an override on `broadcast` itself — that
no-op'd whenever `_isFacet` was true. The comments explained the
concern:

> Facets share the parent DO's WebSocket registry: getConnections()
> returns parent-owned sockets, so iterating from a facet throws
> "Cannot perform I/O on behalf of a different Durable Object".
> Sub-agents are RPC-only and have no WS clients of their own.

That was accurate for the pre-routing world where facets existed
only as RPC targets reachable by the parent. Sub-agent routing
(#1355) changed the model: clients now connect directly to facets
via `/agents/{parent}/{name}/sub/{class}/{name}`, and those
WebSockets are upgraded on — and owned by — the facet's isolate.
`getConnections()` inside the facet returns the facet's own
sockets. The "cross-DO I/O" concern no longer applies.

The consequence was that every `this.broadcast(...)` call on a
facet silently did nothing. That includes:

- `AIChatAgent._broadcastChatMessage` — streaming chunks to the
  client during a chat turn. **This is the one that broke the
  demo.**
- `setState()` → `_broadcastProtocol` → `CF_AGENT_STATE` — state
  sync to connected clients from a facet.
- `broadcastMcpServers` — MCP server updates.
- Any user-defined broadcast from subclass code.

**Fix**: remove both guards. `this.broadcast(...)` and
`this._broadcastProtocol(...)` now iterate the facet's own
connections — same behavior as a top-level DO.

Regression test (spike suite): a facet is connected to directly,
then invokes `this.broadcast(...)` from `onMessage`. The client
receives the broadcast. Before this fix the broadcast was
silently dropped; now it round-trips.

Other `_isFacet` guards are unchanged:
- `schedule()` / `cancelSchedule()` / `keepAlive()` still special-case
  facets — workerd doesn't support alarms on SQLite-backed facets
  today. The previous commit documents `keepAlive`'s soft-no-op
  semantics.
- `destroy()`'s `deleteAlarm` skip for facets stays (facets never
  set alarms, so there's nothing to clear).

Fixes the "chat UI doesn't update until refresh" symptom in
`examples/multi-ai-chat`.

Made-with: Cursor

* example(multi-ai-chat): render reasoning + tool parts, add shared-memory tools

Three small tools on the `Chat` agent to make the demo actually
agentic:

- `rememberFact(fact)` — persists a fact to the parent Inbox's
  shared memory (`inbox.setSharedMemory`). Every sibling chat
  picks up the fact on the next turn. Demonstrates cross-DO RPC
  from inside a tool `execute` that runs in a facet.
- `recallMemory()` — reads the current shared memory.
- `getCurrentTime()` — returns the server's ISO time. Included
  mostly to give the model a tool to pick when the user just
  wants small talk about the clock.

The model now runs in a multi-step agentic loop (`stopWhen:
stepCountIs(5)`) so it can call a tool, observe the output, and
respond in the same turn.

Client rendering overhaul:

- Drop the "join all text parts into one string" renderer.
- Render `UIMessage.parts` in order: text → bubble, `reasoning` →
  dimmed "Thinking" block, tool parts → panel with state badge
  (Running/Done/Error), input JSON, output JSON, and errorText.
- Streaming cursor only appears on the trailing text part of the
  last assistant message.
- Ignore `step-start`, `source-*`, `file` — the `examples/ai-chat`
  has a fuller treatment if needed.

README points people at things to try:
_"Remember I prefer TypeScript"_ exercises `rememberFact`, and
_"What time is it?"_ exercises `getCurrentTime`. Saving memory
via the sidebar still works for the no-tool-call case.

Made-with: Cursor

* agents: parentAgent() helper + multi-ai-chat review polish

Five small follow-ups from a self-review pass on the PR. All tests
pass (1325/1325 in agents); all 75 projects typecheck.

**1. `parentAgent<T>(namespace)` on the Agent base class.**

Every facet-based app was about to hand-roll a `getParent()` helper
that reads `this.parentPath[0]` and opens a stub via `getAgentByName`.
Codify it on the base class — pass the parent's namespace binding,
get back a typed `DurableObjectStub<T>` with the right instance
name resolved for you:

    class Chat extends AIChatAgent<Env> {
      private getInbox() {
        return this.parentAgent(this.env.Inbox);
      }
    }

Throws a clear error when called on a top-level (non-facet) agent.

Tests: `resolves the parent stub from within a facet`, `throws a
clear error when called on a non-facet`.

**2. `examples/multi-ai-chat`: `listSubAgents(Chat)` as the source of truth.**

Previously the example maintained a parallel `inbox_chats` table
alongside `cf_agents_sub_agents` — both tracked "this chat exists",
and a crash between the two writes could leave them out of sync.

Now: the sub-agent registry is authoritative for existence, and a
thin `chat_meta` table holds app-owned decoration (title, preview,
updated_at). `_refreshState` joins `listSubAgents(Chat)` against
`chat_meta` to build the sidebar. A chat with a missing meta row
just gets a default title.

**3. Drop the redundant `className !== "Chat"` check in `onBeforeSubAgent`.**

`Agent.fetch` filters URLs via `knownClasses: Object.keys(ctx.exports)`
before the hook runs, so by the time `onBeforeSubAgent` fires the
class is guaranteed to be in exports. The subsequent `hasSubAgent`
check acts as the real gate.

**4. `Chat.getInbox()` now delegates to `this.parentAgent(...)`.**

Two hardcoded ancestor-shape assertions collapse into the framework
helper.

**5. Client `AnyToolPart` type cleanup.**

Drop the hand-rolled intersection type. `ToolPart` now takes
`Parameters<typeof getToolName>[0]` — the same narrowed union
`isToolUIPart` returns — and reads optional fields via `"x" in part`
checks instead of re-widening. Type-safe with no casts.

**6. Trim `keepAlive()` docstring.**

The previous text pointed users at `getAgentByName(parent).keepAlive()`
as an escape hatch. In practice nobody needs it — the soft no-op is
sufficient because facets share the parent's isolate and the active
Promise chain plus open WebSockets already keep the machine alive
for the duration of real work.

Made-with: Cursor

* agents: parentAgent(Cls) — class-ref API with runtime safety

The `parentAgent(namespace)` signature from the previous commit had
a silent-corruption footgun: passing the wrong binding resolved a
stub for a different DO against the recorded parent name. If the
target class happened to share method names with the recorded
parent, calls would succeed silently against the wrong data.

Change the API to take a class reference (symmetric with
`subAgent(Cls, name)` on the parent side), plus two runtime
guards:

1. `cls.name === parentPath[0].className` — catches the wrong-class
   mistake directly. Error names both the passed and the recorded
   class so the diagnostic is actionable.
2. `env[cls.name]` exists — catches the "binding name ≠ class name"
   case with a suggestion to use `getAgentByName(env.X, this.parentPath[0].name)`
   directly.

Usage collapses from

    await this.parentAgent(this.env.Inbox as DurableObjectNamespace<Inbox>)

to

    await this.parentAgent(Inbox)

Symmetric with `this.subAgent(Chat, id)`.

JSDoc now also documents how to reach grandparents (iterate
`this.parentPath`; there's no framework helper for further
ancestors — the one-hop case is 95% of usage).

Example `multi-ai-chat`:
- `Chat.getInbox()` uses the new form: `this.parentAgent(Inbox)`.
- `Inbox.onBeforeSubAgent` now returns a class-agnostic
  `"${className} "${name}" not found"` body (previously said
  "Chat not found" for anything, stale after we dropped the
  className-equality guard).

Tests:
- Existing `resolves the parent stub from within a facet` test now
  exercises the class-ref form (casts dropped).
- New `throws when the passed class doesn't match the recorded
  parent class` test verifies the class-mismatch guard. Asserts
  both class names appear in the error body.

Made-with: Cursor

* ai-chat: restore ChatMessage/messages compat + align docs/RFCs

Implements the review decisions directly:

1. **`messages` stays a public field.**
   Revert the getter + `_messages` backing field experiment in
   `AIChatAgent`. The compatibility cost was real, the benefit was
   thin, and existing subclasses may legitimately assign
   `this.messages = [...]` or mutate it directly. Internals now write
   `this.messages` again.

2. **`ChatMessage` stays exported.**
   Internally the codebase still standardizes on `UIMessage`, but the
   package now keeps `export type ChatMessage = UIMessage` so existing
   user imports from `@cloudflare/ai-chat` do not break.

3. **Docs / README / changeset sweep.**
   - `packages/ai-chat/README.md`
     - API header updated to `AIChatAgent<Env, State, Props>`
     - `messages` described as public + mutable for compatibility
     - exports table includes `ChatMessage`
   - `docs/chat-agents.md`
     - `ChatRecoveryContext.messages` → `UIMessage[]`
     - stale `this.messages = []` example → `await this.saveMessages([])`
   - top-level `README.md`
     - adds Sub-agents feature row
     - includes `examples/multi-ai-chat` in the examples tour
   - `packages/agents/README.md`
     - adds a new Sub-agents section (`subAgent`, `onBeforeSubAgent`,
       `useAgent({ sub })`, `parentAgent`)
   - `packages/agents/AGENTS.md`
     - refreshes the source layout (`sub-routing.ts`, `chat/`)
     - adds `agents/chat` export, but explicitly frames it as a
       sibling-package support layer rather than a broad user-facing
       surface
     - updates the stale `src/index.ts` line count and test-suite list
   - `design/AGENTS.md`
     - adds missing entries for `rfc-think-multi-session.md` and the
       AIChatAgent stance RFC
   - `.changeset/ai-chat-cleanups.md`
     - reflects the actual compatibility decisions (`ChatMessage`
       kept, `messages` stays mutable, `parentAgent(Inbox)` in the
       example)

4. **Rewrite the AIChatAgent RFC around the real stance.**
   `design/rfc-ai-chat-maintenance.md` is now:
   - retitled to remove "maintenance"
   - marked `Status: accepted`
   - explicit that `AIChatAgent` is first-class, production-ready,
     and continuing to get features
   - corrected to say `messages` stays mutable and `ChatMessage`
     stays exported
   - reframed `agents/chat` as primarily a sibling-package shared
     toolkit today (published, versioned, but not yet over-marketed)

5. **Update RFCs for shipped reality.**
   - `rfc-think-multi-session.md` now reflects the shipped
     `parentAgent(Cls)` helper instead of the old generic / manual
     `parentPath` lookup text.
   - `rfc-sub-agent-routing.md` now reflects `className`,
     `parentAgent(Cls)`, current `listSubAgents` return shape, and the
     post-launch facet semantics (facet broadcasts, keepAlive no-op).

Checks:
- `npm run check` — all 75 projects typecheck successfully
- `packages/ai-chat` workers tests — 414/414 passing
- `packages/agents` workers tests — 1005/1005 passing (7 skipped)

Note: full workspace browser projects still require Playwright browsers
installed locally; they were not runnable in this environment.

Made-with: Cursor

* docs(ai-chat): use ChatMessage as the public type language

Normalizes the user-facing wording around AIChatAgent:

-  is the public message type name in docs / README /
  changeset / RFCs
-  is described simply as the public field users already
  know, without compatibility framing
- removes the lingering  /
  language from user-facing AIChat docs

This matches the actual public stance: we never shipped a breaking
change to , and we don't need to narrate the public API as
an apology for a change that never landed.

Also updates the chat API design doc so the analysis uses the same
public terminology () instead of oscillating between
 and .

Made-with: Cursor

* agents: fix parentAgent root-vs-direct + example polish

Four independent review fixes:

1. parentAgent root-vs-direct-parent bug (real, silent-corruption
   footgun). parentPath is root-first, so the direct parent is the
   LAST entry, not the first. The previous implementation did
   `const [parent] = this._parentPath` which destructures the first
   element — fine for one-level chains (Root -> Chat), but for any
   deeper chain (Root -> Outer -> Inner) `parentPath[0]` is the root
   and not the spawning parent. `parentAgent(Outer)` from Inner
   would then either throw a confusingly wrong class-match error,
   or — if the caller passed `Root` to silence the error — quietly
   resolve a stub to the wrong DO.

   Fix: use `this._parentPath[this._parentPath.length - 1]`. Update
   the JSDoc and the diagnostic error messages to reference
   `parentPath.at(-1)`. Regression test added: a doubly-nested
   Inner facet calling `parentAgent(TestSubAgentParent)` must throw
   with the real direct parent `OuterSubAgent` named in the error.

2. `_cf_initAsFacet` JSDoc claimed setting `_isFacet` early was
   needed so broadcasts would be suppressed during the first
   `onStart()`. That guard was removed in `e5827d54` ("let facets
   broadcast to their own WebSocket clients"). The note has been
   rewritten to reflect the actual remaining reason (schedule
   guards still branch on `_isFacet`, not broadcasts).

3. Example violated the "no `dark:` Tailwind variants" rule in
   `examples/AGENTS.md`. Replaced `bg-red-50 dark:bg-red-950/20` /
   `text-red-600 dark:text-red-400` with the Kumo semantic tokens
   (`bg-kumo-danger-tint`, `text-kumo-danger`).

4. Example was missing the required `public/favicon.ico`. Copied
   from `examples/assistant/public/favicon.ico`.

Also updated the server header comment in `examples/multi-ai-chat/src/server.ts`
and the `rfc-sub-agent-routing.md` note about "the last entry of
parentPath" so the public docs match the implementation.

Made-with: Cursor

* docs: add sub-agents reference + correct long-running guide

This fills the biggest documentation gap around sub-agents / facets:
there was no single user-facing page that explained the shipped
primitive end-to-end. Users had to piece it together from the routing
RFC, the Think-specific `chat()` docs, the long-running-agents guide,
and the multi-ai-chat example.

## New: `docs/sub-agents.md`

A dedicated user-facing reference page covering the primitive as it
works today:

- what a sub-agent / facet is
- when to use it vs a top-level DO
- `subAgent`, `deleteSubAgent`, `abortSubAgent`
- `onBeforeSubAgent`
- `hasSubAgent`, `listSubAgents`
- `parentPath`, `selfPath`, `parentAgent(Cls)`
- `useAgent({ sub: [...] })`
- `routeSubAgentRequest`, `getSubAgentByName`
- lifecycle / routing flow
- current limitations (no independent alarms on facets **yet**)
- link to the multi-ai-chat example

## Fix: `docs/long-running-agents.md`

The existing "Delegating to sub-agents" section said:

> Sub-agents are independent Durable Objects. They have their own
> state, their own schedules, and their own lifecycle.

That is not true today. Facets have their own state and lifecycle,
but *not* their own alarms. `schedule()` / `scheduleEvery()` are
unsupported on facets at the moment. The text now says so explicitly,
notes that support is coming soon, and points readers at the new
sub-agents page for the full routing / client / parent-lookup story.

## Navigation

- `docs/index.md` now links to `./sub-agents.md` under Core Concepts.
- `docs/think/sub-agents.md` now makes its scope explicit: it covers
  Think's `chat()` RPC method and programmatic turns, while the
  generic framework primitive lives in `../sub-agents.md`.

## Design docs

- Add `design/sub-agent-routing.md` as the living design doc for the
  shipped primitive (the RFC remains the historical decision record).
- Register it in `design/AGENTS.md` and `design/README.md`.
- Fix one confusing example in `design/rfc-sub-agent-routing.md`
  where the array order in the `parentPath` example contradicted the
  comment (`root -> direct parent`).

Made-with: Cursor

* Add favicon to multi-ai-chat example

Insert a <link rel="icon" href="/favicon.ico" /> tag into the head of examples/multi-ai-chat/index.html so the page displays the site favicon and improves UX.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant