Skip to content

refactor(phase8 #73 D8.1): backend AI SDK v5 stream emitter#1695

Merged
earayu merged 1 commit intomainfrom
phase8/d81-stream-emitter
Apr 25, 2026
Merged

refactor(phase8 #73 D8.1): backend AI SDK v5 stream emitter#1695
earayu merged 1 commit intomainfrom
phase8/d81-stream-emitter

Conversation

@earayu
Copy link
Copy Markdown
Collaborator

@earayu earayu commented Apr 25, 2026

Phase 8 task #73 — D8.1 backend AI SDK v5-compatible stream emitter

Closes task #73 (msg=6a5a8459) per PM scope-lock msg=82ba98fc + architect contract first-cut LGTM msg=27ca7cee.

Scope (PM-locked)

Backend wire emission only. Replace the legacy AgentTimelineEventEnvelope SSE wire with AI SDK v5 UI Message Stream Protocol parts (per docs/modularization/agent-message-protocol-design.md).

NOT touched (handed to other lanes):

Write set (5 files, +1571/-43)

Backend

  • aperag/domains/agent_runtime/wire/__init__.py (NEW) — public surface (StreamPart, StreamPartAdapter, TranslatorState, translate_envelope, dump_part, parse_part)
  • aperag/domains/agent_runtime/wire/parts.py (NEW, 411 LOC) — Pydantic models for AI SDK v5 stream parts
  • aperag/domains/agent_runtime/wire/translator.py (NEW, 420 LOC) — pure translate_envelope(envelope, state, *, safe_tool_name_resolver=None) -> list[StreamPart] + per-turn TranslatorState
  • aperag/domains/agent_runtime/api/routes.py — SSE route updated:
    • Header now includes x-vercel-ai-ui-message-stream: v1
    • _format_part_frame replaces legacy _format_sse (no event: field, single-line JSON data:)
    • Sequence semantics: Envelope-atomic replay (see §Sequence Convention below)
    • Generator wrapped in try/except → emits synthetic error part on uncaught exception

Tests

  • tests/unit_test/test_agent_runtime_wire_parts.py (NEW, 570 LOC) — 14 tests
  • 2 existing agent_runtime tests updated to v5 shape

Sequence Convention (canonical lock per architect msg=7b2169c4 + clarification msg=0b8516b6)

Implementation choice: Option 2 — Envelope-atomic replay (per architect's enumeration).

Mechanics:

  • Each AgentTimelineEvent.sequence (DB-backed) maps 1:1 to one envelope, which the translator may fan out into N stream parts.
  • On the SSE wire, only the LAST part of a fan-out group carries the id: {sequence} line; intermediate parts are emitted with data: only (no id:).
  • HTML5 SSE spec: client Last-Event-ID only advances when a frame with id: is received. So if a client disconnects mid-fan-out (before the closing id: of the current envelope), its last advanced cursor remains at the previous envelope's sequence.
  • On reconnect (Last-Event-ID: <prev>), server replays from sequence > prev, which means the entire current envelope is replayed from scratch — including parts the client may have already received.

Client expectations (relevant to #76 D8.4a FE):

  • Must dedup by stable part identifiers: toolCallId (tool parts), artifact_id (citations / source-urls), text-block id (text-start/-delta/-end). On replay, an already-received part is identified by the same stable id and dropped.
  • AI SDK v5 useChat / message-store layers handle this naturally — their reducers are id-keyed.

Trade-off rationale:

  • Pro: zero DB / Redis schema change, full alignment with existing sequence semantics, [Features] support preview and download documents #74 at-rest storage unchanged.
  • Con: mid-fan-out disconnect causes client to receive duplicates of earlier parts in the in-flight envelope; client-side dedup expected.

Errata for first-cut msg=df929617 §B.4

The first-cut said "fan-out 时保持 sequence 单调递增(对应每个新 part 一个 sequence)" — that wording would imply Option 3 (per-part new sequence). The actual implementation is Option 2 above; the PR description is the authoritative canonical reference.

Contract first-cut for #74 / #75 unblock (per PM msg=387fd639)

Wire format

HTTP/1.1 200 OK
Content-Type: text/event-stream
x-vercel-ai-ui-message-stream: v1

[ intermediate parts of fan-out: ]
data: {"type":"<part-type>", ...}

[ last part of fan-out: ]
id: <sequence>
data: {"type":"<part-type>", ...}

: heartbeat

Resume / error / abort

  • Resume: Last-Event-ID: <last-seen-sequence> header → replay from sequence > last-seen (envelope-atomic).
  • Error: turn-level uncaught exception → synthetic error {errorText} + finish part emitted before stream closes.
  • Abort: user cancel / lease loss → abort + finish part emitted before stream closes.

Hand-off seams (for #74 / #75)

Acceptance gates

  • pytest test_agent_runtime_wire_parts + agent_runtime/ + test_v1_ghost_guard + test_modularization_boundaries + test_openapi_spec -q55 passed
  • ✅ Pre-commit make lint + make add-license clean
  • ✅ Rebased on latest main 128409ba (chore: fix code generate websocket connect failed #66 G5b-impl included)

Caveats / known gaps (out of #73 scope, flagged for downstream)

  1. reference_bundle items not yet inlined in envelope data: today runtime.py emits data={artifact_id, artifact_type} only. Translator looks for data['items'] first then data['payload']['items'], defaulting to empty list. Without runtime inlining items into the envelope (or storage materializing them when SSE reads), no citations will surface to FE. Out of [Features] integrate Slack bot as a frontend of kubechat #73 scope — flag for [Features] support preview and download documents #74 / runtime inlining hook decision before [Features] Integrate Feishu doc as a data source #76 D8.4a integration.
  2. text-end not emitted on mid-stream turn.failed: failure paths emit error + finish but don't close any open text-start block. FE should treat error/finish as implicit close.
  3. SafeToolName + metadata population: deferred to [Features] integrate YuQue as a data source #75 D8.3 via the safe_tool_name_resolver hook. Translator currently emits raw envelope tool_name with empty metadata={}.
  4. data-tool-consent / data-elicitation flow: part-type literals reserved in parts.py; flow logic is [Features] integrate YuQue as a data source #75's lane.

Ghost-check

none. No new backend coupling, no DB schema change, no FE touch. The wire format change is the explicit hard-cut per Phase 8 philosophy (earayu2 msg=78fdb6fc) — FE consumers updated in #76 D8.4a (standby).

🤖 Generated with Claude Code

Land the wire-emission half of D8.1 — the agent-runtime SSE endpoint
now emits AI SDK v5 ``UI Message Stream Protocol`` part frames in
place of the legacy ``AgentTimelineEventEnvelope`` JSON, advertising
itself via the ``x-vercel-ai-ui-message-stream: v1`` response header
that the FE ``@ai-sdk/react`` consumer (#76) keys on.

New ``aperag/domains/agent_runtime/wire/`` sub-package:
* ``parts.py`` — Pydantic models for every v5 part type the runtime
  emits + ``data-citation`` (Anthropic-shape) / ``data-activity``
  ApeRAG extensions + placeholder ``data-tool-consent`` /
  ``data-elicitation`` literals reserved for #75 chenyexuan; exposed
  as a discriminated ``StreamPart`` union with a ``TypeAdapter`` for
  round-trip parsing.
* ``translator.py`` — pure ``translate_envelope(envelope, state)``
  function mapping each timeline envelope to one-or-more parts per
  the D8.1 mapping table; per-turn ``TranslatorState`` carries
  text-block lifecycle bookkeeping; ``safe_tool_name_resolver`` hook
  reserved for #75 (raw tool name + empty metadata until then).

SSE route (``api/routes.py``) updated:
* New ``_format_part_frame`` writes ``id: <seq>\ndata: <json>\n\n``
  AI SDK v5 frames; only the LAST part of an envelope fan-out gets
  the SSE ``id:`` so ``Last-Event-ID`` resume keeps pointing at the
  next envelope (translator docstring documents the invariant).
* ``stream_turn_events_view`` now wraps each envelope through the
  translator and yields one frame per part. Heartbeat switched to
  the SSE-comment form (``: heartbeat\n\n``) which is invisible to
  the v5 consumer. Generator wrapped in try/except that emits a
  synthetic ``error`` part on uncaught exceptions before re-raising.

Out of scope (per PM lock msg=82ba98fc): DB / Redis storage (#74),
tool consent / elicitation / SafeToolName plumbing (#75), FE consumer
(#76), agent reasoning loop. The translator is read-only over
envelopes; storage shape is unchanged.

Tests:
* ``tests/unit_test/test_agent_runtime_wire_parts.py`` — 14 contract
  tests covering every envelope→part mapping, JSON round-trip across
  the union, ``safe_tool_name_resolver`` plug-in seam, SSE response
  headers (v5 marker + Content-Type), and ``Last-Event-ID`` resume
  semantics.
* Updated ``test_agent_runtime_v3.py`` and
  ``test_agent_runtime_openapi_contract.py`` to assert on the new
  AI SDK v5 wire shape (hard-cut per Phase 8 msg=78fdb6fc — no dual
  emission, no envelope-format fallback).

Acceptance gates green: wire-parts suite + modularization_boundaries
+ v1_ghost_guard + openapi_spec all pass; ``make lint`` +
``make add-license`` clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@earayu earayu merged commit 5113730 into main Apr 25, 2026
4 checks passed
@earayu earayu deleted the phase8/d81-stream-emitter branch April 25, 2026 12:42
earayu added a commit that referenced this pull request Apr 25, 2026
… elicitation (#1696)

* feat(phase8 #74 D8.2): first-cut UIMessage at-rest storage for agent path

Phase 8 task #74 (D8.2) — first cut of the at-rest UIMessage storage
layer per the canonical ``docs/modularization/agent-message-protocol-design.md``
and ``docs/modularization/agent-runtime-mcp-design.md`` (in main).

This PR delivers the foundation:

* ``aperag/domains/agent_runtime/uimessage.py`` (NEW) — pydantic
  schema for ``UIMessage`` and every ``UIMessagePart`` variant
  (text / tool / source-url / source-document / data-citation /
  data-activity / data-tool-consent / data-elicitation), plus
  ``persistable_parts`` / ``args_preview`` / ``args_hash`` helpers
  enforcing D9 §A7 raw-args-private rule.

* ``aperag/domains/agent_runtime/db/models.py`` — new ``AgentMessage``
  ORM (``agent_message`` table; 1:1 with ``agent_turn`` via
  ``turn_id``; ``parts`` JSON column carries the full UIMessage at
  rest; ``schema_version`` tag for FE forward-compat). Legacy
  ``AgentArtifact`` / ``AgentTimelineEvent`` tables retained during
  D8.x rollout — D8.6 (#80) will drop them once the FE renderer is
  consuming AgentMessage exclusively.

* ``aperag/migration/versions/...d8e2c4a17b91_add_agent_message_table.py``
  — new alembic revision chained off ``7c4e9e1f8b21``; pure additive
  (no rename / drop in this PR), idempotent migration.

* ``aperag/domains/agent_runtime/storage.py`` — extend
  ``AgentRuntimeRedisStore`` with ``write_message_snapshot`` /
  ``read_message_snapshot`` / ``delete_message_snapshot`` keyed on
  ``agent_runtime:turn:<id>:message``; same TTL as the live event
  buffer.

* ``aperag/domains/agent_runtime/uimessage_store.py`` (NEW) —
  ``UIMessageStore`` wraps the DB row + Redis snapshot behind a
  single ``write`` / ``read`` / ``delete`` surface. ``write``
  filters transient parts (currently only ``data-activity``);
  ``read`` prefers Redis but falls back to the durable DB row when
  the snapshot is cold. ``UIMessageDbOps`` is a SQLAlchemy-bound
  helper kept separate so unit tests can inject in-memory fakes.

* ``tests/unit_test/agent_runtime/test_uimessage_at_rest.py`` (NEW)
  — at-rest reload contract tests pinning the three invariants
  Weston named as the prerequisite for unblocking D8.4b
  (msg=50c90f6f / msg=cef89ed8): round-trip fidelity across every
  persistable part variant, transient exclusion, snapshot
  consistency between Redis and DB.

Out of scope (left for follow-up commits / sibling lanes per PM
msg=a3c31f79):

* Wire/streaming emitter — D8.1 (#73, cuiwenbo)
* Tool / citation / consent / elicitation enforcement of the
  7-point D9 §A4 contract — D8.3 (#75, chenyexuan)
* Full event-to-UIMessage projection in the runtime services —
  follow-up commit on this branch once #73 stream contract is
  visible
* Drop of legacy ``agent_artifact`` / ``agent_timeline_event``
  tables — D8.6 (#80)
* Non-agent bot path migration — D8.5 (#79)
* FE renderer — D8.4a/b/c (#76/#77/#78)

Gates: 709 pass / 29 skip / 1 deselect / 0 fail unit suite (incl.
7 new contract tests + 24 boundary intact); ruff lint+format clean.

* refactor(phase8 #73 D8.1): backend AI SDK v5 stream emitter

Land the wire-emission half of D8.1 — the agent-runtime SSE endpoint
now emits AI SDK v5 ``UI Message Stream Protocol`` part frames in
place of the legacy ``AgentTimelineEventEnvelope`` JSON, advertising
itself via the ``x-vercel-ai-ui-message-stream: v1`` response header
that the FE ``@ai-sdk/react`` consumer (#76) keys on.

New ``aperag/domains/agent_runtime/wire/`` sub-package:
* ``parts.py`` — Pydantic models for every v5 part type the runtime
  emits + ``data-citation`` (Anthropic-shape) / ``data-activity``
  ApeRAG extensions + placeholder ``data-tool-consent`` /
  ``data-elicitation`` literals reserved for #75 chenyexuan; exposed
  as a discriminated ``StreamPart`` union with a ``TypeAdapter`` for
  round-trip parsing.
* ``translator.py`` — pure ``translate_envelope(envelope, state)``
  function mapping each timeline envelope to one-or-more parts per
  the D8.1 mapping table; per-turn ``TranslatorState`` carries
  text-block lifecycle bookkeeping; ``safe_tool_name_resolver`` hook
  reserved for #75 (raw tool name + empty metadata until then).

SSE route (``api/routes.py``) updated:
* New ``_format_part_frame`` writes ``id: <seq>\ndata: <json>\n\n``
  AI SDK v5 frames; only the LAST part of an envelope fan-out gets
  the SSE ``id:`` so ``Last-Event-ID`` resume keeps pointing at the
  next envelope (translator docstring documents the invariant).
* ``stream_turn_events_view`` now wraps each envelope through the
  translator and yields one frame per part. Heartbeat switched to
  the SSE-comment form (``: heartbeat\n\n``) which is invisible to
  the v5 consumer. Generator wrapped in try/except that emits a
  synthetic ``error`` part on uncaught exceptions before re-raising.

Out of scope (per PM lock msg=82ba98fc): DB / Redis storage (#74),
tool consent / elicitation / SafeToolName plumbing (#75), FE consumer
(#76), agent reasoning loop. The translator is read-only over
envelopes; storage shape is unchanged.

Tests:
* ``tests/unit_test/test_agent_runtime_wire_parts.py`` — 14 contract
  tests covering every envelope→part mapping, JSON round-trip across
  the union, ``safe_tool_name_resolver`` plug-in seam, SSE response
  headers (v5 marker + Content-Type), and ``Last-Event-ID`` resume
  semantics.
* Updated ``test_agent_runtime_v3.py`` and
  ``test_agent_runtime_openapi_contract.py`` to assert on the new
  AI SDK v5 wire shape (hard-cut per Phase 8 msg=78fdb6fc — no dual
  emission, no envelope-format fallback).

Acceptance gates green: wire-parts suite + modularization_boundaries
+ v1_ghost_guard + openapi_spec all pass; ``make lint`` +
``make add-license`` clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): wrap data-* parts in {type, data: {...}} per D8 §2 canonical

Architect canonical lock 2026-04-25 (msg=ad6168e7) + PM scope-tightening
(msg=1ff7ed9e): persisted data-* parts must round-trip byte-for-byte
with the wire shape produced by #73 cuiwenbo's emitter — D8 §2 forbids
a wire/at-rest converter layer.

Pre-fix at-rest used flat fields (DataCitationPart.cited_text/.location,
DataToolConsentPart.tool_call_id/..., DataElicitationPart.elicitation_id/...)
which violated the same-schema canonical and would have forced #75
chenyexuan or the FE renderer (#76/#77) to maintain dual code paths.

This commit:
- Introduces inner data classes (CitationData / ActivityData /
  ToolConsentData / ElicitationData) so each data-* part follows
  {type, data: {...}} with the field set unchanged.
- Updates the every-part fixture in the contract test to construct
  parts via the wrapped form.
- Adds test_data_parts_use_wrapped_data_shape — a dedicated lock that
  reads the persisted DB row and asserts each data-* part's keys are
  exactly {type, data} and that data carries the canonical fields.

Tests: 8/8 in agent_runtime/test_uimessage_at_rest.py pass; full unit
suite 711/711 (29 skip), ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): align ToolPart with D8 §2.4 / D9 §A1+§A6 SafeToolName shape

Weston minimal CR (msg=1812fb03) + architect canonical affirm (msg=8412dce5):
the at-rest ToolPart used a flat `type: "tool"` literal plus a separate
`tool_name` field, which is neither the AI SDK v5 streaming form
(`tool-input-*` / `tool-output-*`) nor the v5 consolidated form
(`type: "tool-<safeName>"`). That third intermediate shape would have
forced #75 emit + #76/#77 FE renderer to do `tool` -> `tool-<name>`
conversion — the same wire/at-rest schema drift class we just rejected
for the data-* parts.

This commit:
- Encodes the SafeToolName directly in `ToolPart.type` via a regex-
  validated `^tool-[A-Za-z0-9_-]+$` discriminator string, matching
  D8 §2.4 + D9 §A1/§A6.
- Drops the redundant `tool_name` field; MCP server/tool identity
  remains carried in `metadata`.
- Replaces the misplaced `args_preview` / `args_hash` fields with the
  canonical `input: Optional[Any]`. Those redaction helpers stay
  module-level (`args_preview()` / `args_hash()`) so #75 D8.3 can use
  them when building DataToolConsentPart.data per D9 §A7.
- Updates the every-part fixture and the round-trip expected_types to
  the new tool-`<name>` discriminator.
- Adds test_tool_part_type_uses_safe_tool_name_form — pins the
  persisted tool part `type` matches the SafeToolName regex and
  confirms no top-level `tool_name` field leaks back.

SafeToolName *resolution* (raw MCP name → safe form, collision hash
suffix per D9 §A6) remains #75's scope; #74 only enforces the
canonical storage shape.

Tests: 9/9 in agent_runtime/test_uimessage_at_rest.py pass; full unit
suite 711/711 (29 skip) — the one observed concurrent_control flake
passes on rerun. Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): persist UIMessage parts with canonical camelCase aliases

Weston minimal CR (msg=59a459c6) + architect canonical affirm: the
at-rest part models lacked Pydantic aliases, so `model_dump(by_alias=True)`
fell back to snake_case (`source_id`, `tool_call_id`, `args_preview`,
`elicitation_id`, etc.) — diverging from cuiwenbo wire `parts.py` (#73)
which already serializes camelCase per AI SDK v5. That breaks the D8 §2
same-schema invariant a third time and would have forced #76/#77 FE
renderer to handle two casings.

This commit attaches `Field(alias=...)` + `ConfigDict(populate_by_name=True)`
to every camelCase-canonical field so JSON serialization matches the
wire byte-for-byte while Python call sites still use snake_case:

- SourceUrlPart.source_id        → sourceId
- SourceDocumentPart.source_id   → sourceId
- SourceDocumentPart.media_type  → mediaType
- ToolPart.tool_call_id          → toolCallId
- ToolPart.error_text            → errorText
- ToolConsentData.tool_call_id   → toolCallId
- ToolConsentData.tool_name      → toolName
- ToolConsentData.args_preview   → argsPreview
- ToolConsentData.args_hash      → argsHash
- ToolConsentData.requested_at   → requestedAt
- ElicitationData.elicitation_id → elicitationId

Snake_case stays where D8 §2 / Anthropic-shape canon requires it:
CitationData.cited_text and the four CitationLocation variants
(char_location / page_location / content_block_location / url_citation
plus their internal start_char / end_char / doc_index / doc_title /
page_index / block_index fields) follow the Anthropic citation
convention unchanged.

Tests:
- test_data_parts_use_wrapped_data_shape now asserts the wrapped
  data-tool-consent / data-elicitation payloads carry camelCase keys
  (toolCallId / argsPreview / requestedAt / elicitationId, etc.).
- New test_persisted_keys_use_canonical_camelcase locks the camelCase
  contract end-to-end against the persisted DB row, explicitly
  failing if any of the legacy snake_case forms reappear.
- test_tool_part_type_uses_safe_tool_name_form additionally pins
  toolCallId on the tool part.

Gates: 10/10 in agent_runtime/test_uimessage_at_rest.py pass; full
unit suite 712/29 skip/0 fail (concurrent_control flake deselected,
pre-existing). Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(phase8 #75 D8.3): backend tool lifecycle + citations + consent + elicitation

Implements the seven-point D9 §A4 contract that gates tool execution
in the agent runtime, plus the Anthropic-shape citation transform:

- tools/safe_name.py     -- D9 §A1+§A6 SafeToolName + collision sha256
                            suffix + (mcpServer, mcpToolName, safeName)
                            reverse lookup
- tools/registry.py      -- D9 §1.1+§A5 three-tier MCP registry with
                            system-namespace reservation and audit-logged
                            admin alias (no silent override)
- tools/authorization.py -- D9 §2 three-level auth (visibility /
                            invocation / consent) with §2.2 default
                            policy + per-tool risk overrides
- tools/args_cache.py    -- D9 §A7 backend-private raw-args cache with
                            short TTL; wire-side argsPreview / argsHash
                            re-exported from the canonical helpers in
                            aperag/domains/agent_runtime/uimessage.py
                            (single-source-of-truth)
- tools/consent.py       -- D9 §3 consent request <-> decision flow with
                            asyncio.Event waiter, single-use raw-args
                            consume, denial-drops-cache invariant
- tools/elicitation.py   -- D9 §5 elicitation request <-> answer flow
                            with schema-validated response + cancel
                            hook; pluggable validator (default checks
                            JSON Schema required fields)
- tools/lifecycle.py     -- envelope event-type constants for
                            tool.consent.* / tool.elicitation.* +
                            translate_lifecycle_envelope() translator
                            extension + LifecycleEmitter glue between
                            consent/elicitation services and the
                            runtime's EventService.append_event path
- tools/citations.py     -- typed Anthropic-shape citation builder for
                            char_location / page_location /
                            content_block_location / url_citation, fed
                            from RAG ReferenceBundleItem metadata

Wire-side refinement:
- wire/parts.py DataToolConsentPart + DataElicitationPart placeholders
  refined to use the canonical wrapped {type, data: ToolConsentData /
  ElicitationData} shape (no more `transient: True` placeholder; per D9
  §3.1 / §5.1 these parts are persisted, audit-trail relevant)

api/routes.py:
- chained translate_lifecycle_envelope() after translate_envelope() so
  consent/elicitation envelopes emit DataToolConsentPart /
  DataElicitationPart on the SSE stream
- new POST /agent/turns/{turn_id}/consent/{tool_call_id} -- records the
  user's decision, wakes the runtime waiter, appends the
  tool.consent.decided envelope so SSE replay carries the resolved part
- new POST /agent/turns/{turn_id}/elicit/{elicitation_id} -- submits a
  schema-validated response, wakes the waiter, appends the
  tool.elicitation.resolved envelope

Contract tests (focused unit_test/agent_runtime/test_tools_*.py, 82
new tests, all passing locally; full unit suite 814 / 29 skip / 0
fail):
- test_tools_safe_name.py     (12 tests) -- D9 §A1+§A6 lock
- test_tools_registry.py      (12 tests) -- D9 §1.1+§A5 lock
- test_tools_authorization.py (11 tests) -- D9 §2 lock
- test_tools_args_cache.py    (12 tests) -- D9 §A7 raw-args privacy lock
- test_tools_consent.py       ( 9 tests) -- D9 §3 consent flow lock
- test_tools_elicitation.py   ( 9 tests) -- D9 §5 elicitation lock
- test_tools_lifecycle.py     ( 9 tests) -- D9 §6 translator extension
- test_tools_citations.py     ( 9 tests) -- D8 §2.5 typed citation lock

7-point D9 §A4 verification:
1. SafeToolName + MCP metadata (D9 §A1+§A6)              -- safe_name.py
2. AI SDK v5 + data-tool-consent custom data-part (§A2)  -- wire/parts.py + lifecycle.py
3. argsPreview + argsHash backend-private (§A7)          -- args_cache.py + consent.py
4. Registry no silent system override (§A5)              -- registry.py
5. data-elicitation schema-validated input (§5)          -- elicitation.py
6. Three-level authorization (§2)                        -- authorization.py
7. PydanticAI as default candidate (§A3)                 -- runtime backbone unchanged
                                                            (per architect msg=ff619d8a /
                                                            Weston msg=50c90f6f C2 lock,
                                                            this PR scope explicitly excludes
                                                            backbone rewrite)

Built on:
- #73 D8.1 wire emitter (cuiwenbo, PR #1695 / 5113730 in main)
  -- consumes wire/parts.py + chains lifecycle translator via api/routes.py
- #74 D8.2 at-rest UIMessage storage (Bryce, PR #1694 head be7406c)
  -- imports ToolConsentData / ElicitationData / args_preview / args_hash
  from aperag/domains/agent_runtime/uimessage.py for wire/at-rest
  same-schema canonical

* fix(phase8 #74 D8.2): align DataElicitationPart with D9 §5.1 canonical

Weston minimal CR (msg=51dffdc9) + PM lock (msg=042b0a7b): the at-rest
ElicitationData was missing the canonical `serverName` field and used
a non-canonical `submitted` state literal. D9 §5.1 locks the shape as:

    { type: "data-elicitation", data: {
        elicitationId: string,
        serverName: string,          // MCP server requesting input
        prompt: string,
        schema: JsonSchema,
        state: "pending" | "answered" | "cancelled"
    }}

This commit:
- Adds `server_name: str = Field(alias="serverName")` to ElicitationData
  so MCP server identity round-trips with the elicitation request.
- Tightens `state` to `Literal["pending", "answered", "cancelled"]` per
  D9 §5.1 / §6.3 — the previous `submitted` would have forced #75 emit
  to translate state on every elicitation reply.
- Keeps `response: Optional[dict[str, Any]]` per PM msg=042b0a7b
  ("可以保留但不能替代 canonical 字段"); it carries the user's submitted
  value at-rest after the POST endpoint completes the round-trip.

Tests:
- Updates the every-part fixture with a representative serverName.
- test_data_parts_use_wrapped_data_shape now asserts `serverName` is
  in the persisted data-elicitation keys.
- test_persisted_keys_use_canonical_camelcase locks `serverName` (not
  `server_name`) and the canonical state literal.
- New test_data_elicitation_answered_state_round_trip — explicit
  round-trip of a `state="answered"` elicitation with a populated
  response, pinning the canonical state vocabulary against regression.

Gates: 11/11 in agent_runtime/test_uimessage_at_rest.py pass; full
unit suite 713 passed / 29 skipped / 0 failed (concurrent_control
flake deselected, pre-existing). Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #75 D8.3): align elicitation to D9 §5 / D9.1 canonical (serverName + state="answered")

Fast-follow per PR description's Test plan TODO. Reconciles
``ElicitationService`` and ``LifecycleEmitter.request_elicitation``
with the canonical ``ElicitationData`` shape locked by Bryce's
#1694 head ``04d268be`` (Weston msg=89bafde9 4th-blocker fix +
architect msg=8a76e5e0 D9.1 amend):

- ``ElicitationOutcome`` literal: ``"submitted"`` -> ``"answered"``
  (canonical state vocabulary per D9 §5.1 / D9.1)
- ``ElicitationService.request_input(*, server_name=...)``: required
  kwarg threaded through to populate ``ElicitationData.server_name``
  so the FE consent UI can surface which MCP server initiated the
  elicitation
- ``LifecycleEmitter.request_elicitation(*, server_name=...)``:
  matching kwarg propagated to the underlying service
- contract tests updated: ``test_payload_carries_canonical_server_name``
  + ``test_request_input_rejects_empty_server_name`` added; existing
  state assertions flipped to ``"answered"``

Tests: ``pytest tests/unit_test/agent_runtime/test_tools_*.py -q``
=> 84 passed (was 82 + 2 new server_name tests).

Wire / at-rest shape stays canonical-clean: ``ElicitationData`` is
imported directly from ``aperag/domains/agent_runtime/uimessage.py``
so the field set + alias casing follow #74 ``be7406c5`` -> ``04d268be``
single-source-of-truth.

* fix(phase8 #75 D8.3): tenant ownership + multi-tenant registry + default-deny auth

Address Weston's three blockers from minimal CR (msg=57cf4632) +
the architect-upgraded fourth blocker (msg=19f2c9a9). All within
PR scope per PM lock (msg=ab2ed5d3); none deferred.

## B2 (tenant-bound consent + elicitation ownership)

- ``ConsentService`` records ``ConsentBinding(turn_id, user_id)``
  at ``request_consent`` time; ``decide()`` raises
  :class:`ConsentOwnershipError` when ``actor_user_id`` does not
  match the bound user, or when ``expected_turn_id`` is provided
  and does not match the bound turn (defense in depth even when
  the user matches).
- ``ElicitationService`` mirrors the same pattern via
  ``ElicitationBinding`` + :class:`ElicitationOwnershipError`.
  ``cancel(*, bypass_ownership=True)`` is reserved for
  internal-only callers (timeout sweeper / abort path) so user-
  facing handlers cannot accidentally skip the check.
- ``LifecycleEmitter.request_consent`` /
  ``LifecycleEmitter.request_elicitation`` thread the new
  ``turn_id`` + ``user_id`` kwargs through to the underlying
  services.
- HTTP endpoints moved to ``chat_id``-scoped paths to align with
  the existing pattern (``/agent/chats/{chat_id}/turns/{turn_id}/...``)
  and to leverage ``turn_service.get_turn_snapshot(user, chat,
  turn)`` for HTTP-layer ownership pre-check (raises
  ``ResourceNotFoundException`` -> 404 on cross-user / unknown
  turn). New endpoints:
    POST /agent/chats/{chat_id}/turns/{turn_id}/consent/{tool_call_id}
    POST /agent/chats/{chat_id}/turns/{turn_id}/elicit/{elicitation_id}
  Both translate ``ConsentOwnershipError`` /
  ``ElicitationOwnershipError`` -> 403, ``KeyError`` -> 404,
  ``ValueError`` -> 409 (already resolved) or 422 (validation).
- Regression tests:
    test_decide_rejects_cross_user_actor / cross_turn_actor (consent)
    test_submit_rejects_cross_user_actor / cross_turn_actor (elicitation)
    test_request_consent_rejects_empty_turn_or_user
    test_request_input_rejects_empty_server_name (already there)

## B3 (registry composite key per scope_ref)

- ``_ScopeIndex.entries`` keyed on ``(scope_ref, name)`` tuple;
  system tier uses ``scope_ref=None`` (single global namespace).
  Bot/user tiers use the owning ``scope_ref`` so different bots /
  users can independently register the same name without
  collision -- per D9 §1.1 multi-tenant boundary.
- New ``_tier_key()`` helper composes the right key shape per
  scope.
- ``effective_servers()`` switched to keyed iteration so the
  ``scope_ref`` filter happens at lookup time (was after
  iteration, which was too late once a same-name entry had
  already been overwritten).
- ``unregister(scope, name, *, scope_ref=None)`` API added so
  bot/user removals can target the right (scope_ref, name) pair.
- Regression tests:
    test_two_bots_can_register_same_name_without_collision
    test_two_users_can_register_same_name_without_collision
    test_user_register_does_not_leak_to_other_user_resolution
    test_bot_register_does_not_leak_to_other_bot_resolution
    test_unregister_is_scope_ref_aware_for_bot_user_tiers

## B4 (unknown-risk default-deny)

- ``ToolAuthorizationPolicy.evaluate`` -- when the
  ``risk_resolver`` returns ``None`` for an unknown tool, the
  policy now returns ``visible=True, can_invoke_auto=False,
  requires_consent=True, risk="writes_user_data"`` instead of the
  previous ``READ_ONLY`` auto-invocable default. Per architect
  canonical lock msg=19f2c9a9: misclassified side-effect tools
  must NOT silently bypass the consent gate; the security-first
  fail-closed posture only costs an extra consent prompt for
  tools that operators forget to classify as ``READ_ONLY``.
- Regression test:
    test_unknown_tool_default_deny_per_security_canonical
    test_unknown_tool_filter_visible_keeps_consent_required_tool

## Gates

- ``pytest tests/unit_test/agent_runtime/test_tools_*.py -q``: 95 passed
  (was 84 + 11 new B2/B3/B4 tests; old elicitation tests
  re-targeted to ``actor_user_id="user-1"`` to match the
  test-fixture binding ``user_id="user-1"``)
- ``pytest tests/unit_test/ -q --deselect concurrent_control/test_performance_comparison.py``:
  828 passed / 29 skipped / 0 failed
- ``ruff check`` + ``ruff format --check``: clean

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant