Skip to content

F106: Canonical Agent Exchange Transcript (JSONL) #369

@pocky

Description

@pocky

F106: Canonical Agent Exchange Transcript (JSONL)

Scope

In Scope

  • Canonical append-only JSONL transcript per run (storage/transcripts/<run-id>.jsonl) as durable, replayable source of truth
  • Domain model (internal/domain/transcript/): ExchangeEvent envelope, closed EventType vocabulary, ContentBlock with typed blocks (text, thinking, tool_use, tool_result, command, stream), fidelity marker (router / agent_emitted), Recorder port
  • Infrastructure (internal/infrastructure/transcript/): atomic JSONL writer (O_APPEND, 0o600, mutex beyond PIPE_BUF), bounded fan-out with drop policy, monotonic Seq allocator, write-then-broadcast composition
  • Lifecycle instrumentation in ExecutionService for all step types (agent, command, operation, terminal, parallel, for_each, while, call_workflow, generic custom)
  • Agent seam emission: message{role:user} carrying resolved prompt + composed system_prompt
  • Composite step tracking via path / iteration; sub-workflow linkage via child_run_id / parent_run_id (separate file per child run)
  • Tool capture at tools.Router.CallTool seam (in-process, plugin + builtin) producing tool_use + tool_result blocks marked fidelity:"router"
  • Stdio proxy mode capture: tool_use marked fidelity:"agent_emitted" from NDJSON
  • Per-provider normalization extending existing DisplayEvent parsers to single ContentBlock mapping (Claude, Codex with NUL handling, Gemini, Copilot, OpenAI HTTP tool_result)
  • Live fan-out wired to a first consumer
  • Arch-lint rules: domain-transcript + infra-transcript
  • doc.go (100+ lines) for both new packages covering model, threat model, versioning

Out of Scope

  • Token-by-token streaming deltas
  • Merging or removing existing DisplayEvent and audit.jsonl channels (must coexist unchanged)
  • Wiring all interfaces to consume the transcript (delivered in F107)
  • Default secret masking (opt-in, deferred)
  • Instrumenting awf mcp-serve subprocess for full stdio fidelity (reserved; fidelity marker enables future transition)

Deferred

Item Rationale Follow-up
Secret masking by default Requires policy design; opt-in path sufficient for foundation future
Full stdio fidelity for awf mcp-serve Out-of-process instrumentation deferred; fidelity:"agent_emitted" marker documents the gap honestly future
Removal/merge of audit.jsonl and DisplayEvent Coexistence guarantees zero regression while transcript stabilizes F107+
Facade Event derivation and consumer migration Consumption layer is a distinct concern F107
Tool content hardening (uniform ToolContent surface) Vocabulary defined here, hardened there F108 Axis C

User Stories

US1: Replayable run transcript (P1 - Must Have)

As a workflow operator,
I want every run to produce a single append-only JSONL file capturing the full lifecycle and agent exchange,
So that I can replay, audit, and reconstruct the execution tree offline without re-running the workflow.

Why this priority: Without a durable canonical stream, downstream features (F107 facade, F108 tool hardening) have no source of truth. This is the foundation; nothing else lands without it.

Acceptance Scenarios:

  1. Given a workflow run with id run-abc, When any step or agent exchange event occurs, Then an envelope line is appended to storage/transcripts/run-abc.jsonl with monotonic Seq, file mode 0o600, and atomic write semantics.
  2. Given a completed transcript file, When it is read sequentially, Then the full step tree (including path, iteration, and child runs) can be reconstructed without loss.
  3. Given a run is killed mid-execution, When the process restarts, Then prior events on disk remain intact and valid JSONL with no torn writes.

Independent Test: Run any existing workflow; assert that storage/transcripts/<run-id>.jsonl exists, every line decodes into ExchangeEvent, Seq is strictly monotonic, and a tree reconstruction tool produces the same step graph as the in-memory state machine.

US2: Per-provider normalized agent exchange (P1 - Must Have)

As a workflow operator running agents across multiple providers,
I want Claude, Codex, Gemini, Copilot, and OpenAI HTTP outputs normalized into a uniform ContentBlock stream,
So that consumers handle one vocabulary instead of N divergent provider formats.

Why this priority: The whole point of the canonical transcript is to absorb provider divergence in exactly one place. Without normalization, downstream features re-implement the same parsing per provider — defeating the foundation.

Acceptance Scenarios:

  1. Given a Codex agent step emitting JSONL with embedded NUL bytes, When parsed, Then the recorder emits well-formed ContentBlock entries with NUL bytes handled and no corruption.
  2. Given a Claude agent step emitting thinking + text + tool_use, When parsed, Then the transcript contains a thinking block, a text block, and a tool_use block with stable identifiers.
  3. Given any provider emitting a tool_use without a matching tool_result, When the step completes, Then the transcript records the dangling tool_use and the parser does not panic or drop the message.

Independent Test: Replay real fixtures for each provider; assert the resulting ContentBlock sequence matches a golden file, and that the same workflow run across providers produces structurally equivalent transcripts (same block types in the same order modulo provider-specific extras).

US3: Tool capture at the router seam (P2 - Should Have)

As a workflow operator,
I want every tools.Router.CallTool invocation recorded with fidelity:"router",
So that in-process tool calls are captured uniformly across plugin and builtin tools without depending on agent-emitted output.

Why this priority: The router is the single seam where every in-process tool call passes. Capturing there guarantees coverage even when an agent omits or malforms its tool_use output. Marked P2 because US1+US2 deliver the foundational stream; router capture sharpens fidelity.

Acceptance Scenarios:

  1. Given a builtin tool invocation through tools.Router.CallTool, When it completes, Then a tool_use and tool_result block pair appears in the transcript with fidelity:"router".
  2. Given a stdio proxy plugin invocation, When the NDJSON tool_use arrives, Then the corresponding block is recorded with fidelity:"agent_emitted" and no double-counting occurs against router-captured events.

Independent Test: Run a workflow invoking one builtin and one plugin tool; assert exactly one tool_use/tool_result pair per call, with the correct fidelity marker for each path.

US4: Bounded live fan-out (P2 - Should Have)

As a consumer of the transcript,
I want to subscribe to live events with a bounded buffer and explicit drop policy,
So that a slow consumer cannot stall the recorder or the workflow execution.

Why this priority: The transcript is the live channel for F107; back-pressure semantics must be settled now. P2 because correctness of the on-disk file (US1) is the durable contract; fan-out is the consumption affordance.

Acceptance Scenarios:

  1. Given a subscriber consuming slower than the producer, When the buffer fills, Then events are dropped per the documented policy and the recorder continues writing to disk without blocking.
  2. Given a subscriber calls Close, When further events arrive, Then the subscriber channel is closed cleanly and Close is idempotent.

Independent Test: Run a producer emitting at high rate with a deliberately slow subscriber; assert the JSONL file is complete and monotonic while the subscriber receives a documented subset, with no deadlock and no panic on repeated Close.

US5: Sub-workflow linkage (P3 - Nice to Have)

As a workflow author,
I want call_workflow steps to produce a linked child transcript with child_run_id and parent_run_id,
So that I can navigate from parent to child runs while keeping each file self-contained.

Why this priority: Composite workflows are real but the linkage shape (one Recorder per sub-run vs. shared multi-file) needs verification. P3 because single-run replay (US1) delivers value first.

Acceptance Scenarios:

  1. Given a parent run invoking call_workflow, When the child run starts, Then the parent transcript contains a step event referencing child_run_id, and the child transcript's lifecycle events carry parent_run_id.
  2. Given a child run completes, When the parent reads both files, Then the linked tree is reconstructible without ambiguity.

Independent Test: Run a workflow with call_workflow; assert two transcript files exist, cross-references are consistent in both directions, and reconstruction yields a single connected tree.

Edge Cases

  • Codex emits embedded NUL bytes in JSONL → parser handles them; transcript line remains valid JSON.
  • Agent emits tool_use without tool_result (timeout, crash) → dangling block is recorded; no panic.
  • Subscriber buffer overflows → documented drop policy applied; disk write unaffected.
  • Concurrent writes from multiple goroutines → monotonic Seq strictly enforced; writes serialized via mutex when lines exceed PIPE_BUF.
  • Process killed mid-write → O_APPEND + atomic line semantics ensure no torn lines on resume.
  • Unknown EventType encountered by reader → tolerant decode (forward-compat); event surfaced with type intact.
  • Close called twice on recorder or subscriber → idempotent; no panic.
  • Sub-workflow recursion deeper than 1 level → each level produces its own file with consistent parent linkage.

Requirements

Functional Requirements

  • FR-001: System MUST emit one append-only JSONL file per run at storage/transcripts/<run-id>.jsonl with mode 0o600, opened with O_APPEND, and serialize writes when payloads exceed PIPE_BUF.
  • FR-002: System MUST define ExchangeEvent (envelope) and ContentBlock (typed content) in internal/domain/transcript/ with a closed EventType vocabulary and block types text, thinking, tool_use, tool_result, command, stream, plus a fidelity marker (router | agent_emitted).
  • FR-003: System MUST allocate Seq from a single monotonic source and apply write-then-broadcast ordering (disk first, fan-out second).
  • FR-004: System MUST instrument every step type (agent, command, operation, terminal, parallel, for_each, while, call_workflow, generic custom) in ExecutionService with start/end lifecycle events.
  • FR-005: System MUST emit, at the agent seam, a message{role:user} event carrying the fully resolved prompt and composed system_prompt.
  • FR-006: System MUST record composite step location via path and iteration fields for parallel, for_each, while, and call_workflow executors.
  • FR-007: System MUST link sub-workflows via child_run_id on the parent event and parent_run_id on the child lifecycle events, with each run written to a separate file.
  • FR-008: System MUST capture tool_use and tool_result at the tools.Router.CallTool seam with fidelity:"router" for both plugin and builtin tools.
  • FR-009: System MUST capture stdio proxy tool_use from NDJSON with fidelity:"agent_emitted" and MUST NOT double-count against router-captured events.
  • FR-010: System MUST normalize Claude, Codex (with NUL byte tolerance), Gemini, Copilot, and OpenAI HTTP outputs into the unified ContentBlock vocabulary in a single mapping layer extending the existing DisplayEvent parsers.
  • FR-011: System MUST expose a Recorder port with Record, Subscribe, and Close operations; Close MUST be idempotent.
  • FR-012: System MUST provide bounded fan-out with a documented buffer size and drop policy so a slow subscriber cannot block writes.
  • FR-013: System MUST preserve existing DisplayEvent and audit.jsonl channels unchanged (coexistence).
  • FR-014: System MUST publish doc.go (100+ lines) in both internal/domain/transcript/ and internal/infrastructure/transcript/ documenting model, threat model, and versioning.
  • FR-015: System MUST register domain-transcript and infra-transcript packages in .go-arch-lint.yml with dependency rules; existing plugin rules MUST remain unchanged.

Non-Functional Requirements

  • NFR-001: Test coverage for new packages MUST exceed 85% and make test-race MUST pass with zero data races, including concurrency tests for monotonic Seq, write-then-broadcast, idempotent Close, bounded replay, and slow-subscriber non-blocking.
  • NFR-002: Transcript file permissions MUST be 0o600; no secrets are masked by default but the design MUST NOT preclude future opt-in masking.
  • NFR-003: Plugin impact MUST be zero — capture occurs at boundaries (router seam, agent seam), not inside plugin internals.
  • NFR-004: A golden-transcript anti-divergence test MUST run in CI to detect unintended changes to the canonical stream shape.
  • NFR-005: JSONL reader MUST tolerate unknown EventType values without failing (forward-compatible decode).

Success Criteria

  • SC-001: Every workflow run produces a valid storage/transcripts/<run-id>.jsonl file with strictly monotonic Seq and no torn lines, verified by a replay tool reconstructing the full step tree.
  • SC-002: A single per-provider normalization layer absorbs Claude, Codex, Gemini, Copilot, and OpenAI HTTP divergence; no provider-specific parsing leaks outside internal/infrastructure/transcript/.
  • SC-003: New transcript packages achieve >85% test coverage and make test-race passes with zero races.
  • SC-004: Existing DisplayEvent and audit.jsonl outputs remain byte-identical for unchanged workflows (coexistence guarantee verified by golden tests).
  • SC-005: A slow subscriber consuming at 10% of producer rate causes zero blocking of disk writes and zero deadlocks across a 10k-event run.
  • SC-006: F107 can derive its facade.Event from ExchangeEvent via an identity-shaped mapping (same types, same Seq) without additional translation logic.

Key Entities

Entity Description Key Attributes
ExchangeEvent Envelope for every transcript line; the unit of replay and fan-out seq (monotonic), run_id, parent_run_id, child_run_id, type (EventType), path, iteration, timestamp, payload
EventType Closed vocabulary of envelope types covering step lifecycle and agent exchange enumerated constants in internal/domain/transcript/event.go
ContentBlock Typed block of agent exchange content carried within message events type (BlockType), fidelity (router | agent_emitted), block-specific fields
BlockType Closed vocabulary: text, thinking, tool_use, tool_result, command, stream enumerated constants in internal/domain/transcript/content.go
Recorder Domain port for emission and subscription Record(ExchangeEvent) error, Subscribe() <-chan ExchangeEvent, Close() error
JSONLWriter Infrastructure adapter: atomic append, O_APPEND, 0o600, mutex beyond PIPE_BUF file path, mutex, file handle
FanOut Infrastructure adapter: bounded subscriber buffer with documented drop policy buffer size, subscribers, drop counter

Assumptions

  • The run_id reused for the transcript filename is the same identifier already produced by ExecutionService for state and audit channels (verification item from the spec).
  • The composed system_prompt and the resolved user prompt are both available at the agent seam at the moment of emission (verification item from the spec).
  • tools.Router.CallTool is the single in-process seam through which all builtin and plugin tool invocations pass; capture there is exhaustive for non-stdio paths.
  • DisplayEvent parsers per provider are the correct starting point for normalization and can be extended without breaking their current consumers.
  • A bounded buffer with a drop-oldest (or drop-newest, to be verified) policy is acceptable for live fan-out; disk remains the durable contract.
  • One Recorder per sub-run (separate file per call_workflow) is preferred over a shared multi-file Recorder (verification item from the spec).

Metadata

  • Status: backlog
  • Version: v0.11.0
  • Priority: high
  • Estimation: XL

Dependencies

  • Blocked by: F103
  • Unblocks: F107, F108

Clarifications

Section populated during clarify step with resolved ambiguities.

Open verification items carried from the source spec:

  • Back-pressure semantics: drop policy direction (oldest vs newest) and buffer size default.
  • Shape of the reused run_id (format, uniqueness guarantees) at the transcript seam.
  • Availability of system_prompt and resolved prompt at the exact agent seam.
  • Per-provider tool_result fidelity across CLI providers (Claude, Codex, Gemini, Copilot).
  • Conversation mode: handling of intermediate turns within a single agent step.
  • Construction of path / iteration inside executeParallel, loop_executor, executeCallWorkflowStep.
  • child_run_id generation and parent_run_id propagation: one Recorder per sub-run vs. shared multi-file.
  • Serialized result shape of custom steps.

Notes

  • Position in roadmap: 4/6, FOUNDATION. Source spec: .agent/specs/2026-06-02-agent-exchange-transcript-design.md. Research: research-improvements.md §5.
  • Cross-feature coordination with F107: facade.Event derives from ExchangeEvent — mapping should ideally be identity (same types and Seq). Resolves the §1.4 "poll vs push" question by establishing push from a single source.
  • Cross-feature coordination with F108 Axis C: ToolContent (F108) and ContentBlock.tool_result (here) share the same vocabulary and fidelity marker. The single source is defined here and reused there.
  • Why this position: F103 first cleans Codex JSONL parity so normalization starts from a clean extraction; F104 and F105 stabilize MCP and ACP boundaries before the transcript instruments them.
  • Plugin impact: none. Capture is at boundaries (router seam, agent seam), not inside plugin internals.
  • Coexistence guarantee: DisplayEvent and audit.jsonl remain unchanged for the duration of F106; their evolution is a separate concern handled in F107+.
  • Threat model (to be detailed in doc.go): file permissions 0o600, no secrets masked by default, opt-in masking path preserved, atomic append against torn writes, monotonic Seq as the only ordering authority.
  • Anti-divergence guard: a golden-transcript fixture replayed in CI detects unintended changes to envelope or block shape across refactors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeature specificationv0.11.0Target version

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions