F106: Canonical Agent Exchange Transcript (JSONL)

# F106: Canonical Agent Exchange Transcript (JSONL)

## Scope

### In Scope

- Canonical append-only JSONL transcript per run (`storage/transcripts/<run-id>.jsonl`) as durable, replayable source of truth
- Domain model (`internal/domain/transcript/`): `ExchangeEvent` envelope, closed `EventType` vocabulary, `ContentBlock` with typed blocks (`text`, `thinking`, `tool_use`, `tool_result`, `command`, `stream`), `fidelity` marker (`router` / `agent_emitted`), `Recorder` port
- Infrastructure (`internal/infrastructure/transcript/`): atomic JSONL writer (`O_APPEND`, `0o600`, mutex beyond PIPE_BUF), bounded fan-out with drop policy, monotonic `Seq` allocator, write-then-broadcast composition
- Lifecycle instrumentation in `ExecutionService` for all step types (`agent`, `command`, `operation`, `terminal`, `parallel`, `for_each`, `while`, `call_workflow`, generic custom)
- Agent seam emission: `message{role:user}` carrying resolved prompt + composed `system_prompt`
- Composite step tracking via `path` / `iteration`; sub-workflow linkage via `child_run_id` / `parent_run_id` (separate file per child run)
- Tool capture at `tools.Router.CallTool` seam (in-process, plugin + builtin) producing `tool_use` + `tool_result` blocks marked `fidelity:"router"`
- Stdio proxy mode capture: `tool_use` marked `fidelity:"agent_emitted"` from NDJSON
- Per-provider normalization extending existing `DisplayEvent` parsers to single `ContentBlock` mapping (Claude, Codex with NUL handling, Gemini, Copilot, OpenAI HTTP `tool_result`)
- Live fan-out wired to a first consumer
- Arch-lint rules: `domain-transcript` + `infra-transcript`
- `doc.go` (100+ lines) for both new packages covering model, threat model, versioning

### Out of Scope

- Token-by-token streaming deltas
- Merging or removing existing `DisplayEvent` and `audit.jsonl` channels (must coexist unchanged)
- Wiring all interfaces to consume the transcript (delivered in F107)
- Default secret masking (opt-in, deferred)
- Instrumenting `awf mcp-serve` subprocess for full stdio fidelity (reserved; `fidelity` marker enables future transition)

### Deferred

| Item | Rationale | Follow-up |
|------|-----------|-----------|
| Secret masking by default | Requires policy design; opt-in path sufficient for foundation | future |
| Full stdio fidelity for `awf mcp-serve` | Out-of-process instrumentation deferred; `fidelity:"agent_emitted"` marker documents the gap honestly | future |
| Removal/merge of `audit.jsonl` and `DisplayEvent` | Coexistence guarantees zero regression while transcript stabilizes | F107+ |
| Facade `Event` derivation and consumer migration | Consumption layer is a distinct concern | F107 |
| Tool content hardening (uniform `ToolContent` surface) | Vocabulary defined here, hardened there | F108 Axis C |

---

## User Stories

### US1: Replayable run transcript (P1 - Must Have)

**As a** workflow operator,
**I want** every run to produce a single append-only JSONL file capturing the full lifecycle and agent exchange,
**So that** I can replay, audit, and reconstruct the execution tree offline without re-running the workflow.

**Why this priority**: Without a durable canonical stream, downstream features (F107 facade, F108 tool hardening) have no source of truth. This is the foundation; nothing else lands without it.

**Acceptance Scenarios:**
1. **Given** a workflow run with id `run-abc`, **When** any step or agent exchange event occurs, **Then** an envelope line is appended to `storage/transcripts/run-abc.jsonl` with monotonic `Seq`, file mode `0o600`, and atomic write semantics.
2. **Given** a completed transcript file, **When** it is read sequentially, **Then** the full step tree (including `path`, `iteration`, and child runs) can be reconstructed without loss.
3. **Given** a run is killed mid-execution, **When** the process restarts, **Then** prior events on disk remain intact and valid JSONL with no torn writes.

**Independent Test:** Run any existing workflow; assert that `storage/transcripts/<run-id>.jsonl` exists, every line decodes into `ExchangeEvent`, `Seq` is strictly monotonic, and a tree reconstruction tool produces the same step graph as the in-memory state machine.

### US2: Per-provider normalized agent exchange (P1 - Must Have)

**As a** workflow operator running agents across multiple providers,
**I want** Claude, Codex, Gemini, Copilot, and OpenAI HTTP outputs normalized into a uniform `ContentBlock` stream,
**So that** consumers handle one vocabulary instead of N divergent provider formats.

**Why this priority**: The whole point of the canonical transcript is to absorb provider divergence in exactly one place. Without normalization, downstream features re-implement the same parsing per provider — defeating the foundation.

**Acceptance Scenarios:**
1. **Given** a Codex agent step emitting JSONL with embedded NUL bytes, **When** parsed, **Then** the recorder emits well-formed `ContentBlock` entries with NUL bytes handled and no corruption.
2. **Given** a Claude agent step emitting thinking + text + tool_use, **When** parsed, **Then** the transcript contains a `thinking` block, a `text` block, and a `tool_use` block with stable identifiers.
3. **Given** any provider emitting a `tool_use` without a matching `tool_result`, **When** the step completes, **Then** the transcript records the dangling `tool_use` and the parser does not panic or drop the message.

**Independent Test:** Replay real fixtures for each provider; assert the resulting `ContentBlock` sequence matches a golden file, and that the same workflow run across providers produces structurally equivalent transcripts (same block types in the same order modulo provider-specific extras).

### US3: Tool capture at the router seam (P2 - Should Have)

**As a** workflow operator,
**I want** every `tools.Router.CallTool` invocation recorded with `fidelity:"router"`,
**So that** in-process tool calls are captured uniformly across plugin and builtin tools without depending on agent-emitted output.

**Why this priority**: The router is the single seam where every in-process tool call passes. Capturing there guarantees coverage even when an agent omits or malforms its tool_use output. Marked P2 because US1+US2 deliver the foundational stream; router capture sharpens fidelity.

**Acceptance Scenarios:**
1. **Given** a builtin tool invocation through `tools.Router.CallTool`, **When** it completes, **Then** a `tool_use` and `tool_result` block pair appears in the transcript with `fidelity:"router"`.
2. **Given** a stdio proxy plugin invocation, **When** the NDJSON `tool_use` arrives, **Then** the corresponding block is recorded with `fidelity:"agent_emitted"` and no double-counting occurs against router-captured events.

**Independent Test:** Run a workflow invoking one builtin and one plugin tool; assert exactly one `tool_use`/`tool_result` pair per call, with the correct `fidelity` marker for each path.

### US4: Bounded live fan-out (P2 - Should Have)

**As a** consumer of the transcript,
**I want** to subscribe to live events with a bounded buffer and explicit drop policy,
**So that** a slow consumer cannot stall the recorder or the workflow execution.

**Why this priority**: The transcript is the live channel for F107; back-pressure semantics must be settled now. P2 because correctness of the on-disk file (US1) is the durable contract; fan-out is the consumption affordance.

**Acceptance Scenarios:**
1. **Given** a subscriber consuming slower than the producer, **When** the buffer fills, **Then** events are dropped per the documented policy and the recorder continues writing to disk without blocking.
2. **Given** a subscriber calls `Close`, **When** further events arrive, **Then** the subscriber channel is closed cleanly and `Close` is idempotent.

**Independent Test:** Run a producer emitting at high rate with a deliberately slow subscriber; assert the JSONL file is complete and monotonic while the subscriber receives a documented subset, with no deadlock and no panic on repeated `Close`.

### US5: Sub-workflow linkage (P3 - Nice to Have)

**As a** workflow author,
**I want** `call_workflow` steps to produce a linked child transcript with `child_run_id` and `parent_run_id`,
**So that** I can navigate from parent to child runs while keeping each file self-contained.

**Why this priority**: Composite workflows are real but the linkage shape (one Recorder per sub-run vs. shared multi-file) needs verification. P3 because single-run replay (US1) delivers value first.

**Acceptance Scenarios:**
1. **Given** a parent run invoking `call_workflow`, **When** the child run starts, **Then** the parent transcript contains a step event referencing `child_run_id`, and the child transcript's lifecycle events carry `parent_run_id`.
2. **Given** a child run completes, **When** the parent reads both files, **Then** the linked tree is reconstructible without ambiguity.

**Independent Test:** Run a workflow with `call_workflow`; assert two transcript files exist, cross-references are consistent in both directions, and reconstruction yields a single connected tree.

### Edge Cases

- Codex emits embedded NUL bytes in JSONL → parser handles them; transcript line remains valid JSON.
- Agent emits `tool_use` without `tool_result` (timeout, crash) → dangling block is recorded; no panic.
- Subscriber buffer overflows → documented drop policy applied; disk write unaffected.
- Concurrent writes from multiple goroutines → monotonic `Seq` strictly enforced; writes serialized via mutex when lines exceed `PIPE_BUF`.
- Process killed mid-write → `O_APPEND` + atomic line semantics ensure no torn lines on resume.
- Unknown `EventType` encountered by reader → tolerant decode (forward-compat); event surfaced with type intact.
- `Close` called twice on recorder or subscriber → idempotent; no panic.
- Sub-workflow recursion deeper than 1 level → each level produces its own file with consistent parent linkage.

---

## Requirements

### Functional Requirements

- **FR-001**: System MUST emit one append-only JSONL file per run at `storage/transcripts/<run-id>.jsonl` with mode `0o600`, opened with `O_APPEND`, and serialize writes when payloads exceed `PIPE_BUF`.
- **FR-002**: System MUST define `ExchangeEvent` (envelope) and `ContentBlock` (typed content) in `internal/domain/transcript/` with a closed `EventType` vocabulary and block types `text`, `thinking`, `tool_use`, `tool_result`, `command`, `stream`, plus a `fidelity` marker (`router` | `agent_emitted`).
- **FR-003**: System MUST allocate `Seq` from a single monotonic source and apply write-then-broadcast ordering (disk first, fan-out second).
- **FR-004**: System MUST instrument every step type (`agent`, `command`, `operation`, `terminal`, `parallel`, `for_each`, `while`, `call_workflow`, generic custom) in `ExecutionService` with start/end lifecycle events.
- **FR-005**: System MUST emit, at the agent seam, a `message{role:user}` event carrying the fully resolved prompt and composed `system_prompt`.
- **FR-006**: System MUST record composite step location via `path` and `iteration` fields for `parallel`, `for_each`, `while`, and `call_workflow` executors.
- **FR-007**: System MUST link sub-workflows via `child_run_id` on the parent event and `parent_run_id` on the child lifecycle events, with each run written to a separate file.
- **FR-008**: System MUST capture `tool_use` and `tool_result` at the `tools.Router.CallTool` seam with `fidelity:"router"` for both plugin and builtin tools.
- **FR-009**: System MUST capture stdio proxy `tool_use` from NDJSON with `fidelity:"agent_emitted"` and MUST NOT double-count against router-captured events.
- **FR-010**: System MUST normalize Claude, Codex (with NUL byte tolerance), Gemini, Copilot, and OpenAI HTTP outputs into the unified `ContentBlock` vocabulary in a single mapping layer extending the existing `DisplayEvent` parsers.
- **FR-011**: System MUST expose a `Recorder` port with `Record`, `Subscribe`, and `Close` operations; `Close` MUST be idempotent.
- **FR-012**: System MUST provide bounded fan-out with a documented buffer size and drop policy so a slow subscriber cannot block writes.
- **FR-013**: System MUST preserve existing `DisplayEvent` and `audit.jsonl` channels unchanged (coexistence).
- **FR-014**: System MUST publish `doc.go` (100+ lines) in both `internal/domain/transcript/` and `internal/infrastructure/transcript/` documenting model, threat model, and versioning.
- **FR-015**: System MUST register `domain-transcript` and `infra-transcript` packages in `.go-arch-lint.yml` with dependency rules; existing plugin rules MUST remain unchanged.

### Non-Functional Requirements

- **NFR-001**: Test coverage for new packages MUST exceed 85% and `make test-race` MUST pass with zero data races, including concurrency tests for monotonic `Seq`, write-then-broadcast, idempotent `Close`, bounded replay, and slow-subscriber non-blocking.
- **NFR-002**: Transcript file permissions MUST be `0o600`; no secrets are masked by default but the design MUST NOT preclude future opt-in masking.
- **NFR-003**: Plugin impact MUST be zero — capture occurs at boundaries (router seam, agent seam), not inside plugin internals.
- **NFR-004**: A golden-transcript anti-divergence test MUST run in CI to detect unintended changes to the canonical stream shape.
- **NFR-005**: JSONL reader MUST tolerate unknown `EventType` values without failing (forward-compatible decode).

---

## Success Criteria

- **SC-001**: Every workflow run produces a valid `storage/transcripts/<run-id>.jsonl` file with strictly monotonic `Seq` and no torn lines, verified by a replay tool reconstructing the full step tree.
- **SC-002**: A single per-provider normalization layer absorbs Claude, Codex, Gemini, Copilot, and OpenAI HTTP divergence; no provider-specific parsing leaks outside `internal/infrastructure/transcript/`.
- **SC-003**: New transcript packages achieve >85% test coverage and `make test-race` passes with zero races.
- **SC-004**: Existing `DisplayEvent` and `audit.jsonl` outputs remain byte-identical for unchanged workflows (coexistence guarantee verified by golden tests).
- **SC-005**: A slow subscriber consuming at 10% of producer rate causes zero blocking of disk writes and zero deadlocks across a 10k-event run.
- **SC-006**: F107 can derive its `facade.Event` from `ExchangeEvent` via an identity-shaped mapping (same types, same `Seq`) without additional translation logic.

---

## Key Entities

| Entity | Description | Key Attributes |
|--------|-------------|----------------|
| `ExchangeEvent` | Envelope for every transcript line; the unit of replay and fan-out | `seq` (monotonic), `run_id`, `parent_run_id`, `child_run_id`, `type` (`EventType`), `path`, `iteration`, `timestamp`, `payload` |
| `EventType` | Closed vocabulary of envelope types covering step lifecycle and agent exchange | enumerated constants in `internal/domain/transcript/event.go` |
| `ContentBlock` | Typed block of agent exchange content carried within message events | `type` (`BlockType`), `fidelity` (`router` \| `agent_emitted`), block-specific fields |
| `BlockType` | Closed vocabulary: `text`, `thinking`, `tool_use`, `tool_result`, `command`, `stream` | enumerated constants in `internal/domain/transcript/content.go` |
| `Recorder` | Domain port for emission and subscription | `Record(ExchangeEvent) error`, `Subscribe() <-chan ExchangeEvent`, `Close() error` |
| `JSONLWriter` | Infrastructure adapter: atomic append, `O_APPEND`, `0o600`, mutex beyond `PIPE_BUF` | file path, mutex, file handle |
| `FanOut` | Infrastructure adapter: bounded subscriber buffer with documented drop policy | buffer size, subscribers, drop counter |

---

## Assumptions

- The `run_id` reused for the transcript filename is the same identifier already produced by `ExecutionService` for state and audit channels (verification item from the spec).
- The composed `system_prompt` and the resolved user prompt are both available at the agent seam at the moment of emission (verification item from the spec).
- `tools.Router.CallTool` is the single in-process seam through which all builtin and plugin tool invocations pass; capture there is exhaustive for non-stdio paths.
- `DisplayEvent` parsers per provider are the correct starting point for normalization and can be extended without breaking their current consumers.
- A bounded buffer with a drop-oldest (or drop-newest, to be verified) policy is acceptable for live fan-out; disk remains the durable contract.
- One Recorder per sub-run (separate file per `call_workflow`) is preferred over a shared multi-file Recorder (verification item from the spec).

---

## Metadata

- **Status**: backlog
- **Version**: v0.11.0
- **Priority**: high
- **Estimation**: XL

## Dependencies

- **Blocked by**: F103
- **Unblocks**: F107, F108

## Clarifications

_Section populated during clarify step with resolved ambiguities._

Open verification items carried from the source spec:

- Back-pressure semantics: drop policy direction (oldest vs newest) and buffer size default.
- Shape of the reused `run_id` (format, uniqueness guarantees) at the transcript seam.
- Availability of `system_prompt` and resolved prompt at the exact agent seam.
- Per-provider `tool_result` fidelity across CLI providers (Claude, Codex, Gemini, Copilot).
- Conversation mode: handling of intermediate turns within a single agent step.
- Construction of `path` / `iteration` inside `executeParallel`, `loop_executor`, `executeCallWorkflowStep`.
- `child_run_id` generation and `parent_run_id` propagation: one Recorder per sub-run vs. shared multi-file.
- Serialized result shape of custom steps.

## Notes

- **Position in roadmap**: 4/6, FOUNDATION. Source spec: `.agent/specs/2026-06-02-agent-exchange-transcript-design.md`. Research: `research-improvements.md` §5.
- **Cross-feature coordination with F107**: `facade.Event` derives from `ExchangeEvent` — mapping should ideally be identity (same types and `Seq`). Resolves the §1.4 "poll vs push" question by establishing push from a single source.
- **Cross-feature coordination with F108 Axis C**: `ToolContent` (F108) and `ContentBlock.tool_result` (here) share the same vocabulary and `fidelity` marker. The single source is defined here and reused there.
- **Why this position**: F103 first cleans Codex JSONL parity so normalization starts from a clean extraction; F104 and F105 stabilize MCP and ACP boundaries before the transcript instruments them.
- **Plugin impact**: none. Capture is at boundaries (router seam, agent seam), not inside plugin internals.
- **Coexistence guarantee**: `DisplayEvent` and `audit.jsonl` remain unchanged for the duration of F106; their evolution is a separate concern handled in F107+.
- **Threat model** (to be detailed in `doc.go`): file permissions `0o600`, no secrets masked by default, opt-in masking path preserved, atomic append against torn writes, monotonic `Seq` as the only ordering authority.
- **Anti-divergence guard**: a golden-transcript fixture replayed in CI detects unintended changes to envelope or block shape across refactors.


Entity	Description	Key Attributes
`ExchangeEvent`	Envelope for every transcript line; the unit of replay and fan-out	`seq` (monotonic), `run_id`, `parent_run_id`, `child_run_id`, `type` (`EventType`), `path`, `iteration`, `timestamp`, `payload`
`EventType`	Closed vocabulary of envelope types covering step lifecycle and agent exchange	enumerated constants in `internal/domain/transcript/event.go`
`ContentBlock`	Typed block of agent exchange content carried within message events	`type` (`BlockType`), `fidelity` (`router` \| `agent_emitted`), block-specific fields
`BlockType`	Closed vocabulary: `text`, `thinking`, `tool_use`, `tool_result`, `command`, `stream`	enumerated constants in `internal/domain/transcript/content.go`
`Recorder`	Domain port for emission and subscription	`Record(ExchangeEvent) error`, `Subscribe() <-chan ExchangeEvent`, `Close() error`
`JSONLWriter`	Infrastructure adapter: atomic append, `O_APPEND`, `0o600`, mutex beyond `PIPE_BUF`	file path, mutex, file handle
`FanOut`	Infrastructure adapter: bounded subscriber buffer with documented drop policy	buffer size, subscribers, drop counter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

F106: Canonical Agent Exchange Transcript (JSONL) #369

F106: Canonical Agent Exchange Transcript (JSONL)

Scope

In Scope

Out of Scope

Deferred

User Stories

US1: Replayable run transcript (P1 - Must Have)

US2: Per-provider normalized agent exchange (P1 - Must Have)

US3: Tool capture at the router seam (P2 - Should Have)

US4: Bounded live fan-out (P2 - Should Have)

US5: Sub-workflow linkage (P3 - Nice to Have)

Edge Cases

Requirements

Functional Requirements

Non-Functional Requirements

Success Criteria

Key Entities

Assumptions

Metadata

Dependencies

Clarifications

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Item	Rationale	Follow-up
Secret masking by default	Requires policy design; opt-in path sufficient for foundation	future
Full stdio fidelity for `awf mcp-serve`	Out-of-process instrumentation deferred; `fidelity:"agent_emitted"` marker documents the gap honestly	future
Removal/merge of `audit.jsonl` and `DisplayEvent`	Coexistence guarantees zero regression while transcript stabilizes	F107+
Facade `Event` derivation and consumer migration	Consumption layer is a distinct concern	F107
Tool content hardening (uniform `ToolContent` surface)	Vocabulary defined here, hardened there	F108 Axis C

Uh oh!

F106: Canonical Agent Exchange Transcript (JSONL) #369

Description

F106: Canonical Agent Exchange Transcript (JSONL)

Scope

In Scope

Out of Scope

Deferred

User Stories

US1: Replayable run transcript (P1 - Must Have)

US2: Per-provider normalized agent exchange (P1 - Must Have)

US3: Tool capture at the router seam (P2 - Should Have)

US4: Bounded live fan-out (P2 - Should Have)

US5: Sub-workflow linkage (P3 - Nice to Have)

Edge Cases

Requirements

Functional Requirements

Non-Functional Requirements

Success Criteria

Key Entities

Assumptions

Metadata

Dependencies

Clarifications

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions