Skip to content

cn8: workflow steps emit structured beads records as canonical output (graph as source of truth)#4

Open
richardkiene wants to merge 43 commits into
mainfrom
feat/cn8-structured-records
Open

cn8: workflow steps emit structured beads records as canonical output (graph as source of truth)#4
richardkiene wants to merge 43 commits into
mainfrom
feat/cn8-structured-records

Conversation

@richardkiene
Copy link
Copy Markdown
Contributor

Epic millworks-cn8 (ADR-0009 D44). A workflow step's canonical output becomes first-class structured beads records (decision, risk, requirement, intent, task, healing — each carrying its prose in its description), not a single prose blob in STEP notes. The beads graph is the source of truth for "what was decided / what happened / what needs doing". Cross-surface (pi + Claude), lockstep. Builds on c30 (#3) and inc5 run-tracking/settle.

What's in this PR (11 beads)

Shared core

  • tools/millworks-emit — the sole least-privilege beads write-path. emit auto-stamps step:/wfrun: labels + a discovered-from link from MILLWORKS_STEP_ID/WFRUN_ID; complete --summary sets STEP notes + the self-report:complete marker. (millworks-thz)
  • persona-picker parses an emits: frontmatter field and surfaces it in JSON (millworks-40a); context-pack-assembler expands a scoped STEP → its emitted records, notes-only when none — a graceful c30 superset (millworks-2qe); requirement registered as a custom type (millworks-6q0); millworks:beads skill documents the emit mechanics (millworks-clb); personas declare conservative emits contracts + the 5 emitting ones rewritten to records (millworks-kma).

Both surfaces (pi extensions/workflow-runner, Claude surfaces/claude/mcp-server), lockstep

  • Dispatch injects step/wfrun env + emit access + a universal completion instruction (every step must millworks-emit complete to settle; emit-types requirement only when emits is non-empty). (millworks-ypd / millworks-d8q)
  • Settle flipped to beads authority: the self-report:complete marker is the settle trigger (transcript/pane demoted to a health signal); the runtime validates the emits contract and is the sole writer of the outcome:success close (validate-then-commit; a contract violation kills the pane and re-dispatches; the inc5 transcript→notes write is removed — notes come from the agent). (millworks-q2h / millworks-kaa)
  • Recovery re-resolves each recovered step's emits and re-validates a marker-seen step (no false auto-pass; a persona that can't be re-resolved fails the run). (millworks-1i7)

Why this matters

Solves the fragile-settle problem: a user interrupting a subagent (ending its transcript turn) no longer reads as "settled" — settlement is now a durable, content-addressed outcome in beads, definitive across crashes and interruptions.

Verification

  • Claude 328 tests, pi 186 tests, Rust crates green; unit + gated real-bd smokes on both surfaces.
  • A cross-surface reconciliation review caught and fixed a Claude miss (marker loop built but unwired) and several lockstep divergences; the two surfaces are byte-lockstep on the completion instruction and behaviorally lockstep on the settle/recovery state machines.

Deferred (tracked beads, not blocking)

  • millworks-26e — live end-to-end + parity verification (owner-driven plugin rebuild + driven run with kill/recover). Pending.
  • millworks-5wz — pi emit-scoping hardening (pi --tools has no per-command scoping, so emitting personas get full bash for now — Decision A; structural scoping tracked).
  • millworks-qaq — direct-persona: steps skip the emits contract (a persona:-pinned step bypasses the picker → emits:[]).

Pre-existing (NOT cn8 regressions, will show in CI)

  • millworks-rrp — 4 context-pack-assembler unit tests failing on main.
  • millworks-7s4 — pi vitest picks up ambient.d.ts as a test file (false failure).

Design in bd show millworks-cn8 --design + ADR-0009 D44.

Record the resolved design for millworks-cn8 (steps emit structured beads
records as canonical output; graph as source of truth) as ADR-0009 D44, and
the 11 child planning beads (millworks-thz/40a/clb/2qe/kma/ypd/d8q/q2h/kaa/1i7/26e)
exported to .beads/issues.jsonl. Design canonical in 'bd show millworks-cn8 --design'.
…llworks-clb)

Adds "Emitting structured output (workflow steps)" to the shared
millworks:beads skill (content/skills/beads/SKILL.md) — DRY mechanics
live once here, referenced by every emitting persona (ADR-0009 D44 M-4).

Covers: prose-in-description principle (D-c); millworks-emit emit and
complete subcommand interfaces verbatim; auto-stamping of step:/wfrun:/
discovered-from by the CLI (agents must not hand-stamp); optional --link
for domain links between emitted records; the self-report:complete
terminal marker as the final act (D-g); emits contract concept (D-a/D-b);
worked requirements-analyst example; and a "What NOT to do" guard list.
Add `emits: [<type>...]` frontmatter support to the shared persona loader
so both runtimes receive a persona's output contract (ADR-0009 D44 D-a).

Changes:
- `RawFrontmatter`: add `emits: Option<serde_yaml::Value>` (mirrors tools)
- `Persona`: add `emits: Vec<String>` (normalized; absent → empty vec)
- `PickResult`: add `emits: Vec<String>` (surfaced in picker JSON output)
- `PickerError::MalformedEmits`: fail-fast for non-string/non-list emits
- `normalize_string_or_list()`: shared helper (DRY) — string or list →
  Vec<String>; absent → []; malformed → MalformedEmits error
- All PickResult construction sites in picker.rs carry emits through
- 6 new unit tests (list, string, absent, integer, mapping, PickResult)
…orks-thz)

New Rust crate `tools/millworks-emit` — the sole beads write-path granted to
Millworks workflow subagents (least-privilege; no arbitrary shell). Realizes
ADR-0009 D44 decisions M-2, M-3, M-5, D-d, D-g.

CLI surface (canonical):
  millworks-emit emit --type <T> --title <S> --description <S> [--link <type>:<id>…]
    Creates a bd record, stamps step:<id>/wfrun:<id> labels and a
    discovered-from link (FROM new record TO STEP). Prints new id to stdout.
  millworks-emit complete --summary <S>
    Sets STEP notes to <S> then adds self-report:complete label (in that order).
Both subcommands fail fast (non-zero, clear stderr) if MILLWORKS_STEP_ID or
MILLWORKS_WFRUN_ID is unset/empty.

Design:
  - bd I/O isolated in `runner::BdRunner` trait + `RealBdRunner` impl so
    argv construction in `commands.rs` is unit-testable without bd (mirrors
    assembler's run_bd_show seam pattern).
  - `parse_created_id` handles mixed warning+JSON stdout from bd create --json.
  - `tools/millworks/src/lib.rs`: added "millworks-emit" to MILLWORKS_BINARIES
    so millworks setup and build-claude both provision it (install.sh + bin/ symlink
    in the Claude plugin — same wiring as the other shared-core CLIs).

Tests: 33 unit tests (argv construction, env fail-fast, id parsing) + 4 real-bd
smoke tests (gated MILLWORKS_SMOKE=1): emit attribution round-trip verifies
step:/wfrun: labels and discovered-from link; complete verifies notes + label.
…ote)

Review polish for millworks-clb:
- --link synopsis metavar uses TYPE:TARGET to match millworks-emit's
  clap value_name.
- Note that the requirement record type is registered by the cn8
  rollout (millworks-6q0 adds the table row), so a reader isn't
  confused it's missing from the 9-types table.
- Make normalize_string_or_list genuinely reusable (DRY): MalformedEmits
  now carries `field: String`, included in the Display message; call site
  passes "emits". Removed the false comment and the `let _ = field;` no-op.
- Malformed-emits unit tests now assert the error names the `emits` field.
- New unit test: explicit `emits: []` (YAML empty sequence) -> empty Vec.
- New unit test: list with a non-string element (emits: [requirement, 42])
  -> fail-fast MalformedEmits.
- New integration test: run the binary against a fixture persona with a
  non-empty emits and assert the JSON output's `emits` array values.
- parse_created_id: replace hand-rolled brace counting (which corrupted on a
  literal `}` inside a title/description, e.g. `closes {issue}`) with a
  serde_json streaming parse from the first `{` — respects string contents.
  Adds two unit tests (brace-in-title, warning-prefix + brace-in-string).
- env::require_env: return the TRIMMED value so a padded env can't leak
  whitespace into a `step:`/`wfrun:` label. Adds a trim test.
- EmitArgs: drop the unused `extra_links` field (it's applied post-create via
  emit_argv, not a create input) — removes an unnecessary clone in main.rs and
  makes the struct cohesive.
- Move the gated real-bd smokes from `src/smoke_tests.rs` (a pub module that
  compiled into the release lib) to `tests/smoke.rs` (an integration-test crate,
  test-only) — the idiom used by context-pack-assembler. Resolves the binary via
  CARGO_BIN_EXE_millworks-emit instead of path-guessing, and asserts `bd dep
  list` exit success so a broken dep-list can't be silently ignored.
- Fix two clippy doc-list warnings in lib.rs.

All green: 27 lib + 5 bin unit tests, 4 real-bd smokes (MILLWORKS_SMOKE=1),
clippy clean.
Add 'requirement' to the custom beads types list (intent,risk,healing,
wfrun,step → +requirement) so cn8 requirements-analyst personas can emit
first-class queryable requirement records rather than modeling them as
task/feature.

- recipes/init-beads.sh: CUSTOM_TYPES gains requirement; count comments
  updated (5→6 custom, 9→10 total)
- docs/beads-mapping.md: Requirement row in summary table; full per-type
  detail section added before WFRUN section
- docs/adr/0003-beads-schema-mapping.md: D16 updated to 10 types/6 custom;
  REQUIREMENT row in domain table; bd config set example and Consequence
  paragraph updated for cn8
- content/skills/beads/SKILL.md: "The 9 record types" heading → 10; new
  Requirement row in Domain records table; error-recovery snippet updated

Verified: bd types lists 'requirement'; bd create -t requirement succeeds
in a fresh scratch workspace.
…-2qe)

When the context-pack-assembler renders a scoped STEP, after the notes
summary it now queries bd list --label step:<id> --json, gathers the
emitted records, and renders each as type+id+title+description under an
"#### Emitted Records" sub-heading (D44 D-e).

Key mechanics:
- run_bd_list_by_label: isolated bd I/O seam for the label query
- render_emitted_records: pure fn over raw JSON list (unit-testable
  without bd); skips malformed records (fail-fast per record), returns ""
  for zero records
- summarize_bd_record_with_emits: pure fn composing the step heading +
  notes + emitted-records block; "" emits block => output identical to
  c30 (superset/graceful-degrade rule)
- summarize_bd_record delegates to summarize_bd_record_with_emits("")
  so all existing c30 tests pin unchanged behavior

New tests (all pass):
- render_emitted_records_lists_type_id_description_per_record
- render_emitted_records_empty_list_returns_empty_string
- render_emitted_records_tolerates_missing_optional_fields
- step_with_zero_emitted_records_renders_notes_only_identical_to_c30
- step_with_emitted_records_appends_them_after_notes
- smoke_step_with_emitted_records_surfaces_in_bundle (MILLWORKS_SMOKE=1)

Pre-existing rrp failures (bare_task_only, task_with_persona,
non_skill_dir_is_ignored, pruning_occurs_when_over_budget) unchanged.
Declare emits contracts in all 20 content/agents/*.md personas and rewrite
Output sections for the 5 roles with non-empty contracts.

Emits mapping applied:
  intake-interviewer    -> emits: [intent]
  requirements-analyst  -> emits: [requirement]
  plan-reviewer         -> emits: [decision]
  architect             -> emits: [decision]
  plan-writer           -> emits: [task]
  all others (15)       -> emits: []

For the 5 non-empty emits personas the Output section is rewritten so that
canonical output is structured beads records emitted via millworks-emit, with
full prose in each record's --description field. Each rewrite:
- instructs emit per unit of substance (one intent / requirement / decision / task)
- uses --link for domain links between emitted records
- ends with millworks-emit complete --summary as the terminal act
- cross-references the millworks:beads skill for mechanics (DRY)
- preserves the persona's posture and quality voice

For the 15 emits:[] personas only the frontmatter field is added; no body
changes (clean audits/reviews find nothing and must still settle).

Verified: cargo test -p persona-picker all 53 tests pass; manual pick check
confirms requirements-analyst -> emits:[requirement], all others as mapped.
…truction at dispatch (millworks-ypd)

D44 M-1: inject MILLWORKS_STEP_ID/MILLWORKS_WFRUN_ID into the spawned pane's
environment via tmux -e so millworks-emit can stamp provenance without the
subagent knowing its own ids.

D44 M-2: always grant Bash(millworks-emit:*) in allowedTools for every
workflow-step subagent (least-privilege scoped emit path); mapStepTools now
returns string[] (never undefined) with the emit tool always appended.

D44 M-4: generate the output-contract instruction from the dispatched persona's
emits (single source: frontmatter → picker → drive loop → dispatch args).
buildContractInstruction(emits) returns undefined for empty emits (uniform rule:
emits: [] → no instruction, cn8 a clean superset of c30). The real impl in
index.ts appends the instruction to the assembled bundle file before spawning.

Widen resolvePersonaViaCli to return { file, emits } (was: string | null).
The picker output already contained emits (PickResult.emits); the TypeScript
cast at workflow-cli.ts:100 is now widened to include it. Direct persona:
references return emits: [] (no picker invoked).

Tests: 8 new unit tests (TDD: watched each fail before implementing);
276 total passing (up from 268); typecheck clean.
…st (millworks-d8q)

D44 M-1/M-2/M-4 on the pi surface (lockstep mirror of Claude ypd):

- buildWrapperEnvExports: injects MILLWORKS_STEP_ID / MILLWORKS_WFRUN_ID as
  export lines in the subagent wrapper.sh (single-quoted, process-env durable)
- addEmitToolAccess: ensures 'bash' is in the pi --tools allowlist when the
  persona declares a non-empty emits contract (least-privilege emit path; bash
  is the closest pi analog to Claude Code's Bash(millworks-emit:*))
- buildContractInstruction: generates the canonical output-contract instruction
  from the persona emits list (null for empty emits — degrades to c30); appended
  to the assembler bundle, not a separate flag
- resolveRoleToPersona: widened from Promise<string> to Promise<PersonaPickResult>
  = { file, emits } to carry the persona-picker emits output through to dispatch
- dispatchStep: wires all three mechanics; personaEmits drives the conditional
  tool-access and instruction injection

22 new unit tests (buildContractInstruction, addEmitToolAccess, buildWrapperEnvExports).
150 pass total (was 128); 4 pre-existing MILLWORKS_SMOKE smokes skipped.
…lowedTools to string[] (millworks-ypd review)
…d (millworks-d8q)

Review fixes:
- addEmitToolAccess doc: stop claiming a scoped-PATH security property that
  isn't implemented. The wrapper inherits full PATH/rc, so the bash grant is
  full bash; the contract instruction is a behavioral nudge only. Structural
  per-command scoping is tracked as hardening bead millworks-5wz.
- dispatchStep env injection: replace `state.stepRecords[step.id] ?? ""` silent
  fallback with a hard throw — an empty MILLWORKS_STEP_ID would make
  millworks-emit mis-attribute/fail silently (project fail-fast rule).

150 tests pass (pre-existing ambient.d.ts glob false-failure tracked as millworks-7s4).
1. plan-writer: phase->phase ordering link example used the wrong link type
   (until is task->decision). Changed to blocks:<phase-task-id> and fixed the
   comment to describe phase ordering, not a decision gate.

2. decompile-synthesizer: converted its record-writing from raw bd create to
   millworks-emit emit (risk + decision). bd create bypasses the auto-stamp of
   step:/wfrun:/discovered-from, leaving records unattributed and invisible to
   assembler expansion. millworks-emit is the only granted, attributed write
   path. provenance:decompiled (a label, not supported by the emit CLI surface)
   folded into the decision --description prose. bd remember stays as a direct
   bd call (free-text memory, not a step-output record). No required-records
   language added — persona remains emits:[].

3. plan-reviewer: completion-summary template used man-page optional-bracket
   notation [, <N> risk] that an LLM might emit literally; rewritten as plain
   prose.

Re-verified: cargo test -p persona-picker (53 + 8 tests) green; the three
edited personas parse with expected emits (task / decision / []).
…lworks-2qe review)

Addresses fail-fast review findings (project rule: never silence errors):

1. run_bd_show no longer swallows a bd list COMMAND failure into "zero
   records" (notes-only). run_bd_list_by_label's Err now propagates via `?`.
   A command that SUCCEEDS but lists zero records still degrades silently to
   notes-only (legitimate D-e graceful-degrade) — distinguished from a real
   command failure.

2. render_emitted_records now returns Result<String> and FAILS FAST on
   malformed bd output (non-empty non-JSON, non-array, or a record missing a
   required `id`/`title` field) via new AssemblerError::MalformedRecord,
   instead of silently dropping bad records. A valid empty array `[]` and an
   empty/blank input string remain the legitimate Ok("") degrade path.

3. summarize_bd_record_with_emits now returns Result<Option<String>> so the
   malformed-record error propagates through the seam. summarize_bd_record
   keeps its Option surface for the c30 tests (empty emits can't be malformed).

4. Fixed the contradictory doc comment that claimed "skips malformed ones,
   keeps valid ones" — now describes the actual fail-fast behavior.

5. main.rs maps MalformedRecord to exit code 2 (bad-data class).

New tests (all pass):
- render_emitted_records_fails_fast_on_malformed_json
- render_emitted_records_fails_fast_on_record_missing_required_field
- summarize_propagates_malformed_emits_error
- render_emitted_records_empty_list_returns_empty_string (now asserts Ok(""))

Test results: 31 pass, 4 pre-existing rrp failures unchanged, 1 smoke
(MILLWORKS_SMOKE=1) ignored by default and passing against live bd.
… runtime closes (millworks-kaa)

STATE MACHINE (lockstep with Claude q2h):
- marker=YES → validate emits → SETTLED (runtime writes outcome:success)
- marker=YES + contract unmet → EmitsContractError → retry path (no false success)
- marker=NO + pane dead → crashed → existing retry/fail path
- marker=NO + pane alive → still running (interruption is not a failure)
- timeout + no marker → TIMEOUT → retry path

CHANGES:
- waitForSettle: reworked to poll bdHasMarker (beads is settle AUTHORITY);
  transcript/done-file/pane demote to HEALTH inputs. Injectable WaitForSettleDeps
  for deterministic unit testing (DI seam per millworks-n0f intent).
- validateEmitsContract: validate-then-commit before any outcome:success write.
  Throws EmitsContractError for missing required types (fail-fast, never silent).
- markStepSettled: calls validateEmitsContract BEFORE writing outcome:success
  (the sole-writer invariant; agent never writes terminal state — D44 D-g).
- stepProduced removed from processReadyStep: agent's `millworks-emit complete`
  already sets STEP notes; runtime must not overwrite them (inc5 notes-write removed).
- buildContractInstruction: ALWAYS returns the completion instruction for ALL steps
  (universal-completion); emit-types requirement APPENDED only when emits non-empty.
  COMPLETION_INSTRUCTION constant exported for lockstep verification.
- addEmitToolAccess: bash granted for ALL steps (not conditioned on emits.length).
  Every step needs millworks-emit complete access (the universal settle signal).
- bdHasMarker + bdCountEmittedByType: new bd helpers for marker poll and validation.
- drainSessionFile: extracted from old waitForSettle for progress/health use.
- StepResult.personaEmits: new field threads persona emits from dispatchStep to
  markStepSettled for post-settle validation.
- adoptStep: updated to use new waitForSettle + bdHasMarker; cwd added to signature.

PI-SPECIFIC vs q2h:
- bash granted (not scoped Bash(millworks-emit:*)) per accepted d8q decision (5wz
  tracks scoping hardening). Recovery paths pass personaEmits:[] (1i7 follow-up).

TESTS: 24 new unit tests (COMPLETION_INSTRUCTION lockstep, buildContractInstruction
universal-completion x5, waitForSettle state matrix x7, validateEmitsContract x3,
2 gated real-bd smoke tests: settle-by-marker round-trip + fail-fast on unmet contract).
Total: 174 pass, 8 skipped (4 new gated smokes). Only ambient.d.ts pre-existing fails.
…emits → runtime closes

State machine (beads-authoritative, D44 D-f/D-g):
  marker=YES + emits met   → runtime writes outcome:success (validate-then-commit)
  marker=YES + type missing → contract-violation → step failure (no false success ever written)
  marker=NO + pane dead    → crashed → retry/re-dispatch
  marker=NO + pane alive   → still running (interruption is NOT a settle)
  elapsed >= timeout       → step failure (backstop for never-signaling agent)

Key changes:
- settle.ts: beads-authoritative state machine (pollSettleMarker, waitForMarker) with
  full DI seam; pane/transcript demotes to HEALTH input only
- workflow.ts: buildContractInstruction always returns completion instruction (universal,
  not conditioned on emits); emits types appended only when non-empty; acceptStep validates
  emits contract BEFORE writing outcome:success (validate-then-commit); inc5 notes-write
  removed (agent's millworks-emit complete --summary sets notes, runtime does NOT overwrite)
- workflow.ts: StepResult gains emits:[] field (for validate-then-commit routing);
  rebuildRunState, recovery paths, and tests updated to include emits:[]
- bd.ts: validateStepEmits added (bd list --label step:<id> --type T for each required type)
- index.ts: validateEmits wired into controllerDeps via validateStepEmits
- settle.marker.test.ts + workflow.settle.test.ts: unit coverage of all 5 state transitions
- settle.marker.smoke.test.ts: gated real-bd round-trip (MILLWORKS_SMOKE=1)
- Completion instruction string byte-matches pi mirror (millworks-kaa) exactly

Fixed gaps left by prior agent (tsc --noEmit failures):
  server.test.ts: 2x StepResult object literals missing emits field
  workflow.substitute.test.ts: settled() helper missing emits field
  workflow.recovery.test.ts: expected StepResult missing emits field
  workflow.ts:rebuildRunState: constructed StepResult missing emits field
… kaa

1. [CRITICAL] Wire waitForMarker into production: buildController.dispatch
   now uses waitForMarker (beads-authoritative) for workflow steps (stepBeadsId
   provided). The settle AUTHORITY is the self-report:complete label polled from
   beads; pane/transcript demotes to health. Ad-hoc dispatch_subagent (no
   stepBeadsId) keeps transcript-based waitForSettle — no regression.
   - Add bdHasMarker + bdReadNotes to bd.ts
   - Import waitForMarker + BeadsSettleState in index.ts
   - Thread stepBeadsId + stepEmits through WorkflowDeps.dispatch args
   - Override deps.wait per-dispatch with marker-poll lambda for workflow steps
   - Read agent notes from beads after marker-settle resolves

2. [CRITICAL] Remove inc5 notes-write: stepProduced no longer called from
   dispatchStepWithRetry or processAdoptedOutcome. Notes come from the agent's
   millworks-emit complete --summary call. Update all tests accordingly.

3. [IMPORTANT] Align buildContractInstruction to kaa byte-for-byte:
   - Add COMPLETION_INSTRUCTION constant (exported, lockstep with kaa)
   - Reorder: completion instruction FIRST, emit-types appended after
   - New emit-types wording: "MUST also emit..." + env trailer
   - Update workflow.settle.test.ts and workflow.drive.test.ts assertions

4. [IMPORTANT] Fix timeout-before-marker ordering in pollSettleMarker:
   elapsed >= timeout is now checked BEFORE the marker (matching kaa's
   waitForSettle). Add test proving timeout wins over a present marker.

5. [MINOR] Remove dead paneCheckEvery field from WaitMarkerDeps (the loop
   body was an empty comment; remove unused field + loop counter variable).
   Update settle.marker.test.ts to drop the field from all test objects.

6. [MINOR] Route validateEmits bd-errors to step-failure path at all three
   acceptStep call sites (driveWorkflow, applyGateAndResume, processAdoptedOutcome)
   so a transient bd throw never propagates uncaught to the MCP caller.

Tests: 310 passed (up from pre-existing 300), 13 skipped. Known failures:
index.integration.test.ts (esbuild not in worktree) + ambient.d.ts (no suite).
…lockstep

Final lockstep divergence on the settle path: a contract violation (marker
present, but a required emits type has 0 records) was mapped to status `errored`
→ markStepFailed (PERMANENT fail, no retry). kaa + the D44 design route a
contract violation to the EXISTING RETRY PATH (re-dispatch up to max-retries,
then outcome:failed). Fix:

- Add a distinct `contract-violation` DispatchOutcome status (workflow.ts) so
  ONLY contract violations get kill-then-retry; a genuine `errored` (pane alive,
  no marker, wait failed) keeps its non-retryable behavior.
- Add WorkflowDeps.killStepPane({wfrunBeadsId, stepId}) — kills the lingering
  subagent pane before a re-dispatch so it can't double-spawn (mirrors kaa's
  killOrphanedPanes-before-retry). Production impl (index.ts) looks up the
  tagged SubagentRecord and calls realTmux.kill (idempotent).
- dispatchStepWithRetry: on `contract-violation`, killStepPane then retryOrFail
  (the retryable path) instead of markStepFailed. validate-then-commit holds —
  no outcome:success is ever written for a violation.
- index.ts marker-wait: capture failed-contract in a closure flag and return an
  `exited` sentinel from the wait (no throw → not mis-recorded as `errored`),
  then override the DispatchOutcome to `contract-violation` after dispatchSubagent
  returns. This distinguishes it from genuine errors through dispatchSubagent's
  fixed status vocabulary.
- Add killStepPane to all 8 WorkflowDeps test fakes.
- New tests (workflow.settle.test.ts): contract-violation re-dispatches up to
  max-retries then succeeds (proves retryable + pane killed before retry); and
  exhausts retries → outcome:failed with pane killed each attempt and NO false
  success (validate-then-commit invariant preserved).

Tests: 312 passed (up from 310), 13 skipped. Known failures only:
index.integration.test.ts (esbuild not in worktree) + ambient.d.ts (no suite).
…(millworks-1i7)

Fixes the gap left by kaa: recovered steps were passing `personaEmits: []` through
all three recovery paths (gate-after, reconcile/adoptStep, pending-validation) which
caused validate-then-commit to auto-pass for any step restarted after a crash.

Recovery now RE-RESOLVES emits via the same resolveRoleToPersona path that dispatch
uses, and re-validates the contract before writing outcome:success.

State-machine additions (pi surface, lockstep with Claude 1i7):
- `BeadsStepRecovery.hasSelfReportComplete`: detected from bd labels in
  recoveryViewFromRecords; true when STEP open + self-report:complete (crash in
  the validation window).
- `ResumePlan.pending-validation`: new plan kind — STEP open + marker present.
  Takes priority over reconcile (agent finished; pane may be gone). driveRun
  re-resolves emits, then passes a StepResult to processReadyStep which calls
  markStepSettled → validateEmitsContract → outcome:success (or fails/retries).
- `resolveStepEmits()`: helper that mirrors dispatchStep's resolution path;
  throws UnrecoverableRunError when the persona/role cannot be resolved (fail
  the run, same transient-vs-malformed split as inc5).
- `adoptStep()`: now calls resolveStepEmits before entering waitForSettle so
  the re-resolved emits flow into the returned StepResult. Removes the 1i7
  follow-up comment (gap is now closed).
- driveRun gate-after path: also re-resolves emits instead of passing [].

Tests (in-source vitest):
- recoveryViewFromRecords: 3 new tests pinning hasSelfReportComplete detection.
- planResume: 3 new tests — pending-validation produced for open+marker; priority
  over plain reconcile; false positive excluded (marker absent → reconcile).
- All 174 pre-existing tests still pass (186 total, +12 new).
…ude surface)

Extends inc5 beads-authoritative recovery with the millworks-1i7 contract:
a STEP that carried `self-report:complete` but was not yet closed (crash in
the validate-then-close window) is now re-validated on recovery — never
auto-passed via emits:[]. A running step with a live pane (no marker) now
carries re-resolved emits into the adopted waitForMarker.

Recovery state machine (lockstep with pi):
  - STEP closed outcome:success/failed → terminal (unchanged, inc5).
  - STEP open + self-report:complete → PENDING VALIDATION: re-resolve
    persona emits via deps.resolvePersona, validateEmits, acceptStep
    (success) or markStepFailed (contract violation). No pane adoption.
  - STEP open + no marker + live pane → adopt, carrying re-resolved emits
    into waitForMarker (not []). Re-dispatch if pane gone.
  - After-gate recovery: reconstructGate re-resolves persona emits so
    gate_approve validates the real contract (not auto-passing).
FAIL-FAST: unresolvable persona/emits propagates as a transient error.

Changes:
  - workflow.ts: BeadsStepRecovery gains markerPresent; RunState gains
    pendingValidationStepIds + pendingValidationOutputs; rebuildRunState
    populates both; resumeRecoveredRun handles all three recovery shapes;
    reconstructGate is now async + re-resolves emits; adoptStep interface
    adds stepEmits.
  - run-tracker.ts: loadRecovery sets markerPresent from SELF_REPORT_COMPLETE.
  - run-tracker.testing.ts: recoveryView() includes markerPresent: false.
  - index.ts: adoptStep uses waitForMarker with re-resolved stepEmits.
  - Tests: 11 new unit + controller-level tests; inc5 recovery tests extended.
…nsient retry), lockstep with pi

Lockstep divergence fix on the Claude side. When recovery re-resolves a
recovered step's persona emits and resolvePersona FAILS, the prior code let
the error propagate as transient — effectively retried next session. A
deterministic resolution failure (the role no longer resolves) would strand
the run open forever, worse than a loud fail. pi's resolveStepEmits throws
UnrecoverableRunError; D44's fail-fast intent is "fail the run, don't auto-pass".

Changes (workflow.ts):
  - New resolveRecoveredEmits helper: wraps deps.resolvePersona; any failure
    throws UnrecoverableRunError (the inc5 malformed-recovery path), carrying
    the original cause + step id. Used at all three re-resolution sites.
  - resumeRecoveredRun (pending-validation + no-marker adopt paths): use the
    helper. A persona failure now fails the run (runDrive closes the WFRUN
    failed), not a silent retry.
  - reconstructGate (after-gate): use the helper; a paused:after:<stepId> for
    a step absent from the re-parsed workflow is also UnrecoverableRunError.
  - doRecover: the gate-pause branch now catches UnrecoverableRunError from
    reconstructGate, closes the WFRUN failed, and starts clean — currentRun is
    armed ONLY after a successful reconstruction (no half-built armed run). A
    transient bd/CLI blip still propagates (run left open, retryable).

Tests:
  - resume: persona-unresolvable on a no-marker step and on a pending-validation
    step both reject with UnrecoverableRunError (was: generic throw).
  - controller-recovery: mid-step (no-marker) and after-gate recovered steps
    whose persona can't be re-resolved close the WFRUN outcome:failed (not
    left open); the controller stays clean and a fresh run can start.
  - substitute test: RunState literal gains the two new recovery fields.

npm test: 326 passed, 13 skipped (only the pre-existing esbuild integration
failure). tsc --noEmit: clean.
…illworks-6q0)

/millworks:init registers beads custom types via the Rust `millworks init` binary
(init.rs), NOT recipes/init-beads.sh. 6q0 updated the recipe but missed init.rs:129,
so `requirement` never registered at workflow runtime — caught during cn8 live
verification (26e). Extract the list to CUSTOM_BEADS_TYPES (incl. requirement),
add a regression test covering this binary path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant