cn8: workflow steps emit structured beads records as canonical output (graph as source of truth)#4
Open
richardkiene wants to merge 43 commits into
Open
cn8: workflow steps emit structured beads records as canonical output (graph as source of truth)#4richardkiene wants to merge 43 commits into
richardkiene wants to merge 43 commits into
Conversation
Record the resolved design for millworks-cn8 (steps emit structured beads records as canonical output; graph as source of truth) as ADR-0009 D44, and the 11 child planning beads (millworks-thz/40a/clb/2qe/kma/ypd/d8q/q2h/kaa/1i7/26e) exported to .beads/issues.jsonl. Design canonical in 'bd show millworks-cn8 --design'.
…llworks-clb) Adds "Emitting structured output (workflow steps)" to the shared millworks:beads skill (content/skills/beads/SKILL.md) — DRY mechanics live once here, referenced by every emitting persona (ADR-0009 D44 M-4). Covers: prose-in-description principle (D-c); millworks-emit emit and complete subcommand interfaces verbatim; auto-stamping of step:/wfrun:/ discovered-from by the CLI (agents must not hand-stamp); optional --link for domain links between emitted records; the self-report:complete terminal marker as the final act (D-g); emits contract concept (D-a/D-b); worked requirements-analyst example; and a "What NOT to do" guard list.
Add `emits: [<type>...]` frontmatter support to the shared persona loader so both runtimes receive a persona's output contract (ADR-0009 D44 D-a). Changes: - `RawFrontmatter`: add `emits: Option<serde_yaml::Value>` (mirrors tools) - `Persona`: add `emits: Vec<String>` (normalized; absent → empty vec) - `PickResult`: add `emits: Vec<String>` (surfaced in picker JSON output) - `PickerError::MalformedEmits`: fail-fast for non-string/non-list emits - `normalize_string_or_list()`: shared helper (DRY) — string or list → Vec<String>; absent → []; malformed → MalformedEmits error - All PickResult construction sites in picker.rs carry emits through - 6 new unit tests (list, string, absent, integer, mapping, PickResult)
…orks-thz)
New Rust crate `tools/millworks-emit` — the sole beads write-path granted to
Millworks workflow subagents (least-privilege; no arbitrary shell). Realizes
ADR-0009 D44 decisions M-2, M-3, M-5, D-d, D-g.
CLI surface (canonical):
millworks-emit emit --type <T> --title <S> --description <S> [--link <type>:<id>…]
Creates a bd record, stamps step:<id>/wfrun:<id> labels and a
discovered-from link (FROM new record TO STEP). Prints new id to stdout.
millworks-emit complete --summary <S>
Sets STEP notes to <S> then adds self-report:complete label (in that order).
Both subcommands fail fast (non-zero, clear stderr) if MILLWORKS_STEP_ID or
MILLWORKS_WFRUN_ID is unset/empty.
Design:
- bd I/O isolated in `runner::BdRunner` trait + `RealBdRunner` impl so
argv construction in `commands.rs` is unit-testable without bd (mirrors
assembler's run_bd_show seam pattern).
- `parse_created_id` handles mixed warning+JSON stdout from bd create --json.
- `tools/millworks/src/lib.rs`: added "millworks-emit" to MILLWORKS_BINARIES
so millworks setup and build-claude both provision it (install.sh + bin/ symlink
in the Claude plugin — same wiring as the other shared-core CLIs).
Tests: 33 unit tests (argv construction, env fail-fast, id parsing) + 4 real-bd
smoke tests (gated MILLWORKS_SMOKE=1): emit attribution round-trip verifies
step:/wfrun: labels and discovered-from link; complete verifies notes + label.
…ote) Review polish for millworks-clb: - --link synopsis metavar uses TYPE:TARGET to match millworks-emit's clap value_name. - Note that the requirement record type is registered by the cn8 rollout (millworks-6q0 adds the table row), so a reader isn't confused it's missing from the 9-types table.
- Make normalize_string_or_list genuinely reusable (DRY): MalformedEmits now carries `field: String`, included in the Display message; call site passes "emits". Removed the false comment and the `let _ = field;` no-op. - Malformed-emits unit tests now assert the error names the `emits` field. - New unit test: explicit `emits: []` (YAML empty sequence) -> empty Vec. - New unit test: list with a non-string element (emits: [requirement, 42]) -> fail-fast MalformedEmits. - New integration test: run the binary against a fixture persona with a non-empty emits and assert the JSON output's `emits` array values.
- parse_created_id: replace hand-rolled brace counting (which corrupted on a
literal `}` inside a title/description, e.g. `closes {issue}`) with a
serde_json streaming parse from the first `{` — respects string contents.
Adds two unit tests (brace-in-title, warning-prefix + brace-in-string).
- env::require_env: return the TRIMMED value so a padded env can't leak
whitespace into a `step:`/`wfrun:` label. Adds a trim test.
- EmitArgs: drop the unused `extra_links` field (it's applied post-create via
emit_argv, not a create input) — removes an unnecessary clone in main.rs and
makes the struct cohesive.
- Move the gated real-bd smokes from `src/smoke_tests.rs` (a pub module that
compiled into the release lib) to `tests/smoke.rs` (an integration-test crate,
test-only) — the idiom used by context-pack-assembler. Resolves the binary via
CARGO_BIN_EXE_millworks-emit instead of path-guessing, and asserts `bd dep
list` exit success so a broken dep-list can't be silently ignored.
- Fix two clippy doc-list warnings in lib.rs.
All green: 27 lib + 5 bin unit tests, 4 real-bd smokes (MILLWORKS_SMOKE=1),
clippy clean.
Add 'requirement' to the custom beads types list (intent,risk,healing, wfrun,step → +requirement) so cn8 requirements-analyst personas can emit first-class queryable requirement records rather than modeling them as task/feature. - recipes/init-beads.sh: CUSTOM_TYPES gains requirement; count comments updated (5→6 custom, 9→10 total) - docs/beads-mapping.md: Requirement row in summary table; full per-type detail section added before WFRUN section - docs/adr/0003-beads-schema-mapping.md: D16 updated to 10 types/6 custom; REQUIREMENT row in domain table; bd config set example and Consequence paragraph updated for cn8 - content/skills/beads/SKILL.md: "The 9 record types" heading → 10; new Requirement row in Domain records table; error-recovery snippet updated Verified: bd types lists 'requirement'; bd create -t requirement succeeds in a fresh scratch workspace.
…-2qe)
When the context-pack-assembler renders a scoped STEP, after the notes
summary it now queries bd list --label step:<id> --json, gathers the
emitted records, and renders each as type+id+title+description under an
"#### Emitted Records" sub-heading (D44 D-e).
Key mechanics:
- run_bd_list_by_label: isolated bd I/O seam for the label query
- render_emitted_records: pure fn over raw JSON list (unit-testable
without bd); skips malformed records (fail-fast per record), returns ""
for zero records
- summarize_bd_record_with_emits: pure fn composing the step heading +
notes + emitted-records block; "" emits block => output identical to
c30 (superset/graceful-degrade rule)
- summarize_bd_record delegates to summarize_bd_record_with_emits("")
so all existing c30 tests pin unchanged behavior
New tests (all pass):
- render_emitted_records_lists_type_id_description_per_record
- render_emitted_records_empty_list_returns_empty_string
- render_emitted_records_tolerates_missing_optional_fields
- step_with_zero_emitted_records_renders_notes_only_identical_to_c30
- step_with_emitted_records_appends_them_after_notes
- smoke_step_with_emitted_records_surfaces_in_bundle (MILLWORKS_SMOKE=1)
Pre-existing rrp failures (bare_task_only, task_with_persona,
non_skill_dir_is_ignored, pruning_occurs_when_over_budget) unchanged.
Declare emits contracts in all 20 content/agents/*.md personas and rewrite Output sections for the 5 roles with non-empty contracts. Emits mapping applied: intake-interviewer -> emits: [intent] requirements-analyst -> emits: [requirement] plan-reviewer -> emits: [decision] architect -> emits: [decision] plan-writer -> emits: [task] all others (15) -> emits: [] For the 5 non-empty emits personas the Output section is rewritten so that canonical output is structured beads records emitted via millworks-emit, with full prose in each record's --description field. Each rewrite: - instructs emit per unit of substance (one intent / requirement / decision / task) - uses --link for domain links between emitted records - ends with millworks-emit complete --summary as the terminal act - cross-references the millworks:beads skill for mechanics (DRY) - preserves the persona's posture and quality voice For the 15 emits:[] personas only the frontmatter field is added; no body changes (clean audits/reviews find nothing and must still settle). Verified: cargo test -p persona-picker all 53 tests pass; manual pick check confirms requirements-analyst -> emits:[requirement], all others as mapped.
…truction at dispatch (millworks-ypd)
D44 M-1: inject MILLWORKS_STEP_ID/MILLWORKS_WFRUN_ID into the spawned pane's
environment via tmux -e so millworks-emit can stamp provenance without the
subagent knowing its own ids.
D44 M-2: always grant Bash(millworks-emit:*) in allowedTools for every
workflow-step subagent (least-privilege scoped emit path); mapStepTools now
returns string[] (never undefined) with the emit tool always appended.
D44 M-4: generate the output-contract instruction from the dispatched persona's
emits (single source: frontmatter → picker → drive loop → dispatch args).
buildContractInstruction(emits) returns undefined for empty emits (uniform rule:
emits: [] → no instruction, cn8 a clean superset of c30). The real impl in
index.ts appends the instruction to the assembled bundle file before spawning.
Widen resolvePersonaViaCli to return { file, emits } (was: string | null).
The picker output already contained emits (PickResult.emits); the TypeScript
cast at workflow-cli.ts:100 is now widened to include it. Direct persona:
references return emits: [] (no picker invoked).
Tests: 8 new unit tests (TDD: watched each fail before implementing);
276 total passing (up from 268); typecheck clean.
…st (millworks-d8q)
D44 M-1/M-2/M-4 on the pi surface (lockstep mirror of Claude ypd):
- buildWrapperEnvExports: injects MILLWORKS_STEP_ID / MILLWORKS_WFRUN_ID as
export lines in the subagent wrapper.sh (single-quoted, process-env durable)
- addEmitToolAccess: ensures 'bash' is in the pi --tools allowlist when the
persona declares a non-empty emits contract (least-privilege emit path; bash
is the closest pi analog to Claude Code's Bash(millworks-emit:*))
- buildContractInstruction: generates the canonical output-contract instruction
from the persona emits list (null for empty emits — degrades to c30); appended
to the assembler bundle, not a separate flag
- resolveRoleToPersona: widened from Promise<string> to Promise<PersonaPickResult>
= { file, emits } to carry the persona-picker emits output through to dispatch
- dispatchStep: wires all three mechanics; personaEmits drives the conditional
tool-access and instruction injection
22 new unit tests (buildContractInstruction, addEmitToolAccess, buildWrapperEnvExports).
150 pass total (was 128); 4 pre-existing MILLWORKS_SMOKE smokes skipped.
…lowedTools to string[] (millworks-ypd review)
…d (millworks-d8q) Review fixes: - addEmitToolAccess doc: stop claiming a scoped-PATH security property that isn't implemented. The wrapper inherits full PATH/rc, so the bash grant is full bash; the contract instruction is a behavioral nudge only. Structural per-command scoping is tracked as hardening bead millworks-5wz. - dispatchStep env injection: replace `state.stepRecords[step.id] ?? ""` silent fallback with a hard throw — an empty MILLWORKS_STEP_ID would make millworks-emit mis-attribute/fail silently (project fail-fast rule). 150 tests pass (pre-existing ambient.d.ts glob false-failure tracked as millworks-7s4).
1. plan-writer: phase->phase ordering link example used the wrong link type (until is task->decision). Changed to blocks:<phase-task-id> and fixed the comment to describe phase ordering, not a decision gate. 2. decompile-synthesizer: converted its record-writing from raw bd create to millworks-emit emit (risk + decision). bd create bypasses the auto-stamp of step:/wfrun:/discovered-from, leaving records unattributed and invisible to assembler expansion. millworks-emit is the only granted, attributed write path. provenance:decompiled (a label, not supported by the emit CLI surface) folded into the decision --description prose. bd remember stays as a direct bd call (free-text memory, not a step-output record). No required-records language added — persona remains emits:[]. 3. plan-reviewer: completion-summary template used man-page optional-bracket notation [, <N> risk] that an LLM might emit literally; rewritten as plain prose. Re-verified: cargo test -p persona-picker (53 + 8 tests) green; the three edited personas parse with expected emits (task / decision / []).
…lworks-2qe review)
Addresses fail-fast review findings (project rule: never silence errors):
1. run_bd_show no longer swallows a bd list COMMAND failure into "zero
records" (notes-only). run_bd_list_by_label's Err now propagates via `?`.
A command that SUCCEEDS but lists zero records still degrades silently to
notes-only (legitimate D-e graceful-degrade) — distinguished from a real
command failure.
2. render_emitted_records now returns Result<String> and FAILS FAST on
malformed bd output (non-empty non-JSON, non-array, or a record missing a
required `id`/`title` field) via new AssemblerError::MalformedRecord,
instead of silently dropping bad records. A valid empty array `[]` and an
empty/blank input string remain the legitimate Ok("") degrade path.
3. summarize_bd_record_with_emits now returns Result<Option<String>> so the
malformed-record error propagates through the seam. summarize_bd_record
keeps its Option surface for the c30 tests (empty emits can't be malformed).
4. Fixed the contradictory doc comment that claimed "skips malformed ones,
keeps valid ones" — now describes the actual fail-fast behavior.
5. main.rs maps MalformedRecord to exit code 2 (bad-data class).
New tests (all pass):
- render_emitted_records_fails_fast_on_malformed_json
- render_emitted_records_fails_fast_on_record_missing_required_field
- summarize_propagates_malformed_emits_error
- render_emitted_records_empty_list_returns_empty_string (now asserts Ok(""))
Test results: 31 pass, 4 pre-existing rrp failures unchanged, 1 smoke
(MILLWORKS_SMOKE=1) ignored by default and passing against live bd.
… runtime closes (millworks-kaa) STATE MACHINE (lockstep with Claude q2h): - marker=YES → validate emits → SETTLED (runtime writes outcome:success) - marker=YES + contract unmet → EmitsContractError → retry path (no false success) - marker=NO + pane dead → crashed → existing retry/fail path - marker=NO + pane alive → still running (interruption is not a failure) - timeout + no marker → TIMEOUT → retry path CHANGES: - waitForSettle: reworked to poll bdHasMarker (beads is settle AUTHORITY); transcript/done-file/pane demote to HEALTH inputs. Injectable WaitForSettleDeps for deterministic unit testing (DI seam per millworks-n0f intent). - validateEmitsContract: validate-then-commit before any outcome:success write. Throws EmitsContractError for missing required types (fail-fast, never silent). - markStepSettled: calls validateEmitsContract BEFORE writing outcome:success (the sole-writer invariant; agent never writes terminal state — D44 D-g). - stepProduced removed from processReadyStep: agent's `millworks-emit complete` already sets STEP notes; runtime must not overwrite them (inc5 notes-write removed). - buildContractInstruction: ALWAYS returns the completion instruction for ALL steps (universal-completion); emit-types requirement APPENDED only when emits non-empty. COMPLETION_INSTRUCTION constant exported for lockstep verification. - addEmitToolAccess: bash granted for ALL steps (not conditioned on emits.length). Every step needs millworks-emit complete access (the universal settle signal). - bdHasMarker + bdCountEmittedByType: new bd helpers for marker poll and validation. - drainSessionFile: extracted from old waitForSettle for progress/health use. - StepResult.personaEmits: new field threads persona emits from dispatchStep to markStepSettled for post-settle validation. - adoptStep: updated to use new waitForSettle + bdHasMarker; cwd added to signature. PI-SPECIFIC vs q2h: - bash granted (not scoped Bash(millworks-emit:*)) per accepted d8q decision (5wz tracks scoping hardening). Recovery paths pass personaEmits:[] (1i7 follow-up). TESTS: 24 new unit tests (COMPLETION_INSTRUCTION lockstep, buildContractInstruction universal-completion x5, waitForSettle state matrix x7, validateEmitsContract x3, 2 gated real-bd smoke tests: settle-by-marker round-trip + fail-fast on unmet contract). Total: 174 pass, 8 skipped (4 new gated smokes). Only ambient.d.ts pre-existing fails.
…emits → runtime closes State machine (beads-authoritative, D44 D-f/D-g): marker=YES + emits met → runtime writes outcome:success (validate-then-commit) marker=YES + type missing → contract-violation → step failure (no false success ever written) marker=NO + pane dead → crashed → retry/re-dispatch marker=NO + pane alive → still running (interruption is NOT a settle) elapsed >= timeout → step failure (backstop for never-signaling agent) Key changes: - settle.ts: beads-authoritative state machine (pollSettleMarker, waitForMarker) with full DI seam; pane/transcript demotes to HEALTH input only - workflow.ts: buildContractInstruction always returns completion instruction (universal, not conditioned on emits); emits types appended only when non-empty; acceptStep validates emits contract BEFORE writing outcome:success (validate-then-commit); inc5 notes-write removed (agent's millworks-emit complete --summary sets notes, runtime does NOT overwrite) - workflow.ts: StepResult gains emits:[] field (for validate-then-commit routing); rebuildRunState, recovery paths, and tests updated to include emits:[] - bd.ts: validateStepEmits added (bd list --label step:<id> --type T for each required type) - index.ts: validateEmits wired into controllerDeps via validateStepEmits - settle.marker.test.ts + workflow.settle.test.ts: unit coverage of all 5 state transitions - settle.marker.smoke.test.ts: gated real-bd round-trip (MILLWORKS_SMOKE=1) - Completion instruction string byte-matches pi mirror (millworks-kaa) exactly Fixed gaps left by prior agent (tsc --noEmit failures): server.test.ts: 2x StepResult object literals missing emits field workflow.substitute.test.ts: settled() helper missing emits field workflow.recovery.test.ts: expected StepResult missing emits field workflow.ts:rebuildRunState: constructed StepResult missing emits field
… kaa 1. [CRITICAL] Wire waitForMarker into production: buildController.dispatch now uses waitForMarker (beads-authoritative) for workflow steps (stepBeadsId provided). The settle AUTHORITY is the self-report:complete label polled from beads; pane/transcript demotes to health. Ad-hoc dispatch_subagent (no stepBeadsId) keeps transcript-based waitForSettle — no regression. - Add bdHasMarker + bdReadNotes to bd.ts - Import waitForMarker + BeadsSettleState in index.ts - Thread stepBeadsId + stepEmits through WorkflowDeps.dispatch args - Override deps.wait per-dispatch with marker-poll lambda for workflow steps - Read agent notes from beads after marker-settle resolves 2. [CRITICAL] Remove inc5 notes-write: stepProduced no longer called from dispatchStepWithRetry or processAdoptedOutcome. Notes come from the agent's millworks-emit complete --summary call. Update all tests accordingly. 3. [IMPORTANT] Align buildContractInstruction to kaa byte-for-byte: - Add COMPLETION_INSTRUCTION constant (exported, lockstep with kaa) - Reorder: completion instruction FIRST, emit-types appended after - New emit-types wording: "MUST also emit..." + env trailer - Update workflow.settle.test.ts and workflow.drive.test.ts assertions 4. [IMPORTANT] Fix timeout-before-marker ordering in pollSettleMarker: elapsed >= timeout is now checked BEFORE the marker (matching kaa's waitForSettle). Add test proving timeout wins over a present marker. 5. [MINOR] Remove dead paneCheckEvery field from WaitMarkerDeps (the loop body was an empty comment; remove unused field + loop counter variable). Update settle.marker.test.ts to drop the field from all test objects. 6. [MINOR] Route validateEmits bd-errors to step-failure path at all three acceptStep call sites (driveWorkflow, applyGateAndResume, processAdoptedOutcome) so a transient bd throw never propagates uncaught to the MCP caller. Tests: 310 passed (up from pre-existing 300), 13 skipped. Known failures: index.integration.test.ts (esbuild not in worktree) + ambient.d.ts (no suite).
…lockstep
Final lockstep divergence on the settle path: a contract violation (marker
present, but a required emits type has 0 records) was mapped to status `errored`
→ markStepFailed (PERMANENT fail, no retry). kaa + the D44 design route a
contract violation to the EXISTING RETRY PATH (re-dispatch up to max-retries,
then outcome:failed). Fix:
- Add a distinct `contract-violation` DispatchOutcome status (workflow.ts) so
ONLY contract violations get kill-then-retry; a genuine `errored` (pane alive,
no marker, wait failed) keeps its non-retryable behavior.
- Add WorkflowDeps.killStepPane({wfrunBeadsId, stepId}) — kills the lingering
subagent pane before a re-dispatch so it can't double-spawn (mirrors kaa's
killOrphanedPanes-before-retry). Production impl (index.ts) looks up the
tagged SubagentRecord and calls realTmux.kill (idempotent).
- dispatchStepWithRetry: on `contract-violation`, killStepPane then retryOrFail
(the retryable path) instead of markStepFailed. validate-then-commit holds —
no outcome:success is ever written for a violation.
- index.ts marker-wait: capture failed-contract in a closure flag and return an
`exited` sentinel from the wait (no throw → not mis-recorded as `errored`),
then override the DispatchOutcome to `contract-violation` after dispatchSubagent
returns. This distinguishes it from genuine errors through dispatchSubagent's
fixed status vocabulary.
- Add killStepPane to all 8 WorkflowDeps test fakes.
- New tests (workflow.settle.test.ts): contract-violation re-dispatches up to
max-retries then succeeds (proves retryable + pane killed before retry); and
exhausts retries → outcome:failed with pane killed each attempt and NO false
success (validate-then-commit invariant preserved).
Tests: 312 passed (up from 310), 13 skipped. Known failures only:
index.integration.test.ts (esbuild not in worktree) + ambient.d.ts (no suite).
…(millworks-1i7) Fixes the gap left by kaa: recovered steps were passing `personaEmits: []` through all three recovery paths (gate-after, reconcile/adoptStep, pending-validation) which caused validate-then-commit to auto-pass for any step restarted after a crash. Recovery now RE-RESOLVES emits via the same resolveRoleToPersona path that dispatch uses, and re-validates the contract before writing outcome:success. State-machine additions (pi surface, lockstep with Claude 1i7): - `BeadsStepRecovery.hasSelfReportComplete`: detected from bd labels in recoveryViewFromRecords; true when STEP open + self-report:complete (crash in the validation window). - `ResumePlan.pending-validation`: new plan kind — STEP open + marker present. Takes priority over reconcile (agent finished; pane may be gone). driveRun re-resolves emits, then passes a StepResult to processReadyStep which calls markStepSettled → validateEmitsContract → outcome:success (or fails/retries). - `resolveStepEmits()`: helper that mirrors dispatchStep's resolution path; throws UnrecoverableRunError when the persona/role cannot be resolved (fail the run, same transient-vs-malformed split as inc5). - `adoptStep()`: now calls resolveStepEmits before entering waitForSettle so the re-resolved emits flow into the returned StepResult. Removes the 1i7 follow-up comment (gap is now closed). - driveRun gate-after path: also re-resolves emits instead of passing []. Tests (in-source vitest): - recoveryViewFromRecords: 3 new tests pinning hasSelfReportComplete detection. - planResume: 3 new tests — pending-validation produced for open+marker; priority over plain reconcile; false positive excluded (marker absent → reconcile). - All 174 pre-existing tests still pass (186 total, +12 new).
…ude surface)
Extends inc5 beads-authoritative recovery with the millworks-1i7 contract:
a STEP that carried `self-report:complete` but was not yet closed (crash in
the validate-then-close window) is now re-validated on recovery — never
auto-passed via emits:[]. A running step with a live pane (no marker) now
carries re-resolved emits into the adopted waitForMarker.
Recovery state machine (lockstep with pi):
- STEP closed outcome:success/failed → terminal (unchanged, inc5).
- STEP open + self-report:complete → PENDING VALIDATION: re-resolve
persona emits via deps.resolvePersona, validateEmits, acceptStep
(success) or markStepFailed (contract violation). No pane adoption.
- STEP open + no marker + live pane → adopt, carrying re-resolved emits
into waitForMarker (not []). Re-dispatch if pane gone.
- After-gate recovery: reconstructGate re-resolves persona emits so
gate_approve validates the real contract (not auto-passing).
FAIL-FAST: unresolvable persona/emits propagates as a transient error.
Changes:
- workflow.ts: BeadsStepRecovery gains markerPresent; RunState gains
pendingValidationStepIds + pendingValidationOutputs; rebuildRunState
populates both; resumeRecoveredRun handles all three recovery shapes;
reconstructGate is now async + re-resolves emits; adoptStep interface
adds stepEmits.
- run-tracker.ts: loadRecovery sets markerPresent from SELF_REPORT_COMPLETE.
- run-tracker.testing.ts: recoveryView() includes markerPresent: false.
- index.ts: adoptStep uses waitForMarker with re-resolved stepEmits.
- Tests: 11 new unit + controller-level tests; inc5 recovery tests extended.
…nsient retry), lockstep with pi
Lockstep divergence fix on the Claude side. When recovery re-resolves a
recovered step's persona emits and resolvePersona FAILS, the prior code let
the error propagate as transient — effectively retried next session. A
deterministic resolution failure (the role no longer resolves) would strand
the run open forever, worse than a loud fail. pi's resolveStepEmits throws
UnrecoverableRunError; D44's fail-fast intent is "fail the run, don't auto-pass".
Changes (workflow.ts):
- New resolveRecoveredEmits helper: wraps deps.resolvePersona; any failure
throws UnrecoverableRunError (the inc5 malformed-recovery path), carrying
the original cause + step id. Used at all three re-resolution sites.
- resumeRecoveredRun (pending-validation + no-marker adopt paths): use the
helper. A persona failure now fails the run (runDrive closes the WFRUN
failed), not a silent retry.
- reconstructGate (after-gate): use the helper; a paused:after:<stepId> for
a step absent from the re-parsed workflow is also UnrecoverableRunError.
- doRecover: the gate-pause branch now catches UnrecoverableRunError from
reconstructGate, closes the WFRUN failed, and starts clean — currentRun is
armed ONLY after a successful reconstruction (no half-built armed run). A
transient bd/CLI blip still propagates (run left open, retryable).
Tests:
- resume: persona-unresolvable on a no-marker step and on a pending-validation
step both reject with UnrecoverableRunError (was: generic throw).
- controller-recovery: mid-step (no-marker) and after-gate recovered steps
whose persona can't be re-resolved close the WFRUN outcome:failed (not
left open); the controller stays clean and a fresh run can start.
- substitute test: RunState literal gains the two new recovery fields.
npm test: 326 passed, 13 skipped (only the pre-existing esbuild integration
failure). tsc --noEmit: clean.
…teps (millworks-1i7)
…en steps (millworks-1i7)
…illworks-6q0) /millworks:init registers beads custom types via the Rust `millworks init` binary (init.rs), NOT recipes/init-beads.sh. 6q0 updated the recipe but missed init.rs:129, so `requirement` never registered at workflow runtime — caught during cn8 live verification (26e). Extract the list to CUSTOM_BEADS_TYPES (incl. requirement), add a regression test covering this binary path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Epic millworks-cn8 (ADR-0009 D44). A workflow step's canonical output becomes first-class structured beads records (decision, risk, requirement, intent, task, healing — each carrying its prose in its
description), not a single prose blob in STEPnotes. The beads graph is the source of truth for "what was decided / what happened / what needs doing". Cross-surface (pi + Claude), lockstep. Builds on c30 (#3) and inc5 run-tracking/settle.What's in this PR (11 beads)
Shared core
tools/millworks-emit— the sole least-privilege beads write-path.emitauto-stampsstep:/wfrun:labels + adiscovered-fromlink fromMILLWORKS_STEP_ID/WFRUN_ID;complete --summarysets STEPnotes+ theself-report:completemarker. (millworks-thz)persona-pickerparses anemits:frontmatter field and surfaces it in JSON (millworks-40a);context-pack-assemblerexpands a scoped STEP → its emitted records, notes-only when none — a graceful c30 superset (millworks-2qe);requirementregistered as a custom type (millworks-6q0);millworks:beadsskill documents the emit mechanics (millworks-clb); personas declare conservativeemitscontracts + the 5 emitting ones rewritten to records (millworks-kma).Both surfaces (pi
extensions/workflow-runner, Claudesurfaces/claude/mcp-server), lockstepmillworks-emit completeto settle; emit-types requirement only whenemitsis non-empty). (millworks-ypd / millworks-d8q)self-report:completemarker is the settle trigger (transcript/pane demoted to a health signal); the runtime validates the emits contract and is the sole writer of theoutcome:successclose (validate-then-commit; a contract violation kills the pane and re-dispatches; the inc5 transcript→notes write is removed — notes come from the agent). (millworks-q2h / millworks-kaa)emitsand re-validates a marker-seen step (no false auto-pass; a persona that can't be re-resolved fails the run). (millworks-1i7)Why this matters
Solves the fragile-settle problem: a user interrupting a subagent (ending its transcript turn) no longer reads as "settled" — settlement is now a durable, content-addressed outcome in beads, definitive across crashes and interruptions.
Verification
bdsmokes on both surfaces.Deferred (tracked beads, not blocking)
millworks-26e— live end-to-end + parity verification (owner-driven plugin rebuild + driven run with kill/recover). Pending.millworks-5wz— pi emit-scoping hardening (pi--toolshas no per-command scoping, so emitting personas get fullbashfor now — Decision A; structural scoping tracked).millworks-qaq— direct-persona:steps skip the emits contract (apersona:-pinned step bypasses the picker →emits:[]).Pre-existing (NOT cn8 regressions, will show in CI)
millworks-rrp— 4context-pack-assemblerunit tests failing on main.millworks-7s4— pi vitest picks upambient.d.tsas a test file (false failure).Design in
bd show millworks-cn8 --design+ ADR-0009 D44.