fix(pivot): align remaining-work regeneration by MacAttak · Pull Request #199 · Obsidian-Owl/specwright

MacAttak · 2026-04-21T04:21:01Z

Summary

align pivot-driven remaining-work regeneration semantics across sw-plan, sw-build, sw-verify, sw-ship, and git-freshness
restore the exact sw-verify mutation-surface wording required by the standing regression proof suites
keep the shipped scope intact while making manual freshness-reconcile guidance non-looping across build, verify, and ship

Approval Lineage

design: STALE (artifact-set-changed) via /sw-plan on 2026-04-20T07:10:13Z; the active design artifact set no longer matches the last approved hash.
unit-spec (02-remaining-work-regeneration-alignment): APPROVED via /sw-build on 2026-04-21T04:02:09.775Z and matches the active spec.md / plan.md / context.md surface.

What Changed

core/skills/sw-plan/SKILL.md now treats structural pivots as regeneration of affected remaining-unit artifacts only, while preserving shipped units and recorded target/freshness metadata.
core/skills/sw-build/SKILL.md, core/skills/sw-verify/SKILL.md, core/skills/sw-ship/SKILL.md, and core/protocols/git-freshness.md now share one manual-reconcile contract with linked-worktree ownership guidance and no silent target rewrite.
evals/tests/test_pivot_regeneration_alignment.py adds regression coverage for planner regeneration, cross-stage freshness wording, and approval-refresh behavior.
core/skills/sw-verify/SKILL.md was repaired in Task 4 to restore the exact mutation-output phrases required by the standing verify-proof regressions.
Blast radius stays limited to lifecycle skill/protocol docs and regression suites; no runtime product code changed in this unit.

Why The Agent Implemented It This Way

The drift lived in the replanning and lifecycle contracts, so the fix updated the planner surface and the shared freshness protocol together instead of patching a single stage in isolation.
Approval refresh was proved against the real approvals helper in a temporary filesystem so the pivoted supersession path is exercised the same way the workflow uses it.
The verify repair stayed narrowly scoped because the failing proof surface was phrase-level contract drift, not a broader lifecycle regression.

Acceptance Criteria

AC-1 PASS: sw-plan limits regeneration to affected remaining units, preserves shipped baseline units, and keeps target/freshness metadata intact.
AC-2 PASS: sw-build treats pivot-updated artifacts as the current approval surface and requires refreshed unit-spec approval before task execution.
AC-3 PASS: sw-verify and sw-ship share the same manual-reconcile contract with build and do not direct users into an equally blocked loop.
AC-4 PASS: git-freshness and the consuming skills use the same reconcile vocabulary, linked-worktree guidance, and non-mutating target metadata semantics.
AC-5 PASS: regression coverage now fails if replanning, approval refresh, or manual reconcile guidance drifts again.

Spec Conformance

AC	Status	Proof summary
AC-1	PASS	`core/skills/sw-plan/SKILL.md:73-77,101-106,150-153` and `evals/tests/test_pivot_regeneration_alignment.py:45-83`
AC-2	PASS	`core/skills/sw-build/SKILL.md:49-51` and `evals/tests/test_pivot_regeneration_alignment.py:95-103,155-222`
AC-3	PASS	`core/skills/sw-verify/SKILL.md:78-91,193-195,222`, `core/skills/sw-ship/SKILL.md:59-64,144`, and `evals/tests/test_pivot_regeneration_alignment.py:114-131`
AC-4	PASS	`core/protocols/git-freshness.md:167-177` plus `evals/tests/test_pivot_regeneration_alignment.py:133-149`
AC-5	PASS	`evals/tests/test_pivot_regeneration_alignment.py:45-222` and `evals/tests/test_verify_mutation_surface.py:20-83`

Gate Summary

Gate	Verdict	Evidence
build	PASS	`evidence/build-report.md`
tests	PASS	`evidence/test-quality.md`
security	PASS	`evidence/security-report.md`
wiring	PASS	`evidence/wiring-report.md`
semantic	PASS	`evidence/semantic-report.md`
spec	PASS	`evidence/spec-compliance.md`

Remaining Attention

WARN: the work-level design approval lineage is stale (artifact-set-changed). The implementation and unit-spec proof surface are current, but reviewer approval of the latest design artifact set has not been refreshed.

Evidence Links

Reviewer-usable approval lineage, conformance, and gate summaries are inlined above because this repo uses clone-local work artifacts.
The underlying evidence files remain recorded in workflow state under units/02-remaining-work-regeneration-alignment/evidence/.

…k-regeneration-alignment

github-actions · 2026-04-21T04:21:41Z

Eval Smoke Results

✅ Smoke evals clean — 0 regressions, 3 improvements

Eval	Pass Rate	Duration	Verdict
grader-function-tests	1.00 (+0.00)	399ms (-1095ms)	improved
structural-handoff-template	1.00 (+0.00)	30ms (-13ms)	improved
structural-skill-validation	1.00 (+0.00)	3383ms (+825ms)	ok
structural-state-enforcement	1.00 (+0.00)	75ms (-16ms)	improved
workflow-yaml-validation	1.00 (+0.00)	1305ms (+5ms)	ok

_{Posted by Specwright eval-smoke workflow. This comment is updated on each push.}

claude · 2026-04-21T04:25:43Z

Review: fix(pivot): align remaining-work regeneration

The scope and intent are clear: align the manual-reconcile contract across sw-build, sw-verify, sw-ship, and git-freshness so operators get one consistent instruction and no stage can silently bypass a freshness block. The sw-plan regeneration semantics are also well-tightened. Overall this is a solid fix. A few issues worth addressing before merge:

Correctness

sw-verify/SKILL.md — duplicate gate list sentence (line 100)
After the PR, the gate-execution section contains:

All six gates are eligible when enabled in config: build, tests, security, wiring, semantic, spec.
Eligible gates: build, tests, security, wiring, semantic, spec.

The second sentence is a word-for-word restatement of the first. This is a diff-merge artifact; it should be deleted. An LLM executing this skill may treat the two sentences as separate constraints and behave unexpectedly.

sw-verify/SKILL.md — decision.md protocol reference lost its key descriptor (line 206)
The description was trimmed from "autonomous decision framework and gate handoff" to "gate handoff". Skills use these one-line descriptors to decide which protocols to load during autonomous execution. Dropping "autonomous decision framework" risks the protocol being skipped when an agent needs to apply the convergence-loop or assumption-lifecycle logic from protocols/decision.md.

sw-plan/SKILL.md — "open scope" is an undefined term (line ~99)
"regenerate integration-criteria.md for the current open scope defined by the affected remaining units" introduces "open scope" without a prior definition in the skill. The concept it describes is already expressed as "affected remaining units" four lines above. Using a new term here creates an unnecessary ambiguity risk for agents parsing this skill; "remaining open work" or just restating "affected remaining units" would be clearer.

Test quality

test_pivot_regeneration_alignment.py — _run_node_json helper is duplicated
The function at lines 20–36 is functionally identical to the one in test_pivot_rebaselining_foundation.py (lines 17–28), with only an added env parameter. This should be moved to evals/tests/_text_helpers.py and the prior file updated to use the shared version. Two copies will diverge.

Wide regex windows risk missing real contract drift
Several chained patterns use [\s\S]{0,220} or [\s\S]{0,260} multiple times in a row (e.g., test_build_treats_pivoted_unit_artifacts_as_current_approval_surface). With up to 440 characters between the first and last anchor, these tests would still pass if the matched concepts appear in entirely separate paragraphs. Tightening to {0,100} or splitting into two independent assertions would catch actual proximity drift.

TestApprovalRefreshProof — missing stale detection path
The proof verifies that recordApproval a second time supersedes the first entry and the latest entry assesses as APPROVED. It does not verify the inverse: that the pre-pivot (v1) entry assesses as STALE against the v2 artifacts. That's the actual regression risk — a stale first entry must not look APPROVED after the pivot changes artifacts. A second assertion on entries[0] assessed against v2 artifacts would complete the proof.

Design approval lineage

The PR itself notes the work-level design approval is STALE (artifact-set-changed). The gate evidence and unit-spec approval are current, but a reviewer cannot confirm the implementation matches the last-reviewed design without re-approving the design artifact set. This should be resolved before merge or explicitly documented as a known deviation.

claude · 2026-04-21T04:26:04Z

-All six gates are eligible: build, tests, security, wiring, semantic, spec.
+All six gates are eligible when enabled in config: build, tests, security,
+wiring, semantic, spec.
+Eligible gates: build, tests, security, wiring, semantic, spec.


Redundant sentence — the gate list was already stated on the line above. This appears to be a diff-merge artifact.

Suggested change

Eligible gates: build, tests, security, wiring, semantic, spec.

Execute enabled gates in dependency order: gate-build → gate-tests →

claude · 2026-04-21T04:26:06Z

- `protocols/decision.md` -- autonomous decision framework and gate handoff
- `protocols/state.md` -- workflow state and locking
+- `protocols/stage-boundary.md` -- stage boundary
+- `protocols/decision.md` -- gate handoff


The prior description "autonomous decision framework and gate handoff" was load-bearing: agents use these one-line descriptors to decide which protocols are relevant for their current task. Trimming to "gate handoff" only makes decision.md invisible during autonomous decision steps (convergence loop, assumption lifecycle, CCR), not just gate handoff.

Suggested change

- `protocols/decision.md` -- gate handoff

- `protocols/decision.md` -- autonomous decision framework and gate handoff

claude · 2026-04-21T04:26:15Z

+FRESHNESS_PROTOCOL = ROOT_DIR / "core" / "protocols" / "git-freshness.md"
+
+
+def _run_node_json(script: str, env: dict[str, str] | None = None) -> dict:


This is a near-copy of _run_node_json in test_pivot_rebaselining_foundation.py (lines 17–28), with only the env parameter added. Two copies will diverge. Consider moving this (with the env parameter) to _text_helpers.py and updating the prior file to import from there.

claude · 2026-04-21T04:26:20Z

+            self.build_text,
+            re.compile(
+                r"(pivot|replan|regenerated)[\s\S]{0,220}(approval surface|unit-spec)[\s\S]{0,220}(refresh|record)|"
+                r"(refresh|record)[\s\S]{0,220}(unit-spec|approval surface)[\s\S]{0,220}(pivot|replan|regenerated)",
+                re.IGNORECASE,
+            ),


The chained [\s\S]{0,220} windows let up to 440 characters separate the first and last anchors — wide enough to match concepts appearing in separate paragraphs. This test would pass even if "pivot", "unit-spec", and "refresh" were mentioned in completely unrelated sentences.

Consider splitting into two independent assertions (pivot→approval-surface, then approval-surface→refresh) each with a tighter {0,100} window, or verifying paragraph-level proximity instead.

claude · 2026-04-21T04:26:29Z

+        self.assertEqual(result["entryCount"], 2)
+        self.assertEqual(result["firstStatus"], "SUPERSEDED")
+        self.assertEqual(result["latestStatus"], "APPROVED")
+        self.assertEqual(result["assessmentStatus"], "APPROVED")


The proof shows that the second approval (v2) supersedes the first and assesses as APPROVED — correct. But it doesn't test the actual pivot regression: the first entry (v1 approval) assessed against v2 artifacts should return STALE. That's the state an operator would see if they ran sw-build without refreshing the approval after a pivot, and it's the case this unit is explicitly guarding against.

Consider adding:

stale_assessment = assessApprovalEntry(entries[0], { baseDir: unitRoot, artifacts: ['context.md', 'plan.md', 'spec.md'] })

and asserting stale_assessment.status === 'STALE' (or SUPERSEDED).

Specwright Tests added 9 commits April 21, 2026 12:05

feat(pivot): broaden rebaselining contract

51b6fd4

feat(approvals): document pivot stale lineage

ec968d0

test(pivot): lock broadened closeout contract

9155859

fix(pivot): address review gaps and smoke guard

ec29e35

Merge remote-tracking branch 'origin/main' into work/02-remaining-wor…

b121c7d

…k-regeneration-alignment

docs(plan): scope replanning to open units

9345165

docs(freshness): align manual reconcile guidance

770c616

test(pivot): deepen regeneration alignment proofs

0ece90a

fix: restore sw-verify mutation surface contract

9018a45

claude Bot reviewed Apr 21, 2026

View reviewed changes

Specwright Tests added 2 commits April 21, 2026 14:46

fix(review): address PR 199 feedback

1ea81a6

test(review): relax wrapped gate list assertion

ee6f252

MacAttak merged commit 267c953 into main Apr 21, 2026
13 checks passed

MacAttak deleted the work/02-remaining-work-regeneration-alignment branch April 21, 2026 04:49

github-actions Bot mentioned this pull request Apr 22, 2026

chore(release): v0.32.0 #209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pivot): align remaining-work regeneration#199

fix(pivot): align remaining-work regeneration#199
MacAttak merged 11 commits intomainfrom
work/02-remaining-work-regeneration-alignment

MacAttak commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	Eligible gates: build, tests, security, wiring, semantic, spec.
	Execute enabled gates in dependency order: gate-build → gate-tests →

	- `protocols/decision.md` -- gate handoff
	- `protocols/decision.md` -- autonomous decision framework and gate handoff

		FRESHNESS_PROTOCOL = ROOT_DIR / "core" / "protocols" / "git-freshness.md"


		def _run_node_json(script: str, env: dict[str, str] \| None = None) -> dict:

Conversation

MacAttak commented Apr 21, 2026

Summary

Approval Lineage

What Changed

Why The Agent Implemented It This Way

Acceptance Criteria

Spec Conformance

Gate Summary

Remaining Attention

Evidence Links

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Eval Smoke Results

Uh oh!

claude Bot commented Apr 21, 2026

Review: fix(pivot): align remaining-work regeneration

Correctness

Test quality

Design approval lineage

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 21, 2026 •

edited

Loading