fix(pivot): align remaining-work regeneration#199
Conversation
…k-regeneration-alignment
Eval Smoke Results✅ Smoke evals clean — 0 regressions, 3 improvements
Posted by Specwright eval-smoke workflow. This comment is updated on each push. |
Review: fix(pivot): align remaining-work regenerationThe scope and intent are clear: align the manual-reconcile contract across Correctness
The second sentence is a word-for-word restatement of the first. This is a diff-merge artifact; it should be deleted. An LLM executing this skill may treat the two sentences as separate constraints and behave unexpectedly.
Test quality
Wide regex windows risk missing real contract drift
Design approval lineageThe PR itself notes the work-level |
| All six gates are eligible: build, tests, security, wiring, semantic, spec. | ||
| All six gates are eligible when enabled in config: build, tests, security, | ||
| wiring, semantic, spec. | ||
| Eligible gates: build, tests, security, wiring, semantic, spec. |
There was a problem hiding this comment.
Redundant sentence — the gate list was already stated on the line above. This appears to be a diff-merge artifact.
| Eligible gates: build, tests, security, wiring, semantic, spec. | |
| Execute enabled gates in dependency order: gate-build → gate-tests → |
| - `protocols/decision.md` -- autonomous decision framework and gate handoff | ||
| - `protocols/state.md` -- workflow state and locking | ||
| - `protocols/stage-boundary.md` -- stage boundary | ||
| - `protocols/decision.md` -- gate handoff |
There was a problem hiding this comment.
The prior description "autonomous decision framework and gate handoff" was load-bearing: agents use these one-line descriptors to decide which protocols are relevant for their current task. Trimming to "gate handoff" only makes decision.md invisible during autonomous decision steps (convergence loop, assumption lifecycle, CCR), not just gate handoff.
| - `protocols/decision.md` -- gate handoff | |
| - `protocols/decision.md` -- autonomous decision framework and gate handoff |
| FRESHNESS_PROTOCOL = ROOT_DIR / "core" / "protocols" / "git-freshness.md" | ||
|
|
||
|
|
||
| def _run_node_json(script: str, env: dict[str, str] | None = None) -> dict: |
There was a problem hiding this comment.
This is a near-copy of _run_node_json in test_pivot_rebaselining_foundation.py (lines 17–28), with only the env parameter added. Two copies will diverge. Consider moving this (with the env parameter) to _text_helpers.py and updating the prior file to import from there.
| self.build_text, | ||
| re.compile( | ||
| r"(pivot|replan|regenerated)[\s\S]{0,220}(approval surface|unit-spec)[\s\S]{0,220}(refresh|record)|" | ||
| r"(refresh|record)[\s\S]{0,220}(unit-spec|approval surface)[\s\S]{0,220}(pivot|replan|regenerated)", | ||
| re.IGNORECASE, | ||
| ), |
There was a problem hiding this comment.
The chained [\s\S]{0,220} windows let up to 440 characters separate the first and last anchors — wide enough to match concepts appearing in separate paragraphs. This test would pass even if "pivot", "unit-spec", and "refresh" were mentioned in completely unrelated sentences.
Consider splitting into two independent assertions (pivot→approval-surface, then approval-surface→refresh) each with a tighter {0,100} window, or verifying paragraph-level proximity instead.
| self.assertEqual(result["entryCount"], 2) | ||
| self.assertEqual(result["firstStatus"], "SUPERSEDED") | ||
| self.assertEqual(result["latestStatus"], "APPROVED") | ||
| self.assertEqual(result["assessmentStatus"], "APPROVED") |
There was a problem hiding this comment.
The proof shows that the second approval (v2) supersedes the first and assesses as APPROVED — correct. But it doesn't test the actual pivot regression: the first entry (v1 approval) assessed against v2 artifacts should return STALE. That's the state an operator would see if they ran sw-build without refreshing the approval after a pivot, and it's the case this unit is explicitly guarding against.
Consider adding:
stale_assessment = assessApprovalEntry(entries[0], {
baseDir: unitRoot,
artifacts: ['context.md', 'plan.md', 'spec.md']
})and asserting stale_assessment.status === 'STALE' (or SUPERSEDED).
Summary
sw-plan,sw-build,sw-verify,sw-ship, andgit-freshnesssw-verifymutation-surface wording required by the standing regression proof suitesApproval Lineage
design: STALE (artifact-set-changed) via/sw-planon2026-04-20T07:10:13Z; the active design artifact set no longer matches the last approved hash.unit-spec(02-remaining-work-regeneration-alignment): APPROVED via/sw-buildon2026-04-21T04:02:09.775Zand matches the activespec.md/plan.md/context.mdsurface.What Changed
core/skills/sw-plan/SKILL.mdnow treats structural pivots as regeneration of affected remaining-unit artifacts only, while preserving shipped units and recorded target/freshness metadata.core/skills/sw-build/SKILL.md,core/skills/sw-verify/SKILL.md,core/skills/sw-ship/SKILL.md, andcore/protocols/git-freshness.mdnow share one manual-reconcile contract with linked-worktree ownership guidance and no silent target rewrite.evals/tests/test_pivot_regeneration_alignment.pyadds regression coverage for planner regeneration, cross-stage freshness wording, and approval-refresh behavior.core/skills/sw-verify/SKILL.mdwas repaired in Task 4 to restore the exact mutation-output phrases required by the standing verify-proof regressions.Why The Agent Implemented It This Way
Acceptance Criteria
AC-1PASS:sw-planlimits regeneration to affected remaining units, preserves shipped baseline units, and keeps target/freshness metadata intact.AC-2PASS:sw-buildtreats pivot-updated artifacts as the current approval surface and requires refreshedunit-specapproval before task execution.AC-3PASS:sw-verifyandsw-shipshare the same manual-reconcile contract with build and do not direct users into an equally blocked loop.AC-4PASS:git-freshnessand the consuming skills use the same reconcile vocabulary, linked-worktree guidance, and non-mutating target metadata semantics.AC-5PASS: regression coverage now fails if replanning, approval refresh, or manual reconcile guidance drifts again.Spec Conformance
core/skills/sw-plan/SKILL.md:73-77,101-106,150-153andevals/tests/test_pivot_regeneration_alignment.py:45-83core/skills/sw-build/SKILL.md:49-51andevals/tests/test_pivot_regeneration_alignment.py:95-103,155-222core/skills/sw-verify/SKILL.md:78-91,193-195,222,core/skills/sw-ship/SKILL.md:59-64,144, andevals/tests/test_pivot_regeneration_alignment.py:114-131core/protocols/git-freshness.md:167-177plusevals/tests/test_pivot_regeneration_alignment.py:133-149evals/tests/test_pivot_regeneration_alignment.py:45-222andevals/tests/test_verify_mutation_surface.py:20-83Gate Summary
evidence/build-report.mdevidence/test-quality.mdevidence/security-report.mdevidence/wiring-report.mdevidence/semantic-report.mdevidence/spec-compliance.mdRemaining Attention
designapproval lineage is stale (artifact-set-changed). The implementation and unit-spec proof surface are current, but reviewer approval of the latest design artifact set has not been refreshed.Evidence Links
units/02-remaining-work-regeneration-alignment/evidence/.