Skip to content

fix(pivot): align remaining-work regeneration#199

Merged
MacAttak merged 11 commits intomainfrom
work/02-remaining-work-regeneration-alignment
Apr 21, 2026
Merged

fix(pivot): align remaining-work regeneration#199
MacAttak merged 11 commits intomainfrom
work/02-remaining-work-regeneration-alignment

Conversation

@MacAttak
Copy link
Copy Markdown
Contributor

Summary

  • align pivot-driven remaining-work regeneration semantics across sw-plan, sw-build, sw-verify, sw-ship, and git-freshness
  • restore the exact sw-verify mutation-surface wording required by the standing regression proof suites
  • keep the shipped scope intact while making manual freshness-reconcile guidance non-looping across build, verify, and ship

Approval Lineage

  • design: STALE (artifact-set-changed) via /sw-plan on 2026-04-20T07:10:13Z; the active design artifact set no longer matches the last approved hash.
  • unit-spec (02-remaining-work-regeneration-alignment): APPROVED via /sw-build on 2026-04-21T04:02:09.775Z and matches the active spec.md / plan.md / context.md surface.

What Changed

  • core/skills/sw-plan/SKILL.md now treats structural pivots as regeneration of affected remaining-unit artifacts only, while preserving shipped units and recorded target/freshness metadata.
  • core/skills/sw-build/SKILL.md, core/skills/sw-verify/SKILL.md, core/skills/sw-ship/SKILL.md, and core/protocols/git-freshness.md now share one manual-reconcile contract with linked-worktree ownership guidance and no silent target rewrite.
  • evals/tests/test_pivot_regeneration_alignment.py adds regression coverage for planner regeneration, cross-stage freshness wording, and approval-refresh behavior.
  • core/skills/sw-verify/SKILL.md was repaired in Task 4 to restore the exact mutation-output phrases required by the standing verify-proof regressions.
  • Blast radius stays limited to lifecycle skill/protocol docs and regression suites; no runtime product code changed in this unit.

Why The Agent Implemented It This Way

  • The drift lived in the replanning and lifecycle contracts, so the fix updated the planner surface and the shared freshness protocol together instead of patching a single stage in isolation.
  • Approval refresh was proved against the real approvals helper in a temporary filesystem so the pivoted supersession path is exercised the same way the workflow uses it.
  • The verify repair stayed narrowly scoped because the failing proof surface was phrase-level contract drift, not a broader lifecycle regression.

Acceptance Criteria

  • AC-1 PASS: sw-plan limits regeneration to affected remaining units, preserves shipped baseline units, and keeps target/freshness metadata intact.
  • AC-2 PASS: sw-build treats pivot-updated artifacts as the current approval surface and requires refreshed unit-spec approval before task execution.
  • AC-3 PASS: sw-verify and sw-ship share the same manual-reconcile contract with build and do not direct users into an equally blocked loop.
  • AC-4 PASS: git-freshness and the consuming skills use the same reconcile vocabulary, linked-worktree guidance, and non-mutating target metadata semantics.
  • AC-5 PASS: regression coverage now fails if replanning, approval refresh, or manual reconcile guidance drifts again.

Spec Conformance

AC Status Proof summary
AC-1 PASS core/skills/sw-plan/SKILL.md:73-77,101-106,150-153 and evals/tests/test_pivot_regeneration_alignment.py:45-83
AC-2 PASS core/skills/sw-build/SKILL.md:49-51 and evals/tests/test_pivot_regeneration_alignment.py:95-103,155-222
AC-3 PASS core/skills/sw-verify/SKILL.md:78-91,193-195,222, core/skills/sw-ship/SKILL.md:59-64,144, and evals/tests/test_pivot_regeneration_alignment.py:114-131
AC-4 PASS core/protocols/git-freshness.md:167-177 plus evals/tests/test_pivot_regeneration_alignment.py:133-149
AC-5 PASS evals/tests/test_pivot_regeneration_alignment.py:45-222 and evals/tests/test_verify_mutation_surface.py:20-83

Gate Summary

Gate Verdict Evidence
build PASS evidence/build-report.md
tests PASS evidence/test-quality.md
security PASS evidence/security-report.md
wiring PASS evidence/wiring-report.md
semantic PASS evidence/semantic-report.md
spec PASS evidence/spec-compliance.md

Remaining Attention

  • WARN: the work-level design approval lineage is stale (artifact-set-changed). The implementation and unit-spec proof surface are current, but reviewer approval of the latest design artifact set has not been refreshed.

Evidence Links

  • Reviewer-usable approval lineage, conformance, and gate summaries are inlined above because this repo uses clone-local work artifacts.
  • The underlying evidence files remain recorded in workflow state under units/02-remaining-work-regeneration-alignment/evidence/.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Eval Smoke Results

✅ Smoke evals clean — 0 regressions, 3 improvements

Eval Pass Rate Duration Tokens (input+output) Verdict
grader-function-tests 1.00 (+0.00) 399ms (-1095ms) 0 (+0) improved
structural-handoff-template 1.00 (+0.00) 30ms (-13ms) 0 (+0) improved
structural-skill-validation 1.00 (+0.00) 3383ms (+825ms) 0 (+0) ok
structural-state-enforcement 1.00 (+0.00) 75ms (-16ms) 0 (+0) improved
workflow-yaml-validation 1.00 (+0.00) 1305ms (+5ms) 0 (+0) ok

Posted by Specwright eval-smoke workflow. This comment is updated on each push.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 21, 2026

Review: fix(pivot): align remaining-work regeneration

The scope and intent are clear: align the manual-reconcile contract across sw-build, sw-verify, sw-ship, and git-freshness so operators get one consistent instruction and no stage can silently bypass a freshness block. The sw-plan regeneration semantics are also well-tightened. Overall this is a solid fix. A few issues worth addressing before merge:


Correctness

sw-verify/SKILL.md — duplicate gate list sentence (line 100)
After the PR, the gate-execution section contains:

All six gates are eligible when enabled in config: build, tests, security, wiring, semantic, spec.
Eligible gates: build, tests, security, wiring, semantic, spec.

The second sentence is a word-for-word restatement of the first. This is a diff-merge artifact; it should be deleted. An LLM executing this skill may treat the two sentences as separate constraints and behave unexpectedly.

sw-verify/SKILL.mddecision.md protocol reference lost its key descriptor (line 206)
The description was trimmed from "autonomous decision framework and gate handoff" to "gate handoff". Skills use these one-line descriptors to decide which protocols to load during autonomous execution. Dropping "autonomous decision framework" risks the protocol being skipped when an agent needs to apply the convergence-loop or assumption-lifecycle logic from protocols/decision.md.

sw-plan/SKILL.md — "open scope" is an undefined term (line ~99)
"regenerate integration-criteria.md for the current open scope defined by the affected remaining units" introduces "open scope" without a prior definition in the skill. The concept it describes is already expressed as "affected remaining units" four lines above. Using a new term here creates an unnecessary ambiguity risk for agents parsing this skill; "remaining open work" or just restating "affected remaining units" would be clearer.


Test quality

test_pivot_regeneration_alignment.py_run_node_json helper is duplicated
The function at lines 20–36 is functionally identical to the one in test_pivot_rebaselining_foundation.py (lines 17–28), with only an added env parameter. This should be moved to evals/tests/_text_helpers.py and the prior file updated to use the shared version. Two copies will diverge.

Wide regex windows risk missing real contract drift
Several chained patterns use [\s\S]{0,220} or [\s\S]{0,260} multiple times in a row (e.g., test_build_treats_pivoted_unit_artifacts_as_current_approval_surface). With up to 440 characters between the first and last anchor, these tests would still pass if the matched concepts appear in entirely separate paragraphs. Tightening to {0,100} or splitting into two independent assertions would catch actual proximity drift.

TestApprovalRefreshProof — missing stale detection path
The proof verifies that recordApproval a second time supersedes the first entry and the latest entry assesses as APPROVED. It does not verify the inverse: that the pre-pivot (v1) entry assesses as STALE against the v2 artifacts. That's the actual regression risk — a stale first entry must not look APPROVED after the pivot changes artifacts. A second assertion on entries[0] assessed against v2 artifacts would complete the proof.


Design approval lineage

The PR itself notes the work-level design approval is STALE (artifact-set-changed). The gate evidence and unit-spec approval are current, but a reviewer cannot confirm the implementation matches the last-reviewed design without re-approving the design artifact set. This should be resolved before merge or explicitly documented as a known deviation.

Comment thread core/skills/sw-verify/SKILL.md Outdated
All six gates are eligible: build, tests, security, wiring, semantic, spec.
All six gates are eligible when enabled in config: build, tests, security,
wiring, semantic, spec.
Eligible gates: build, tests, security, wiring, semantic, spec.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant sentence — the gate list was already stated on the line above. This appears to be a diff-merge artifact.

Suggested change
Eligible gates: build, tests, security, wiring, semantic, spec.
Execute enabled gates in dependency order: gate-build → gate-tests →

Comment thread core/skills/sw-verify/SKILL.md Outdated
- `protocols/decision.md` -- autonomous decision framework and gate handoff
- `protocols/state.md` -- workflow state and locking
- `protocols/stage-boundary.md` -- stage boundary
- `protocols/decision.md` -- gate handoff
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior description "autonomous decision framework and gate handoff" was load-bearing: agents use these one-line descriptors to decide which protocols are relevant for their current task. Trimming to "gate handoff" only makes decision.md invisible during autonomous decision steps (convergence loop, assumption lifecycle, CCR), not just gate handoff.

Suggested change
- `protocols/decision.md` -- gate handoff
- `protocols/decision.md` -- autonomous decision framework and gate handoff

FRESHNESS_PROTOCOL = ROOT_DIR / "core" / "protocols" / "git-freshness.md"


def _run_node_json(script: str, env: dict[str, str] | None = None) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a near-copy of _run_node_json in test_pivot_rebaselining_foundation.py (lines 17–28), with only the env parameter added. Two copies will diverge. Consider moving this (with the env parameter) to _text_helpers.py and updating the prior file to import from there.

Comment on lines +97 to +102
self.build_text,
re.compile(
r"(pivot|replan|regenerated)[\s\S]{0,220}(approval surface|unit-spec)[\s\S]{0,220}(refresh|record)|"
r"(refresh|record)[\s\S]{0,220}(unit-spec|approval surface)[\s\S]{0,220}(pivot|replan|regenerated)",
re.IGNORECASE,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chained [\s\S]{0,220} windows let up to 440 characters separate the first and last anchors — wide enough to match concepts appearing in separate paragraphs. This test would pass even if "pivot", "unit-spec", and "refresh" were mentioned in completely unrelated sentences.

Consider splitting into two independent assertions (pivot→approval-surface, then approval-surface→refresh) each with a tighter {0,100} window, or verifying paragraph-level proximity instead.

Comment on lines +219 to +222
self.assertEqual(result["entryCount"], 2)
self.assertEqual(result["firstStatus"], "SUPERSEDED")
self.assertEqual(result["latestStatus"], "APPROVED")
self.assertEqual(result["assessmentStatus"], "APPROVED")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proof shows that the second approval (v2) supersedes the first and assesses as APPROVED — correct. But it doesn't test the actual pivot regression: the first entry (v1 approval) assessed against v2 artifacts should return STALE. That's the state an operator would see if they ran sw-build without refreshing the approval after a pivot, and it's the case this unit is explicitly guarding against.

Consider adding:

stale_assessment = assessApprovalEntry(entries[0], {
    baseDir: unitRoot,
    artifacts: ['context.md', 'plan.md', 'spec.md']
})

and asserting stale_assessment.status === 'STALE' (or SUPERSEDED).

@MacAttak MacAttak merged commit 267c953 into main Apr 21, 2026
13 checks passed
@MacAttak MacAttak deleted the work/02-remaining-work-regeneration-alignment branch April 21, 2026 04:49
@github-actions github-actions Bot mentioned this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant