release: v4.7.0 — auto-rebake PRs + drift issues lead with structured verdict#320
Merged
Merged
Conversation
PR #317 (tonight's first real-world auto-rebake) showed the chain works end-to-end but surfaced an ergonomic gap: the PR body opened with raw [bake] log output, then a unified-line diff. A reviewer had to read ~60 lines to decide ship-or- investigate. The common case (text-only drift, ship it) looked identical at a glance to the rare case (tools removed, investigate). v4.7.0 leads with a one-line verdict + per-axis bullets. scripts/drift-report.mjs gains two exports: - interpretDrift(diff) — classifies the slot-level diff into a structured summary { toolsAdded, toolsRemoved, betasAdded, betasRemoved, systemPromptDelta, agentIdentityChanged, bodyFieldOrderChanged, headerOrderChanged } + a single verdict: 'benign' | 'moderate' | 'substantive'. Verdict ladder is conservative; substantive dominates moderate dominates benign. - formatDriftSummary(interpretation) — renders the structured summary as markdown for direct embedding in PR + issue bodies. Leads with **Verdict:** ✅/🟡/🔴 + label, then per-axis bullets with brief context. Verdict tiers: - benign — only text content changed (the 90%+ case) - moderate — tools added, betas changed, agent_identity changed - substantive — tools REMOVED, body_field_order or header_order changed (can break canonical-rebuild paths) Wiring: - capture-and-bake.mjs --check: prints verdict-led summary before the unified-line detail; writes drift-summary.md to disk so the workflow can drop it verbatim into PR/issue bodies without grep-parsing the [bake] log. - cc-drift-template-watch.yml: both the auto-rebake PR body and the drift tracking issue body lead with a "### Summary" section before the existing "### Drift report" code block. Guarded by `[ -f drift-summary.md ]` for backward compat. Tests: test/bake-drift-report.mjs gains 12 headers / 27 assertions covering empty-diff verdict, per-slot verdict promotions, multi-axis aggregation, comma-split parsing, formatDriftSummary emoji + label + bullet rendering. 69/69 file tests pass; 75/75 full suite green. No src/ changes.
Contributor
Compat test: ❌ FAILEDRan Output |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Closes an ergonomic gap that PR #317 (tonight's first real-world auto-rebake) exposed: the PR body opened with raw `[bake]` log output, then a unified-line diff. A reviewer had to read ~60 lines of detail to decide ship-or-investigate. The common case (text-only drift, ship it) looked identical at a glance to the rare case (tools removed, body_field_order changed, investigate).
v4.7.0 leads with a one-line verdict + per-axis bullets.
Two new exports in `scripts/drift-report.mjs`
`interpretDrift(diff)` — classifies the slot-level diff into a structured summary:
```typescript
{
toolsAdded: string[]
toolsRemoved: string[]
betasAdded: string[]
betasRemoved: string[]
systemPromptDelta: number // chars added/removed, signed
agentIdentityChanged: boolean
bodyFieldOrderChanged: boolean
headerOrderChanged: boolean
verdict: 'benign' | 'moderate' | 'substantive'
}
```
Verdict ladder (conservative — substantive dominates moderate dominates benign):
`formatDriftSummary(interpretation)` — renders the structured summary as markdown for embedding in PR + issue bodies. Lead line: `Verdict: ✅ Benign` / `🟡 Moderate` / `🔴 Substantive`, then per-axis bullets with brief context.
Wiring
What a reviewer sees on the next drift PR (before reading any detail)
Click merge. Done. The unified diff stays inline for the rare case where slot-level signal isn't enough.
Tests
`test/bake-drift-report.mjs` gains 12 headers (20-31) / 27 assertions covering:
69/69 file tests pass; 75/75 full suite green.
How to test
```bash
git fetch origin feat/v4.7.0-drift-summary-header
git checkout feat/v4.7.0-drift-summary-header
npm run build && npm test # 75/75 (no src/ changes; new tests in test/bake-drift-report.mjs)
Optional: simulate the summary output:
node -e "
import('./scripts/drift-report.mjs').then(({ interpretDrift, formatDriftSummary }) => {
const interp = interpretDrift([
{ summary: 'tools added: NewTool' },
{ summary: 'system_prompt content changed (12000 → 12150 chars, delta +150)' },
]);
console.log(formatDriftSummary(interp).join('\n'));
});
"
```
Checklist