Skip to content

release: v4.7.0 — auto-rebake PRs + drift issues lead with structured verdict#320

Merged
askalf merged 1 commit into
masterfrom
feat/v4.7.0-drift-summary-header
May 18, 2026
Merged

release: v4.7.0 — auto-rebake PRs + drift issues lead with structured verdict#320
askalf merged 1 commit into
masterfrom
feat/v4.7.0-drift-summary-header

Conversation

@askalf
Copy link
Copy Markdown
Owner

@askalf askalf commented May 18, 2026

What does this PR do?

Closes an ergonomic gap that PR #317 (tonight's first real-world auto-rebake) exposed: the PR body opened with raw `[bake]` log output, then a unified-line diff. A reviewer had to read ~60 lines of detail to decide ship-or-investigate. The common case (text-only drift, ship it) looked identical at a glance to the rare case (tools removed, body_field_order changed, investigate).

v4.7.0 leads with a one-line verdict + per-axis bullets.

Two new exports in `scripts/drift-report.mjs`

`interpretDrift(diff)` — classifies the slot-level diff into a structured summary:

```typescript
{
toolsAdded: string[]
toolsRemoved: string[]
betasAdded: string[]
betasRemoved: string[]
systemPromptDelta: number // chars added/removed, signed
agentIdentityChanged: boolean
bodyFieldOrderChanged: boolean
headerOrderChanged: boolean
verdict: 'benign' | 'moderate' | 'substantive'
}
```

Verdict ladder (conservative — substantive dominates moderate dominates benign):

Verdict Triggers Action
`benign` only text-content drift (system_prompt / agent_identity / tool descriptions) ship — the 90%+ case
`moderate` tools added, betas added/removed, agent_identity changed probably ship, closer read
`substantive` tools removed, body_field_order or header_order changed investigate; can break canonical-rebuild

`formatDriftSummary(interpretation)` — renders the structured summary as markdown for embedding in PR + issue bodies. Lead line: `Verdict: ✅ Benign` / `🟡 Moderate` / `🔴 Substantive`, then per-axis bullets with brief context.

Wiring

  • `scripts/capture-and-bake.mjs --check`: prints the verdict-led summary before the unified-line detail. Also writes `drift-summary.md` to disk so the workflow can drop it verbatim into PR/issue bodies.
  • `.github/workflows/cc-drift-template-watch.yml`: both the auto-rebake PR body and the drift tracking issue body lead with a "### Summary" section before the existing "### Drift report" code block.

What a reviewer sees on the next drift PR (before reading any detail)

Verdict: ✅ Benign

  • system_prompt: -2107 chars net (text-content drift — see unified diff below)

Click merge. Done. The unified diff stays inline for the rare case where slot-level signal isn't enough.

Tests

`test/bake-drift-report.mjs` gains 12 headers (20-31) / 27 assertions covering:

  • Empty diff returns benign verdict, zero counts
  • Per-slot verdict promotions (benign → moderate → substantive)
  • Substantive-dominates-moderate ordering
  • Multi-tool comma-split parsing
  • `formatDriftSummary` emoji + label + bullet rendering across the three verdicts

69/69 file tests pass; 75/75 full suite green.

How to test

```bash
git fetch origin feat/v4.7.0-drift-summary-header
git checkout feat/v4.7.0-drift-summary-header
npm run build && npm test # 75/75 (no src/ changes; new tests in test/bake-drift-report.mjs)

Optional: simulate the summary output:

node -e "
import('./scripts/drift-report.mjs').then(({ interpretDrift, formatDriftSummary }) => {
const interp = interpretDrift([
{ summary: 'tools added: NewTool' },
{ summary: 'system_prompt content changed (12000 → 12150 chars, delta +150)' },
]);
console.log(formatDriftSummary(interp).join('\n'));
});
"
```

Checklist

  • `npm run build` passes
  • `npm test` passes (offline regression test, no credentials required) — 75/75
  • For changes that touch `proxy.ts`, `cc-template.ts`, or streaming behavior: tested with `dario proxy --verbose` + `node test/compat.mjs` (requires credentials) — N/A: scripts + workflow + tests only
  • No new runtime dependencies added
  • No tokens/secrets in code or logs

PR #317 (tonight's first real-world auto-rebake) showed the
chain works end-to-end but surfaced an ergonomic gap: the PR
body opened with raw [bake] log output, then a unified-line
diff. A reviewer had to read ~60 lines to decide ship-or-
investigate. The common case (text-only drift, ship it) looked
identical at a glance to the rare case (tools removed,
investigate).

v4.7.0 leads with a one-line verdict + per-axis bullets.

scripts/drift-report.mjs gains two exports:
- interpretDrift(diff) — classifies the slot-level diff into a
  structured summary { toolsAdded, toolsRemoved, betasAdded,
  betasRemoved, systemPromptDelta, agentIdentityChanged,
  bodyFieldOrderChanged, headerOrderChanged } + a single
  verdict: 'benign' | 'moderate' | 'substantive'. Verdict
  ladder is conservative; substantive dominates moderate
  dominates benign.
- formatDriftSummary(interpretation) — renders the structured
  summary as markdown for direct embedding in PR + issue
  bodies. Leads with **Verdict:** ✅/🟡/🔴 + label, then
  per-axis bullets with brief context.

Verdict tiers:
- benign — only text content changed (the 90%+ case)
- moderate — tools added, betas changed, agent_identity changed
- substantive — tools REMOVED, body_field_order or header_order
  changed (can break canonical-rebuild paths)

Wiring:
- capture-and-bake.mjs --check: prints verdict-led summary
  before the unified-line detail; writes drift-summary.md to
  disk so the workflow can drop it verbatim into PR/issue
  bodies without grep-parsing the [bake] log.
- cc-drift-template-watch.yml: both the auto-rebake PR body and
  the drift tracking issue body lead with a "### Summary"
  section before the existing "### Drift report" code block.
  Guarded by `[ -f drift-summary.md ]` for backward compat.

Tests: test/bake-drift-report.mjs gains 12 headers / 27
assertions covering empty-diff verdict, per-slot verdict
promotions, multi-axis aggregation, comma-split parsing,
formatDriftSummary emoji + label + bullet rendering. 69/69
file tests pass; 75/75 full suite green.

No src/ changes.
@askalf askalf enabled auto-merge (squash) May 18, 2026 00:46
@github-actions
Copy link
Copy Markdown
Contributor

Compat test: ❌ FAILED

Ran node test/compat.mjs against dario proxy --passthrough on the self-hosted runner for commit f6c5c167bdf332dcde360ea14880e5ae326a7fba.

Output
============================================================
  dario Compatibility Validation (--passthrough)
  2026-05-18T00:47:01.991Z
============================================================

--- Anthropic Messages API (Hermes) ---
❌ #1 Anthropic non-stream: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind
❌ #2 Anthropic stream: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind
❌ #3 SSE framing: HTTP 429

--- Passthrough Verification ---
❌ #4 No thinking injection: HTTP 429
❌ #5 Client betas preserved: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind

--- Tool Use (OpenClaw) ---
❌ #6 Tool use: stop_reason=undefined tool=false
❌ #7 Tool use stream: HTTP 429

--- OpenAI Compat ---
❌ #8 OpenAI non-stream: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind
❌ #9 OpenAI stream: HTTP 429

--- Header Visibility ---
✅ #10 Header visibility: request-id=true | ratelimit=false (0 headers)

============================================================
  RESULTS: 1 passed, 9 failed, 0 warnings
============================================================

Failed:
  #1 Anthropic non-stream: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind
  #2 Anthropic stream: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind
  #3 SSE framing: HTTP 429
  #4 No thinking injection: HTTP 429
  #5 Client betas preserved: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind
  #6 Tool use: stop_reason=undefined tool=false
  #7 Tool use stream: HTTP 429
  #8 OpenAI non-stream: HTTP 429: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited (rejected). Limiting wind
  #9 OpenAI stream: HTTP 429

Full workflow run

@askalf askalf merged commit b72789d into master May 18, 2026
9 of 10 checks passed
@askalf askalf deleted the feat/v4.7.0-drift-summary-header branch May 18, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant