Skip to content

v0.12.0 — evidence-tracker hook

Choose a tag to compare

@alexherrero alexherrero released this 23 May 19:24
· 449 commits to main since this release

Minor — 4th base hook in the toolkit after kill-switch / steer / commit-on-stop (ADR 0003). Paired with agentic-harness v2.6.0 which ships the corresponding /work §5b spec amendment.

What's new

Ships evidence-tracker — a PreToolUse hook that enforces a default-FAIL evidence contract on harness /work task closeouts.

The agent must demonstrably read (via the Read tool, which the hook observes) a spec/test/evidence file matching the task's requirement before a Write/Edit that flips PLAN.md [ ][x] is allowed. Hook blocks (exit 2) otherwise with a helpful stderr message + 3 recovery paths.

Why this matters

Today's /work trusts the agent to verify before flipping a task [x]. The verification step is in the contract but not observable — sometimes the agent claims completion based on partial signals. Per the cwc-long-running-agents pattern: "The only evidence that counts is a file matching the patterns." This hook makes that observable + enforced.

Evidence resolution — HYBRID

  • HEURISTIC by default: files under tests/ or spec/, matching *.spec.* / *.test.* / *_test.py / test_*.py with a code extension (markdown explicitly excluded), OR any path literally named in the task's **Verification:** text.
  • Per-task override via **Evidence:** <glob-or-paths> task-body annotation.
  • Explicit opt-out via **Evidence:** none — <rationale> (rationale mandatory; becomes audit trail).

Locked design calls

Q Decision
Q1 Evidence resolution HYBRID (heuristic + per-task override + explicit opt-out)
Q2 Granularity Per-task PLAN.md [ ][x] flips ONLY (not features.json passes:true)
Q3 Bypass EXPLICIT OPT-OUT only (no auto-detection)

Full rationale + 4 load-bearing assumptions with re-audit triggers in ADR 0009.

Operator-facing doc

Use The Evidence-Tracker Hook how-to — when-it-fires per-tool table + 3 worked scenarios + 6-row troubleshooting + dogfood walkthrough.

Internal

  • hooks/evidence-tracker/evidence_tracker.py (~720 lines, stdlib-only) with 61 unit tests across 9 classes; wired into all 3 OS CI workflows.
  • Installer install_hook / Install-Hook extended to copy sibling .py helpers alongside hook entry scripts — first hook to ship a Python sidecar; pattern documented for future hooks.
  • 21 → 0 toolkit-side check-wiki structural drift cleanup landed alongside (commit 8793237).

Coordinated-release ordering

6th consecutive paired-release pair (after v0.9.0 / v0.9.2 / v0.10.0 / v0.11.0 / v0.11.1). First real-substance toolkit MINOR since v0.11.0 (intervening v0.11.1 was wiki-only). This release tagged first; harness v2.6.0 notes URL-link back here per [[coordinated-release-order]].

Full changelog

v0.11.1..v0.12.0. See CHANGELOG.md for the full entry.