Release v0.6.0 — evaluator agent + kind=agent installer support · alexherrero/crickets

First customization beyond skills: the evaluator sub-agent — a read-only fresh-context grader. Caller supplies an ARTIFACT: reference and a numbered RUBRIC: inline in the dispatch prompt; the evaluator returns a structured PASS / NEEDS_WORK verdict with per-rubric-item reasoning citing the artifact.

Tool allowlist is [Read, Glob, Grep] only — no Bash, no Write, no Edit. The "fresh context that never saw the build" framing is structurally protected: the evaluator cannot re-execute, cannot mutate, cannot fetch.

Paired with agentic-harness v2.1.0, which adds a new optional §3b "evaluator augmentation" section to /review documenting how to dispatch the evaluator alongside the existing adversarial-reviewer. The two are complementary: adversarial finds defects ("the code contains bugs"); evaluator grades against an explicit rubric ("did this satisfy claims 1–5?"). The harness works standalone without the toolkit — §3b graceful-skips when agent-toolkit is absent.

Highlights

agents/evaluator.md — full body covering Purpose / Tool allowlist / Input contract (ARTIFACT + RUBRIC labeled sections) / Output contract (PASS/NEEDS_WORK header + per-rubric-item PASS/FAIL + Verdict line) / Workflow / 4 Failure modes / 7 Anti-patterns / 2 worked examples.
First-class kind: agent dispatch across install.sh + install.ps1: per-host paths (single file for claude-code + gemini-cli; sub-agent-as-skill wrap for Antigravity); new --agent flag; validate-manifests.py knows agents; bundle dispatch handles inner agents.
ADR 0002 — evaluator design — captures Context (forces driving the new primitive), Decision (5 locked design calls), Consequences (5 positive + 4 negative + load-bearing assumptions).
How to use the evaluator — practical recipe with three worked rubrics and four common failure modes with symptoms + fixes.
MANAGED_PARENTS extended with .claude/agents + .gemini/agents for true-sync --update orphan cleanup of toolkit-managed agents.

Future consumers

The evaluator is the load-bearing primitive for several future roadmap items:

Design skill (#6) — per-step grader for the design-doc → execution loop.
Quality-gates bundle (#10) — packages evaluator + kill-switch + steer + commit-on-stop + evidence-tracker for one-shot install.
Long-running custom skills — any skill that wants automated PASS / NEEDS_WORK grading on its own output.

See CHANGELOG.md for the full v0.6.0 entry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0 — evaluator agent + kind=agent installer support

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Future consumers

Uh oh!