v0.2.0
Baseline-first supergoal skill: the role-separated critic→fixer→verify loop is the default, plus new recording / DB / aesthetic capabilities, a polished-by-default UI/UX overlay, and a hardened harness-eval methodology backed by an extensive eval sweep.
New
- Surfaced-requirements record. The role-loop critic now writes each implicit requirement it surfaces to
docs/surfaced-requirements.md(what the spec implies / why it's required though unstated / covering test / status); the verifier closes them out. A durable, human-readable trail alongside the failing tests. (reference/role-loop.md,templates/surfaced-requirements.md,tests/role-loop-contract.test.sh) - Self-contained read-only DB access.
templates/db-access/(db-access.mjs+ helper scripts +.env.example) andreference/db-access.md, wired into GREENFIELD/DEBUG/LEGACY/QA-ONLY for optional schema/data evidence. (tests/db-access-contract.test.sh) - Expressive aesthetic families. Selectable UI/UX families — minimalist-ui, high-end-visual-design, industrial-brutalist-ui — in
reference/taste-aesthetics.md, wired throughreference/ui-ux.mdandagents/designer.md.
UI/UX: polished by default
- Expressive/polished is the default for ALL user-facing UI. The overlay no longer classifies a surface into Expressive vs Functional tiers where Functional shipped a plainer result.
reference/taste-skill-v2.mdis now the authority for every user-facing surface, applied from Frame through Build and QA. functional-ui.mddemoted to a density add-on. It is layered ON TOP of the Expressive baseline for dense admin/dashboard surfaces (density discipline + complete UI states), never a reason to lower polish.- Core-principle carve-out. Scope-minimalism governs code surface area and feature count, NOT visual quality — for user-facing UI a polished result is baseline correctness, not padding to defer until asked. (
SKILL.md,reference/ui-ux.md)
Harness-eval methodology (hardened)
- Pick a discriminating regime (spec-completeness × baseline strength, not difficulty tier), validate the fixture discriminates (stub fails / reference passes / lazy fails) before spending compute, and a mandatory equal-compute control with a two-claim framing: (a) skill vs one-shot default, (b) mechanism vs equal compute.
- RevFactory corpus + runnable fixtures (
templates/harness-eval-cases/fixtures/, authored under-spec specs). - Eval sweep (codex via headroom, ground-truth scoring): explicit-spec medium/hard/expert all tie baseline vs harness; on an under-specified latent-correctness task the skill catches a prototype-pollution false-GREEN a one-shot ships (useful via forced verification) — but an equal-compute naive loop matches it, so the active ingredient is the extra passes, not the role-separation.
Changed / removed
- Removed the dead HARNESS-MAKE machinery and its test suites.
- All 10 contract suites green.