Skip to content

v0.2.0

Choose a tag to compare

@cskwork cskwork released this 07 Jun 12:45
· 12 commits to main since this release

Baseline-first supergoal skill: the role-separated critic→fixer→verify loop is the default, plus new recording / DB / aesthetic capabilities, a polished-by-default UI/UX overlay, and a hardened harness-eval methodology backed by an extensive eval sweep.

New

  • Surfaced-requirements record. The role-loop critic now writes each implicit requirement it surfaces to docs/surfaced-requirements.md (what the spec implies / why it's required though unstated / covering test / status); the verifier closes them out. A durable, human-readable trail alongside the failing tests. (reference/role-loop.md, templates/surfaced-requirements.md, tests/role-loop-contract.test.sh)
  • Self-contained read-only DB access. templates/db-access/ (db-access.mjs + helper scripts + .env.example) and reference/db-access.md, wired into GREENFIELD/DEBUG/LEGACY/QA-ONLY for optional schema/data evidence. (tests/db-access-contract.test.sh)
  • Expressive aesthetic families. Selectable UI/UX families — minimalist-ui, high-end-visual-design, industrial-brutalist-ui — in reference/taste-aesthetics.md, wired through reference/ui-ux.md and agents/designer.md.

UI/UX: polished by default

  • Expressive/polished is the default for ALL user-facing UI. The overlay no longer classifies a surface into Expressive vs Functional tiers where Functional shipped a plainer result. reference/taste-skill-v2.md is now the authority for every user-facing surface, applied from Frame through Build and QA.
  • functional-ui.md demoted to a density add-on. It is layered ON TOP of the Expressive baseline for dense admin/dashboard surfaces (density discipline + complete UI states), never a reason to lower polish.
  • Core-principle carve-out. Scope-minimalism governs code surface area and feature count, NOT visual quality — for user-facing UI a polished result is baseline correctness, not padding to defer until asked. (SKILL.md, reference/ui-ux.md)

Harness-eval methodology (hardened)

  • Pick a discriminating regime (spec-completeness × baseline strength, not difficulty tier), validate the fixture discriminates (stub fails / reference passes / lazy fails) before spending compute, and a mandatory equal-compute control with a two-claim framing: (a) skill vs one-shot default, (b) mechanism vs equal compute.
  • RevFactory corpus + runnable fixtures (templates/harness-eval-cases/fixtures/, authored under-spec specs).
  • Eval sweep (codex via headroom, ground-truth scoring): explicit-spec medium/hard/expert all tie baseline vs harness; on an under-specified latent-correctness task the skill catches a prototype-pollution false-GREEN a one-shot ships (useful via forced verification) — but an equal-compute naive loop matches it, so the active ingredient is the extra passes, not the role-separation.

Changed / removed

  • Removed the dead HARNESS-MAKE machinery and its test suites.
  • All 10 contract suites green.