Skip to content

Releases: cskwork/supergoal-skill

v0.3.0 - REVIEW-ONLY mode

09 Jun 18:40

Choose a tag to compare

v0.3.0 - REVIEW-ONLY mode, lean-loop coherence, leaner domain-context

New: REVIEW-ONLY mode (9th mode)

"review / audit this code/diff/PR - no fixes" is now a routed intent (reference/review-only.md):

  • Two independent reviewers in parallel: code-reviewer in findings-only stance (each untested
    required behavior becomes a finding naming the missing test - the repo stays untouched) plus
    security-reviewer.
  • Every CRITICAL/HIGH finding is re-checked against the cited code before it enters the report.
  • Deliverable: severity-ordered report.md with Untested behaviors: and Not covered: anchors.
    Fixes are a new objective - route to DEBUG/LEGACY with the report as input.
  • New contract test (tests/review-only-contract.test.sh, 15 checks). Mode grids updated in
    SKILL.md, README, README.ko, and the landing page (8 -> 9 modes).

Changed: reference/agents aligned with the lean loop

The baseline-first rewrite had removed the gated ceremony (Validate / Human Feedback / Committee /
Deliver, delivery-gate.sh, claims.md, ten-rules), but many reference and persona files still
spoke that vocabulary. All live files now speak the current loop (Frame -> Build -> Critic -> Fixer
-> Verify):

  • agents/code-reviewer.md is now the Critic persona (it previously lacked the Write tool and could
    not author the failing tests role-loop requires).
  • agents/executor.md is now Builder/Fixer with the fixer constraints (never edit test files, no
    padding); claims.md is retired in favor of a run-to-prove line in the run vault README.md.
  • DEBUG no longer blocks on a "Human Feedback" round-trip: read-only until the root cause is
    confirmed by direct evidence, user stops only on hard stops (destructive fix / genuine ambiguity).
  • The external diagnose-skill dependency is gone; the trusted feedback-loop requirement is inlined.
  • tests/role-loop-contract.test.sh retargeted to the docs/changelog/<YYYY-MM>/<DD-topic>/ vault
    (it was failing against the old path).

Changed: reference/domain-context.md compressed 255 -> 142 lines (-44%)

Same contract, fewer tokens: per-file contract folded to one line per file, duplicated loops merged,
all contract-pinned strings and the Domain Brief format preserved.

Verification

11/11 contract suites, 363 assertions pass, 0 fail.

Full changelog: log/changelog-2026-06-10.md (commits 98c6bf8, 4d99bf5)

v0.2.2

09 Jun 16:46

Choose a tag to compare

Run-vault doc layout: month-grouped + surfaced-requirements unified

Per-task artifacts now live in one month-grouped run vault instead of a flat date-prefixed folder plus a separate global file.

  • Run vault: docs/changelog/<date>-<slug>/ -> docs/changelog/<YYYY-MM>/<DD-topic>/ (e.g. docs/changelog/2026-06/10-add-auth/).
  • Surfaced requirements: flat global docs/surfaced-requirements.md -> per-run …/<DD-topic>/surfaced-requirements.md — one file per run, same lifecycle as the run's other evidence.
  • QA vault and harness-eval persist_path follow the same layout.
  • Keeps the changelog/ namespace so run vaults don't pollute docs/ root; existing history left as-is.

Also includes: interview-style checks in LEARN docs.

Files: SKILL.md, reference/{domain-context,role-loop,qa-only}.md, templates/surfaced-requirements.md, templates/qa-gate.sh, qa-only-gate.sh, templates/harness-eval-case.yaml + harness-eval-cases/*.yaml (16).

v0.2.1

08 Jun 05:51

Choose a tag to compare

Portable /supergoal docs and contract release.

Changed

  • Present /supergoal as a portable agent skill for Claude Code, Codex, agy, and other agent CLIs instead of a Claude Code-only workflow.
  • Polish the Korean README and landing page copy so the value proposition, workflow, install path, and proof sections read naturally for developers and non-developers.
  • Sync the landing page interview metric to the current <=5 clarification-question contract.
  • Add the difficulty-gate contract: beyond very easy tasks, red-green evidence is required, and DB evidence is required when persisted data is load-bearing.

Verified

  • tests/*.test.sh all pass.
  • README.md and README.ko.md local links pass.
  • docs/index.html EN/KO language blocks are balanced at 79/79, with key HTML tag counts balanced.

v0.2.0

07 Jun 12:45

Choose a tag to compare

Baseline-first supergoal skill: the role-separated critic→fixer→verify loop is the default, plus new recording / DB / aesthetic capabilities, a polished-by-default UI/UX overlay, and a hardened harness-eval methodology backed by an extensive eval sweep.

New

  • Surfaced-requirements record. The role-loop critic now writes each implicit requirement it surfaces to docs/surfaced-requirements.md (what the spec implies / why it's required though unstated / covering test / status); the verifier closes them out. A durable, human-readable trail alongside the failing tests. (reference/role-loop.md, templates/surfaced-requirements.md, tests/role-loop-contract.test.sh)
  • Self-contained read-only DB access. templates/db-access/ (db-access.mjs + helper scripts + .env.example) and reference/db-access.md, wired into GREENFIELD/DEBUG/LEGACY/QA-ONLY for optional schema/data evidence. (tests/db-access-contract.test.sh)
  • Expressive aesthetic families. Selectable UI/UX families — minimalist-ui, high-end-visual-design, industrial-brutalist-ui — in reference/taste-aesthetics.md, wired through reference/ui-ux.md and agents/designer.md.

UI/UX: polished by default

  • Expressive/polished is the default for ALL user-facing UI. The overlay no longer classifies a surface into Expressive vs Functional tiers where Functional shipped a plainer result. reference/taste-skill-v2.md is now the authority for every user-facing surface, applied from Frame through Build and QA.
  • functional-ui.md demoted to a density add-on. It is layered ON TOP of the Expressive baseline for dense admin/dashboard surfaces (density discipline + complete UI states), never a reason to lower polish.
  • Core-principle carve-out. Scope-minimalism governs code surface area and feature count, NOT visual quality — for user-facing UI a polished result is baseline correctness, not padding to defer until asked. (SKILL.md, reference/ui-ux.md)

Harness-eval methodology (hardened)

  • Pick a discriminating regime (spec-completeness × baseline strength, not difficulty tier), validate the fixture discriminates (stub fails / reference passes / lazy fails) before spending compute, and a mandatory equal-compute control with a two-claim framing: (a) skill vs one-shot default, (b) mechanism vs equal compute.
  • RevFactory corpus + runnable fixtures (templates/harness-eval-cases/fixtures/, authored under-spec specs).
  • Eval sweep (codex via headroom, ground-truth scoring): explicit-spec medium/hard/expert all tie baseline vs harness; on an under-specified latent-correctness task the skill catches a prototype-pollution false-GREEN a one-shot ships (useful via forced verification) — but an equal-compute naive loop matches it, so the active ingredient is the extra passes, not the role-separation.

Changed / removed

  • Removed the dead HARNESS-MAKE machinery and its test suites.
  • All 10 contract suites green.

v0.1.8 - SKILL-MINE mode

05 Jun 14:27
7373ad7

Choose a tag to compare

v0.1.8 — SKILL-MINE mode

A seventh mode for turning your repeated work into a reusable skill. Trigger it with "make a skill", "스킬 만들어", "learn new skill", "make skill from history", or "이거 자주 하는데 스킬로".

It mines your recent agent session history (~/.claude/projects/*.jsonl, adaptive 7-30 day window), surfaces 3-5 candidate skills ranked by frequency × payoff, and lets you pick / reject / name a new one. On your pick it forges ONE cross-agent-portable SKILL.md (the agentskills.io open standard) and installs it. The human pick is a hard gate — it never creates or installs a skill you did not approve (the validation step that Hermes-style autonomous skill creation lacks). It writes no production code and no worktree.

Highlights

  • Pipeline: Intake → Window → Mine → Rank → Suggest → Human pick/reject → Forge → Verify → Install → Journal. No worktree, no delivery gate.
  • Mechanical miner (templates/skill-mine/mine.mjs, pure Node, no deps): reads the full tool-call transcripts (not the prompt-only history.jsonl), picks an adaptive window, and emits frequent Bash command signatures + first-prompt intent clusters + already-used skills. A real-data experiment confirmed raw tool-name n-grams are generic noise; the signal lives in command signatures and intent clusters.
  • Portable forge: generates a directory + SKILL.md to the agentskills.io standard. A frontmatter gate (templates/skill-frontmatter-gate.mjs) enforces the portable limits — name lowercase/≤64/no reserved word, description+when_to_use ≤1536 chars, body ≤~5k tokens.
  • Cross-agent install, no auto-sync: one canonical real dir (e.g. ~/.agents/skills/<name>/) symlinked into each agent (Claude Code / Codex / opencode / Hermes), so one edit lands everywhere. Every run ends by stating where the source skill lives.
  • Anti over-suggestion gate: surface a candidate only when frequency ≥ minsup AND it plausibly recurs (Horvitz mixed-initiative); rejection is free and ends the run.

Wiring & tests

New files: reference/skill-mine.md, templates/skill-mine/mine.mjs, templates/skill-frontmatter-gate.mjs, templates/skill.md.template, agents/skill-miner.md, agents/skill-forger.md. Wired into SKILL / Step 0 routing / reference map / template scripts; README + landing updated to 7 modes.

Frontmatter gate: 5/5 (valid passes; uppercase name, empty description, combined >1536, missing SKILL.md each fail). Miner verified on real history (7d, 19 sessions, 1290 tool-calls), sparse-history widen-to-30d, and --all (850 sessions). Dogfooded end-to-end — forged 4 real skills (gh-release, sync-skill, refine-skill-terse, update-gh-pages) installed across 4 agents.

Full rationale: log/changelog-2026-06-05.md.

Full changelog: v0.1.7...v0.1.8

v0.1.7 - QA-ONLY mode

04 Jun 23:17

Choose a tag to compare

v0.1.7 — QA-ONLY mode

A sixth mode for when you want QA, data verification, or a data comparison and explicitly no code change. Trigger it with "QA only", "검증만", "데이터 정합성 / 데이터 비교", "API 수정 전후 확인", or "A/B".

It exercises an already-running app (and a read-only, DB-independent database) like a user, writes no production code, creates no worktree, and runs none of the implementation gates. It produces a human-friendly report.md (what worked / what didn't / what it discovered) and persists a reusable, indexed QA suite under .domain-agent/qa/ so the same check re-runs fast.

Highlights

  • Pipeline: Intake → Target & Access → Scenario checkpoint → Exercise → Cross-check → Report → Persist. No worktree, no delivery gate.
  • Browser driver: agent-browser by default; attach-to-browser (Playwright CLI, attach-to-browser-skill) for authenticated / real logged-in sessions. The agent-browser doctor preflight is always recorded.
  • Read-only, DB-independent DB access: MySQL / PostgreSQL / SQLite via an intelligence skill if installed (e.g. aidt-mysql-cli, postgres-intelligence), else the raw CLI. SELECT-class only — enforced both in the agent prompt and by a gate SQL scan (per-DB:-line read-only; blocks INSERT/UPDATE/DELETE/MERGE/REPLACE/DDL/GRANT/REVOKE/CALL).
  • Two isolated subagents: qa-auditor drives the app, db-reader reads the DB — raw rows/PII never enter the browser agent. Conductor-mediated handoff via a small sanitized qa/expected.md (which also serves as the before-after baseline); auth is transient and never written to a file.
  • Action cap (default 100): each browser interaction and DB query counts as one action; summed across both agents and gate-enforced so a run stays scoped and the user does not get lost in an unbounded crawl.
  • New terminal gate templates/qa-only-gate.sh: report anchors + qa-gate.sh browser/CLI evidence + action_count <= action_cap + DB read-only backstop.

Wiring & tests

New files: reference/qa-only.md, reference/db-access.md, agents/qa-auditor.md, agents/db-reader.md, templates/qa-report.md, templates/qa-only-gate.sh, tests/qa-only-contract.test.sh. Wired into SKILL / pipeline / qa / experts / domain-context / index template / vault / state.json / README.

Tests: qa-only-contract 56/0; full contract suite 293/0, no regressions. Dogfooded end-to-end against a live Cloudflare-protected site — the mode correctly tried agent-browser, recorded an honest BLOCKED verdict (no fake pass), recommended attach-to-browser as the remediation, and its gate passed on truthful evidence.

Full rationale: docs/changelog/changelog-2026-06-05.md.

Full changelog: v0.1.6...v0.1.7

v0.1.6

04 Jun 14:37

Choose a tag to compare

Seventh release. Focus: a human-facing onboarding handbook for LEARN-DOMAIN, plus wiring so the saved domain knowledge actually reaches the agents that build/debug/extend code. After the agent learns a codebase, the new Onboard step renders the grounded .domain-agent/ knowledge into one self-contained HTML page for fast human onboarding - what the domain is, key terms, architecture, flows, and the rules that must not break - styled to the Functional UI tier. And the ## Domain Brief is now a first-class dispatch input, so GREENFIELD/DEBUG/LEGACY agents receive it instead of it living only in conductor-facing docs. Plus README/docs surfacing and a repo-root cleanup.

LEARN-DOMAIN Onboard step (new)

  • New terminal step in the LEARN-DOMAIN pipeline: Intake -> Survey -> Scope checkpoint -> Map -> Deepen -> Ground -> Persist -> **Onboard** -> Freshness.
  • Renders the grounded pack into one self-contained <knowledgePath>/onboarding.html (default .domain-agent/onboarding.html, gitignored) for humans only. The markdown pack stays the agent's source of truth; the HTML is a derived snapshot, regenerated on every Freshness run.
  • Seven sections, plain summary first then expert detail in <details>: Orientation (mental model), Key terms (glossary in this repo's meaning), Architecture (systems/contexts/entry points + one inline diagram), Key flows (per flows/*.md), Rules that must not break (invariants with grounding status), Get hands-on (test-map.md commands), and Trust & freshness (verified/unverified legend + lastUpdated).
  • Grounding discipline carries over: rendered from the pack only (no new facts; never upgrade unverified to verified), and every load-bearing fact shows a visible verified/unverified badge so a reader is never misled.
  • Functional tier (reference/functional-ui.md), not the Expressive taste tier: one accent + one type/spacing/radius scale, computed WCAG-AA contrast (body AAA), information density, minimal motion honoring prefers-reduced-motion, declared color-scheme with light+dark tokens, no empty decoration.
  • Self-contained and offline: inline CSS only, no external scripts/fonts/CDN/network requests, inline SVG for diagrams (zero added security surface). This constraint overrides functional-ui's "adopt one named design system" rule, so the baseline is implemented with a small inline token set instead of Fluent/Carbon/shadcn.
  • LEARN-DOMAIN runs no implementation gates, so the render deliberately does not pull the product Designer's claims.md / QA contrast gate / committee apparatus; the agent self-applies the functional-ui baseline. Dispatch is explore (reads the persisted pack, writes the doc).
  • New templates/domain-onboarding.html skeleton (token set, the seven sections with source-mapped <!-- FILL ... --> markers, ToC, badge legend, <details> progressive disclosure, light/dark). Wired into SKILL.md (pipeline row, LEARN-vs-LEARN-DOMAIN note, template-script table) and reference/learn-domain.md (Step 7 + dispatch row + stop condition). tests/learn-domain-contract.test.sh grows from 17 to 29 assertions.

Domain Brief reaches the dispatched agents (new)

  • Closes a consumption gap: the saved .domain-agent/ knowledge was described only in conductor-facing docs - no persona and not the locked-prompt template referenced the pack or the Brief, and debugger/explore did not even have it in their READ scope. Domain context could be built and still never reach the agent doing the work.
  • reference/experts.md: adds a DOMAIN BRIEF slot to the locked-prompt template (routing index only; verify against current code, current code wins; never bulk-read the pack) and a dispatch step that passes the Brief to the architect (GREENFIELD Plan), debugger/tracer (DEBUG Reproduce/Diagnose), and explorer (LEGACY Explore).
  • agents/explore.md + agents/debugger.md: add the run's ## Domain Brief to READ scope plus a one-line rule to route by it but verify load-bearing facts against current code. agents/architect.md: names the Brief explicitly (was a vague "selected domain/architecture docs"). executor is unchanged - it builds from the frozen plan.md.
  • Passes the capped, current-code-verified Brief (<=5 files, <=80 lines), never the raw pack path, so the retrieval cap / current-code-wins / no-bulk-read discipline holds. The human onboarding.html is never an agent input. tests/domain-context-contract.test.sh grows from 30 to 37 assertions.

Other

  • docs: add LEARN-DOMAIN mode + Onboard handbook to README - the Modes table listed only four modes (prose said "three modes") even though LEARN-DOMAIN shipped in v0.1.5; adds the LEARN-DOMAIN row (pipeline ends in Onboard (human handbook)), a usage example, and lists interview + learn-domain in the Layout summary.
  • chore: keep root to SKILL/README/LICENSE; move DESIGN + preference files - repo root trimmed to SKILL.md, README.md, LICENSE; DESIGN.md -> docs/DESIGN.md, USER_PREFERENCE.template.md and the git-ignored USER_PREFERENCE.md -> learn/.

Verification

Full contract suite: domain-context 37/0, gate-scenarios 100/0, interview-contract 26/0, learn-contract 11/0, learn-domain-contract 29/0, ui-ux 17/0, worktree 17/0 = 237 passed, 0 failed. No regressions. Template confirmed self-contained (a grep for external scripts/links/CDN/url() matches only the in-comment instruction forbidding them).

Included changes

  • feat: pass the Domain Brief to dispatched agents (close consumption gap)
  • docs: add LEARN-DOMAIN mode + Onboard handbook to README
  • feat: complete LEARN-DOMAIN Onboard step (Step 7 spec, Functional tier, tests)
  • feat: wire LEARN-DOMAIN Onboard step (SKILL spine + handbook template)
  • chore: keep root to SKILL/README/LICENSE; move DESIGN + preference files

Full changelog: v0.1.5...v0.1.6

v0.1.5

04 Jun 13:29

Choose a tag to compare

Sixth release. Focus: an ambiguity-gated clarifying interview before plan freeze, a new LEARN-DOMAIN mode that learns a codebase for the agent with execution-grounded verification, distributed-systems DEBUG hardening, user-language output, and Windows (LF) compatibility.

Clarifying interview before plan freeze (new)

  • reference/interview.md (new): a conditional, ambiguity-gated clarifying interview inserted after context-gathering and before plan freeze for GREENFIELD, DEBUG, and LEGACY (LEARN / LEARN-DOMAIN exempt).
  • Fires only on genuine ambiguity (multiple plausible interpretations or an unclear load-bearing detail); skips when clear or a cheap code read answers it, and logs the skip.
  • Code-first: code-answerable questions are resolved by reading current docs/code (reuses plan-grounding.md); only user-only load-bearing choices reach the user.
  • Capped at 3-5 high-leverage questions, one round, one at a time, each with a recommended answer; drawn from a six-dimension menu (objective, definition-of-done, scope, constraints, environment, safety/reversibility) and selected by information gain.
  • Hard gate: blocks plan freeze until must-have answers or a user-approved assumption. DEBUG variant presents 3-5 ranked root-cause hypotheses for re-ranking at end of Diagnose (non-blocking; proceeds on its own ranking if AFK).
  • Backed by a fact-checked deep-research pass (Ambig-SWE/CMU, Active Task Disambiguation/ICLR 2025, SAGE-Agent, Ask-before-Plan, ClarEval). Procedural step recorded in plan.md ## Interview; no new machine gate.
  • Wired into SKILL.md, reference/pipeline.md, reference/debugging.md; new tests/interview-contract.test.sh (26 assertions).

LEARN-DOMAIN mode (new)

  • A fifth mode that learns a large/cryptic codebase for the agent and persists a source-grounded .domain-agent/ wiki (distinct from LEARN, which teaches a human).
  • Pipeline: Intake -> Survey -> Scope checkpoint -> Map -> Deepen -> Ground -> Persist -> Freshness. Writes no production code; its only writes are the knowledge pack plus throwaway sandbox probes.
  • reference/learn-domain.md (new) encodes six research-backed choices: agentic discovery (not embeddings/RAG), markdown-first persistence (Aider repo-map pattern), bottom-up symbol->repo hierarchy, optional structural index only (cache, never required), balanced retrieval budget, and execution-grounded verification.
  • New gate templates/learn-grounding-gate.mjs: every populated invariant/flow must carry a Grounding: verified|unverified marker, index.md must name a concrete entry point, and a high-precision secret scan must pass. Facts that cannot be executed are marked unverified, never faked.
  • Supporting edits to SKILL.md, templates/domain-agent/code-map.md (Aider-style key-symbol signatures), templates/domain-agent/invariants.md, templates/domain-agent/flows/README.md; new tests/learn-domain-contract.test.sh (17 assertions) plus a 7-case grounding-gate scenario.

DEBUG hardening (distributed triage, F->P repro, evidence ledger)

  • Six senior-engineer debugging disciplines encoded into reference/debugging.md, sourced from Google SRE Book/Workbook, OpenTelemetry, W3C Trace Context, AWS Builders' Library, and SWE-bench-line agent papers (Agentless, SWE-Adept, SWT-Bench):
    1. Golden-signal triage + symptom-vs-cause split (chase only definite, imminent causes).
    2. Correlation-ID propagation as a precondition for cross-service RCA.
    3. Known-good vs known-bad differential.
    4. Hypothesis ledger with evidence on both sides ("definite & imminent?"); only a confirmed cause advances.
    5. Reproduce-first as a fail-to-pass (F->P) gate; flaky/timing bugs must fail consistently over N runs.
    6. Context isolation + minimal-diff checkpointing + breadth-based escalation (single-driver stays the default).
  • Adds a microservice failure-pattern checklist (cascading overload, retry storms, missing deadline propagation, partial-failure bimodal latency). Supporting edits in reference/pipeline.md and reference/vault.md.

Output language follows the user

  • Agent-authored prose (vault README.md/brief.md/plan.md/claims.md/verification.md, both Human Feedback briefs, run notes, changelog, LEARN journals, returned summaries) is written in the user's language, defaulting to English when unknown.
  • Machine-checked anchors, structural keys, code, identifiers, file paths, shell commands, and commit messages stay verbatim English so the gates' literal greps keep matching. Applied in SKILL.md, reference/experts.md, reference/vault.md.

Windows compatibility (LF)

  • .gitattributes (new) forces LF (eol=lf) on *.sh, *.mjs, *.js, *.md, *.json, and tracked scripts were re-checked-out to LF. A Windows checkout with core.autocrlf=true had materialized the gate/test scripts as CRLF, which bash refused to parse ($'\r': command not found) - the real blocker to running the suite on Windows. Target support is bash-on-Windows (Git Bash or WSL); run the suite under WSL bash.

Other

  • chore: Require explicit merge-commit for integrations - stricter integration policy (explicit merge commit required); tests/worktree-contract.test.sh re-anchored to the new wording.

Verification

Full suite under WSL: interview-contract 26/0, gate-scenarios 100/0, domain-context 30/0, worktree 17/0, ui-ux 17/0, learn-domain 17/0, learn 11/0 = 218 passed, 0 failed. No regressions.

Included changes

  • feat: add ambiguity-gated clarifying interview before plan freeze
  • feat: add LEARN-DOMAIN mode with agentic-discovery wiki and grounding gate
  • feat: agent-authored docs follow the user's language
  • feat: harden DEBUG with distributed triage, F->P repro, evidence ledger; add Windows LF support
  • chore: Require explicit merge-commit for integrations

Full changelog: v0.1.4...v0.1.5

v0.1.4

03 Jun 12:45

Choose a tag to compare

Fifth release. Focus: a new Functional UI design tier so dashboards, admin panels, and internal tools get design discipline (not just landing pages), a repo-local Domain Context overlay, LEARN-mode atomic-decomposition teaching, worktree-isolation hardening, and a round of gate hardening that turns four prose-only contract claims into machine-checked gates.

Functional UI design tier (new)

  • reference/functional-ui.md (new): a lightweight design authority for dashboards, data tables, admin panels, internal tools, settings, wizards, and CRUD forms. Enforces a real design system (Carbon/Fluent/Material/Polaris/Atlaskit/Radix/shadcn), computed WCAG contrast, consistent accent/type/spacing/radius tokens, complete UI states (loading/empty/error/focus/...), density-first dials, and minimal motion - without the marketing-only rules.
  • reference/ui-ux.md: now a surface -> tier dispatcher. Expressive (landing/marketing) routes to taste-skill-v2; Functional (dashboard/admin) routes to functional-ui. Dashboards were previously skipped entirely, so neither the Designer agent nor the contrast gate ran on them.
  • agents/designer.md: tier-aware authority. Universal bans (one accent, computed contrast, declared color-scheme, no empty decoration) are marked with (*) and apply to both tiers; Functional adds its own bans (design-system-not-hand-rolled, table empty/loading/error states, density-matches-dial, low motion).

Gate hardening (prose contracts -> machine gates)

  • Contrast is now gate-enforced. templates/qa-gate.sh detects a UI-tier: line (or a qa/contrast-pairs.json) and runs contrast-gate.mjs, failing QA - and, via the delivery backstop, Deliver - on a missing pair list or any sub-threshold pair. The "contrast blocks Deliver" guarantee was previously prose with no wired gate.
  • Committee approval + plan freeze enforced at Deliver. templates/delivery-gate.sh now requires a Committee: line in verification.md with architect + security + code-review all APPROVED (no reject/changes-requested), and recomputes a CR-stripped sha256 of plan.md to match state.json.plan_hash (with a RE-PLAN: escape in README.md). Both were declared in the contract but checked nowhere.
  • Cycle bound. templates/cycle-bound.mjs (new) trips at max_cycles_per_phase (default 5) regardless of error identity - the runaway-retry case the identical-signature circuit breaker never caught.
  • reference/experts.md: the Designer dispatch row was stale (pointed only at taste-skill-v2, mis-routing Functional surfaces); now tier-routed via ui-ux.md.
  • tests/ui-ux-contract.test.sh (new): 17-assertion contract test guarding the tier split. Suite is now 167 green (gate-scenarios 74 -> 92).

Domain Context overlay (new)

  • reference/domain-context.md + templates/domain-agent/: a compact repo-local ## Domain Brief built from a .domain-agent/ knowledge pack, loaded at GREENFIELD Plan, DEBUG Reproduce/Diagnose, and LEGACY Explore. Missing pack prompts for a path and gitignores it.

LEARN mode

  • Human-to-code bridge, atomic concept decomposition guidance, enforced atom map + process traces in explanations, and naturalized Korean term labels (핵심 용어 / 구성 요소 instead of literal 원자).

Worktree isolation hardening

  • Verify both base and target branch refs exist in the target repo before git worktree add; ask for corrected names instead of substituting nearby branches.
  • Retention contract: keep the three most recent completed run worktrees; prune only the oldest repo-managed one, never the active run, original checkout, or manual worktrees.

Plan grounding

  • Added a standalone plan-grounding pressure test.

Included changes

  • feat: enforce committee, plan-freeze, contrast, cycle-bound; finish UI/UX tier split
  • feat: add Functional UI design tier for dashboard/admin surfaces
  • docs: naturalize LEARN Korean term labels
  • docs: enforce LEARN atom process traces
  • fix: Verify branch refs before worktree creation
  • Update supergoal worktree retention contract
  • docs: add standalone plan grounding pressure test
  • Add domain context overlay to supergoal
  • docs: Add atomic concept decomposition guidance
  • docs: add human-to-code bridge to learn mode

Full changelog: v0.1.3...v0.1.4

v0.1.3

02 Jun 15:20

Choose a tag to compare

Fourth release. Focus: QA and the "contrast is computed" UI/UX rule are now machine-enforceable literal gates, runs are isolated in a git worktree, and browser QA requires an agent-browser preflight instead of silently falling back to headless Chrome.

QA + UI/UX gates (now machine-checkable)

  • templates/qa-gate.sh (new): browser apps must produce qa/as-is-* + qa/to-be-* evidence and a ## QA Tool: line; any non-agent-browser driver needs a Fallback: justification, blocking silent headless-Chrome fallback. Wired into the GREENFIELD/LEGACY QA exit gate.
  • templates/contrast-gate.mjs (new): the agent enumerates text/bg pairs into qa/contrast-pairs.json; the script computes WCAG ratios and judges them (body AAA, other text AA, large >=3) so contrast can't be eyeballed.
  • templates/delivery-gate.sh: if the vault has a qa/ dir, delivery re-runs qa-gate.sh as a final backstop (CLI/DEBUG runs have no qa/ dir, unaffected).
  • Gate test suite grew 51 -> 73 cases, all green.

Browser QA preflight

  • The agent-browser-first sequence is now two steps (npm i -g + agent-browser install for the Chrome binary).
  • A missing browser binary is explicitly NOT "install impossible"; browser QA is gated on the preflight succeeding rather than falling back silently.

Worktree isolation

  • supergoal runs are now required to execute inside an isolated git worktree.

Docs

  • Compressed supergoal skill references and refreshed the landing page.

Included changes

  • feat: make QA + UI/UX contrast machine-enforceable gates
  • fix: require agent-browser CLI preflight for QA
  • fix: gate browser QA on agent-browser preflight
  • Require worktree isolation for supergoal runs
  • docs: compress supergoal skill references
  • docs: refresh supergoal landing page

Full changelog: v0.1.2...v0.1.3