Releases · cskwork/supergoal-skill

09 Jun 18:40

cskwork

v0.3.0

4d99bf5

v0.3.0 - REVIEW-ONLY mode Latest

Latest

v0.3.0 - REVIEW-ONLY mode, lean-loop coherence, leaner domain-context

New: REVIEW-ONLY mode (9th mode)

"review / audit this code/diff/PR - no fixes" is now a routed intent (reference/review-only.md):

Two independent reviewers in parallel: code-reviewer in findings-only stance (each untested
required behavior becomes a finding naming the missing test - the repo stays untouched) plus
security-reviewer.
Every CRITICAL/HIGH finding is re-checked against the cited code before it enters the report.
Deliverable: severity-ordered report.md with Untested behaviors: and Not covered: anchors.
Fixes are a new objective - route to DEBUG/LEGACY with the report as input.
New contract test (tests/review-only-contract.test.sh, 15 checks). Mode grids updated in
SKILL.md, README, README.ko, and the landing page (8 -> 9 modes).

Changed: reference/agents aligned with the lean loop

The baseline-first rewrite had removed the gated ceremony (Validate / Human Feedback / Committee /
Deliver, delivery-gate.sh, claims.md, ten-rules), but many reference and persona files still
spoke that vocabulary. All live files now speak the current loop (Frame -> Build -> Critic -> Fixer
-> Verify):

agents/code-reviewer.md is now the Critic persona (it previously lacked the Write tool and could
not author the failing tests role-loop requires).
agents/executor.md is now Builder/Fixer with the fixer constraints (never edit test files, no
padding); claims.md is retired in favor of a run-to-prove line in the run vault README.md.
DEBUG no longer blocks on a "Human Feedback" round-trip: read-only until the root cause is
confirmed by direct evidence, user stops only on hard stops (destructive fix / genuine ambiguity).
The external diagnose-skill dependency is gone; the trusted feedback-loop requirement is inlined.
tests/role-loop-contract.test.sh retargeted to the docs/changelog/<YYYY-MM>/<DD-topic>/ vault
(it was failing against the old path).

Changed: reference/domain-context.md compressed 255 -> 142 lines (-44%)

Same contract, fewer tokens: per-file contract folded to one line per file, duplicated loops merged,
all contract-pinned strings and the Domain Brief format preserved.

Verification

11/11 contract suites, 363 assertions pass, 0 fail.

Full changelog: log/changelog-2026-06-10.md (commits 98c6bf8, 4d99bf5)

Assets 2

09 Jun 16:46

cskwork

v0.2.2

8680d18

v0.2.2

Run-vault doc layout: month-grouped + surfaced-requirements unified

Per-task artifacts now live in one month-grouped run vault instead of a flat date-prefixed folder plus a separate global file.

Run vault: docs/changelog/<date>-<slug>/ -> docs/changelog/<YYYY-MM>/<DD-topic>/ (e.g. docs/changelog/2026-06/10-add-auth/).
Surfaced requirements: flat global docs/surfaced-requirements.md -> per-run …/<DD-topic>/surfaced-requirements.md — one file per run, same lifecycle as the run's other evidence.
QA vault and harness-eval persist_path follow the same layout.
Keeps the changelog/ namespace so run vaults don't pollute docs/ root; existing history left as-is.

Also includes: interview-style checks in LEARN docs.

Files: SKILL.md, reference/{domain-context,role-loop,qa-only}.md, templates/surfaced-requirements.md, templates/qa-gate.sh, qa-only-gate.sh, templates/harness-eval-case.yaml + harness-eval-cases/*.yaml (16).

Assets 2

08 Jun 05:51

cskwork

v0.2.1

fbffe2b

v0.2.1

Portable /supergoal docs and contract release.

Changed

Present /supergoal as a portable agent skill for Claude Code, Codex, agy, and other agent CLIs instead of a Claude Code-only workflow.
Polish the Korean README and landing page copy so the value proposition, workflow, install path, and proof sections read naturally for developers and non-developers.
Sync the landing page interview metric to the current <=5 clarification-question contract.
Add the difficulty-gate contract: beyond very easy tasks, red-green evidence is required, and DB evidence is required when persisted data is load-bearing.

Verified

tests/*.test.sh all pass.
README.md and README.ko.md local links pass.
docs/index.html EN/KO language blocks are balanced at 79/79, with key HTML tag counts balanced.

Assets 2

07 Jun 12:45

cskwork

v0.2.0

9881f29

v0.2.0

Baseline-first supergoal skill: the role-separated critic→fixer→verify loop is the default, plus new recording / DB / aesthetic capabilities, a polished-by-default UI/UX overlay, and a hardened harness-eval methodology backed by an extensive eval sweep.

New

Surfaced-requirements record. The role-loop critic now writes each implicit requirement it surfaces to docs/surfaced-requirements.md (what the spec implies / why it's required though unstated / covering test / status); the verifier closes them out. A durable, human-readable trail alongside the failing tests. (reference/role-loop.md, templates/surfaced-requirements.md, tests/role-loop-contract.test.sh)
Self-contained read-only DB access. templates/db-access/ (db-access.mjs + helper scripts + .env.example) and reference/db-access.md, wired into GREENFIELD/DEBUG/LEGACY/QA-ONLY for optional schema/data evidence. (tests/db-access-contract.test.sh)
Expressive aesthetic families. Selectable UI/UX families — minimalist-ui, high-end-visual-design, industrial-brutalist-ui — in reference/taste-aesthetics.md, wired through reference/ui-ux.md and agents/designer.md.

UI/UX: polished by default

Expressive/polished is the default for ALL user-facing UI. The overlay no longer classifies a surface into Expressive vs Functional tiers where Functional shipped a plainer result. reference/taste-skill-v2.md is now the authority for every user-facing surface, applied from Frame through Build and QA.
functional-ui.md demoted to a density add-on. It is layered ON TOP of the Expressive baseline for dense admin/dashboard surfaces (density discipline + complete UI states), never a reason to lower polish.
Core-principle carve-out. Scope-minimalism governs code surface area and feature count, NOT visual quality — for user-facing UI a polished result is baseline correctness, not padding to defer until asked. (SKILL.md, reference/ui-ux.md)

Harness-eval methodology (hardened)

Pick a discriminating regime (spec-completeness × baseline strength, not difficulty tier), validate the fixture discriminates (stub fails / reference passes / lazy fails) before spending compute, and a mandatory equal-compute control with a two-claim framing: (a) skill vs one-shot default, (b) mechanism vs equal compute.
RevFactory corpus + runnable fixtures (templates/harness-eval-cases/fixtures/, authored under-spec specs).
Eval sweep (codex via headroom, ground-truth scoring): explicit-spec medium/hard/expert all tie baseline vs harness; on an under-specified latent-correctness task the skill catches a prototype-pollution false-GREEN a one-shot ships (useful via forced verification) — but an equal-compute naive loop matches it, so the active ingredient is the extra passes, not the role-separation.

Changed / removed

Removed the dead HARNESS-MAKE machinery and its test suites.
All 10 contract suites green.

Assets 2

05 Jun 14:27

cskwork

v0.1.8

7373ad7

v0.1.8 - SKILL-MINE mode

v0.1.8 — SKILL-MINE mode

A seventh mode for turning your repeated work into a reusable skill. Trigger it with "make a skill", "스킬 만들어", "learn new skill", "make skill from history", or "이거 자주 하는데 스킬로".

It mines your recent agent session history (~/.claude/projects/*.jsonl, adaptive 7-30 day window), surfaces 3-5 candidate skills ranked by frequency × payoff, and lets you pick / reject / name a new one. On your pick it forges ONE cross-agent-portable SKILL.md (the agentskills.io open standard) and installs it. The human pick is a hard gate — it never creates or installs a skill you did not approve (the validation step that Hermes-style autonomous skill creation lacks). It writes no production code and no worktree.

Highlights

Pipeline: Intake → Window → Mine → Rank → Suggest → Human pick/reject → Forge → Verify → Install → Journal. No worktree, no delivery gate.
Mechanical miner (templates/skill-mine/mine.mjs, pure Node, no deps): reads the full tool-call transcripts (not the prompt-only history.jsonl), picks an adaptive window, and emits frequent Bash command signatures + first-prompt intent clusters + already-used skills. A real-data experiment confirmed raw tool-name n-grams are generic noise; the signal lives in command signatures and intent clusters.
Portable forge: generates a directory + SKILL.md to the agentskills.io standard. A frontmatter gate (templates/skill-frontmatter-gate.mjs) enforces the portable limits — name lowercase/≤64/no reserved word, description+when_to_use ≤1536 chars, body ≤~5k tokens.
Cross-agent install, no auto-sync: one canonical real dir (e.g. ~/.agents/skills/<name>/) symlinked into each agent (Claude Code / Codex / opencode / Hermes), so one edit lands everywhere. Every run ends by stating where the source skill lives.
Anti over-suggestion gate: surface a candidate only when frequency ≥ minsup AND it plausibly recurs (Horvitz mixed-initiative); rejection is free and ends the run.

Wiring & tests

New files: reference/skill-mine.md, templates/skill-mine/mine.mjs, templates/skill-frontmatter-gate.mjs, templates/skill.md.template, agents/skill-miner.md, agents/skill-forger.md. Wired into SKILL / Step 0 routing / reference map / template scripts; README + landing updated to 7 modes.

Frontmatter gate: 5/5 (valid passes; uppercase name, empty description, combined >1536, missing SKILL.md each fail). Miner verified on real history (7d, 19 sessions, 1290 tool-calls), sparse-history widen-to-30d, and --all (850 sessions). Dogfooded end-to-end — forged 4 real skills (gh-release, sync-skill, refine-skill-terse, update-gh-pages) installed across 4 agents.

Full rationale: log/changelog-2026-06-05.md.

Full changelog: v0.1.7...v0.1.8

Assets 2

04 Jun 23:17

cskwork

v0.1.7

c569153

v0.1.7 - QA-ONLY mode

v0.1.7 — QA-ONLY mode

A sixth mode for when you want QA, data verification, or a data comparison and explicitly no code change. Trigger it with "QA only", "검증만", "데이터 정합성 / 데이터 비교", "API 수정 전후 확인", or "A/B".

It exercises an already-running app (and a read-only, DB-independent database) like a user, writes no production code, creates no worktree, and runs none of the implementation gates. It produces a human-friendly report.md (what worked / what didn't / what it discovered) and persists a reusable, indexed QA suite under .domain-agent/qa/ so the same check re-runs fast.

Highlights

Pipeline: Intake → Target & Access → Scenario checkpoint → Exercise → Cross-check → Report → Persist. No worktree, no delivery gate.
Browser driver: agent-browser by default; attach-to-browser (Playwright CLI, attach-to-browser-skill) for authenticated / real logged-in sessions. The agent-browser doctor preflight is always recorded.
Read-only, DB-independent DB access: MySQL / PostgreSQL / SQLite via an intelligence skill if installed (e.g. aidt-mysql-cli, postgres-intelligence), else the raw CLI. SELECT-class only — enforced both in the agent prompt and by a gate SQL scan (per-DB:-line read-only; blocks INSERT/UPDATE/DELETE/MERGE/REPLACE/DDL/GRANT/REVOKE/CALL).
Two isolated subagents: qa-auditor drives the app, db-reader reads the DB — raw rows/PII never enter the browser agent. Conductor-mediated handoff via a small sanitized qa/expected.md (which also serves as the before-after baseline); auth is transient and never written to a file.
Action cap (default 100): each browser interaction and DB query counts as one action; summed across both agents and gate-enforced so a run stays scoped and the user does not get lost in an unbounded crawl.
New terminal gate templates/qa-only-gate.sh: report anchors + qa-gate.sh browser/CLI evidence + action_count <= action_cap + DB read-only backstop.

Wiring & tests

New files: reference/qa-only.md, reference/db-access.md, agents/qa-auditor.md, agents/db-reader.md, templates/qa-report.md, templates/qa-only-gate.sh, tests/qa-only-contract.test.sh. Wired into SKILL / pipeline / qa / experts / domain-context / index template / vault / state.json / README.

Tests: qa-only-contract 56/0; full contract suite 293/0, no regressions. Dogfooded end-to-end against a live Cloudflare-protected site — the mode correctly tried agent-browser, recorded an honest BLOCKED verdict (no fake pass), recommended attach-to-browser as the remediation, and its gate passed on truthful evidence.

Full rationale: docs/changelog/changelog-2026-06-05.md.

Full changelog: v0.1.6...v0.1.7

Assets 2

04 Jun 14:37

cskwork

v0.1.6

02870f6

v0.1.6

Seventh release. Focus: a human-facing onboarding handbook for LEARN-DOMAIN, plus wiring so the saved domain knowledge actually reaches the agents that build/debug/extend code. After the agent learns a codebase, the new Onboard step renders the grounded .domain-agent/ knowledge into one self-contained HTML page for fast human onboarding - what the domain is, key terms, architecture, flows, and the rules that must not break - styled to the Functional UI tier. And the ## Domain Brief is now a first-class dispatch input, so GREENFIELD/DEBUG/LEGACY agents receive it instead of it living only in conductor-facing docs. Plus README/docs surfacing and a repo-root cleanup.

LEARN-DOMAIN Onboard step (new)

New terminal step in the LEARN-DOMAIN pipeline: Intake -> Survey -> Scope checkpoint -> Map -> Deepen -> Ground -> Persist -> **Onboard** -> Freshness.
Renders the grounded pack into one self-contained <knowledgePath>/onboarding.html (default .domain-agent/onboarding.html, gitignored) for humans only. The markdown pack stays the agent's source of truth; the HTML is a derived snapshot, regenerated on every Freshness run.
Seven sections, plain summary first then expert detail in <details>: Orientation (mental model), Key terms (glossary in this repo's meaning), Architecture (systems/contexts/entry points + one inline diagram), Key flows (per flows/*.md), Rules that must not break (invariants with grounding status), Get hands-on (test-map.md commands), and Trust & freshness (verified/unverified legend + lastUpdated).
Grounding discipline carries over: rendered from the pack only (no new facts; never upgrade unverified to verified), and every load-bearing fact shows a visible verified/unverified badge so a reader is never misled.
Functional tier (reference/functional-ui.md), not the Expressive taste tier: one accent + one type/spacing/radius scale, computed WCAG-AA contrast (body AAA), information density, minimal motion honoring prefers-reduced-motion, declared color-scheme with light+dark tokens, no empty decoration.
Self-contained and offline: inline CSS only, no external scripts/fonts/CDN/network requests, inline SVG for diagrams (zero added security surface). This constraint overrides functional-ui's "adopt one named design system" rule, so the baseline is implemented with a small inline token set instead of Fluent/Carbon/shadcn.
LEARN-DOMAIN runs no implementation gates, so the render deliberately does not pull the product Designer's claims.md / QA contrast gate / committee apparatus; the agent self-applies the functional-ui baseline. Dispatch is explore (reads the persisted pack, writes the doc).
New templates/domain-onboarding.html skeleton (token set, the seven sections with source-mapped  markers, ToC, badge legend, <details> progressive disclosure, light/dark). Wired into SKILL.md (pipeline row, LEARN-vs-LEARN-DOMAIN note, template-script table) and reference/learn-domain.md (Step 7 + dispatch row + stop condition). tests/learn-domain-contract.test.sh grows from 17 to 29 assertions.

Domain Brief reaches the dispatched agents (new)

Closes a consumption gap: the saved .domain-agent/ knowledge was described only in conductor-facing docs - no persona and not the locked-prompt template referenced the pack or the Brief, and debugger/explore did not even have it in their READ scope. Domain context could be built and still never reach the agent doing the work.
reference/experts.md: adds a DOMAIN BRIEF slot to the locked-prompt template (routing index only; verify against current code, current code wins; never bulk-read the pack) and a dispatch step that passes the Brief to the architect (GREENFIELD Plan), debugger/tracer (DEBUG Reproduce/Diagnose), and explorer (LEGACY Explore).
agents/explore.md + agents/debugger.md: add the run's ## Domain Brief to READ scope plus a one-line rule to route by it but verify load-bearing facts against current code. agents/architect.md: names the Brief explicitly (was a vague "selected domain/architecture docs"). executor is unchanged - it builds from the frozen plan.md.
Passes the capped, current-code-verified Brief (<=5 files, <=80 lines), never the raw pack path, so the retrieval cap / current-code-wins / no-bulk-read discipline holds. The human onboarding.html is never an agent input. tests/domain-context-contract.test.sh grows from 30 to 37 assertions.

Other

docs: add LEARN-DOMAIN mode + Onboard handbook to README - the Modes table listed only four modes (prose said "three modes") even though LEARN-DOMAIN shipped in v0.1.5; adds the LEARN-DOMAIN row (pipeline ends in Onboard (human handbook)), a usage example, and lists interview + learn-domain in the Layout summary.
chore: keep root to SKILL/README/LICENSE; move DESIGN + preference files - repo root trimmed to SKILL.md, README.md, LICENSE; DESIGN.md -> docs/DESIGN.md, USER_PREFERENCE.template.md and the git-ignored USER_PREFERENCE.md -> learn/.

Verification

Full contract suite: domain-context 37/0, gate-scenarios 100/0, interview-contract 26/0, learn-contract 11/0, learn-domain-contract 29/0, ui-ux 17/0, worktree 17/0 = 237 passed, 0 failed. No regressions. Template confirmed self-contained (a grep for external scripts/links/CDN/url() matches only the in-comment instruction forbidding them).

Included changes

feat: pass the Domain Brief to dispatched agents (close consumption gap)
docs: add LEARN-DOMAIN mode + Onboard handbook to README
feat: complete LEARN-DOMAIN Onboard step (Step 7 spec, Functional tier, tests)
feat: wire LEARN-DOMAIN Onboard step (SKILL spine + handbook template)
chore: keep root to SKILL/README/LICENSE; move DESIGN + preference files

Full changelog: v0.1.5...v0.1.6

Assets 2

04 Jun 13:29

cskwork

v0.1.5

bde1261

v0.1.5

Sixth release. Focus: an ambiguity-gated clarifying interview before plan freeze, a new LEARN-DOMAIN mode that learns a codebase for the agent with execution-grounded verification, distributed-systems DEBUG hardening, user-language output, and Windows (LF) compatibility.

Clarifying interview before plan freeze (new)

reference/interview.md (new): a conditional, ambiguity-gated clarifying interview inserted after context-gathering and before plan freeze for GREENFIELD, DEBUG, and LEGACY (LEARN / LEARN-DOMAIN exempt).
Fires only on genuine ambiguity (multiple plausible interpretations or an unclear load-bearing detail); skips when clear or a cheap code read answers it, and logs the skip.
Code-first: code-answerable questions are resolved by reading current docs/code (reuses plan-grounding.md); only user-only load-bearing choices reach the user.
Capped at 3-5 high-leverage questions, one round, one at a time, each with a recommended answer; drawn from a six-dimension menu (objective, definition-of-done, scope, constraints, environment, safety/reversibility) and selected by information gain.
Hard gate: blocks plan freeze until must-have answers or a user-approved assumption. DEBUG variant presents 3-5 ranked root-cause hypotheses for re-ranking at end of Diagnose (non-blocking; proceeds on its own ranking if AFK).
Backed by a fact-checked deep-research pass (Ambig-SWE/CMU, Active Task Disambiguation/ICLR 2025, SAGE-Agent, Ask-before-Plan, ClarEval). Procedural step recorded in plan.md ## Interview; no new machine gate.
Wired into SKILL.md, reference/pipeline.md, reference/debugging.md; new tests/interview-contract.test.sh (26 assertions).

LEARN-DOMAIN mode (new)

A fifth mode that learns a large/cryptic codebase for the agent and persists a source-grounded .domain-agent/ wiki (distinct from LEARN, which teaches a human).
Pipeline: Intake -> Survey -> Scope checkpoint -> Map -> Deepen -> Ground -> Persist -> Freshness. Writes no production code; its only writes are the knowledge pack plus throwaway sandbox probes.
reference/learn-domain.md (new) encodes six research-backed choices: agentic discovery (not embeddings/RAG), markdown-first persistence (Aider repo-map pattern), bottom-up symbol->repo hierarchy, optional structural index only (cache, never required), balanced retrieval budget, and execution-grounded verification.
New gate templates/learn-grounding-gate.mjs: every populated invariant/flow must carry a Grounding: verified|unverified marker, index.md must name a concrete entry point, and a high-precision secret scan must pass. Facts that cannot be executed are marked unverified, never faked.
Supporting edits to SKILL.md, templates/domain-agent/code-map.md (Aider-style key-symbol signatures), templates/domain-agent/invariants.md, templates/domain-agent/flows/README.md; new tests/learn-domain-contract.test.sh (17 assertions) plus a 7-case grounding-gate scenario.

DEBUG hardening (distributed triage, F->P repro, evidence ledger)

Six senior-engineer debugging disciplines encoded into reference/debugging.md, sourced from Google SRE Book/Workbook, OpenTelemetry, W3C Trace Context, AWS Builders' Library, and SWE-bench-line agent papers (Agentless, SWE-Adept, SWT-Bench):
1. Golden-signal triage + symptom-vs-cause split (chase only definite, imminent causes).
2. Correlation-ID propagation as a precondition for cross-service RCA.
3. Known-good vs known-bad differential.
4. Hypothesis ledger with evidence on both sides ("definite & imminent?"); only a confirmed cause advances.
5. Reproduce-first as a fail-to-pass (F->P) gate; flaky/timing bugs must fail consistently over N runs.
6. Context isolation + minimal-diff checkpointing + breadth-based escalation (single-driver stays the default).
Adds a microservice failure-pattern checklist (cascading overload, retry storms, missing deadline propagation, partial-failure bimodal latency). Supporting edits in reference/pipeline.md and reference/vault.md.

Output language follows the user

Agent-authored prose (vault README.md/brief.md/plan.md/claims.md/verification.md, both Human Feedback briefs, run notes, changelog, LEARN journals, returned summaries) is written in the user's language, defaulting to English when unknown.
Machine-checked anchors, structural keys, code, identifiers, file paths, shell commands, and commit messages stay verbatim English so the gates' literal greps keep matching. Applied in SKILL.md, reference/experts.md, reference/vault.md.

Windows compatibility (LF)

.gitattributes (new) forces LF (eol=lf) on *.sh, *.mjs, *.js, *.md, *.json, and tracked scripts were re-checked-out to LF. A Windows checkout with core.autocrlf=true had materialized the gate/test scripts as CRLF, which bash refused to parse ($'\r': command not found) - the real blocker to running the suite on Windows. Target support is bash-on-Windows (Git Bash or WSL); run the suite under WSL bash.

Other

chore: Require explicit merge-commit for integrations - stricter integration policy (explicit merge commit required); tests/worktree-contract.test.sh re-anchored to the new wording.

Verification

Full suite under WSL: interview-contract 26/0, gate-scenarios 100/0, domain-context 30/0, worktree 17/0, ui-ux 17/0, learn-domain 17/0, learn 11/0 = 218 passed, 0 failed. No regressions.

Included changes

feat: add ambiguity-gated clarifying interview before plan freeze
feat: add LEARN-DOMAIN mode with agentic-discovery wiki and grounding gate
feat: agent-authored docs follow the user's language
feat: harden DEBUG with distributed triage, F->P repro, evidence ledger; add Windows LF support
chore: Require explicit merge-commit for integrations

Full changelog: v0.1.4...v0.1.5

Assets 2

03 Jun 12:45

cskwork

v0.1.4

e5d8249

v0.1.4

Fifth release. Focus: a new Functional UI design tier so dashboards, admin panels, and internal tools get design discipline (not just landing pages), a repo-local Domain Context overlay, LEARN-mode atomic-decomposition teaching, worktree-isolation hardening, and a round of gate hardening that turns four prose-only contract claims into machine-checked gates.

Functional UI design tier (new)

reference/functional-ui.md (new): a lightweight design authority for dashboards, data tables, admin panels, internal tools, settings, wizards, and CRUD forms. Enforces a real design system (Carbon/Fluent/Material/Polaris/Atlaskit/Radix/shadcn), computed WCAG contrast, consistent accent/type/spacing/radius tokens, complete UI states (loading/empty/error/focus/...), density-first dials, and minimal motion - without the marketing-only rules.
reference/ui-ux.md: now a surface -> tier dispatcher. Expressive (landing/marketing) routes to taste-skill-v2; Functional (dashboard/admin) routes to functional-ui. Dashboards were previously skipped entirely, so neither the Designer agent nor the contrast gate ran on them.
agents/designer.md: tier-aware authority. Universal bans (one accent, computed contrast, declared color-scheme, no empty decoration) are marked with (*) and apply to both tiers; Functional adds its own bans (design-system-not-hand-rolled, table empty/loading/error states, density-matches-dial, low motion).

Gate hardening (prose contracts -> machine gates)

Contrast is now gate-enforced. templates/qa-gate.sh detects a UI-tier: line (or a qa/contrast-pairs.json) and runs contrast-gate.mjs, failing QA - and, via the delivery backstop, Deliver - on a missing pair list or any sub-threshold pair. The "contrast blocks Deliver" guarantee was previously prose with no wired gate.
Committee approval + plan freeze enforced at Deliver. templates/delivery-gate.sh now requires a Committee: line in verification.md with architect + security + code-review all APPROVED (no reject/changes-requested), and recomputes a CR-stripped sha256 of plan.md to match state.json.plan_hash (with a RE-PLAN: escape in README.md). Both were declared in the contract but checked nowhere.
Cycle bound. templates/cycle-bound.mjs (new) trips at max_cycles_per_phase (default 5) regardless of error identity - the runaway-retry case the identical-signature circuit breaker never caught.
reference/experts.md: the Designer dispatch row was stale (pointed only at taste-skill-v2, mis-routing Functional surfaces); now tier-routed via ui-ux.md.
tests/ui-ux-contract.test.sh (new): 17-assertion contract test guarding the tier split. Suite is now 167 green (gate-scenarios 74 -> 92).

Domain Context overlay (new)

reference/domain-context.md + templates/domain-agent/: a compact repo-local ## Domain Brief built from a .domain-agent/ knowledge pack, loaded at GREENFIELD Plan, DEBUG Reproduce/Diagnose, and LEGACY Explore. Missing pack prompts for a path and gitignores it.

LEARN mode

Human-to-code bridge, atomic concept decomposition guidance, enforced atom map + process traces in explanations, and naturalized Korean term labels (핵심 용어 / 구성 요소 instead of literal 원자).

Worktree isolation hardening

Verify both base and target branch refs exist in the target repo before git worktree add; ask for corrected names instead of substituting nearby branches.
Retention contract: keep the three most recent completed run worktrees; prune only the oldest repo-managed one, never the active run, original checkout, or manual worktrees.

Plan grounding

Added a standalone plan-grounding pressure test.

Included changes

feat: enforce committee, plan-freeze, contrast, cycle-bound; finish UI/UX tier split
feat: add Functional UI design tier for dashboard/admin surfaces
docs: naturalize LEARN Korean term labels
docs: enforce LEARN atom process traces
fix: Verify branch refs before worktree creation
Update supergoal worktree retention contract
docs: add standalone plan grounding pressure test
Add domain context overlay to supergoal
docs: Add atomic concept decomposition guidance
docs: add human-to-code bridge to learn mode

Full changelog: v0.1.3...v0.1.4

Assets 2

02 Jun 15:20

cskwork

v0.1.3

36de553

v0.1.3

Fourth release. Focus: QA and the "contrast is computed" UI/UX rule are now machine-enforceable literal gates, runs are isolated in a git worktree, and browser QA requires an agent-browser preflight instead of silently falling back to headless Chrome.

QA + UI/UX gates (now machine-checkable)

templates/qa-gate.sh (new): browser apps must produce qa/as-is-* + qa/to-be-* evidence and a ## QA Tool: line; any non-agent-browser driver needs a Fallback: justification, blocking silent headless-Chrome fallback. Wired into the GREENFIELD/LEGACY QA exit gate.
templates/contrast-gate.mjs (new): the agent enumerates text/bg pairs into qa/contrast-pairs.json; the script computes WCAG ratios and judges them (body AAA, other text AA, large >=3) so contrast can't be eyeballed.
templates/delivery-gate.sh: if the vault has a qa/ dir, delivery re-runs qa-gate.sh as a final backstop (CLI/DEBUG runs have no qa/ dir, unaffected).
Gate test suite grew 51 -> 73 cases, all green.

Browser QA preflight

The agent-browser-first sequence is now two steps (npm i -g + agent-browser install for the Chrome binary).
A missing browser binary is explicitly NOT "install impossible"; browser QA is gated on the preflight succeeding rather than falling back silently.

Worktree isolation

supergoal runs are now required to execute inside an isolated git worktree.

Docs

Compressed supergoal skill references and refreshed the landing page.

Included changes

feat: make QA + UI/UX contrast machine-enforceable gates
fix: require agent-browser CLI preflight for QA
fix: gate browser QA on agent-browser preflight
Require worktree isolation for supergoal runs
docs: compress supergoal skill references
docs: refresh supergoal landing page

Full changelog: v0.1.2...v0.1.3

Assets 2

Releases: cskwork/supergoal-skill

v0.3.0 - REVIEW-ONLY mode

v0.3.0 - REVIEW-ONLY mode, lean-loop coherence, leaner domain-context

New: REVIEW-ONLY mode (9th mode)

Changed: reference/agents aligned with the lean loop

Changed: reference/domain-context.md compressed 255 -> 142 lines (-44%)

Verification

Uh oh!

v0.2.2

Run-vault doc layout: month-grouped + surfaced-requirements unified

Uh oh!

v0.2.1

Changed

Verified

Uh oh!

v0.2.0

New

UI/UX: polished by default

Harness-eval methodology (hardened)

Changed / removed

Uh oh!

v0.1.8 - SKILL-MINE mode

v0.1.8 — SKILL-MINE mode

Highlights

Wiring & tests

Uh oh!

v0.1.7 - QA-ONLY mode

v0.1.7 — QA-ONLY mode

Highlights

Wiring & tests

Uh oh!

v0.1.6

LEARN-DOMAIN Onboard step (new)

Domain Brief reaches the dispatched agents (new)

Other

Verification

Included changes

Uh oh!

v0.1.5

Clarifying interview before plan freeze (new)

LEARN-DOMAIN mode (new)

DEBUG hardening (distributed triage, F->P repro, evidence ledger)

Output language follows the user

Windows compatibility (LF)

Other

Verification

Included changes

Uh oh!

v0.1.4

Functional UI design tier (new)

Gate hardening (prose contracts -> machine gates)

Domain Context overlay (new)

LEARN mode

Worktree isolation hardening

Plan grounding

Included changes

Uh oh!

v0.1.3

QA + UI/UX gates (now machine-checkable)

Browser QA preflight

Worktree isolation

Docs

Included changes

Uh oh!