Skip to content

v0.1.5

Choose a tag to compare

@cskwork cskwork released this 04 Jun 13:29
· 55 commits to main since this release

Sixth release. Focus: an ambiguity-gated clarifying interview before plan freeze, a new LEARN-DOMAIN mode that learns a codebase for the agent with execution-grounded verification, distributed-systems DEBUG hardening, user-language output, and Windows (LF) compatibility.

Clarifying interview before plan freeze (new)

  • reference/interview.md (new): a conditional, ambiguity-gated clarifying interview inserted after context-gathering and before plan freeze for GREENFIELD, DEBUG, and LEGACY (LEARN / LEARN-DOMAIN exempt).
  • Fires only on genuine ambiguity (multiple plausible interpretations or an unclear load-bearing detail); skips when clear or a cheap code read answers it, and logs the skip.
  • Code-first: code-answerable questions are resolved by reading current docs/code (reuses plan-grounding.md); only user-only load-bearing choices reach the user.
  • Capped at 3-5 high-leverage questions, one round, one at a time, each with a recommended answer; drawn from a six-dimension menu (objective, definition-of-done, scope, constraints, environment, safety/reversibility) and selected by information gain.
  • Hard gate: blocks plan freeze until must-have answers or a user-approved assumption. DEBUG variant presents 3-5 ranked root-cause hypotheses for re-ranking at end of Diagnose (non-blocking; proceeds on its own ranking if AFK).
  • Backed by a fact-checked deep-research pass (Ambig-SWE/CMU, Active Task Disambiguation/ICLR 2025, SAGE-Agent, Ask-before-Plan, ClarEval). Procedural step recorded in plan.md ## Interview; no new machine gate.
  • Wired into SKILL.md, reference/pipeline.md, reference/debugging.md; new tests/interview-contract.test.sh (26 assertions).

LEARN-DOMAIN mode (new)

  • A fifth mode that learns a large/cryptic codebase for the agent and persists a source-grounded .domain-agent/ wiki (distinct from LEARN, which teaches a human).
  • Pipeline: Intake -> Survey -> Scope checkpoint -> Map -> Deepen -> Ground -> Persist -> Freshness. Writes no production code; its only writes are the knowledge pack plus throwaway sandbox probes.
  • reference/learn-domain.md (new) encodes six research-backed choices: agentic discovery (not embeddings/RAG), markdown-first persistence (Aider repo-map pattern), bottom-up symbol->repo hierarchy, optional structural index only (cache, never required), balanced retrieval budget, and execution-grounded verification.
  • New gate templates/learn-grounding-gate.mjs: every populated invariant/flow must carry a Grounding: verified|unverified marker, index.md must name a concrete entry point, and a high-precision secret scan must pass. Facts that cannot be executed are marked unverified, never faked.
  • Supporting edits to SKILL.md, templates/domain-agent/code-map.md (Aider-style key-symbol signatures), templates/domain-agent/invariants.md, templates/domain-agent/flows/README.md; new tests/learn-domain-contract.test.sh (17 assertions) plus a 7-case grounding-gate scenario.

DEBUG hardening (distributed triage, F->P repro, evidence ledger)

  • Six senior-engineer debugging disciplines encoded into reference/debugging.md, sourced from Google SRE Book/Workbook, OpenTelemetry, W3C Trace Context, AWS Builders' Library, and SWE-bench-line agent papers (Agentless, SWE-Adept, SWT-Bench):
    1. Golden-signal triage + symptom-vs-cause split (chase only definite, imminent causes).
    2. Correlation-ID propagation as a precondition for cross-service RCA.
    3. Known-good vs known-bad differential.
    4. Hypothesis ledger with evidence on both sides ("definite & imminent?"); only a confirmed cause advances.
    5. Reproduce-first as a fail-to-pass (F->P) gate; flaky/timing bugs must fail consistently over N runs.
    6. Context isolation + minimal-diff checkpointing + breadth-based escalation (single-driver stays the default).
  • Adds a microservice failure-pattern checklist (cascading overload, retry storms, missing deadline propagation, partial-failure bimodal latency). Supporting edits in reference/pipeline.md and reference/vault.md.

Output language follows the user

  • Agent-authored prose (vault README.md/brief.md/plan.md/claims.md/verification.md, both Human Feedback briefs, run notes, changelog, LEARN journals, returned summaries) is written in the user's language, defaulting to English when unknown.
  • Machine-checked anchors, structural keys, code, identifiers, file paths, shell commands, and commit messages stay verbatim English so the gates' literal greps keep matching. Applied in SKILL.md, reference/experts.md, reference/vault.md.

Windows compatibility (LF)

  • .gitattributes (new) forces LF (eol=lf) on *.sh, *.mjs, *.js, *.md, *.json, and tracked scripts were re-checked-out to LF. A Windows checkout with core.autocrlf=true had materialized the gate/test scripts as CRLF, which bash refused to parse ($'\r': command not found) - the real blocker to running the suite on Windows. Target support is bash-on-Windows (Git Bash or WSL); run the suite under WSL bash.

Other

  • chore: Require explicit merge-commit for integrations - stricter integration policy (explicit merge commit required); tests/worktree-contract.test.sh re-anchored to the new wording.

Verification

Full suite under WSL: interview-contract 26/0, gate-scenarios 100/0, domain-context 30/0, worktree 17/0, ui-ux 17/0, learn-domain 17/0, learn 11/0 = 218 passed, 0 failed. No regressions.

Included changes

  • feat: add ambiguity-gated clarifying interview before plan freeze
  • feat: add LEARN-DOMAIN mode with agentic-discovery wiki and grounding gate
  • feat: agent-authored docs follow the user's language
  • feat: harden DEBUG with distributed triage, F->P repro, evidence ledger; add Windows LF support
  • chore: Require explicit merge-commit for integrations

Full changelog: v0.1.4...v0.1.5