Skip to content

feat(coder): package skeleton + loop.py with DEFAULT_LOOP#819

Merged
kovtcharov merged 3 commits into
coderfrom
feature/gaia-coder-scaffold
Apr 20, 2026
Merged

feat(coder): package skeleton + loop.py with DEFAULT_LOOP#819
kovtcharov merged 3 commits into
coderfrom
feature/gaia-coder-scaffold

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

Summary

Phase 1 scaffolding for gaia-coder per docs/plans/coder-agent.mdx. Creates the package structure every follow-up task imports from — no runtime behaviour, no LLM calls, no network, no SQLite. Just the load-bearing shapes: her own base class, the editable 20-state ReAct loop, the CLI surface, and the three living documents the spec anchors on.

This is the trunk the two sibling branches (feature/gaia-coder-stores, feature/gaia-coder-mixins-1) rebase onto once it lands.

Threads

  • Package skeleton (§3.1)src/gaia/coder/ with sub-packages for prompts/, stores/, tools/, review/, introspect/, self_fix/, tests/, skills/. Every __init__.py is a stub pointing at the phase that fills it in. Matters because downstream tasks need stable import paths now without racing on who creates which directory.
  • CLI stub (§3.1)gaia-coder console script wired through setup.py. All 17 subcommands from the spec (daemon, status, ask, note, critical, inbox, feedback, promote, demote, trust, audit, spend, egress, introspect, skill, doctor, rag) print "<name>: not yet implemented" and exit 0. Matters so docs, tests, and muscle memory can start using the real surface immediately. Uses argparse (matching existing gaia CLI) rather than click, which is not a project dependency.
  • State machine (§5.1, §15.3)src/gaia/coder/loop.py with immutable Transition, State, Loop dataclasses and DEFAULT_LOOP containing all 20 states grouped into the seven stages. The self_review state carries the updated three-way transition (publish | debug | edit based on failure_is_complex) from §15.3 — shallow failures go straight back to edit, complex ones enter the dedicated debug sub-loop. Matters because a typo here silently regresses her review discipline.
  • Base class (§5.1)CoderAgent that does NOT inherit from gaia.agents.base.Agent. Product-agent assumptions (request-scoped, single LLM round-trip) do not match her daemon-scoped lifecycle with durable queues and editable control flow. Composition from the GAIA base is still allowed at use-site (@tool, AgentConsole, PathValidator). Matters so sibling tasks can from gaia.coder import CoderAgent today.
  • Living documents (§4.6, §6.5)GAIA.md (identity: principles, persona, Karpathy working-style rules), ARCHITECTURE.md (how she is composed), PROJECT_MAP.md (how the project she's building is composed), plus skills/catalog.toml seed. All short Phase-1 placeholders; real content lands in Phase 3/5. Matters because both docs are referenced from every system prompt — they need to exist on disk before the runner does.
  • Introspection seed (§7.7)introspect_state_machine() in loop.py returns a JSON snapshot + stateDiagram-v2 Mermaid render of the loop. The Phase 3 IntrospectionToolsMixin wraps this as the public @tool; keeping the implementation next to the loop it describes avoids duplication.

Deviations from spec

  • Spec says "wired via click (already a project dep)" — click is not actually in setup.py or any extras. Switched to argparse to match the existing src/gaia/cli.py convention and avoid adding a dependency. The subcommand surface is unchanged.
  • Spec suggests 4 commits; this PR has 3. Splitting loop.py across two commits (dataclasses in b, DEFAULT_LOOP + introspection helper in c) required temporarily deleting and re-adding content, which felt worse than a clean "state machine + base class" commit. Commits are: (1) scaffold + CLI, (2) loop.py + base.py + __init__.py re-exports, (3) tests.

Test plan

  • pytest tests/coder/test_skeleton.py -xvs — 5/5 pass, including the guard against regressing self_review's three-way transition.
  • python util/lint.py --black --isort --flake8 on the new files — all clean (project flake8 uses --max-line-length=88 with E501 ignored per util/lint.py).
  • uv pip install -e . — succeeds; gaia-coder --help prints the 17-subcommand list; stub subcommands exit 0 (gaia-coder introspect, gaia-coder status, etc.).
  • git diff feature/coding-agent...HEAD --stat — changes only under src/gaia/coder/, tests/coder/, and setup.py. No edits to src/gaia/agents/ or other existing modules.
  • Existing gaia-code entry (legacy Next.js scaffolder) untouched per §1.

Phase 1 scaffolding per docs/plans/coder-agent.mdx §3.1. Creates the
package skeleton every downstream task imports from: subpackages for
prompts, stores, tools, review, introspect, self_fix, tests, and skills;
the three living documents (GAIA.md, ARCHITECTURE.md, PROJECT_MAP.md) as
short placeholders; the skills catalog.toml seed; and the gaia-coder
console script backed by an argparse-based CLI whose 17 subcommands
(§3.1) are all Phase 1 stubs that print "not yet implemented" and
exit 0.

setup.py registers the new packages and the gaia-coder entry point; the
existing gaia-code entry (legacy Next.js scaffolder) is untouched per §1.

State machine (loop.py) and base class (base.py) land in follow-up
commits on this branch; sibling branches feature/gaia-coder-stores and
feature/gaia-coder-mixins-1 will fill in stores/ and tools/ after this
lands.
Adds the canonical ReAct control flow for gaia-coder per
docs/plans/coder-agent.mdx §5.1 and §15.3:

- src/gaia/coder/loop.py defines the immutable Transition / State / Loop
  dataclasses and ships DEFAULT_LOOP — 20 states grouped into the seven
  stages (Intake, Understand, Design, Build, Verify, Publish, Land).
  The self_review state emits the updated three-way transition
  (publish | debug | edit based on failure_is_complex) from §15.3 so
  shallow failures go straight back to edit while complex failures
  enter the dedicated debug sub-loop.
- introspect_state_machine() renders the loop as a JSON snapshot plus
  a stateDiagram-v2 Mermaid string — the Phase 3 IntrospectionToolsMixin
  (§7.7) will wrap this as the public tool.
- src/gaia/coder/base.py defines CoderAgent: her own base class that
  does NOT inherit from gaia.agents.base.Agent, since product-agent
  assumptions (request-scoped, single LLM round-trip) do not match her
  daemon-scoped lifecycle. Composition from the GAIA base (@tool,
  AgentConsole, PathValidator) is still allowed at use-site.

The top-level package re-exports CoderAgent, DEFAULT_LOOP, Loop, State,
and Transition so sibling branches have a stable public surface to
import from.
Five tests pin the public surface sibling branches rely on:

- gaia-coder --help runs and exits 0 with a subcommand list.
- `from gaia.coder import CoderAgent` and `from gaia.coder.loop import
  DEFAULT_LOOP` both succeed.
- DEFAULT_LOOP has exactly 20 states (§15.3).
- The self_review state has exactly three transitions targeting
  publish / debug / edit — the updated three-way transition from §15.3
  that a typo or edit to loop.py would regress silently.
- introspect_state_machine() returns a Mermaid render; the test accepts
  either the `graph TD` or `stateDiagram` dialect since §7.7 does not
  pin the choice.

CLI invocation uses `python -m gaia.coder.cli` so the test runs whether
or not the gaia-coder console script has been installed — same
invariant either route exercises.
@github-actions github-actions Bot added dependencies Dependency updates tests Test changes labels Apr 20, 2026
@kovtcharov kovtcharov changed the base branch from feature/coding-agent to coder April 20, 2026 08:23
@kovtcharov kovtcharov changed the base branch from coder to main April 20, 2026 08:25
@kovtcharov kovtcharov changed the base branch from main to coder April 20, 2026 08:25
@kovtcharov kovtcharov marked this pull request as ready for review April 20, 2026 09:20
@kovtcharov kovtcharov merged commit 2370e4b into coder Apr 20, 2026
7 checks passed
@kovtcharov kovtcharov deleted the feature/gaia-coder-scaffold branch April 20, 2026 09:20
@github-actions
Copy link
Copy Markdown
Contributor

Review — feat(coder): package skeleton + loop.py with DEFAULT_LOOP (#819)

Note: this PR is already merged. Posting a retrospective review so the follow-up sibling branches (feature/gaia-coder-stores, feature/gaia-coder-mixins-1) can pick up the threads.

Summary

Clean, well-scoped Phase 1 scaffolding. Introduces a new src/gaia/coder/ package that deliberately sits outside the src/gaia/agents/ product-agent ecosystem, adds a gaia-coder CLI stub whose 17 subcommands all exit 0, and pins the §15.3 loop shape (20 states, three-way self_review branch) with a real test. The single most important thing to know: every docstring, placeholder document, and the PR body itself cite docs/plans/coder-agent.mdx, but that file has never existed in the repo (git log --all -- docs/plans/coder-agent.mdx returns nothing). Readers following the breadcrumbs will hit dead ends until the spec is committed.

Issues

🟡 Important

Referenced spec docs/plans/coder-agent.mdx does not exist — in pr-diff.txt I count 19 references to docs/plans/coder-agent.mdx across __init__.py (src/gaia/coder/__init__.py:13), base.py (src/gaia/coder/base.py:15,58,67), cli.py (src/gaia/coder/cli.py:101), loop.py (src/gaia/coder/loop.py:6), and all three living documents (GAIA.md:5, ARCHITECTURE.md:3, PROJECT_MAP.md:3). ls docs/plans/ confirms every neighbour exists (agent-hub.mdx, autonomy-engine.mdx, skill-format.mdx, …) except coder-agent.mdx. Every §X.Y citation in the scaffolding currently points at vapour. Either land the spec in a follow-up PR referenced from here, or relax the docstrings to say "spec to land in a follow-up" so first-time readers aren't chasing a ghost.

docs/docs.json and docs/reference/cli.mdx not updated for the new gaia-coder console scriptCLAUDE.md is unambiguous: "New CLI commands → update docs/reference/cli.mdx" and "New pages must be added to docs/docs.json navigation". The PR adds a new user-facing binary (setup.py:216 adds gaia-coder = gaia.coder.cli:main) but skips both docs touchpoints. Phase 1 is a reasonable time to defer this since every subcommand is a stub, but an explicit note in the PR body ("docs land with Phase 2 when subcommands do real work") would have made the deferral legible. Please capture in a follow-up issue so it isn't lost.

🟢 Minor

test_introspect_state_machine_returns_mermaid has a vestigial graph TD branch (tests/coder/test_skeleton.py:99). The assertion is:

assert "graph TD" in mermaid or "stateDiagram" in mermaid

but loop.py:_mermaid (line 118) only ever emits stateDiagram-v2. The or "graph TD" arm is dead. That's fine as forward-looking flexibility, but the docstring at test_skeleton.py:92–97 implies both dialects are "accepted" which misleads future readers into thinking the implementation sometimes produces graph TD. Either tighten the assertion or drop the "graph TD" half of the docstring:

def test_introspect_state_machine_returns_mermaid() -> None:
    """The introspection helper must return a Mermaid stateDiagram render.

    §7.7 specifies the tool renders both JSON and a Mermaid diagram for
    the EM's inspection. The current implementation uses
    ``stateDiagram-v2`` so the seven-stage grouping renders as nested
    blocks; this test pins that dialect.
    """
    from gaia.coder.loop import introspect_state_machine

    payload = introspect_state_machine()

    assert isinstance(payload, dict)
    assert "json" in payload
    assert "mermaid" in payload

    mermaid = payload["mermaid"]
    assert isinstance(mermaid, str)
    assert "stateDiagram" in mermaid, (
        f"Mermaid render should use the `stateDiagram` dialect; "
        f"got:\n{mermaid[:200]}"
    )

Optional[X] where X | None would be more idiomatic (src/gaia/coder/base.py:275,316, src/gaia/coder/loop.py:656,715,716). setup.py:219 pins python_requires=">=3.10", so PEP 604 union syntax is always available. The file already uses it in the CLI (cli.py:149argv: list[str] | None = None), so the inconsistency is within the PR itself. Not blocking — the repo has mixed usage — but worth normalising when the real bodies land.

gaia-coder not listed in the CLAUDE.md "Console Script Entry Points" table (repo-level CLAUDE.md, Project Structure section). That table lists gaia, gaia-mcp, gaia-code, gaia-emr but not the new binary. Adding it keeps the repo-level doc self-describing and prevents new contributors from asking "what's gaia-coder vs gaia-code?".

_build_parser's subcommands parameter type is Iterable[tuple[str, str]] but it's iterated only once (src/gaia/coder/cli.py:94). Harmless for the default call, but if a test ever passes a generator and then calls _build_parser a second time, the generator is exhausted. Narrow to Sequence[tuple[str, str]] to make re-iteration safe, or document the constraint.

Strengths

  • Immutable graph modelling is correct for the stated use case. @dataclass(frozen=True) on Transition/State/Loop plus the "an edit means a diff to this file, not a runtime mutation API" comment (loop.py:43–46) is the right stance — any future runtime that tries to mutate the graph will fail loudly, which matches the §7.8 invariant that every merged edit bumps Loop.version.
  • Test pins the load-bearing invariants, not implementation details. test_default_loop_has_20_states and test_self_review_has_three_transitions would both fail if someone silently regressed §15.3. That's high-signal coverage for a skeleton PR.
  • The "not inheriting from gaia.agents.base.Agent" decision is justified in-place (base.py:5–12). The docstring names both what's different (daemon lifecycle, durable queues, editable graph) and what composition is still welcome (@tool, AgentConsole, PathValidator). Future readers won't have to go spelunking for the rationale.
  • Fail-loudly is honoured where it matters. Loop.state_by_name (loop.py:159) raises KeyError with the known state list instead of returning None; identity_document (base.py:170) lets FileNotFoundError propagate with a docstring explaining why absence is a bug not a fallback. Matches repo CLAUDE.md.
  • Scope discipline is clean. Only new files under src/gaia/coder/ + tests/coder/, plus the setup.py package/entry-point additions. No drive-by edits to src/gaia/agents/ or other modules.

Verdict

Approve with suggestions (retrospective — PR is already merged). No blocking issues. The missing docs/plans/coder-agent.mdx is the one thing that genuinely warrants a follow-up: land it (or retarget the references) before the next scaffolding PR, because every sibling branch will inherit the same broken-citation pattern otherwise.

kovtcharov added a commit that referenced this pull request Apr 20, 2026
Three review-followups from the #818/#819/#820 merge:

- src/gaia/coder/__init__.py: restore the CoderAgent/DEFAULT_LOOP/Loop/State/Transition
  re-exports that #819 added and the #818 rebase accidentally dropped. test_package_imports
  in test_skeleton.py relies on these.

- src/gaia/coder/tools/cli.py: replace the bare except Exception: pass in the stream-reader
  teardown with a targeted OSError catch + logger.debug. Per CLAUDE.md's fail-loudly rule,
  silent swallows hide reader-thread bugs; a stream close failing under pipe tear-down is
  real and worth a debug line.

- src/gaia/coder/tools/search.py: stop reaching into the private _TOOL_REGISTRY dict from
  grep(). Use get_tool_metadata() — the public accessor added in base/tools.py:113 exists
  precisely for this.

Tests: all 73 (5 skeleton + 38 mixins + 30 stores) pass on coder HEAD with the fixes.
kovtcharov added a commit that referenced this pull request Apr 20, 2026
#823)

## Summary

Three review-followups from the #818 / #819 / #820 merge, flagged by the
auto-review bot. All tests (73/73 on `coder`) pass with the fixes.

## What this changes

- **Critical** — `src/gaia/coder/__init__.py`: restore the `CoderAgent`
/ `DEFAULT_LOOP` / `Loop` / `State` / `Transition` re-exports that #819
added and #818's rebase accidentally dropped. Without this,
`tests/coder/test_skeleton.py::test_package_imports` fails on `coder`
HEAD.

- **Important** — `src/gaia/coder/tools/cli.py:135`: replace bare
`except Exception: pass` in the stream-reader teardown with a targeted
`OSError` catch + `logger.debug`. CLAUDE.md's *No Silent Fallbacks* rule
explicitly forbids this pattern.

- **Important** — `src/gaia/coder/tools/search.py:59`: stop reaching
into the private `_TOOL_REGISTRY` from `grep()`. Use
`get_tool_metadata()` — the public accessor at
`src/gaia/agents/base/tools.py:113` exists for this.

## Test plan

- [x] `pytest tests/coder/ -x` — 73/73 pass (5 skeleton + 38 mixins + 30
stores)
- [x] `python -c "from gaia.coder import CoderAgent, DEFAULT_LOOP, Loop,
State, Transition"` — all imports resolve
kovtcharov added a commit that referenced this pull request Apr 20, 2026
Addresses the auto-review findings on #819 and #820:

**Land the spec.** docs/plans/coder-agent.mdx was being referenced from 19 places in
the scaffold (every module docstring, every living doc, every CLI body) but the file
itself was untracked — every citation pointed at vapour. Commit it now so Phase 2+
branches inherit real citations. Also lands docs/superpowers/specs/2026-04-19-gaia-code-agent-analysis.md
as the retained historical context per the plan's §14.

**SQL identifier hardening** (src/gaia/coder/stores/_common.py). The CRUD primitives
string-interpolated table names, column names, and ORDER BY clauses. SQLite has no
built-in parameterisation for identifiers, so whitelist matching is the only safe
path. Adds _safe_ident() + _SAFE_IDENT_RE and applies it at every interpolation site,
plus multi-clause ORDER BY validation (split by comma, each clause validated for
optional ASC/DESC direction). Auto-review flagged this as a latent injection surface;
treating it as a latent vuln, not hypothetical.

**Narrow the conftest catch** (tests/coder/conftest.py). The bare `except Exception`
during stub fallback was masking import-time bugs in stores (AttributeError,
SyntaxError). Now catches only ImportError/ModuleNotFoundError and logs the
underlying reason so real failures surface in CI rather than silently stubbing over.
Scaffold is now landed so step-1 (real import) always succeeds — this is a safety net,
not a hot path.

**Register gaia-coder in CLAUDE.md + docs.json**. The new console_scripts entry
was missing from the Console Script Entry Points table (§Project Structure) and from
the "Standalone binaries" section. Also adds plans/coder-agent to docs.json under
the Agents group so Mintlify renders it.

All 73 coder tests pass (5 skeleton + 38 mixins + 30 stores).
kovtcharov added a commit that referenced this pull request Apr 20, 2026
## Summary

Addresses the auto-review findings on #819 + #820 and lands the plan
that every scaffold citation was pointing at.

## Changes

- **Land `docs/plans/coder-agent.mdx`** (3,584 lines). 19 scaffold
citations were pointing at a file that didn't exist in tracked state.
Also lands
`docs/superpowers/specs/2026-04-19-gaia-code-agent-analysis.md` as the
retained historical context.
- **SQL identifier hardening** (`src/gaia/coder/stores/_common.py`).
`_safe_ident()` + `_SAFE_IDENT_RE` validation at every interpolation
site (table, column, ORDER BY). Multi-clause ORDER BY supported.
- **Narrow conftest catch** (`tests/coder/conftest.py`). Only catches
`ImportError` / `ModuleNotFoundError`; logs the reason. Real import-time
bugs in stores now surface.
- **Register `gaia-coder`** in `CLAUDE.md` Console Script Entry Points +
Standalone binaries, and add `plans/coder-agent` to `docs/docs.json`
under Agents.

## Test plan

- [x] `pytest tests/coder/` — 73/73 pass (5 skeleton + 38 mixins + 30
stores)
- [x] `docs/plans/coder-agent.mdx` resolves for every scaffold docstring
citation
- [x] SQL injection guard: `_safe_ident("x; DROP TABLE")` raises
`ValueError`
kovtcharov added a commit that referenced this pull request Apr 20, 2026
## Summary

Phase 4 of the `gaia-coder` plan: the seven-pass self-review gate that
runs before she ever calls `gh_pr_create`. Implements the full §8 table
— deterministic checks (lint, tests, security, prose) plus LLM-driven
passes (architectural, persona, adversarial, feedback-binding) — and
surfaces an aggregated verdict with the confidence score the §7.6
auto-merge path reads.

This is the "deep-review discipline" the trust contract is built on.
Every PR she opens must clear this gate; self-fix PRs additionally
require Pass 7 (feedback-binding), which confirms the diff actually
addresses the EM's wording and that the regression test fails on `coder`
and passes on the branch.

## What this PR adds

- **Seven review passes** (`pass_1_static` through
`pass_7_feedback_binding`) — each a focused module that returns a common
`PassResult` envelope, so the gate never has to special-case any of
them.
- **`gate.run_all_passes`** — orchestrator with cost-aware
short-circuiting: a hard-fail on the deterministic cheap gates (1, 2, 4)
skips the expensive Opus calls, because there is no point paying for
adversarial review on a branch that fails lint.
- **`ReviewToolsMixin`** — exposes every pass as a `@tool`
(`review_diff_self_static`, …) plus the one-shot `review_diff_gate`, so
the agent, the EM's TUI, and the evaluation harness all share one code
path.
- **Canonical prompt files** for Passes 3, 5, 6, 7 under
`src/gaia/coder/prompts/`, materialised verbatim from §15.8 — self-fix
PRs that touch a prompt are now a single-file `git diff`, the whole
point of the §6.4 whitelist.
- **22 unit tests** that cover every pass (one pass + one fail per
module), the gate's short-circuit behaviour, the self-fix gating of Pass
7, the confidence-score contract, and a prompt-files-exist guard so the
canonical templates can't silently drift.

## Why the LLM seam matters

Every Opus call flows through `gaia.coder.review._llm.call_opus`. Tests
patch that one name — cheap, stable, vendor-agnostic — rather than
reaching into the Anthropic SDK. When we ship the Claude Agent SDK
integration for Passes 3 and 6 (the `architecture-reviewer` /
`code-reviewer` subagents in §15.8), the switch happens at a single
module boundary without churning every test.

## Scope constraint

Changes are confined to `src/gaia/coder/review/`,
`src/gaia/coder/prompts/`, and `tests/coder/test_review.py`. No other
files are touched. Pass 7's differential-pytest worktree machinery is
wired but opt-out via `skip_differential_pytest=True` for unit tests,
because Phase 4 intentionally does not ship a running daemon — that's
Phase 5.

## Dependencies

- Stacks on top of #818 (mixins; uses
`gaia.coder.tools.cli._check_denylist` for subprocess safety) and #819
(scaffold). Both appear in this PR until they merge to `coder` — the
diff will clean itself up afterwards.
- Does not depend on #820 (stores).

## Citations

-
[`docs/plans/coder-agent.mdx`](../blob/feature/gaia-coder-review/docs/plans/coder-agent.mdx)
§8 — the seven-pass table
- §15.8 — canonical prompt templates for Passes 3/5/6/7
- §7.6 — confidence-score gate on auto-merge
- §7.4 — self-correction loop that Pass 7 bookends

## Test plan

- [x] `pytest tests/coder/test_review.py -xvs` — 22/22 pass
- [x] `pytest tests/coder/` — 65/65 pass (sibling tests untouched)
- [x] `python util/lint.py --black --isort` — clean on in-scope files
- [x] Smoke: `from gaia.coder.review import ReviewToolsMixin; m =
ReviewToolsMixin(); m.register_review_tools()` registers exactly 8 tools
- [ ] Integration run against a real branch with Anthropic creds present
(Phase 5 territory — deferred to the daemon wiring task)

## Do not merge

Draft: this stacks on two unmerged sibling PRs and the Phase 4 runtime
wiring lives in Phase 5.
kovtcharov added a commit that referenced this pull request Apr 20, 2026
## Summary

Lands Phase 5 of `docs/plans/coder-agent.mdx` — the EM-facing surface of
`gaia-coder`. Before this PR the CLI verbs were stubs; after it, the EM
can bootstrap the agent, read her trust contract, promote/demote her,
and queue messages. Every LLM call now injects her identity triplet
(`GAIA.md` + `ARCHITECTURE.md` + `PROJECT_MAP.md`) as a cacheable
prefix.

## Threads

- **`trust.py`** — `EMConfig` / `RepoBinding` Pydantic models, the 0-5
  `CapabilityTier` ladder, TOML round-trip, and `promote` / `demote`
  functions with audit-log writes. Promotion refuses mismatched EM
  signatures; demotion is immediate. *Why it matters:* §4.2 makes
  promotion explicit, so a quiet accept would let any caller escalate
  tier. Fail-loudly TrustError surfaces exactly what to fix.
- **`inbox.py`** — thin CRUD over `em_inbox.db` with the §4.5 5-second
  non-LLM auto-ack, channel-agnostic dispatch callable, and escalation
  into `feedback.db` with severity translation. *Why it matters:*
  §4.5 says the ack is non-negotiable in latency; keeping it template-
  only (no model call) makes the SLA automatic.
- **`intent.py`** — LLM-driven conversational intent classifier for
  §15.4 + §15.8 P9, temperature 0 Opus 4.7, mockable via an injected
  `llm` callable. Low confidence (< 70) and unknown intents coerce to
  `free_form`. Handler functions cover every §15.4 intent. *Why it
  matters:* regex matchers would miss paraphrases ("let me give you
  self-edit for now"); LLM routing keeps the grammar maintainable.
- **`prompt_composer.py`** — builds Anthropic-format message blocks
  with `cache_control={"type":"ephemeral"}` on the identity triplet
  + per-skill blocks, per §3.2 / §4.6 / §6.5. *Why it matters:* §3.1
  mandates prompt caching; this is the single place that decides what
  gets cached.
- **`cli.py`** — replaces seven stub handlers (`trust`, `promote`,
  `demote`, `ask`, `note`, `critical`, `inbox`) with real ones. Config
  dir honours `$GAIA_CODER_HOME` so tests never touch real user state.
  Stubs remain for the Phase 6+ verbs (`daemon`, `status`, `feedback`,
  `doctor`, etc.).
- **`prompts/intent_classifier.md` + `prompts/standup.md`** — §15.8 P9
  and P10 prompt templates landed verbatim for future `prompt`-class
  self-fix PRs.
- **Tests (58 new)** — each module has a dedicated test file; CLI tests
  run as subprocess to exercise the full argparse + env-var path.

## Dependencies

This PR depends on sibling branches that have not yet merged to `coder`:

- **#819 scaffold** — imports `gaia.coder.{__init__,base,loop}` and the
  `GAIA.md` / `ARCHITECTURE.md` / `PROJECT_MAP.md` placeholders.
- **#820 stores** — hard dep on `gaia.coder.stores.{em_inbox, feedback,
  audit}`.
- **#818 mixins** — optional import of `gaia.coder.tools.cli` for
  subprocess helpers (not used in this PR's code paths).

Rebase onto `coder` once those land.

## Test plan

- [ ] `pytest
tests/coder/test_{trust,intent,inbox,prompt_composer,cli_trust}.py -xvs`
— all 58 tests pass
- [ ] `gaia-coder trust` on a fresh `$GAIA_CODER_HOME` prints the §4.1
bootstrap question and exits 0
- [ ] `gaia-coder trust --bootstrap --em-handle <you> --em-channel <ch>`
then `gaia-coder trust` renders the §4.2 template verbatim
- [ ] `gaia-coder promote --to-tier 2 --reason "..." --em-signature
<you>` updates `em.toml` and writes an audit row
- [ ] `gaia-coder promote ... --em-signature wrong-user` exits 1 with
the mismatch message on stderr
- [ ] `gaia-coder ask "enable self-edit"` prints the auto-ack template
and enqueues a pending row in `em_inbox.db`
- [ ] `gaia-coder inbox` lists the pending row
kovtcharov added a commit that referenced this pull request Apr 20, 2026
…825)

## Summary

Phase 6 implements **gaia-coder's self-correction loop** — the core
value proposition. When the EM gives feedback, the agent now triages it
into one of eight fix classes, localises the cause, drafts a
regression-tested plan, applies a fix on an
`auto/gaia-coder/<feedback_id>` branch, and opens a draft PR targeting
`coder`. Wires §7.2 continuous critique, §7.3 feedback intake, §7.4 loop
steps 1-10, and §7.9 verification. Loosely coupled to Phase 4 (review
gate) and Phase 5 (trust inbox) so they can land in any order.

## Threads

- **Prompt templates (§15.8 P1/P2/P3).** `prompts/triage.md`,
`prompts/critique.md`, and `prompts/plan_review.md` — the three
canonical text bodies the loop consumes. Stored as plain Markdown so she
can edit them via prompt-class self-fix PRs.
- **Triage (§7.4 step 1-2).** `classify_fix_class` runs P1 on Opus 4.7;
`< 60` confidence is rewritten to `out-of-scope` so the loop never
commits to a guess. `localise` is deterministic grep — no LLM.
- **Planner (§5.1 Stage 3 / §7.4 step 3).** `draft_plan` +
`is_large_job` + `request_em_approval`. Large jobs (> 200 LoC, or
`architectural` / `state-machine`, or cross-mixin) post the P3 message
to the EM inbox and wait for ✅.
- **Fixer (§7.4 steps 4-5).** `generate_fix` creates the self-fix branch
and applies edits; `write_regression_test` emits a pytest file + marker
flag so it *genuinely* fails on `coder` and passes on the fix branch (no
clever mocks); `verify_test_differential` raises on the pass-on-both or
fail-on-both failure modes.
- **Publisher (§7.4 steps 7-8).** `open_self_fix_pr` refuses to open
without a regression test (§7.4 step 5 hard rule). PR body cites
`feedback_id` and quotes the EM's wording verbatim — Pass 7
(feedback-binding) depends on both.
- **Verifier (§7.4 step 10).** `verify_on_merge` re-runs the regression
test on the merged SHA, transitions the feedback row to `verified`, and
writes `failure_patterns` + `review_patterns` memory records so the same
symptom is recognised next time.
- **Continuous critique (§7.2).** Cheap one-shot Opus call after every
state-changing tool; findings `< 60` confidence are dropped; `≥ 80`
surface inline for pre-transition action.
- **Loop driver.** `FeedbackLoopDriver.process_pending_feedback()`
orchestrates the whole thing with full `pending → triaged → in-fix →
fix-pr-open → verified | rejected → closed` transitions written to
`feedback.notes_json`.
- **SelfFixToolsMixin.** Registers **10 tools** (well above the §15.2 ≥
7 contract) so the loop is callable from the agent's tool registry.
Phase-7 tools (`classify_failure`, `pause_current_task`, `restart_self`,
`edit_self_file`) are intentionally out of scope.
- **CLI.** `gaia-coder feedback "<body>" --severity high --on <url>`
enqueues, `gaia-coder self-fix process` runs one iteration.
- **Tests.** 47 new tests covering every §7.4 step, every §7.2 filtering
rule, and the §15.2 mixin contract. Mocks Anthropic and `gh` at the
callable boundary; uses a tmp git repo with a `coder` branch for real
branch creation and pytest differential runs.

## Out of scope (explicit non-goals per the Phase 6 task)

- EventBridge ingestion of `@gaia-coder feedback:` comments (Phase 10
repo binding).
- Auto-merge (§7.6) — blocked on Phase 4 review-gate confidence score.
- Dev-mode self-edit (§7.5) and `restart_self` — Phase 7.
- ReAct loop self-edit (§7.8) — Phase 7.

## Dependencies

- **#818 (mixins):** Imports `edit_file` semantics inline rather than
instantiating `FileToolsMixin`, to keep the module usable without the
mixin wired.
- **#819 (scaffold):** Package skeleton + GAIA.md + prompts/ dir.
- **#820 (stores):** Uses `gaia.coder.stores.feedback` and
`gaia.coder.stores.memory` directly.
- **Phase 4 review gate (sibling):** Loose-coupled —
`review_gate_runner` is optional; absent, we publish a "(review gate not
available)" placeholder in the PR body.
- **Phase 5 trust (sibling):** Loose-coupled — `request_em_approval`
uses `gaia.coder.trust.inbox.enqueue` if importable, else defers.

## Test plan

- [ ] `pytest tests/coder/test_self_fix/ -xvs` — 47 tests pass.
- [ ] `pytest tests/coder/ -q` — full coder suite (120 tests) pass.
- [ ] `python util/lint.py --all` — no new critical errors (pre-existing
pylint warnings in `cli.py`, `discovery.py`, etc. are not touched).
- [ ] `python -c "from gaia.coder.self_fix import SelfFixToolsMixin; m =
SelfFixToolsMixin(); assert len(m.register_self_fix_tools()) >= 7"` —
smoke.
- [ ] `gaia-coder feedback "..." --severity high --on
https://github.com/amd/gaia/pull/9999 --id fb-test --db-path /tmp/fb.db`
— writes a row; follow with `gaia-coder self-fix process --db-path
/tmp/fb.db --repo-root <worktree> --skip-differential-verify
--skip-fix-apply` for the end-to-end path (PR creation mocked).

## Merge plan

- Draft PR, **do not merge**. Rebase onto `coder` after Phase 4 and
Phase 5 land so the review gate and inbox wiring become real imports.
@kovtcharov kovtcharov mentioned this pull request Apr 26, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant