feat(coder): package skeleton + loop.py with DEFAULT_LOOP by kovtcharov · Pull Request #819 · amd/gaia

kovtcharov · 2026-04-20T08:19:33Z

Summary

Phase 1 scaffolding for gaia-coder per docs/plans/coder-agent.mdx. Creates the package structure every follow-up task imports from — no runtime behaviour, no LLM calls, no network, no SQLite. Just the load-bearing shapes: her own base class, the editable 20-state ReAct loop, the CLI surface, and the three living documents the spec anchors on.

This is the trunk the two sibling branches (feature/gaia-coder-stores, feature/gaia-coder-mixins-1) rebase onto once it lands.

Threads

Package skeleton (§3.1) — src/gaia/coder/ with sub-packages for prompts/, stores/, tools/, review/, introspect/, self_fix/, tests/, skills/. Every __init__.py is a stub pointing at the phase that fills it in. Matters because downstream tasks need stable import paths now without racing on who creates which directory.
CLI stub (§3.1) — gaia-coder console script wired through setup.py. All 17 subcommands from the spec (daemon, status, ask, note, critical, inbox, feedback, promote, demote, trust, audit, spend, egress, introspect, skill, doctor, rag) print "<name>: not yet implemented" and exit 0. Matters so docs, tests, and muscle memory can start using the real surface immediately. Uses argparse (matching existing gaia CLI) rather than click, which is not a project dependency.
State machine (§5.1, §15.3) — src/gaia/coder/loop.py with immutable Transition, State, Loop dataclasses and DEFAULT_LOOP containing all 20 states grouped into the seven stages. The self_review state carries the updated three-way transition (publish | debug | edit based on failure_is_complex) from §15.3 — shallow failures go straight back to edit, complex ones enter the dedicated debug sub-loop. Matters because a typo here silently regresses her review discipline.
Base class (§5.1) — CoderAgent that does NOT inherit from gaia.agents.base.Agent. Product-agent assumptions (request-scoped, single LLM round-trip) do not match her daemon-scoped lifecycle with durable queues and editable control flow. Composition from the GAIA base is still allowed at use-site (@tool, AgentConsole, PathValidator). Matters so sibling tasks can from gaia.coder import CoderAgent today.
Living documents (§4.6, §6.5) — GAIA.md (identity: principles, persona, Karpathy working-style rules), ARCHITECTURE.md (how she is composed), PROJECT_MAP.md (how the project she's building is composed), plus skills/catalog.toml seed. All short Phase-1 placeholders; real content lands in Phase 3/5. Matters because both docs are referenced from every system prompt — they need to exist on disk before the runner does.
Introspection seed (§7.7) — introspect_state_machine() in loop.py returns a JSON snapshot + stateDiagram-v2 Mermaid render of the loop. The Phase 3 IntrospectionToolsMixin wraps this as the public @tool; keeping the implementation next to the loop it describes avoids duplication.

Deviations from spec

Spec says "wired via click (already a project dep)" — click is not actually in setup.py or any extras. Switched to argparse to match the existing src/gaia/cli.py convention and avoid adding a dependency. The subcommand surface is unchanged.
Spec suggests 4 commits; this PR has 3. Splitting loop.py across two commits (dataclasses in b, DEFAULT_LOOP + introspection helper in c) required temporarily deleting and re-adding content, which felt worse than a clean "state machine + base class" commit. Commits are: (1) scaffold + CLI, (2) loop.py + base.py + __init__.py re-exports, (3) tests.

Test plan

pytest tests/coder/test_skeleton.py -xvs — 5/5 pass, including the guard against regressing self_review's three-way transition.
python util/lint.py --black --isort --flake8 on the new files — all clean (project flake8 uses --max-line-length=88 with E501 ignored per util/lint.py).
uv pip install -e . — succeeds; gaia-coder --help prints the 17-subcommand list; stub subcommands exit 0 (gaia-coder introspect, gaia-coder status, etc.).
git diff feature/coding-agent...HEAD --stat — changes only under src/gaia/coder/, tests/coder/, and setup.py. No edits to src/gaia/agents/ or other existing modules.
Existing gaia-code entry (legacy Next.js scaffolder) untouched per §1.

Phase 1 scaffolding per docs/plans/coder-agent.mdx §3.1. Creates the package skeleton every downstream task imports from: subpackages for prompts, stores, tools, review, introspect, self_fix, tests, and skills; the three living documents (GAIA.md, ARCHITECTURE.md, PROJECT_MAP.md) as short placeholders; the skills catalog.toml seed; and the gaia-coder console script backed by an argparse-based CLI whose 17 subcommands (§3.1) are all Phase 1 stubs that print "not yet implemented" and exit 0. setup.py registers the new packages and the gaia-coder entry point; the existing gaia-code entry (legacy Next.js scaffolder) is untouched per §1. State machine (loop.py) and base class (base.py) land in follow-up commits on this branch; sibling branches feature/gaia-coder-stores and feature/gaia-coder-mixins-1 will fill in stores/ and tools/ after this lands.

@tool

Adds the canonical ReAct control flow for gaia-coder per docs/plans/coder-agent.mdx §5.1 and §15.3: - src/gaia/coder/loop.py defines the immutable Transition / State / Loop dataclasses and ships DEFAULT_LOOP — 20 states grouped into the seven stages (Intake, Understand, Design, Build, Verify, Publish, Land). The self_review state emits the updated three-way transition (publish | debug | edit based on failure_is_complex) from §15.3 so shallow failures go straight back to edit while complex failures enter the dedicated debug sub-loop. - introspect_state_machine() renders the loop as a JSON snapshot plus a stateDiagram-v2 Mermaid string — the Phase 3 IntrospectionToolsMixin (§7.7) will wrap this as the public tool. - src/gaia/coder/base.py defines CoderAgent: her own base class that does NOT inherit from gaia.agents.base.Agent, since product-agent assumptions (request-scoped, single LLM round-trip) do not match her daemon-scoped lifecycle. Composition from the GAIA base (@tool, AgentConsole, PathValidator) is still allowed at use-site. The top-level package re-exports CoderAgent, DEFAULT_LOOP, Loop, State, and Transition so sibling branches have a stable public surface to import from.

Five tests pin the public surface sibling branches rely on: - gaia-coder --help runs and exits 0 with a subcommand list. - `from gaia.coder import CoderAgent` and `from gaia.coder.loop import DEFAULT_LOOP` both succeed. - DEFAULT_LOOP has exactly 20 states (§15.3). - The self_review state has exactly three transitions targeting publish / debug / edit — the updated three-way transition from §15.3 that a typo or edit to loop.py would regress silently. - introspect_state_machine() returns a Mermaid render; the test accepts either the `graph TD` or `stateDiagram` dialect since §7.7 does not pin the choice. CLI invocation uses `python -m gaia.coder.cli` so the test runs whether or not the gaia-coder console script has been installed — same invariant either route exercises.

github-actions · 2026-04-20T09:23:26Z

Review — `feat(coder): package skeleton + loop.py with DEFAULT_LOOP` (#819)

Note: this PR is already merged. Posting a retrospective review so the follow-up sibling branches (feature/gaia-coder-stores, feature/gaia-coder-mixins-1) can pick up the threads.

Summary

Clean, well-scoped Phase 1 scaffolding. Introduces a new src/gaia/coder/ package that deliberately sits outside the src/gaia/agents/ product-agent ecosystem, adds a gaia-coder CLI stub whose 17 subcommands all exit 0, and pins the §15.3 loop shape (20 states, three-way self_review branch) with a real test. The single most important thing to know: every docstring, placeholder document, and the PR body itself cite docs/plans/coder-agent.mdx, but that file has never existed in the repo (git log --all -- docs/plans/coder-agent.mdx returns nothing). Readers following the breadcrumbs will hit dead ends until the spec is committed.

Issues

🟡 Important

Referenced spec docs/plans/coder-agent.mdx does not exist — in pr-diff.txt I count 19 references to docs/plans/coder-agent.mdx across __init__.py (src/gaia/coder/__init__.py:13), base.py (src/gaia/coder/base.py:15,58,67), cli.py (src/gaia/coder/cli.py:101), loop.py (src/gaia/coder/loop.py:6), and all three living documents (GAIA.md:5, ARCHITECTURE.md:3, PROJECT_MAP.md:3). ls docs/plans/ confirms every neighbour exists (agent-hub.mdx, autonomy-engine.mdx, skill-format.mdx, …) except coder-agent.mdx. Every §X.Y citation in the scaffolding currently points at vapour. Either land the spec in a follow-up PR referenced from here, or relax the docstrings to say "spec to land in a follow-up" so first-time readers aren't chasing a ghost.

docs/docs.json and docs/reference/cli.mdx not updated for the new gaia-coder console script — CLAUDE.md is unambiguous: "New CLI commands → update docs/reference/cli.mdx" and "New pages must be added to docs/docs.json navigation". The PR adds a new user-facing binary (setup.py:216 adds gaia-coder = gaia.coder.cli:main) but skips both docs touchpoints. Phase 1 is a reasonable time to defer this since every subcommand is a stub, but an explicit note in the PR body ("docs land with Phase 2 when subcommands do real work") would have made the deferral legible. Please capture in a follow-up issue so it isn't lost.

🟢 Minor

test_introspect_state_machine_returns_mermaid has a vestigial graph TD branch (tests/coder/test_skeleton.py:99). The assertion is:

assert "graph TD" in mermaid or "stateDiagram" in mermaid

but loop.py:_mermaid (line 118) only ever emits stateDiagram-v2. The or "graph TD" arm is dead. That's fine as forward-looking flexibility, but the docstring at test_skeleton.py:92–97 implies both dialects are "accepted" which misleads future readers into thinking the implementation sometimes produces graph TD. Either tighten the assertion or drop the "graph TD" half of the docstring:

def test_introspect_state_machine_returns_mermaid() -> None:
    """The introspection helper must return a Mermaid stateDiagram render.

    §7.7 specifies the tool renders both JSON and a Mermaid diagram for
    the EM's inspection. The current implementation uses
    ``stateDiagram-v2`` so the seven-stage grouping renders as nested
    blocks; this test pins that dialect.
    """
    from gaia.coder.loop import introspect_state_machine

    payload = introspect_state_machine()

    assert isinstance(payload, dict)
    assert "json" in payload
    assert "mermaid" in payload

    mermaid = payload["mermaid"]
    assert isinstance(mermaid, str)
    assert "stateDiagram" in mermaid, (
        f"Mermaid render should use the `stateDiagram` dialect; "
        f"got:\n{mermaid[:200]}"
    )

Optional[X] where X | None would be more idiomatic (src/gaia/coder/base.py:275,316, src/gaia/coder/loop.py:656,715,716). setup.py:219 pins python_requires=">=3.10", so PEP 604 union syntax is always available. The file already uses it in the CLI (cli.py:149 — argv: list[str] | None = None), so the inconsistency is within the PR itself. Not blocking — the repo has mixed usage — but worth normalising when the real bodies land.

gaia-coder not listed in the CLAUDE.md "Console Script Entry Points" table (repo-level CLAUDE.md, Project Structure section). That table lists gaia, gaia-mcp, gaia-code, gaia-emr but not the new binary. Adding it keeps the repo-level doc self-describing and prevents new contributors from asking "what's gaia-coder vs gaia-code?".

_build_parser's subcommands parameter type is Iterable[tuple[str, str]] but it's iterated only once (src/gaia/coder/cli.py:94). Harmless for the default call, but if a test ever passes a generator and then calls _build_parser a second time, the generator is exhausted. Narrow to Sequence[tuple[str, str]] to make re-iteration safe, or document the constraint.

Strengths

Immutable graph modelling is correct for the stated use case. @dataclass(frozen=True) on Transition/State/Loop plus the "an edit means a diff to this file, not a runtime mutation API" comment (loop.py:43–46) is the right stance — any future runtime that tries to mutate the graph will fail loudly, which matches the §7.8 invariant that every merged edit bumps Loop.version.
Test pins the load-bearing invariants, not implementation details. test_default_loop_has_20_states and test_self_review_has_three_transitions would both fail if someone silently regressed §15.3. That's high-signal coverage for a skeleton PR.
The "not inheriting from gaia.agents.base.Agent" decision is justified in-place (base.py:5–12). The docstring names both what's different (daemon lifecycle, durable queues, editable graph) and what composition is still welcome (@tool, AgentConsole, PathValidator). Future readers won't have to go spelunking for the rationale.
Fail-loudly is honoured where it matters. Loop.state_by_name (loop.py:159) raises KeyError with the known state list instead of returning None; identity_document (base.py:170) lets FileNotFoundError propagate with a docstring explaining why absence is a bug not a fallback. Matches repo CLAUDE.md.
Scope discipline is clean. Only new files under src/gaia/coder/ + tests/coder/, plus the setup.py package/entry-point additions. No drive-by edits to src/gaia/agents/ or other modules.

Verdict

Approve with suggestions (retrospective — PR is already merged). No blocking issues. The missing docs/plans/coder-agent.mdx is the one thing that genuinely warrants a follow-up: land it (or retarget the references) before the next scaffolding PR, because every sibling branch will inherit the same broken-citation pattern otherwise.

Three review-followups from the #818/#819/#820 merge: - src/gaia/coder/__init__.py: restore the CoderAgent/DEFAULT_LOOP/Loop/State/Transition re-exports that #819 added and the #818 rebase accidentally dropped. test_package_imports in test_skeleton.py relies on these. - src/gaia/coder/tools/cli.py: replace the bare except Exception: pass in the stream-reader teardown with a targeted OSError catch + logger.debug. Per CLAUDE.md's fail-loudly rule, silent swallows hide reader-thread bugs; a stream close failing under pipe tear-down is real and worth a debug line. - src/gaia/coder/tools/search.py: stop reaching into the private _TOOL_REGISTRY dict from grep(). Use get_tool_metadata() — the public accessor added in base/tools.py:113 exists precisely for this. Tests: all 73 (5 skeleton + 38 mixins + 30 stores) pass on coder HEAD with the fixes.

#823) ## Summary Three review-followups from the #818 / #819 / #820 merge, flagged by the auto-review bot. All tests (73/73 on `coder`) pass with the fixes. ## What this changes - **Critical** — `src/gaia/coder/__init__.py`: restore the `CoderAgent` / `DEFAULT_LOOP` / `Loop` / `State` / `Transition` re-exports that #819 added and #818's rebase accidentally dropped. Without this, `tests/coder/test_skeleton.py::test_package_imports` fails on `coder` HEAD. - **Important** — `src/gaia/coder/tools/cli.py:135`: replace bare `except Exception: pass` in the stream-reader teardown with a targeted `OSError` catch + `logger.debug`. CLAUDE.md's *No Silent Fallbacks* rule explicitly forbids this pattern. - **Important** — `src/gaia/coder/tools/search.py:59`: stop reaching into the private `_TOOL_REGISTRY` from `grep()`. Use `get_tool_metadata()` — the public accessor at `src/gaia/agents/base/tools.py:113` exists for this. ## Test plan - [x] `pytest tests/coder/ -x` — 73/73 pass (5 skeleton + 38 mixins + 30 stores) - [x] `python -c "from gaia.coder import CoderAgent, DEFAULT_LOOP, Loop, State, Transition"` — all imports resolve

Addresses the auto-review findings on #819 and #820: **Land the spec.** docs/plans/coder-agent.mdx was being referenced from 19 places in the scaffold (every module docstring, every living doc, every CLI body) but the file itself was untracked — every citation pointed at vapour. Commit it now so Phase 2+ branches inherit real citations. Also lands docs/superpowers/specs/2026-04-19-gaia-code-agent-analysis.md as the retained historical context per the plan's §14. **SQL identifier hardening** (src/gaia/coder/stores/_common.py). The CRUD primitives string-interpolated table names, column names, and ORDER BY clauses. SQLite has no built-in parameterisation for identifiers, so whitelist matching is the only safe path. Adds _safe_ident() + _SAFE_IDENT_RE and applies it at every interpolation site, plus multi-clause ORDER BY validation (split by comma, each clause validated for optional ASC/DESC direction). Auto-review flagged this as a latent injection surface; treating it as a latent vuln, not hypothetical. **Narrow the conftest catch** (tests/coder/conftest.py). The bare `except Exception` during stub fallback was masking import-time bugs in stores (AttributeError, SyntaxError). Now catches only ImportError/ModuleNotFoundError and logs the underlying reason so real failures surface in CI rather than silently stubbing over. Scaffold is now landed so step-1 (real import) always succeeds — this is a safety net, not a hot path. **Register gaia-coder in CLAUDE.md + docs.json**. The new console_scripts entry was missing from the Console Script Entry Points table (§Project Structure) and from the "Standalone binaries" section. Also adds plans/coder-agent to docs.json under the Agents group so Mintlify renders it. All 73 coder tests pass (5 skeleton + 38 mixins + 30 stores).

## Summary Addresses the auto-review findings on #819 + #820 and lands the plan that every scaffold citation was pointing at. ## Changes - **Land `docs/plans/coder-agent.mdx`** (3,584 lines). 19 scaffold citations were pointing at a file that didn't exist in tracked state. Also lands `docs/superpowers/specs/2026-04-19-gaia-code-agent-analysis.md` as the retained historical context. - **SQL identifier hardening** (`src/gaia/coder/stores/_common.py`). `_safe_ident()` + `_SAFE_IDENT_RE` validation at every interpolation site (table, column, ORDER BY). Multi-clause ORDER BY supported. - **Narrow conftest catch** (`tests/coder/conftest.py`). Only catches `ImportError` / `ModuleNotFoundError`; logs the reason. Real import-time bugs in stores now surface. - **Register `gaia-coder`** in `CLAUDE.md` Console Script Entry Points + Standalone binaries, and add `plans/coder-agent` to `docs/docs.json` under Agents. ## Test plan - [x] `pytest tests/coder/` — 73/73 pass (5 skeleton + 38 mixins + 30 stores) - [x] `docs/plans/coder-agent.mdx` resolves for every scaffold docstring citation - [x] SQL injection guard: `_safe_ident("x; DROP TABLE")` raises `ValueError`

## Summary Phase 4 of the `gaia-coder` plan: the seven-pass self-review gate that runs before she ever calls `gh_pr_create`. Implements the full §8 table — deterministic checks (lint, tests, security, prose) plus LLM-driven passes (architectural, persona, adversarial, feedback-binding) — and surfaces an aggregated verdict with the confidence score the §7.6 auto-merge path reads. This is the "deep-review discipline" the trust contract is built on. Every PR she opens must clear this gate; self-fix PRs additionally require Pass 7 (feedback-binding), which confirms the diff actually addresses the EM's wording and that the regression test fails on `coder` and passes on the branch. ## What this PR adds - **Seven review passes** (`pass_1_static` through `pass_7_feedback_binding`) — each a focused module that returns a common `PassResult` envelope, so the gate never has to special-case any of them. - **`gate.run_all_passes`** — orchestrator with cost-aware short-circuiting: a hard-fail on the deterministic cheap gates (1, 2, 4) skips the expensive Opus calls, because there is no point paying for adversarial review on a branch that fails lint. - **`ReviewToolsMixin`** — exposes every pass as a `@tool` (`review_diff_self_static`, …) plus the one-shot `review_diff_gate`, so the agent, the EM's TUI, and the evaluation harness all share one code path. - **Canonical prompt files** for Passes 3, 5, 6, 7 under `src/gaia/coder/prompts/`, materialised verbatim from §15.8 — self-fix PRs that touch a prompt are now a single-file `git diff`, the whole point of the §6.4 whitelist. - **22 unit tests** that cover every pass (one pass + one fail per module), the gate's short-circuit behaviour, the self-fix gating of Pass 7, the confidence-score contract, and a prompt-files-exist guard so the canonical templates can't silently drift. ## Why the LLM seam matters Every Opus call flows through `gaia.coder.review._llm.call_opus`. Tests patch that one name — cheap, stable, vendor-agnostic — rather than reaching into the Anthropic SDK. When we ship the Claude Agent SDK integration for Passes 3 and 6 (the `architecture-reviewer` / `code-reviewer` subagents in §15.8), the switch happens at a single module boundary without churning every test. ## Scope constraint Changes are confined to `src/gaia/coder/review/`, `src/gaia/coder/prompts/`, and `tests/coder/test_review.py`. No other files are touched. Pass 7's differential-pytest worktree machinery is wired but opt-out via `skip_differential_pytest=True` for unit tests, because Phase 4 intentionally does not ship a running daemon — that's Phase 5. ## Dependencies - Stacks on top of #818 (mixins; uses `gaia.coder.tools.cli._check_denylist` for subprocess safety) and #819 (scaffold). Both appear in this PR until they merge to `coder` — the diff will clean itself up afterwards. - Does not depend on #820 (stores). ## Citations - [`docs/plans/coder-agent.mdx`](../blob/feature/gaia-coder-review/docs/plans/coder-agent.mdx) §8 — the seven-pass table - §15.8 — canonical prompt templates for Passes 3/5/6/7 - §7.6 — confidence-score gate on auto-merge - §7.4 — self-correction loop that Pass 7 bookends ## Test plan - [x] `pytest tests/coder/test_review.py -xvs` — 22/22 pass - [x] `pytest tests/coder/` — 65/65 pass (sibling tests untouched) - [x] `python util/lint.py --black --isort` — clean on in-scope files - [x] Smoke: `from gaia.coder.review import ReviewToolsMixin; m = ReviewToolsMixin(); m.register_review_tools()` registers exactly 8 tools - [ ] Integration run against a real branch with Anthropic creds present (Phase 5 territory — deferred to the daemon wiring task) ## Do not merge Draft: this stacks on two unmerged sibling PRs and the Phase 4 runtime wiring lives in Phase 5.

## Summary Lands Phase 5 of `docs/plans/coder-agent.mdx` — the EM-facing surface of `gaia-coder`. Before this PR the CLI verbs were stubs; after it, the EM can bootstrap the agent, read her trust contract, promote/demote her, and queue messages. Every LLM call now injects her identity triplet (`GAIA.md` + `ARCHITECTURE.md` + `PROJECT_MAP.md`) as a cacheable prefix. ## Threads - **`trust.py`** — `EMConfig` / `RepoBinding` Pydantic models, the 0-5 `CapabilityTier` ladder, TOML round-trip, and `promote` / `demote` functions with audit-log writes. Promotion refuses mismatched EM signatures; demotion is immediate. *Why it matters:* §4.2 makes promotion explicit, so a quiet accept would let any caller escalate tier. Fail-loudly TrustError surfaces exactly what to fix. - **`inbox.py`** — thin CRUD over `em_inbox.db` with the §4.5 5-second non-LLM auto-ack, channel-agnostic dispatch callable, and escalation into `feedback.db` with severity translation. *Why it matters:* §4.5 says the ack is non-negotiable in latency; keeping it template- only (no model call) makes the SLA automatic. - **`intent.py`** — LLM-driven conversational intent classifier for §15.4 + §15.8 P9, temperature 0 Opus 4.7, mockable via an injected `llm` callable. Low confidence (< 70) and unknown intents coerce to `free_form`. Handler functions cover every §15.4 intent. *Why it matters:* regex matchers would miss paraphrases ("let me give you self-edit for now"); LLM routing keeps the grammar maintainable. - **`prompt_composer.py`** — builds Anthropic-format message blocks with `cache_control={"type":"ephemeral"}` on the identity triplet + per-skill blocks, per §3.2 / §4.6 / §6.5. *Why it matters:* §3.1 mandates prompt caching; this is the single place that decides what gets cached. - **`cli.py`** — replaces seven stub handlers (`trust`, `promote`, `demote`, `ask`, `note`, `critical`, `inbox`) with real ones. Config dir honours `$GAIA_CODER_HOME` so tests never touch real user state. Stubs remain for the Phase 6+ verbs (`daemon`, `status`, `feedback`, `doctor`, etc.). - **`prompts/intent_classifier.md` + `prompts/standup.md`** — §15.8 P9 and P10 prompt templates landed verbatim for future `prompt`-class self-fix PRs. - **Tests (58 new)** — each module has a dedicated test file; CLI tests run as subprocess to exercise the full argparse + env-var path. ## Dependencies This PR depends on sibling branches that have not yet merged to `coder`: - **#819 scaffold** — imports `gaia.coder.{__init__,base,loop}` and the `GAIA.md` / `ARCHITECTURE.md` / `PROJECT_MAP.md` placeholders. - **#820 stores** — hard dep on `gaia.coder.stores.{em_inbox, feedback, audit}`. - **#818 mixins** — optional import of `gaia.coder.tools.cli` for subprocess helpers (not used in this PR's code paths). Rebase onto `coder` once those land. ## Test plan - [ ] `pytest tests/coder/test_{trust,intent,inbox,prompt_composer,cli_trust}.py -xvs` — all 58 tests pass - [ ] `gaia-coder trust` on a fresh `$GAIA_CODER_HOME` prints the §4.1 bootstrap question and exits 0 - [ ] `gaia-coder trust --bootstrap --em-handle <you> --em-channel <ch>` then `gaia-coder trust` renders the §4.2 template verbatim - [ ] `gaia-coder promote --to-tier 2 --reason "..." --em-signature <you>` updates `em.toml` and writes an audit row - [ ] `gaia-coder promote ... --em-signature wrong-user` exits 1 with the mismatch message on stderr - [ ] `gaia-coder ask "enable self-edit"` prints the auto-ack template and enqueues a pending row in `em_inbox.db` - [ ] `gaia-coder inbox` lists the pending row

…825) ## Summary Phase 6 implements **gaia-coder's self-correction loop** — the core value proposition. When the EM gives feedback, the agent now triages it into one of eight fix classes, localises the cause, drafts a regression-tested plan, applies a fix on an `auto/gaia-coder/<feedback_id>` branch, and opens a draft PR targeting `coder`. Wires §7.2 continuous critique, §7.3 feedback intake, §7.4 loop steps 1-10, and §7.9 verification. Loosely coupled to Phase 4 (review gate) and Phase 5 (trust inbox) so they can land in any order. ## Threads - **Prompt templates (§15.8 P1/P2/P3).** `prompts/triage.md`, `prompts/critique.md`, and `prompts/plan_review.md` — the three canonical text bodies the loop consumes. Stored as plain Markdown so she can edit them via prompt-class self-fix PRs. - **Triage (§7.4 step 1-2).** `classify_fix_class` runs P1 on Opus 4.7; `< 60` confidence is rewritten to `out-of-scope` so the loop never commits to a guess. `localise` is deterministic grep — no LLM. - **Planner (§5.1 Stage 3 / §7.4 step 3).** `draft_plan` + `is_large_job` + `request_em_approval`. Large jobs (> 200 LoC, or `architectural` / `state-machine`, or cross-mixin) post the P3 message to the EM inbox and wait for ✅. - **Fixer (§7.4 steps 4-5).** `generate_fix` creates the self-fix branch and applies edits; `write_regression_test` emits a pytest file + marker flag so it *genuinely* fails on `coder` and passes on the fix branch (no clever mocks); `verify_test_differential` raises on the pass-on-both or fail-on-both failure modes. - **Publisher (§7.4 steps 7-8).** `open_self_fix_pr` refuses to open without a regression test (§7.4 step 5 hard rule). PR body cites `feedback_id` and quotes the EM's wording verbatim — Pass 7 (feedback-binding) depends on both. - **Verifier (§7.4 step 10).** `verify_on_merge` re-runs the regression test on the merged SHA, transitions the feedback row to `verified`, and writes `failure_patterns` + `review_patterns` memory records so the same symptom is recognised next time. - **Continuous critique (§7.2).** Cheap one-shot Opus call after every state-changing tool; findings `< 60` confidence are dropped; `≥ 80` surface inline for pre-transition action. - **Loop driver.** `FeedbackLoopDriver.process_pending_feedback()` orchestrates the whole thing with full `pending → triaged → in-fix → fix-pr-open → verified | rejected → closed` transitions written to `feedback.notes_json`. - **SelfFixToolsMixin.** Registers **10 tools** (well above the §15.2 ≥ 7 contract) so the loop is callable from the agent's tool registry. Phase-7 tools (`classify_failure`, `pause_current_task`, `restart_self`, `edit_self_file`) are intentionally out of scope. - **CLI.** `gaia-coder feedback "<body>" --severity high --on <url>` enqueues, `gaia-coder self-fix process` runs one iteration. - **Tests.** 47 new tests covering every §7.4 step, every §7.2 filtering rule, and the §15.2 mixin contract. Mocks Anthropic and `gh` at the callable boundary; uses a tmp git repo with a `coder` branch for real branch creation and pytest differential runs. ## Out of scope (explicit non-goals per the Phase 6 task) - EventBridge ingestion of `@gaia-coder feedback:` comments (Phase 10 repo binding). - Auto-merge (§7.6) — blocked on Phase 4 review-gate confidence score. - Dev-mode self-edit (§7.5) and `restart_self` — Phase 7. - ReAct loop self-edit (§7.8) — Phase 7. ## Dependencies - **#818 (mixins):** Imports `edit_file` semantics inline rather than instantiating `FileToolsMixin`, to keep the module usable without the mixin wired. - **#819 (scaffold):** Package skeleton + GAIA.md + prompts/ dir. - **#820 (stores):** Uses `gaia.coder.stores.feedback` and `gaia.coder.stores.memory` directly. - **Phase 4 review gate (sibling):** Loose-coupled — `review_gate_runner` is optional; absent, we publish a "(review gate not available)" placeholder in the PR body. - **Phase 5 trust (sibling):** Loose-coupled — `request_em_approval` uses `gaia.coder.trust.inbox.enqueue` if importable, else defers. ## Test plan - [ ] `pytest tests/coder/test_self_fix/ -xvs` — 47 tests pass. - [ ] `pytest tests/coder/ -q` — full coder suite (120 tests) pass. - [ ] `python util/lint.py --all` — no new critical errors (pre-existing pylint warnings in `cli.py`, `discovery.py`, etc. are not touched). - [ ] `python -c "from gaia.coder.self_fix import SelfFixToolsMixin; m = SelfFixToolsMixin(); assert len(m.register_self_fix_tools()) >= 7"` — smoke. - [ ] `gaia-coder feedback "..." --severity high --on https://github.com/amd/gaia/pull/9999 --id fb-test --db-path /tmp/fb.db` — writes a row; follow with `gaia-coder self-fix process --db-path /tmp/fb.db --repo-root <worktree> --skip-differential-verify --skip-fix-apply` for the end-to-end path (PR creation mocked). ## Merge plan - Draft PR, **do not merge**. Rebase onto `coder` after Phase 4 and Phase 5 land so the review gate and inbox wiring become real imports.

kovtcharov added 3 commits April 20, 2026 01:15

github-actions Bot added dependencies Dependency updates tests Test changes labels Apr 20, 2026

kovtcharov changed the base branch from feature/coding-agent to coder April 20, 2026 08:23

kovtcharov changed the base branch from coder to main April 20, 2026 08:25

kovtcharov changed the base branch from main to coder April 20, 2026 08:25

kovtcharov marked this pull request as ready for review April 20, 2026 09:20

kovtcharov requested a review from kovtcharov-amd as a code owner April 20, 2026 09:20

kovtcharov merged commit 2370e4b into coder Apr 20, 2026
7 checks passed

kovtcharov deleted the feature/gaia-coder-scaffold branch April 20, 2026 09:20

github-actions Bot mentioned this pull request Apr 20, 2026

feat(coder): File/CLI/Search mixins per §15.2 #818

Merged

5 tasks

This was referenced Apr 20, 2026

feat(coder): multi-pass self-review gate (Passes 1-7) #821

Merged

feat(coder): trust contract + EM CLI + prompt composer #822

Merged

kovtcharov mentioned this pull request Apr 20, 2026

fix(coder): restore package exports + CLAUDE.md fail-loudly compliance #823

Merged

2 tasks

kovtcharov mentioned this pull request Apr 20, 2026

fix(coder): review followups + land the plan doc #824

Merged

3 tasks

kovtcharov mentioned this pull request Apr 20, 2026

feat(coder): self-correction loop (§7.3-§7.9 + continuous critique) #825

Merged

5 tasks

kovtcharov mentioned this pull request Apr 26, 2026

feat: GAIA Code initialization #151

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(coder): package skeleton + loop.py with DEFAULT_LOOP#819

feat(coder): package skeleton + loop.py with DEFAULT_LOOP#819
kovtcharov merged 3 commits into
coderfrom
feature/gaia-coder-scaffold

kovtcharov commented Apr 20, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kovtcharov commented Apr 20, 2026

Summary

Threads

Deviations from spec

Test plan

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Review — feat(coder): package skeleton + loop.py with DEFAULT_LOOP (#819)

Summary

Issues

🟡 Important

🟢 Minor

Strengths

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Review — `feat(coder): package skeleton + loop.py with DEFAULT_LOOP` (#819)