feat(coder): File/CLI/Search mixins per §15.2 by kovtcharov · Pull Request #818 · amd/gaia

kovtcharov · 2026-04-20T08:15:56Z

Summary

Three foundational no-network mixins for `gaia-coder` (per §15.2 of `docs/plans/coder-agent.mdx`). Unblocks scaffold and downstream mixin work: the scaffold task wires these into `CoderAgent.init`; GitHub/Web/OSSReuse mixins build on top.

Sibling streams: `feature/gaia-coder-scaffold` (agent class + loop), `feature/gaia-coder-stores` (§15.1 SQLite stores).

What's in the box

`FileToolsMixin` — `read_file`, `write_file`, `edit_file`, `search_code`, `glob`, `generate_diff`. Pure-Python (`re` + `pathlib` + `difflib`) so behaviour is deterministic on every platform. `read_file`/`edit_file` raise loudly on missing paths and non-unique matches per §2 principle 3.
`CLIToolsMixin` — `run_cli_command`, `stop_process`, `list_processes`, `get_process_logs`, plus the static-denylist layer of the §6.8 guardrail stack (`rm -rf /`, `sudo`, `chmod 777`, `curl | bash`, `git push origin main`) and a local `ShellDeniedError` (§15.8 will fold it into a shared taxonomy). Background processes land in a module-level registry with reader threads draining stdio so tails don't deadlock on pipe buffers.
`SearchToolsMixin` — `grep` (delegates to `search_code`), `find_symbol` (Python-only via `ast`), `list_files`. Non-Python `find_symbol` calls log WARN and return `[]` rather than silently skipping.

Test plan

`pytest tests/coder/test_mixins.py` — 38/38 passing (happy + failure + registry per tool)
`flake8 --max-line-length=88 --select=E9,F63,F7,F82,F401,F811 src/gaia/coder tests/coder` — clean
`black --check` / `isort --check-only` — clean
Integration smoke: `from gaia.coder.tools import *; FileToolsMixin().register_file_tools()` — 13 tools land in `_TOOL_REGISTRY`
`git diff --stat` — changes confined to `src/gaia/coder/` and `tests/coder/`

Not in this PR

`CoderAgent` class / `loop.py` — Stream A (`feature/gaia-coder-scaffold`)
SQLite stores — Stream C (`feature/gaia-coder-stores`)
`GitHubToolsMixin`, `WebToolsMixin`, `OSSReuseMixin` — follow-up tasks (require network)
Full exception taxonomy for denylist — §15.8

Status: draft — do not merge until the scaffold PR lands and wiring is verified end-to-end.

Implements §15.2 of the gaia-coder plan (docs/plans/coder-agent.mdx): read_file, write_file, edit_file, search_code, glob, generate_diff. Pure-Python (re + pathlib + difflib) so behaviour is deterministic on every platform — no dependency on ripgrep or external binaries. read_file and edit_file raise loudly on missing paths / non-unique matches per the fail-loudly principle rather than returning error envelopes. 18 tests under tests/coder/test_mixins.py cover happy path, failure modes, and registry wiring.

Implements §15.2 CLI tools: run_cli_command, stop_process, list_processes, get_process_logs, plus the static-denylist layer of the §6.8 guardrail stack (rm -rf /, sudo, chmod 777, curl | bash, git push origin main). Background processes land in a module-level _PROCESS_REGISTRY with reader threads draining stdout/stderr so foreground callers can tail logs without deadlocking pipe buffers. ShellDeniedError is defined locally for now — §15.8 will fold it into a shared exception taxonomy. 12 new tests covering happy path, non-zero exit, denylist, background lifecycle, log capture, and registry wiring (30 total).

Implements §15.2 Search tools: grep (delegates to search_code), find_symbol (Python-only via ast walk for FunctionDef/ClassDef/Import), list_files. SearchToolsMixin inherits from FileToolsMixin so agents that opt into search automatically get file tools — in practice you always want both. find_symbol logs a WARN and returns [] on repos with no .py files rather than silently skipping (§2 principle 3). 8 new tests covering happy path, kind-filter, unsupported-language WARN, recursive list, directory exclusion, and registry wiring (38 total).

github-actions · 2026-04-20T09:26:37Z

Code Review — #818 `feat(coder): File/CLI/Search mixins per §15.2`

Summary

Clean, well-tested foundation for the three no-network coder mixins. Stdlib-only implementation is portable and deterministic, the happy/failure/registration triad per tool is exemplary, and the 514-line test file is mostly readable. However, the post-rebase state of src/gaia/coder/__init__.py appears to drop the public re-exports added in #819 (CoderAgent, DEFAULT_LOOP, …), which breaks tests/coder/test_skeleton.py::test_package_imports. A handful of smaller issues around private-API coupling, a CLAUDE.md-forbidden silent except, and filesystem-walk hygiene are worth cleaning up in a follow-up.

Since the PR is already merged, flag these for a fast follow-up rather than a revert.

Issues Found

🔴 Critical

Package-level re-exports from #819 got dropped in the rebase (src/gaia/coder/__init__.py)

The diff replaces the __init__.py that #819 landed (which exposed CoderAgent, DEFAULT_LOOP, Loop, State, Transition) with a comment-only module. tests/coder/test_skeleton.py:50 still asserts those imports:

def test_package_imports() -> None:
    """Public re-exports must resolve from the top-level package."""
    from gaia.coder import CoderAgent  # noqa: F401  (import is the assertion)

Verified — after this PR's merge, gaia.coder.__init__ exports nothing. base.py:105 also has from gaia.coder import CoderAgent in its docstring example, so the docs are also wrong now.

Suggested fix — restore the exports while keeping the terser docstring from this PR:

# Copyright(C) 2026 Advanced Micro Devices, Inc. All rights reserved.
# SPDX-License-Identifier: MIT
"""gaia-coder — dev-tooling coding agent (see docs/plans/coder-agent.mdx).

This package is intentionally separate from ``src/gaia/agents/`` — the coder
is infrastructure for building GAIA, not a GAIA product agent.
"""

from gaia.coder.base import CoderAgent
from gaia.coder.loop import DEFAULT_LOOP, Loop, State, Transition

__all__ = [
    "CoderAgent",
    "DEFAULT_LOOP",
    "Loop",
    "State",
    "Transition",
]

🟡 Important

Silent except Exception: pass violates CLAUDE.md "No Silent Fallbacks" (src/gaia/coder/tools/cli.py:208)

finally:
    try:
        stream.close()
    except Exception:  # pylint: disable=broad-except
        pass

CLAUDE.md § No Silent Fallbacks — Fail Loudly explicitly calls this out and warns "The codebase has pre-existing except Exception: pass blocks… They are tech debt, not precedent". A stream-close failure here is real (e.g., OSError on a torn-down pipe) and swallowing it hides reader-thread bugs. Use a specific exception or re-raise with context:

        finally:
            try:
                stream.close()
            except OSError as e:
                logger.debug("reader thread: stream close failed (%s)", e)

Reaching into a private registry instead of the public accessor (src/gaia/coder/tools/search.py:672)

from gaia.agents.base.tools import _TOOL_REGISTRY
search_code = _TOOL_REGISTRY["search_code"]["function"]

The base module already ships a public helper at src/gaia/agents/base/tools.py:116:

def get_tool(tool_name: str) -> ...:
    """This is the public accessor for ``_TOOL_REGISTRY``.  Consumers outside
    [...] should use this function instead of importing ``_TOOL_REGISTRY``
    directly.
    """
    return _TOOL_REGISTRY.get(tool_name)

Use it:

            # Fetch search_code from the registry rather than re-importing so
            # tests that monkey-patch it still work.
            from gaia.agents.base.tools import get_tool

            entry = get_tool("search_code")
            if entry is None:
                raise RuntimeError(
                    "grep requires search_code to be registered; "
                    "call register_file_tools() first"
                )
            search_code = entry["function"]

Denylist is trivially bypassable AND prone to false positives (src/gaia/coder/tools/cli.py:146-164)

_SHELL_DENYLIST: tuple = ("rm -rf /", "sudo", "chmod 777", "curl | bash", "git push origin main")

Literal-substring matching against a joined argv:

sudo matches benign commands like sudoku, pseudo-compiler, --help-sudo.
git push origin main does not match ["git", "push", "-f", "origin", "main"] (different argv order / extra flags) — the most important case to block slips through.
curl | bash cannot match anything because subprocess never sees a pipe when shell=False.

The PR description calls out §15.8 will harden this, and the in-file comment is honest. But the v1 denylist gives the illusion of a guardrail without the substance — consider either (a) matching on argv[0] + flag intersection, or (b) making the denylist empty in v1 and landing §15.8 before any agent wires this in. Discuss with maintainer — happy to defer if §15.8 is near.

Recursive tree walks don't prune heavy directories (src/gaia/coder/tools/file.py:597-607, src/gaia/coder/tools/search.py:692)

search_code / find_symbol / list_files all walk .git/, .venv/, node_modules/, __pycache__/. On this repo alone .git/ has ~35k objects and node_modules/ in src/gaia/apps/webui/ has ~300 MB. A single find_symbol("foo") call will read every one of those files before returning. This is a perf regression and a latency footgun for the LLM loop.

Cheapest fix is a default skip-set in _iter_text_files:

_DEFAULT_SKIP_DIRS = frozenset({
    ".git", ".hg", ".svn", ".venv", "venv", "__pycache__",
    "node_modules", ".mypy_cache", ".pytest_cache", ".ruff_cache",
    "dist", "build", ".next",
})


def _iter_text_files(base: Path, glob_pattern: Optional[str]):
    """Yield files under ``base`` that match ``glob_pattern`` (or everything)."""
    if base.is_file():
        if glob_pattern is None or fnmatch.fnmatch(base.name, glob_pattern):
            yield base
        return
    for root, dirs, files in os.walk(base):
        dirs[:] = [d for d in dirs if d not in _DEFAULT_SKIP_DIRS]
        for name in files:
            if glob_pattern is not None and not fnmatch.fnmatch(name, glob_pattern):
                continue
            yield Path(root) / name

Apply the same _DEFAULT_SKIP_DIRS filter inside find_symbol's rglob loop (search.py:692) — or better, route find_symbol through _iter_text_files with glob="*.py".

🟢 Minor

Dead background processes are never GC'd from the registry (src/gaia/coder/tools/cli.py:341-357)

After a background command exits naturally (not via stop_process), its entry stays in _PROCESS_REGISTRY forever — list_processes keeps returning it with running=False. A long-running agent accumulates zombie-like entries. Cheap fix: prune entries where proc.poll() is not None inside list_processes.

SearchToolsMixin.register_search_tools() silently double-registers file tools (src/gaia/coder/tools/search.py:661)

SearchToolsMixin inherits FileToolsMixin and calls self.register_file_tools() inside register_search_tools(). A class that composes both mixins and calls both registration methods (a natural pattern given the KNOWN_TOOLS convention in src/gaia/agents/registry.py) will re-register every file tool. Base @tool decorator overwrites silently, so the bug is invisible until someone monkey-patches. Either document that callers must pick one, or add an idempotency guard.

Timing-dependent test (tests/coder/test_mixins.py:1177)

time.sleep(0.8)
logs = get_process_logs(pid, tail_lines=10)
assert "line" in logs

0.8s is fine on a dev laptop; on a loaded CI runner it will occasionally miss. Prefer polling with a short timeout:

        deadline = time.monotonic() + 5.0
        logs = ""
        while time.monotonic() < deadline:
            logs = get_process_logs(pid, tail_lines=10)
            if "line" in logs:
                break
            time.sleep(0.05)
        assert "line" in logs

search_code reads full file before decode-fail (src/gaia/coder/tools/file.py:546-548)

file_path.read_text(encoding="utf-8") loads the whole blob into memory, then UnicodeDecodeError fires on the first bad byte. A 500 MB vendored binary still gets fully read. A cheap size gate (skip files >1 MB for regex search) would harden this and pairs naturally with the skip-dir fix above.

PR description says draft — do not merge until the scaffold PR lands and wiring is verified end-to-end

It was merged anyway. Not a code issue — but worth flagging to @kovtcharov-amd that merge gates didn't honor the stated intent, and that "wiring verified end-to-end" is precisely the step that would have caught the Critical issue above.

Strengths

Test coverage is the right shape. Happy / failure / registration triad per tool, with tmp_path isolation and a registry_snapshot fixture that prevents test cross-contamination. test_edit_file_non_unique_raises, test_run_cli_command_denylist_sudo, and test_find_symbol_unsupported_language show care about edge cases.
Pipe-buffer deadlock avoidance is correct. _spawn_reader draining stdout/stderr on dedicated daemon threads is the standard fix for long-running background processes that produce stdio; easy to get wrong and this PR gets it right.
edit_file semantics mirror Claude Code Edit. Exact-match + uniqueness enforcement + opt-in replace_all is exactly what the calling LLM expects, and the ValueError messages name the problem in a way an agent can recover from (per CLAUDE.md actionable-errors rule).

Verdict

Approve with follow-up required — the PR itself is good work, but the rebase dropped the __init__.py exports from #819, which breaks tests/coder/test_skeleton.py. That needs a fix-forward PR within a day, along with the private-registry coupling in search.py and the except Exception: pass in cli.py:208. The rest (denylist hardening, skip-dirs, registry GC) can ride §15.8 or a dedicated polish pass.

Three review-followups from the #818/#819/#820 merge: - src/gaia/coder/__init__.py: restore the CoderAgent/DEFAULT_LOOP/Loop/State/Transition re-exports that #819 added and the #818 rebase accidentally dropped. test_package_imports in test_skeleton.py relies on these. - src/gaia/coder/tools/cli.py: replace the bare except Exception: pass in the stream-reader teardown with a targeted OSError catch + logger.debug. Per CLAUDE.md's fail-loudly rule, silent swallows hide reader-thread bugs; a stream close failing under pipe tear-down is real and worth a debug line. - src/gaia/coder/tools/search.py: stop reaching into the private _TOOL_REGISTRY dict from grep(). Use get_tool_metadata() — the public accessor added in base/tools.py:113 exists precisely for this. Tests: all 73 (5 skeleton + 38 mixins + 30 stores) pass on coder HEAD with the fixes.

#823) ## Summary Three review-followups from the #818 / #819 / #820 merge, flagged by the auto-review bot. All tests (73/73 on `coder`) pass with the fixes. ## What this changes - **Critical** — `src/gaia/coder/__init__.py`: restore the `CoderAgent` / `DEFAULT_LOOP` / `Loop` / `State` / `Transition` re-exports that #819 added and #818's rebase accidentally dropped. Without this, `tests/coder/test_skeleton.py::test_package_imports` fails on `coder` HEAD. - **Important** — `src/gaia/coder/tools/cli.py:135`: replace bare `except Exception: pass` in the stream-reader teardown with a targeted `OSError` catch + `logger.debug`. CLAUDE.md's *No Silent Fallbacks* rule explicitly forbids this pattern. - **Important** — `src/gaia/coder/tools/search.py:59`: stop reaching into the private `_TOOL_REGISTRY` from `grep()`. Use `get_tool_metadata()` — the public accessor at `src/gaia/agents/base/tools.py:113` exists for this. ## Test plan - [x] `pytest tests/coder/ -x` — 73/73 pass (5 skeleton + 38 mixins + 30 stores) - [x] `python -c "from gaia.coder import CoderAgent, DEFAULT_LOOP, Loop, State, Transition"` — all imports resolve

…rential Implements §7.4 steps 4-5 end-to-end: * generate_fix() creates auto/gaia-coder/<feedback_id> off the base ref (default 'coder' — §5.6, never main), then walks the caller-supplied EditHunks via an _edit_file_impl free function that mirrors the exact semantics of FileToolsMixin.edit_file from PR #818 (unique-match, opt-in replace_all, fail-loudly on missing). The branch is idempotent: re-runs check out the existing branch rather than error. * write_regression_test() emits a pytest file plus a marker flag so the generated test genuinely fails on the base branch (marker missing) and passes on the fix branch (marker present) — the §7.4 step 5 contract without needing any cleverness. Refuses to run unless the working tree is on a SELF_FIX_BRANCH_PREFIX branch. * verify_test_differential() runs pytest on both refs and raises RuntimeError if the fail-then-pass contract is violated (pass on both → regression not exercised; fail on both → fix does not fix). sys.executable is used for subprocess pytest invocations so hosts without a `python` symlink on PATH still work.

## Summary Phase 4 of the `gaia-coder` plan: the seven-pass self-review gate that runs before she ever calls `gh_pr_create`. Implements the full §8 table — deterministic checks (lint, tests, security, prose) plus LLM-driven passes (architectural, persona, adversarial, feedback-binding) — and surfaces an aggregated verdict with the confidence score the §7.6 auto-merge path reads. This is the "deep-review discipline" the trust contract is built on. Every PR she opens must clear this gate; self-fix PRs additionally require Pass 7 (feedback-binding), which confirms the diff actually addresses the EM's wording and that the regression test fails on `coder` and passes on the branch. ## What this PR adds - **Seven review passes** (`pass_1_static` through `pass_7_feedback_binding`) — each a focused module that returns a common `PassResult` envelope, so the gate never has to special-case any of them. - **`gate.run_all_passes`** — orchestrator with cost-aware short-circuiting: a hard-fail on the deterministic cheap gates (1, 2, 4) skips the expensive Opus calls, because there is no point paying for adversarial review on a branch that fails lint. - **`ReviewToolsMixin`** — exposes every pass as a `@tool` (`review_diff_self_static`, …) plus the one-shot `review_diff_gate`, so the agent, the EM's TUI, and the evaluation harness all share one code path. - **Canonical prompt files** for Passes 3, 5, 6, 7 under `src/gaia/coder/prompts/`, materialised verbatim from §15.8 — self-fix PRs that touch a prompt are now a single-file `git diff`, the whole point of the §6.4 whitelist. - **22 unit tests** that cover every pass (one pass + one fail per module), the gate's short-circuit behaviour, the self-fix gating of Pass 7, the confidence-score contract, and a prompt-files-exist guard so the canonical templates can't silently drift. ## Why the LLM seam matters Every Opus call flows through `gaia.coder.review._llm.call_opus`. Tests patch that one name — cheap, stable, vendor-agnostic — rather than reaching into the Anthropic SDK. When we ship the Claude Agent SDK integration for Passes 3 and 6 (the `architecture-reviewer` / `code-reviewer` subagents in §15.8), the switch happens at a single module boundary without churning every test. ## Scope constraint Changes are confined to `src/gaia/coder/review/`, `src/gaia/coder/prompts/`, and `tests/coder/test_review.py`. No other files are touched. Pass 7's differential-pytest worktree machinery is wired but opt-out via `skip_differential_pytest=True` for unit tests, because Phase 4 intentionally does not ship a running daemon — that's Phase 5. ## Dependencies - Stacks on top of #818 (mixins; uses `gaia.coder.tools.cli._check_denylist` for subprocess safety) and #819 (scaffold). Both appear in this PR until they merge to `coder` — the diff will clean itself up afterwards. - Does not depend on #820 (stores). ## Citations - [`docs/plans/coder-agent.mdx`](../blob/feature/gaia-coder-review/docs/plans/coder-agent.mdx) §8 — the seven-pass table - §15.8 — canonical prompt templates for Passes 3/5/6/7 - §7.6 — confidence-score gate on auto-merge - §7.4 — self-correction loop that Pass 7 bookends ## Test plan - [x] `pytest tests/coder/test_review.py -xvs` — 22/22 pass - [x] `pytest tests/coder/` — 65/65 pass (sibling tests untouched) - [x] `python util/lint.py --black --isort` — clean on in-scope files - [x] Smoke: `from gaia.coder.review import ReviewToolsMixin; m = ReviewToolsMixin(); m.register_review_tools()` registers exactly 8 tools - [ ] Integration run against a real branch with Anthropic creds present (Phase 5 territory — deferred to the daemon wiring task) ## Do not merge Draft: this stacks on two unmerged sibling PRs and the Phase 4 runtime wiring lives in Phase 5.

## Summary Lands Phase 5 of `docs/plans/coder-agent.mdx` — the EM-facing surface of `gaia-coder`. Before this PR the CLI verbs were stubs; after it, the EM can bootstrap the agent, read her trust contract, promote/demote her, and queue messages. Every LLM call now injects her identity triplet (`GAIA.md` + `ARCHITECTURE.md` + `PROJECT_MAP.md`) as a cacheable prefix. ## Threads - **`trust.py`** — `EMConfig` / `RepoBinding` Pydantic models, the 0-5 `CapabilityTier` ladder, TOML round-trip, and `promote` / `demote` functions with audit-log writes. Promotion refuses mismatched EM signatures; demotion is immediate. *Why it matters:* §4.2 makes promotion explicit, so a quiet accept would let any caller escalate tier. Fail-loudly TrustError surfaces exactly what to fix. - **`inbox.py`** — thin CRUD over `em_inbox.db` with the §4.5 5-second non-LLM auto-ack, channel-agnostic dispatch callable, and escalation into `feedback.db` with severity translation. *Why it matters:* §4.5 says the ack is non-negotiable in latency; keeping it template- only (no model call) makes the SLA automatic. - **`intent.py`** — LLM-driven conversational intent classifier for §15.4 + §15.8 P9, temperature 0 Opus 4.7, mockable via an injected `llm` callable. Low confidence (< 70) and unknown intents coerce to `free_form`. Handler functions cover every §15.4 intent. *Why it matters:* regex matchers would miss paraphrases ("let me give you self-edit for now"); LLM routing keeps the grammar maintainable. - **`prompt_composer.py`** — builds Anthropic-format message blocks with `cache_control={"type":"ephemeral"}` on the identity triplet + per-skill blocks, per §3.2 / §4.6 / §6.5. *Why it matters:* §3.1 mandates prompt caching; this is the single place that decides what gets cached. - **`cli.py`** — replaces seven stub handlers (`trust`, `promote`, `demote`, `ask`, `note`, `critical`, `inbox`) with real ones. Config dir honours `$GAIA_CODER_HOME` so tests never touch real user state. Stubs remain for the Phase 6+ verbs (`daemon`, `status`, `feedback`, `doctor`, etc.). - **`prompts/intent_classifier.md` + `prompts/standup.md`** — §15.8 P9 and P10 prompt templates landed verbatim for future `prompt`-class self-fix PRs. - **Tests (58 new)** — each module has a dedicated test file; CLI tests run as subprocess to exercise the full argparse + env-var path. ## Dependencies This PR depends on sibling branches that have not yet merged to `coder`: - **#819 scaffold** — imports `gaia.coder.{__init__,base,loop}` and the `GAIA.md` / `ARCHITECTURE.md` / `PROJECT_MAP.md` placeholders. - **#820 stores** — hard dep on `gaia.coder.stores.{em_inbox, feedback, audit}`. - **#818 mixins** — optional import of `gaia.coder.tools.cli` for subprocess helpers (not used in this PR's code paths). Rebase onto `coder` once those land. ## Test plan - [ ] `pytest tests/coder/test_{trust,intent,inbox,prompt_composer,cli_trust}.py -xvs` — all 58 tests pass - [ ] `gaia-coder trust` on a fresh `$GAIA_CODER_HOME` prints the §4.1 bootstrap question and exits 0 - [ ] `gaia-coder trust --bootstrap --em-handle <you> --em-channel <ch>` then `gaia-coder trust` renders the §4.2 template verbatim - [ ] `gaia-coder promote --to-tier 2 --reason "..." --em-signature <you>` updates `em.toml` and writes an audit row - [ ] `gaia-coder promote ... --em-signature wrong-user` exits 1 with the mismatch message on stderr - [ ] `gaia-coder ask "enable self-edit"` prints the auto-ack template and enqueues a pending row in `em_inbox.db` - [ ] `gaia-coder inbox` lists the pending row

…rential Implements §7.4 steps 4-5 end-to-end: * generate_fix() creates auto/gaia-coder/<feedback_id> off the base ref (default 'coder' — §5.6, never main), then walks the caller-supplied EditHunks via an _edit_file_impl free function that mirrors the exact semantics of FileToolsMixin.edit_file from PR #818 (unique-match, opt-in replace_all, fail-loudly on missing). The branch is idempotent: re-runs check out the existing branch rather than error. * write_regression_test() emits a pytest file plus a marker flag so the generated test genuinely fails on the base branch (marker missing) and passes on the fix branch (marker present) — the §7.4 step 5 contract without needing any cleverness. Refuses to run unless the working tree is on a SELF_FIX_BRANCH_PREFIX branch. * verify_test_differential() runs pytest on both refs and raises RuntimeError if the fail-then-pass contract is violated (pass on both → regression not exercised; fail on both → fix does not fix). sys.executable is used for subprocess pytest invocations so hosts without a `python` symlink on PATH still work.

…825) ## Summary Phase 6 implements **gaia-coder's self-correction loop** — the core value proposition. When the EM gives feedback, the agent now triages it into one of eight fix classes, localises the cause, drafts a regression-tested plan, applies a fix on an `auto/gaia-coder/<feedback_id>` branch, and opens a draft PR targeting `coder`. Wires §7.2 continuous critique, §7.3 feedback intake, §7.4 loop steps 1-10, and §7.9 verification. Loosely coupled to Phase 4 (review gate) and Phase 5 (trust inbox) so they can land in any order. ## Threads - **Prompt templates (§15.8 P1/P2/P3).** `prompts/triage.md`, `prompts/critique.md`, and `prompts/plan_review.md` — the three canonical text bodies the loop consumes. Stored as plain Markdown so she can edit them via prompt-class self-fix PRs. - **Triage (§7.4 step 1-2).** `classify_fix_class` runs P1 on Opus 4.7; `< 60` confidence is rewritten to `out-of-scope` so the loop never commits to a guess. `localise` is deterministic grep — no LLM. - **Planner (§5.1 Stage 3 / §7.4 step 3).** `draft_plan` + `is_large_job` + `request_em_approval`. Large jobs (> 200 LoC, or `architectural` / `state-machine`, or cross-mixin) post the P3 message to the EM inbox and wait for ✅. - **Fixer (§7.4 steps 4-5).** `generate_fix` creates the self-fix branch and applies edits; `write_regression_test` emits a pytest file + marker flag so it *genuinely* fails on `coder` and passes on the fix branch (no clever mocks); `verify_test_differential` raises on the pass-on-both or fail-on-both failure modes. - **Publisher (§7.4 steps 7-8).** `open_self_fix_pr` refuses to open without a regression test (§7.4 step 5 hard rule). PR body cites `feedback_id` and quotes the EM's wording verbatim — Pass 7 (feedback-binding) depends on both. - **Verifier (§7.4 step 10).** `verify_on_merge` re-runs the regression test on the merged SHA, transitions the feedback row to `verified`, and writes `failure_patterns` + `review_patterns` memory records so the same symptom is recognised next time. - **Continuous critique (§7.2).** Cheap one-shot Opus call after every state-changing tool; findings `< 60` confidence are dropped; `≥ 80` surface inline for pre-transition action. - **Loop driver.** `FeedbackLoopDriver.process_pending_feedback()` orchestrates the whole thing with full `pending → triaged → in-fix → fix-pr-open → verified | rejected → closed` transitions written to `feedback.notes_json`. - **SelfFixToolsMixin.** Registers **10 tools** (well above the §15.2 ≥ 7 contract) so the loop is callable from the agent's tool registry. Phase-7 tools (`classify_failure`, `pause_current_task`, `restart_self`, `edit_self_file`) are intentionally out of scope. - **CLI.** `gaia-coder feedback "<body>" --severity high --on <url>` enqueues, `gaia-coder self-fix process` runs one iteration. - **Tests.** 47 new tests covering every §7.4 step, every §7.2 filtering rule, and the §15.2 mixin contract. Mocks Anthropic and `gh` at the callable boundary; uses a tmp git repo with a `coder` branch for real branch creation and pytest differential runs. ## Out of scope (explicit non-goals per the Phase 6 task) - EventBridge ingestion of `@gaia-coder feedback:` comments (Phase 10 repo binding). - Auto-merge (§7.6) — blocked on Phase 4 review-gate confidence score. - Dev-mode self-edit (§7.5) and `restart_self` — Phase 7. - ReAct loop self-edit (§7.8) — Phase 7. ## Dependencies - **#818 (mixins):** Imports `edit_file` semantics inline rather than instantiating `FileToolsMixin`, to keep the module usable without the mixin wired. - **#819 (scaffold):** Package skeleton + GAIA.md + prompts/ dir. - **#820 (stores):** Uses `gaia.coder.stores.feedback` and `gaia.coder.stores.memory` directly. - **Phase 4 review gate (sibling):** Loose-coupled — `review_gate_runner` is optional; absent, we publish a "(review gate not available)" placeholder in the PR body. - **Phase 5 trust (sibling):** Loose-coupled — `request_em_approval` uses `gaia.coder.trust.inbox.enqueue` if importable, else defers. ## Test plan - [ ] `pytest tests/coder/test_self_fix/ -xvs` — 47 tests pass. - [ ] `pytest tests/coder/ -q` — full coder suite (120 tests) pass. - [ ] `python util/lint.py --all` — no new critical errors (pre-existing pylint warnings in `cli.py`, `discovery.py`, etc. are not touched). - [ ] `python -c "from gaia.coder.self_fix import SelfFixToolsMixin; m = SelfFixToolsMixin(); assert len(m.register_self_fix_tools()) >= 7"` — smoke. - [ ] `gaia-coder feedback "..." --severity high --on https://github.com/amd/gaia/pull/9999 --id fb-test --db-path /tmp/fb.db` — writes a row; follow with `gaia-coder self-fix process --db-path /tmp/fb.db --repo-root <worktree> --skip-differential-verify --skip-fix-apply` for the end-to-end path (PR creation mocked). ## Merge plan - Draft PR, **do not merge**. Rebase onto `coder` after Phase 4 and Phase 5 land so the review gate and inbox wiring become real imports.

github-actions Bot added the tests Test changes label Apr 20, 2026

kovtcharov changed the base branch from feature/coding-agent to coder April 20, 2026 08:17

kovtcharov changed the base branch from coder to main April 20, 2026 08:25

kovtcharov changed the base branch from main to coder April 20, 2026 08:25

kovtcharov marked this pull request as ready for review April 20, 2026 09:20

kovtcharov requested a review from kovtcharov-amd as a code owner April 20, 2026 09:20

kovtcharov added 3 commits April 20, 2026 02:21

kovtcharov force-pushed the feature/gaia-coder-mixins-1 branch from 4a03aef to 67eb7a5 Compare April 20, 2026 09:21

kovtcharov merged commit bf18bbe into coder Apr 20, 2026
2 checks passed

kovtcharov deleted the feature/gaia-coder-mixins-1 branch April 20, 2026 09:21

This was referenced Apr 20, 2026

feat(coder): multi-pass self-review gate (Passes 1-7) #821

Merged

feat(coder): trust contract + EM CLI + prompt composer #822

Merged

kovtcharov mentioned this pull request Apr 20, 2026

fix(coder): restore package exports + CLAUDE.md fail-loudly compliance #823

Merged

2 tasks

kovtcharov mentioned this pull request Apr 20, 2026

feat(coder): self-correction loop (§7.3-§7.9 + continuous critique) #825

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(coder): File/CLI/Search mixins per §15.2#818

feat(coder): File/CLI/Search mixins per §15.2#818
kovtcharov merged 3 commits into
coderfrom
feature/gaia-coder-mixins-1

kovtcharov commented Apr 20, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kovtcharov commented Apr 20, 2026

Summary

What's in the box

Test plan

Not in this PR

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Code Review — #818 feat(coder): File/CLI/Search mixins per §15.2

Summary

Issues Found

🔴 Critical

🟡 Important

🟢 Minor

Strengths

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Code Review — #818 `feat(coder): File/CLI/Search mixins per §15.2`