feat(coder): SQLite stores per §15.1 DDL#820
Conversation
Shared infrastructure for the per-store modules landing in subsequent commits: - src/gaia/coder/stores/_common.py exposes open_connection() with the canonical PRAGMAs (WAL, foreign_keys=ON, busy_timeout=5000) plus low-level insert / fetch / update helpers every per-store module wraps with typed signatures. The primitives are deliberately untyped at row-level — each store attaches its own Pydantic model. - tests/coder/conftest.py pre-populates sys.modules with a stub for gaia.coder.base so the stores tests stay isolated from in-flight sibling scaffold work on feature/gaia-coder-scaffold.
Phase 1 §15.1 verbatim DDL for em_inbox.db — the EM input queue. One row per inbound message from the bound EM (cli | tui | gh-comment | email | daily-standup-reply). CHECK constraints on channel, severity, and state. Indices on state and received_at. Exposes DDL, create_tables(), open_store() with canonical PRAGMAs, and the EmInboxRow pydantic model plus insert_row / get_row / update_row / list_rows helpers. tests cover table creation, round-trip, and CHECK rejection on invalid channel, severity, and state values.
§15.1 verbatim DDL for tasks.db — the agent's task queue. One row per engineering task through its full lifecycle: pending → running → waiting → blocked → paused → done → abandoned. Seven-valued CHECK on state; indices on state and on (priority DESC, created_at) so the worker can claim the highest-priority pending task efficiently. TaskRow pydantic model, full CRUD, and tests for creation, round-trip (including priority-DESC ordering), and CHECK rejection of an invalid state.
§15.1 verbatim DDL for feedback.db — the graduated-from-em_inbox feedback queue. Two CHECK enums: severity (low | med | high | critical) and fix_class covering the seven agent-fix classes plus out-of-scope (eight total: prompt, doc, test, tool, policy, architectural, state-machine, out-of-scope). Seven-valued state lifecycle: pending → triaged → in-fix → fix-pr-open → verified → rejected → closed. FeedbackRow pydantic model + full CRUD. Tests exercise all eight fix_class values round-trip and confirm CHECK rejection for invalid severity and invalid fix_class.
§15.1 verbatim DDL for spend.db — per-call cost ledger. Captures the four Anthropic token dimensions separately (input, cache_read, cache_create, output) plus the resolved USD amount, the call_site tag (triage | plan | pass_6_adversarial | continuous_critique | …), and the optional task_id FK. Indices on occurred_at and task_id power the weekly spend summary (§6.6). SpendRow pydantic model + full CRUD. Tests cover creation, a cache-heavy round-trip, and NOT-NULL enforcement for the required usd and input_tokens columns.
§15.1 verbatim DDL for audit.log.db — the append-only tool-call audit. One row per tool invocation: tool name, JSON args, JSON result, duration, exception-class-name on failure, and the loop_version / stage / state the call ran under. Integer AUTOINCREMENT primary key; three indices on occurred_at, task_id, and tool_name support §7.7 introspection and §8 review passes. AuditRow pydantic model. insert_row() ignores any caller-supplied id and returns the DB-assigned lastrowid so the audit log is append-only by construction. Tests exercise creation, full round-trip, NOT-NULL enforcement, and strictly-increasing auto-ids.
§15.1 verbatim DDL for ci_history.db — workflow-duration cache for §6.2 GitHub Actions monitoring. Compound primary key (workflow_name, branch, run_id) lets the agent estimate expected runtime for a workflow on a given branch without re-querying the API. CiHistoryRow pydantic model. get_row / update_row take the three PK components explicitly rather than a single id so callers can't confuse which column is which. Tests cover creation, compound-key round-trip, and duplicate-PK rejection via IntegrityError.
§15.1 verbatim DDL for memory.db — the relational side of the hybrid SQLite + FAISS memory store (§6.8). Eight-valued topic CHECK (review_patterns | failure_patterns | flaky_tests | em_preferences | adr_decisions | tool_usage_heuristics | task_outcomes | mutation_seeds) plus confidence BETWEEN 0 AND 100. The embedding_key column stores a FAISS id that Phase 10 will populate — FAISS integration is explicitly out of scope here. MemoryRow pydantic model + full CRUD. Tests cover creation, full round-trip with recall_count increment, and CHECK rejection for both an unknown topic value and an out-of-range confidence.
§7.5 / §15.1 JSON-per-file snapshots under paused-tasks/. Not SQLite: a paused task is read whole, written whole, never queried, and needs to round-trip arbitrary tool-call histories without schema migrations. write_snapshot() uses a .tmp-then-rename for atomic replacement. read_snapshot() raises FileNotFoundError rather than returning None (per CLAUDE.md "no silent fallbacks"). task_ids are validated against path-traversal characters. list_snapshots() returns every task_id present under the root. Tests cover round-trip, path-traversal rejection, and the loud-error contract for missing snapshots.
§6.4 / §15.1 append-only JSONL log of every self-edit PR — one JSON
object per line. Schema per §15.1 verbatim: ts, pr, fix_class,
files[], before_sha, after_sha, review_passes{static, functional,
arch, security, prose, adversarial, feedback_binding}, confidence,
em_review, auto_merged, feedback_id.
SelfEditRecord + nested ReviewPasses pydantic models validate both
writes and reads. append() writes one line; iter_records() / read_all()
replay them. append_dict() validates raw dicts against the schema so
callers can feed telemetry payloads directly.
Tests round-trip two records (confirming append semantics) and
confirm schema rejection for a malformed payload.
§4.6 / §15.1 append-only plain-text log of candidate GAIA.md promotions. Canonical single-line format: <ISO-8601 UTC> [task=<task_id>] Observed: "<observation>" → <candidate> Promotion candidate is constrained to 'GAIA.md', 'dismissed', or 'skill:<name>'. Observations must be single-line so grep/awk stay useful. LearningEntry pydantic model with two field validators enforces both rules at construction time. format_entry() and parse_line() keep the on-disk text reversible with the model. append() writes one line; iter_entries() / read_all() replay them, failing loudly on malformed lines. Tests round-trip two entries (GAIA.md + skill:<name> candidates), confirm the on-disk format is human-readable, and reject both an unknown promotion candidate and a multi-line observation.
Stores package entry point. Re-exports all ten sibling store modules
for `from gaia.coder.stores import em_inbox, tasks, ...` ergonomics
and adds the convenience helper `create_all_stores(root)`.
create_all_stores() materialises every store under a single directory
(the canonical layout is ~/.gaia/coder/) and returns a
{store_name: Path} map. For SQLite stores it opens the connection
with the canonical PRAGMAs, applies the DDL, and closes. For the
three non-SQL stores it creates the target directory
(paused-tasks/) or ensures the parent of the log file exists. The
function is idempotent so callers can run it on every boot.
Smoke test: test_create_all_stores() asserts every expected store
name is present, every SQLite file exists with its table
materialised, paused-tasks/ is a directory, and the helper returns
identical paths on a second invocation.
e5ac03b to
c468427
Compare
SummaryImplements the ten persistent stores called out in §15.1 of Issues Found🟡 SQL injection surface:
|
Three review-followups from the #818/#819/#820 merge: - src/gaia/coder/__init__.py: restore the CoderAgent/DEFAULT_LOOP/Loop/State/Transition re-exports that #819 added and the #818 rebase accidentally dropped. test_package_imports in test_skeleton.py relies on these. - src/gaia/coder/tools/cli.py: replace the bare except Exception: pass in the stream-reader teardown with a targeted OSError catch + logger.debug. Per CLAUDE.md's fail-loudly rule, silent swallows hide reader-thread bugs; a stream close failing under pipe tear-down is real and worth a debug line. - src/gaia/coder/tools/search.py: stop reaching into the private _TOOL_REGISTRY dict from grep(). Use get_tool_metadata() — the public accessor added in base/tools.py:113 exists precisely for this. Tests: all 73 (5 skeleton + 38 mixins + 30 stores) pass on coder HEAD with the fixes.
#823) ## Summary Three review-followups from the #818 / #819 / #820 merge, flagged by the auto-review bot. All tests (73/73 on `coder`) pass with the fixes. ## What this changes - **Critical** — `src/gaia/coder/__init__.py`: restore the `CoderAgent` / `DEFAULT_LOOP` / `Loop` / `State` / `Transition` re-exports that #819 added and #818's rebase accidentally dropped. Without this, `tests/coder/test_skeleton.py::test_package_imports` fails on `coder` HEAD. - **Important** — `src/gaia/coder/tools/cli.py:135`: replace bare `except Exception: pass` in the stream-reader teardown with a targeted `OSError` catch + `logger.debug`. CLAUDE.md's *No Silent Fallbacks* rule explicitly forbids this pattern. - **Important** — `src/gaia/coder/tools/search.py:59`: stop reaching into the private `_TOOL_REGISTRY` from `grep()`. Use `get_tool_metadata()` — the public accessor at `src/gaia/agents/base/tools.py:113` exists for this. ## Test plan - [x] `pytest tests/coder/ -x` — 73/73 pass (5 skeleton + 38 mixins + 30 stores) - [x] `python -c "from gaia.coder import CoderAgent, DEFAULT_LOOP, Loop, State, Transition"` — all imports resolve
Addresses the auto-review findings on #819 and #820: **Land the spec.** docs/plans/coder-agent.mdx was being referenced from 19 places in the scaffold (every module docstring, every living doc, every CLI body) but the file itself was untracked — every citation pointed at vapour. Commit it now so Phase 2+ branches inherit real citations. Also lands docs/superpowers/specs/2026-04-19-gaia-code-agent-analysis.md as the retained historical context per the plan's §14. **SQL identifier hardening** (src/gaia/coder/stores/_common.py). The CRUD primitives string-interpolated table names, column names, and ORDER BY clauses. SQLite has no built-in parameterisation for identifiers, so whitelist matching is the only safe path. Adds _safe_ident() + _SAFE_IDENT_RE and applies it at every interpolation site, plus multi-clause ORDER BY validation (split by comma, each clause validated for optional ASC/DESC direction). Auto-review flagged this as a latent injection surface; treating it as a latent vuln, not hypothetical. **Narrow the conftest catch** (tests/coder/conftest.py). The bare `except Exception` during stub fallback was masking import-time bugs in stores (AttributeError, SyntaxError). Now catches only ImportError/ModuleNotFoundError and logs the underlying reason so real failures surface in CI rather than silently stubbing over. Scaffold is now landed so step-1 (real import) always succeeds — this is a safety net, not a hot path. **Register gaia-coder in CLAUDE.md + docs.json**. The new console_scripts entry was missing from the Console Script Entry Points table (§Project Structure) and from the "Standalone binaries" section. Also adds plans/coder-agent to docs.json under the Agents group so Mintlify renders it. All 73 coder tests pass (5 skeleton + 38 mixins + 30 stores).
## Summary Addresses the auto-review findings on #819 + #820 and lands the plan that every scaffold citation was pointing at. ## Changes - **Land `docs/plans/coder-agent.mdx`** (3,584 lines). 19 scaffold citations were pointing at a file that didn't exist in tracked state. Also lands `docs/superpowers/specs/2026-04-19-gaia-code-agent-analysis.md` as the retained historical context. - **SQL identifier hardening** (`src/gaia/coder/stores/_common.py`). `_safe_ident()` + `_SAFE_IDENT_RE` validation at every interpolation site (table, column, ORDER BY). Multi-clause ORDER BY supported. - **Narrow conftest catch** (`tests/coder/conftest.py`). Only catches `ImportError` / `ModuleNotFoundError`; logs the reason. Real import-time bugs in stores now surface. - **Register `gaia-coder`** in `CLAUDE.md` Console Script Entry Points + Standalone binaries, and add `plans/coder-agent` to `docs/docs.json` under Agents. ## Test plan - [x] `pytest tests/coder/` — 73/73 pass (5 skeleton + 38 mixins + 30 stores) - [x] `docs/plans/coder-agent.mdx` resolves for every scaffold docstring citation - [x] SQL injection guard: `_safe_ident("x; DROP TABLE")` raises `ValueError`
## Summary Phase 4 of the `gaia-coder` plan: the seven-pass self-review gate that runs before she ever calls `gh_pr_create`. Implements the full §8 table — deterministic checks (lint, tests, security, prose) plus LLM-driven passes (architectural, persona, adversarial, feedback-binding) — and surfaces an aggregated verdict with the confidence score the §7.6 auto-merge path reads. This is the "deep-review discipline" the trust contract is built on. Every PR she opens must clear this gate; self-fix PRs additionally require Pass 7 (feedback-binding), which confirms the diff actually addresses the EM's wording and that the regression test fails on `coder` and passes on the branch. ## What this PR adds - **Seven review passes** (`pass_1_static` through `pass_7_feedback_binding`) — each a focused module that returns a common `PassResult` envelope, so the gate never has to special-case any of them. - **`gate.run_all_passes`** — orchestrator with cost-aware short-circuiting: a hard-fail on the deterministic cheap gates (1, 2, 4) skips the expensive Opus calls, because there is no point paying for adversarial review on a branch that fails lint. - **`ReviewToolsMixin`** — exposes every pass as a `@tool` (`review_diff_self_static`, …) plus the one-shot `review_diff_gate`, so the agent, the EM's TUI, and the evaluation harness all share one code path. - **Canonical prompt files** for Passes 3, 5, 6, 7 under `src/gaia/coder/prompts/`, materialised verbatim from §15.8 — self-fix PRs that touch a prompt are now a single-file `git diff`, the whole point of the §6.4 whitelist. - **22 unit tests** that cover every pass (one pass + one fail per module), the gate's short-circuit behaviour, the self-fix gating of Pass 7, the confidence-score contract, and a prompt-files-exist guard so the canonical templates can't silently drift. ## Why the LLM seam matters Every Opus call flows through `gaia.coder.review._llm.call_opus`. Tests patch that one name — cheap, stable, vendor-agnostic — rather than reaching into the Anthropic SDK. When we ship the Claude Agent SDK integration for Passes 3 and 6 (the `architecture-reviewer` / `code-reviewer` subagents in §15.8), the switch happens at a single module boundary without churning every test. ## Scope constraint Changes are confined to `src/gaia/coder/review/`, `src/gaia/coder/prompts/`, and `tests/coder/test_review.py`. No other files are touched. Pass 7's differential-pytest worktree machinery is wired but opt-out via `skip_differential_pytest=True` for unit tests, because Phase 4 intentionally does not ship a running daemon — that's Phase 5. ## Dependencies - Stacks on top of #818 (mixins; uses `gaia.coder.tools.cli._check_denylist` for subprocess safety) and #819 (scaffold). Both appear in this PR until they merge to `coder` — the diff will clean itself up afterwards. - Does not depend on #820 (stores). ## Citations - [`docs/plans/coder-agent.mdx`](../blob/feature/gaia-coder-review/docs/plans/coder-agent.mdx) §8 — the seven-pass table - §15.8 — canonical prompt templates for Passes 3/5/6/7 - §7.6 — confidence-score gate on auto-merge - §7.4 — self-correction loop that Pass 7 bookends ## Test plan - [x] `pytest tests/coder/test_review.py -xvs` — 22/22 pass - [x] `pytest tests/coder/` — 65/65 pass (sibling tests untouched) - [x] `python util/lint.py --black --isort` — clean on in-scope files - [x] Smoke: `from gaia.coder.review import ReviewToolsMixin; m = ReviewToolsMixin(); m.register_review_tools()` registers exactly 8 tools - [ ] Integration run against a real branch with Anthropic creds present (Phase 5 territory — deferred to the daemon wiring task) ## Do not merge Draft: this stacks on two unmerged sibling PRs and the Phase 4 runtime wiring lives in Phase 5.
## Summary
Lands Phase 5 of `docs/plans/coder-agent.mdx` — the EM-facing surface of
`gaia-coder`. Before this PR the CLI verbs were stubs; after it, the EM
can bootstrap the agent, read her trust contract, promote/demote her,
and queue messages. Every LLM call now injects her identity triplet
(`GAIA.md` + `ARCHITECTURE.md` + `PROJECT_MAP.md`) as a cacheable
prefix.
## Threads
- **`trust.py`** — `EMConfig` / `RepoBinding` Pydantic models, the 0-5
`CapabilityTier` ladder, TOML round-trip, and `promote` / `demote`
functions with audit-log writes. Promotion refuses mismatched EM
signatures; demotion is immediate. *Why it matters:* §4.2 makes
promotion explicit, so a quiet accept would let any caller escalate
tier. Fail-loudly TrustError surfaces exactly what to fix.
- **`inbox.py`** — thin CRUD over `em_inbox.db` with the §4.5 5-second
non-LLM auto-ack, channel-agnostic dispatch callable, and escalation
into `feedback.db` with severity translation. *Why it matters:*
§4.5 says the ack is non-negotiable in latency; keeping it template-
only (no model call) makes the SLA automatic.
- **`intent.py`** — LLM-driven conversational intent classifier for
§15.4 + §15.8 P9, temperature 0 Opus 4.7, mockable via an injected
`llm` callable. Low confidence (< 70) and unknown intents coerce to
`free_form`. Handler functions cover every §15.4 intent. *Why it
matters:* regex matchers would miss paraphrases ("let me give you
self-edit for now"); LLM routing keeps the grammar maintainable.
- **`prompt_composer.py`** — builds Anthropic-format message blocks
with `cache_control={"type":"ephemeral"}` on the identity triplet
+ per-skill blocks, per §3.2 / §4.6 / §6.5. *Why it matters:* §3.1
mandates prompt caching; this is the single place that decides what
gets cached.
- **`cli.py`** — replaces seven stub handlers (`trust`, `promote`,
`demote`, `ask`, `note`, `critical`, `inbox`) with real ones. Config
dir honours `$GAIA_CODER_HOME` so tests never touch real user state.
Stubs remain for the Phase 6+ verbs (`daemon`, `status`, `feedback`,
`doctor`, etc.).
- **`prompts/intent_classifier.md` + `prompts/standup.md`** — §15.8 P9
and P10 prompt templates landed verbatim for future `prompt`-class
self-fix PRs.
- **Tests (58 new)** — each module has a dedicated test file; CLI tests
run as subprocess to exercise the full argparse + env-var path.
## Dependencies
This PR depends on sibling branches that have not yet merged to `coder`:
- **#819 scaffold** — imports `gaia.coder.{__init__,base,loop}` and the
`GAIA.md` / `ARCHITECTURE.md` / `PROJECT_MAP.md` placeholders.
- **#820 stores** — hard dep on `gaia.coder.stores.{em_inbox, feedback,
audit}`.
- **#818 mixins** — optional import of `gaia.coder.tools.cli` for
subprocess helpers (not used in this PR's code paths).
Rebase onto `coder` once those land.
## Test plan
- [ ] `pytest
tests/coder/test_{trust,intent,inbox,prompt_composer,cli_trust}.py -xvs`
— all 58 tests pass
- [ ] `gaia-coder trust` on a fresh `$GAIA_CODER_HOME` prints the §4.1
bootstrap question and exits 0
- [ ] `gaia-coder trust --bootstrap --em-handle <you> --em-channel <ch>`
then `gaia-coder trust` renders the §4.2 template verbatim
- [ ] `gaia-coder promote --to-tier 2 --reason "..." --em-signature
<you>` updates `em.toml` and writes an audit row
- [ ] `gaia-coder promote ... --em-signature wrong-user` exits 1 with
the mismatch message on stderr
- [ ] `gaia-coder ask "enable self-edit"` prints the auto-ack template
and enqueues a pending row in `em_inbox.db`
- [ ] `gaia-coder inbox` lists the pending row
…825) ## Summary Phase 6 implements **gaia-coder's self-correction loop** — the core value proposition. When the EM gives feedback, the agent now triages it into one of eight fix classes, localises the cause, drafts a regression-tested plan, applies a fix on an `auto/gaia-coder/<feedback_id>` branch, and opens a draft PR targeting `coder`. Wires §7.2 continuous critique, §7.3 feedback intake, §7.4 loop steps 1-10, and §7.9 verification. Loosely coupled to Phase 4 (review gate) and Phase 5 (trust inbox) so they can land in any order. ## Threads - **Prompt templates (§15.8 P1/P2/P3).** `prompts/triage.md`, `prompts/critique.md`, and `prompts/plan_review.md` — the three canonical text bodies the loop consumes. Stored as plain Markdown so she can edit them via prompt-class self-fix PRs. - **Triage (§7.4 step 1-2).** `classify_fix_class` runs P1 on Opus 4.7; `< 60` confidence is rewritten to `out-of-scope` so the loop never commits to a guess. `localise` is deterministic grep — no LLM. - **Planner (§5.1 Stage 3 / §7.4 step 3).** `draft_plan` + `is_large_job` + `request_em_approval`. Large jobs (> 200 LoC, or `architectural` / `state-machine`, or cross-mixin) post the P3 message to the EM inbox and wait for ✅. - **Fixer (§7.4 steps 4-5).** `generate_fix` creates the self-fix branch and applies edits; `write_regression_test` emits a pytest file + marker flag so it *genuinely* fails on `coder` and passes on the fix branch (no clever mocks); `verify_test_differential` raises on the pass-on-both or fail-on-both failure modes. - **Publisher (§7.4 steps 7-8).** `open_self_fix_pr` refuses to open without a regression test (§7.4 step 5 hard rule). PR body cites `feedback_id` and quotes the EM's wording verbatim — Pass 7 (feedback-binding) depends on both. - **Verifier (§7.4 step 10).** `verify_on_merge` re-runs the regression test on the merged SHA, transitions the feedback row to `verified`, and writes `failure_patterns` + `review_patterns` memory records so the same symptom is recognised next time. - **Continuous critique (§7.2).** Cheap one-shot Opus call after every state-changing tool; findings `< 60` confidence are dropped; `≥ 80` surface inline for pre-transition action. - **Loop driver.** `FeedbackLoopDriver.process_pending_feedback()` orchestrates the whole thing with full `pending → triaged → in-fix → fix-pr-open → verified | rejected → closed` transitions written to `feedback.notes_json`. - **SelfFixToolsMixin.** Registers **10 tools** (well above the §15.2 ≥ 7 contract) so the loop is callable from the agent's tool registry. Phase-7 tools (`classify_failure`, `pause_current_task`, `restart_self`, `edit_self_file`) are intentionally out of scope. - **CLI.** `gaia-coder feedback "<body>" --severity high --on <url>` enqueues, `gaia-coder self-fix process` runs one iteration. - **Tests.** 47 new tests covering every §7.4 step, every §7.2 filtering rule, and the §15.2 mixin contract. Mocks Anthropic and `gh` at the callable boundary; uses a tmp git repo with a `coder` branch for real branch creation and pytest differential runs. ## Out of scope (explicit non-goals per the Phase 6 task) - EventBridge ingestion of `@gaia-coder feedback:` comments (Phase 10 repo binding). - Auto-merge (§7.6) — blocked on Phase 4 review-gate confidence score. - Dev-mode self-edit (§7.5) and `restart_self` — Phase 7. - ReAct loop self-edit (§7.8) — Phase 7. ## Dependencies - **#818 (mixins):** Imports `edit_file` semantics inline rather than instantiating `FileToolsMixin`, to keep the module usable without the mixin wired. - **#819 (scaffold):** Package skeleton + GAIA.md + prompts/ dir. - **#820 (stores):** Uses `gaia.coder.stores.feedback` and `gaia.coder.stores.memory` directly. - **Phase 4 review gate (sibling):** Loose-coupled — `review_gate_runner` is optional; absent, we publish a "(review gate not available)" placeholder in the PR body. - **Phase 5 trust (sibling):** Loose-coupled — `request_em_approval` uses `gaia.coder.trust.inbox.enqueue` if importable, else defers. ## Test plan - [ ] `pytest tests/coder/test_self_fix/ -xvs` — 47 tests pass. - [ ] `pytest tests/coder/ -q` — full coder suite (120 tests) pass. - [ ] `python util/lint.py --all` — no new critical errors (pre-existing pylint warnings in `cli.py`, `discovery.py`, etc. are not touched). - [ ] `python -c "from gaia.coder.self_fix import SelfFixToolsMixin; m = SelfFixToolsMixin(); assert len(m.register_self_fix_tools()) >= 7"` — smoke. - [ ] `gaia-coder feedback "..." --severity high --on https://github.com/amd/gaia/pull/9999 --id fb-test --db-path /tmp/fb.db` — writes a row; follow with `gaia-coder self-fix process --db-path /tmp/fb.db --repo-root <worktree> --skip-differential-verify --skip-fix-apply` for the end-to-end path (PR creation mocked). ## Merge plan - Draft PR, **do not merge**. Rebase onto `coder` after Phase 4 and Phase 5 land so the review gate and inbox wiring become real imports.
Summary
Implements the ten persistent stores called out in §15.1 of
docs/plans/coder-agent.mdx— the durable-state layer every Phase 1 / Phase 3 gaia-coder feature depends on (task queue, EM inbox, feedback queue, spend ledger, audit log, CI history, memory, paused-task snapshots, self-edit JSONL, learnings plain-text log). No business logic — just DDL, Pydantic row models, typed CRUD, and tests.em_inbox.db,tasks.db,feedback.db,spend.db,audit.log.db,ci_history.db,memory.db. Each ships the §15.1 DDL verbatim (grep-verifiable), acreate_tables()/open_store()pair with canonical PRAGMAs (WAL,foreign_keys=ON,busy_timeout=5000), andinsert_row/get_row/update_row/list_rowshelpers keyed off a per-store Pydantic v2 model.paused_tasks(JSON-per-file),self_edits_log(JSONL),learnings_log(plain text with a strict single-line grammar). All three use Pydantic for schema validation on the write side and fail loudly on read errors.stores/__init__.py::create_all_stores(root)— idempotent convenience that materialises every store under one directory and returns a{store_name: Path}map.tests/coder/test_stores.py— create / round-trip / CHECK-constraint for every SQLite store, round-trip / schema-rejection for the append-only logs, smoke test forcreate_all_stores.tests/coder/conftest.py— defensivesys.modulesstubs forgaia.coder.base/gaia.coder.loopso these tests stay isolated from sibling scaffold churn; becomes inert once the sibling branch stabilises.FAISS index wiring for the
memorystore is intentionally out of scope — that's Phase 10.Dependencies
Branched off
feature/gaia-coder-scaffold(sibling task, already merged into this branch via rebase). The three scaffold commits (daefa1d,a160148,da051cd) at the tip are required forfrom gaia.coder import CoderAgentto resolve, but nothing in this PR's 12 commits depends on the scaffold beyond defensive test stubs.Commits
One commit per store module (ten) plus one for the infra (
_common.py+ test conftest) and one for the package__init__withcreate_all_stores:feat(coder): stores — SQLite connection helpers + test harnessfeat(coder): em_inbox store + CRUD + testsfeat(coder): tasks store + CRUD + testsfeat(coder): feedback store + CRUD + testsfeat(coder): spend store + CRUD + testsfeat(coder): audit store + CRUD + testsfeat(coder): ci_history store + CRUD + testsfeat(coder): memory store + CRUD + tests (SQL side only)feat(coder): paused_tasks snapshot store + testsfeat(coder): self_edits_log JSONL append-only store + testsfeat(coder): learnings_log plain-text append-only store + testsfeat(coder): stores/__init__ with create_all_stores + smoke testTest plan
python -m pytest tests/coder/test_stores.py -xvs— 30 passed locally.python util/lint.py --black --isort— clean.src/gaia/coder/stores/:Do not merge
Draft — blocked on the sibling
feature/gaia-coder-scaffoldPR landing first so the git history stays clean once both land onfeature/coding-agent.