Skip to content

feat(coder): trust contract + EM CLI + prompt composer#822

Merged
kovtcharov merged 6 commits into
coderfrom
feature/gaia-coder-trust
Apr 20, 2026
Merged

feat(coder): trust contract + EM CLI + prompt composer#822
kovtcharov merged 6 commits into
coderfrom
feature/gaia-coder-trust

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

Summary

Lands Phase 5 of docs/plans/coder-agent.mdx — the EM-facing surface of
gaia-coder. Before this PR the CLI verbs were stubs; after it, the EM
can bootstrap the agent, read her trust contract, promote/demote her,
and queue messages. Every LLM call now injects her identity triplet
(GAIA.md + ARCHITECTURE.md + PROJECT_MAP.md) as a cacheable prefix.

Threads

  • trust.pyEMConfig / RepoBinding Pydantic models, the 0-5
    CapabilityTier ladder, TOML round-trip, and promote / demote
    functions with audit-log writes. Promotion refuses mismatched EM
    signatures; demotion is immediate. Why it matters: §4.2 makes
    promotion explicit, so a quiet accept would let any caller escalate
    tier. Fail-loudly TrustError surfaces exactly what to fix.
  • inbox.py — thin CRUD over em_inbox.db with the §4.5 5-second
    non-LLM auto-ack, channel-agnostic dispatch callable, and escalation
    into feedback.db with severity translation. Why it matters:
    §4.5 says the ack is non-negotiable in latency; keeping it template-
    only (no model call) makes the SLA automatic.
  • intent.py — LLM-driven conversational intent classifier for
    §15.4 + §15.8 P9, temperature 0 Opus 4.7, mockable via an injected
    llm callable. Low confidence (< 70) and unknown intents coerce to
    free_form. Handler functions cover every §15.4 intent. Why it
    matters:
    regex matchers would miss paraphrases ("let me give you
    self-edit for now"); LLM routing keeps the grammar maintainable.
  • prompt_composer.py — builds Anthropic-format message blocks
    with cache_control={"type":"ephemeral"} on the identity triplet
    • per-skill blocks, per §3.2 / §4.6 / §6.5. Why it matters: §3.1
      mandates prompt caching; this is the single place that decides what
      gets cached.
  • cli.py — replaces seven stub handlers (trust, promote,
    demote, ask, note, critical, inbox) with real ones. Config
    dir honours $GAIA_CODER_HOME so tests never touch real user state.
    Stubs remain for the Phase 6+ verbs (daemon, status, feedback,
    doctor, etc.).
  • prompts/intent_classifier.md + prompts/standup.md — §15.8 P9
    and P10 prompt templates landed verbatim for future prompt-class
    self-fix PRs.
  • Tests (58 new) — each module has a dedicated test file; CLI tests
    run as subprocess to exercise the full argparse + env-var path.

Dependencies

This PR depends on sibling branches that have not yet merged to coder:

Rebase onto coder once those land.

Test plan

  • pytest tests/coder/test_{trust,intent,inbox,prompt_composer,cli_trust}.py -xvs — all 58 tests pass
  • gaia-coder trust on a fresh $GAIA_CODER_HOME prints the §4.1 bootstrap question and exits 0
  • gaia-coder trust --bootstrap --em-handle <you> --em-channel <ch> then gaia-coder trust renders the §4.2 template verbatim
  • gaia-coder promote --to-tier 2 --reason "..." --em-signature <you> updates em.toml and writes an audit row
  • gaia-coder promote ... --em-signature wrong-user exits 1 with the mismatch message on stderr
  • gaia-coder ask "enable self-edit" prints the auto-ack template and enqueues a pending row in em_inbox.db
  • gaia-coder inbox lists the pending row

@github-actions github-actions Bot added the tests Test changes label Apr 20, 2026
Adds the durable-state half of the §4 trust contract for gaia-coder:

- EMConfig Pydantic model mirrors em.toml's schema from §7.1 (em_handle,
  em_channel, persona_name, dev_mode_self_edit, allow_state_machine_edit,
  auto_merge_classes, current_tier).
- CapabilityTier IntEnum 0-5 with the §4.2 labels.
- RepoBinding Pydantic model for repo_binding.toml from §15.6.
- load_em_config / save_em_config — TOML round-trip (hand-rolled writer
  since tomli_w is not in the dep set; hand-validated by the test suite).
- promote() rejects mismatched EM signatures with a TrustError that names
  the expected handle, the received handle, and the remediation command.
  This is load-bearing — §4.2 makes promotion explicit; a silent accept
  would let any caller escalate tier.
- demote() is immediate and requires no signature per §4.2; a reason is
  captured for the audit row but not enforced.
- tier_history() reconstructs the timeline from audit.log.db for
  \`gaia-coder trust --history\`.

Fail-loudly policy: every invariant violation raises TrustError with
what-failed / what-to-do / where-to-look. No silent fallbacks.

Tests cover em.toml round-trip, bad-signature promote, empty-reason
promote, tier-range validation, demote-target ordering, chronological
history reconstruction, tier label stability, and RepoBinding schema
validation.
Thin stateless wrapper over em_inbox.db:

- enqueue(severity, body, from_handle, channel) with invariant checks on
  severity and channel — we raise InboxError rather than let SQLite's
  CHECK constraint surface an opaque IntegrityError, so the caller gets
  an actionable message.
- auto_ack() emits the §4.5 non-LLM template ("I see your message; will
  respond at next breakpoint"); deliberately no model call so the ack
  cannot be delayed by a slow turn. Takes an optional dispatch callable
  so CLI, GitHub comment, email, and daily-standup-reply channels can
  plug in without this module growing channel adapters.
- mark_seen / mark_answered / escalate cover the four §15.1 state
  transitions. escalate() translates em_inbox's info/question/critical
  ladder to feedback.db's low/med/high/critical ladder per §7.3.
- poll_at_breakpoint() returns pending rows oldest-first for the ReAct
  loop to service at natural breakpoints (§4.5).

Tests include a round-trip latency measurement (must be under 5s per
§4.5; in practice < 1ms), severity translation, idempotent mark_seen,
and escalation moving a row from em_inbox into feedback with the
right fields copied over.
…8 P9)

The EM writes free text ("enable self-edit permanently", "promote to
tier 3") and the agent needs to map that to a concrete action. Keyword
matchers would need one branch per phrasing; a small Opus call with the
full intent table in the prompt is both more maintainable and more
accurate (§15.4's grammar is broader than any reasonable regex).

This commit adds:

- INTENT_CATALOG listing every §15.4 intent with a one-line description
  the classifier prompt renders verbatim.
- classify_intent(em_message, allowed_intents, llm=None) — LLM-driven,
  temperature 0, ≤200 tokens out. The llm parameter is a Callable[[str],
  str] so tests inject a lambda that returns canned JSON. The default
  lazy-imports the Anthropic SDK and fails loudly with install guidance
  if the SDK is missing — no hidden fallback to a regex matcher.
- build_prompt() is exposed publicly so §15.4's requirement "every
  classification is audit-logged with the raw message and the returned
  JSON" can be satisfied without this module doing the audit write
  itself.
- Low-confidence (< MIN_CONFIDENCE=70) and unknown-intent responses are
  coerced to free_form per §15.4's bail-out rule; the original
  classification is preserved in args so the audit log can still show
  what the model thought.
- Handler functions for every intent in the table: enable_self_edit_*,
  disable_self_edit, promote/demote_tier, grant_per_call_selfedit,
  what_tier, spend_query (stub — aggregator lands in Phase 3), pause,
  resume, authorise_sensitive, feedback (escalates to feedback.db via
  gaia.coder.inbox.escalate), skill_invoke (stub — Phase 4.7 loader).
- HandlerContext dataclass bundles every resource a handler might need
  (em.toml path, parsed EMConfig, three open connections, mutable
  session dict) so adding a future handler doesn't require editing
  every existing signature.

Tests use a mock llm and cover the five §15.4 canonical phrases, the
low-confidence bail-out, hallucinated-intent coercion, malformed-JSON
loudness, prompt-building invariants, triple-quote escaping, and the
session-flag / em.toml / audit side effects of the happy-path handlers.
… §6.5)

Every LLM call ships the same three-document identity prefix: GAIA.md
(who she is, §4.6), ARCHITECTURE.md (how she is composed, §6.5), and
PROJECT_MAP.md (what she is building, §6.5). §15.8 prompt templates all
refer to these tags as cacheable prefix segments — this module is where
the list of blocks actually gets built.

- compose_system_prompt(LoopContext, matched_skills) returns a list of
  Anthropic-format MessageBlock dicts. Every identity block carries
  cache_control={"type":"ephemeral"} so the §3.1 / §6.6 mandatory prompt
  caching actually hits.
- Matched skills land after the identity prefix (§4.7 loading algorithm:
  "Matching skills are loaded into the turn's system prompt *after*
  GAIA.md + ARCHITECTURE.md but *before* the task instructions"). Each
  skill is cache-keyed independently so the identity block's cache hit
  is preserved when the skill set changes turn-over-turn.
- extra_suffix appends un-cached per-turn facts ("current tier: 3") so
  dynamic state doesn't force the identity docs to churn.
- Missing identity docs raise FileNotFoundError with an actionable
  message — §4.6 calls these files load-bearing; silent omission would
  be a bigger bug than a loud crash.

LoopContext is kept tiny on purpose: identity_root + skill_paths +
extra_suffix. If the composer needs more loop state, we surface it
through this dataclass rather than have the composer reach into the
loop directly.

Tests include the order check (GAIA → ARCHITECTURE → PROJECT_MAP), the
cache_control assertion, skill-block placement, the extra_suffix un-cached
tail, missing-doc loudness, and an integration-style check that the
default identity root actually contains the three shipped files.
…templates

§15.8 of the spec lists ten canonical prompt templates under
src/gaia/coder/prompts/. This commit lands the two Phase 5 ones
verbatim from the spec:

- intent_classifier.md (P9) — fires on every \`gaia-coder ask "..."\`.
  Opus 4.7, temperature 0, max 200 tokens. Returns JSON matching the
  §15.4 intent grammar; confidence < 70 or unknown intent coerces to
  free_form. :mod:\`gaia.coder.intent\`'s build_prompt() renders this
  template at runtime — the file is here for human review, audit
  traceability, and the future prompt-class self-fix workflow (§7.4).
- standup.md (P10) — fires daily 09:00 EM-local and weekly Fri 17:00
  per §4.4. Opus 4.7, temperature 0.3 (a touch of variety on prose per
  §15.8 header), max 2000 tokens. Wires the cacheable
  <gaia_md_persona_section> tag that prompt_composer.py also emits.

Both prompts follow the §15.8 "Convention" rule: explicit
response-format section, XML tags for slots, cacheable prefix segments
(GAIA.md / ARCHITECTURE.md / persona section) separated from per-turn
suffix content.
…al/inbox

Replaces the Phase-1 "not yet implemented" stubs for the seven EM-facing
verbs with real handlers backed by gaia.coder.{trust,inbox,intent}.

- \`gaia-coder trust\` renders the §4.2 tier summary verbatim (Tier:, EM:,
  "At this tier you may:", "At this tier you may NOT yet:" — these
  labels are load-bearing; a typo breaks the "first-glance view of the
  trust contract"). When em.toml is absent, trust halts with the §4.1
  bootstrap question rather than silently falling through to Tier 0.
- \`gaia-coder trust --bootstrap\` records em.toml from --em-handle /
  --em-channel / --persona-name so the user never has to hand-edit TOML
  (§7.1's rationale: config-file UX is a feature that gets used rarely).
- \`gaia-coder trust --history\` dumps tier-change events oldest-first
  from audit.log.db, one line per event with handle + reason.
- \`gaia-coder promote\` validates the EM signature via
  trust.promote(); rejection surfaces the exception message to stderr
  and exits 1. Silent accept would violate §4.2.
- \`gaia-coder demote\` runs trust.demote() with no signature check.
- \`gaia-coder ask | note | critical "..."\` enqueue inbox rows at
  question/info/critical severity and print the §4.5 auto-ack template.
- \`gaia-coder inbox\` lists pending rows + recent history.

Config-dir resolution honours $GAIA_CODER_HOME so tests can isolate
state in a pytest tmp_path — no shared ~/.gaia/coder/ between tests.

Stubs (daemon, status, feedback, audit, spend, egress, introspect,
skill, doctor, rag) still print "not yet implemented" — those land in
Phase 6+.

Tests spawn the CLI as a subprocess to exercise the full argparse wiring
+ env-var path: bootstrap-prompt on fresh install, bootstrap flags +
tier render, promote + history + bad-signature rejection, demote,
ask-enqueues-inbox (hits the durable store, not stdout parsing), note +
critical severity routing, inbox listing, and the four remaining stubs.
@kovtcharov kovtcharov force-pushed the feature/gaia-coder-trust branch from 08d2158 to 34e77bd Compare April 20, 2026 09:43
@kovtcharov kovtcharov marked this pull request as ready for review April 20, 2026 09:45
@kovtcharov kovtcharov merged commit 0617343 into coder Apr 20, 2026
10 of 11 checks passed
@kovtcharov kovtcharov deleted the feature/gaia-coder-trust branch April 20, 2026 09:45
@github-actions
Copy link
Copy Markdown
Contributor

Summary

This PR turns seven gaia-coder CLI verbs from stubs into working implementations and adds the data-layer primitives (trust.py, inbox.py, intent.py, prompt_composer.py) that back them. The code is well-structured, fails loudly on contract violations per CLAUDE.md, and is backed by 58 new tests exercising both module-level logic and the subprocess-invoked CLI. The biggest thing worth flagging is that it lands on main while relying on sibling PRs (#818/#819/#820) that hadn't merged yet — imports like gaia.coder.stores.em_inbox, gaia.coder.stores.feedback, and gaia.coder.stores.audit will break any fresh checkout until those land.

Issues Found

🟡 Important

1. Dead/broken XML-tag fallback in prompt_composer._identity_block (src/gaia/coder/prompt_composer.py:1729-1736 in diff)

The computed fallback tag is never exercised (every path is covered by tag_map), but it also contains a no-op .replace("_", "_") that looks like a typo — probably intended as .replace(" ", "_") or .replace("-", "_"). Either wire the fallback correctly, or drop it entirely and raise if the label isn't in tag_map:

    tag_map = {
        "gaia.md": "gaia_md",
        "architecture.md": "architecture_md",
        "project_map.md": "project_map_md",
    }
    try:
        xml_tag = tag_map[label.lower()]
    except KeyError as e:
        raise ValueError(
            f"no XML tag mapping for identity doc {label!r}; "
            "add it to tag_map in _identity_block()."
        ) from e
    wrapped = f"<{xml_tag}>\n{body.rstrip()}\n</{xml_tag}>"

This also matches the fail-loudly policy: a new identity doc added to IDENTITY_DOCS without a tag_map entry would currently produce a silently wrong tag (project_map without the _md suffix the §15.8 prompts reference), which is exactly the kind of quiet-wrong-answer the policy warns against.

2. Docstring/implementation mismatch: ask handler does not run the intent classifier (src/gaia/coder/cli.py:_handle_ask + inbox.py:_enqueue_em_message)

The PR description says "The ask handler uses it to run the intent classifier" and the _enqueue_em_message docstring repeats that. In practice, _handle_ask only enqueues + prints the queued id — the classifier is never invoked, and nothing in cli.py imports gaia.coder.intent. The classifier is fully built but dead-on-arrival from the CLI's perspective.

Either:

  • wire the classifier into _handle_ask (which is what the commit title wire EM CLI verbs — trust/promote/demote/ask/note/critical/inbox promises), or
  • update the two docstrings to say "future daemon will run the classifier; Phase 5 CLI just enqueues."

This isn't a blocker, but a new contributor reading the PR today will reasonably expect gaia-coder ask "promote me to tier 3" to act on the intent.

3. Duplicated _utc_now_iso() across three modules (trust.py:_utc_now_iso, inbox.py:_utc_now_iso, intent.py:_utc_now_iso)

The same function is defined three times. One lives inside a function body (intent.py), which adds a per-call import cost. Extract to gaia.coder.time or an existing util module:

from gaia.coder.time import utc_now_iso

Low-severity on its own, but it's three copies in one PR, and the lint bar in this repo wants DRY on utilities.

🟢 Minor

4. Greedy regex in _parse_llm_response (src/gaia/coder/intent.py:1142)

_JSON_OBJ_RE = re.compile(r"\{.*\}", re.DOTALL) is greedy across newlines — if the model ever emits two JSON blocks (e.g. a prose preamble with curly-brace examples), this matches from the first { to the last } and json.loads fails on the combined mess. Non-greedy + balanced-brace is overkill, but non-greedy at least fails predictably on the first object:

_JSON_OBJ_RE = re.compile(r"\{.*?\}", re.DOTALL)

5. Implicit string concatenation reads like a bug (src/gaia/coder/inbox.py:783, inbox.py:836)

raise InboxError(
    f"invalid channel {channel!r}; expected one of " f"{sorted(VALID_CHANNELS)}"
)

The leading f"" on the second fragment is unnecessary and reads like two arguments. Collapse:

        raise InboxError(
            f"invalid channel {channel!r}; expected one of {sorted(VALID_CHANNELS)}"
        )

Same pattern at line 836 ("...{msg_id!r}; " "enqueue the message first").

6. Dead helper _render_intent_table in intent.py:1109

Defined but never called — build_prompt inlines the table render. Drop it (it's small and trivially reconstructable if needed again):

# (delete _render_intent_table — build_prompt renders inline)

7. resolve_config_dir mkdir fires even on --help (src/gaia/coder/cli.py:resolve_config_dir)

Path.mkdir(parents=True, exist_ok=True) runs at argparse time indirectly (every subcommand handler calls _em_toml_path() etc.). Running gaia-coder --help from a user with no GAIA install still creates ~/.gaia/coder/. Lazy-create on first write instead — move mkdir into the save paths (save_em_config already handles this; open_store presumably does too).

8. Over-broad pytest.raises(Exception) in trust tests (tests/coder/test_trust.py:3422,3546)

Both call sites have a # pydantic ValidationError comment explaining what they mean. Import the real exception and assert on it — the comment is load-bearing precisely because Exception matches too much (a typo in a fixture would silently pass):

from pydantic import ValidationError
# ...
    with pytest.raises(ValidationError):
        trust_mod.EMConfig(em_handle="", em_channel="cli")

9. Silent no-op on demote from tier 0 (src/gaia/coder/trust.py:demote)

max(0, current - 1) means demote(cfg_at_tier_0, ...) returns a new EMConfig at tier 0 and still writes an audit row. Either skip the audit write when no change occurred, or raise TrustError("already at floor tier") — writing an audit row for a no-op is a quiet-wrong-answer per §4.4 ("silent demotions are concealment-adjacent").

Strengths

  • Fail-loudly errors everywhere that matter. TrustError / InboxError name what failed, what to do, and where to look — exactly the shape CLAUDE.md asks for (trust.py:2306-2312 is a textbook example).
  • CLI tests run as real subprocesses (tests/coder/test_cli_trust.py:_run_cli) with GAIA_CODER_HOME redirection — this exercises the full argparse + env-var + import path the way a user hits it, which the testing guidance explicitly asks for.
  • Mockable LLM contract (intent.py:classify_intent(..., llm=None)) — tests never touch the network; the default lazy-imports anthropic only when actually called. Clean separation.
  • Prompt caching is done right (prompt_composer.py) — identity triplet cached ephemerally, per-turn facts explicitly uncached to avoid cache thrash. The block-by-block test (test_prompt_composer_extra_suffix_is_uncached) pins that invariant.
  • Explicit severity-ladder translation (inbox.py:_ESCALATE_SEVERITY_MAP) with a docstring explaining why the two tables use different ladders — exactly the kind of non-obvious WHY the CLAUDE.md comment policy asks for.

Verdict

Approve with suggestions — the code is sound and well-tested, but Issues 1 and 2 are worth a follow-up: the broken _identity_block fallback is a latent fail-silently bug the next identity doc will trip on, and the ask → classifier wiring claimed in the PR title isn't actually there. Everything else is minor polish. Since the PR is already merged, consider landing a follow-up that addresses 1, 2, and 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant