Skip to content

refactor(agents): remove YAML manifest agent support (#912)#914

Merged
itomek-amd merged 3 commits intomainfrom
912-refactor-remove-yaml-manifest-agent-support-consolidate-on-python-custom-agents
Apr 27, 2026
Merged

refactor(agents): remove YAML manifest agent support (#912)#914
itomek-amd merged 3 commits intomainfrom
912-refactor-remove-yaml-manifest-agent-support-consolidate-on-python-custom-agents

Conversation

@itomek
Copy link
Copy Markdown
Collaborator

@itomek itomek commented Apr 27, 2026

Summary

Consolidates custom agents on a single definition format: Python agent.py. Deletes ~200 lines of dynamic type()-based class construction, Pydantic manifest validation, and per-agent MCP-config merging from src/gaia/agents/registry.py. Unblocks upcoming external-service work (OAuth-bound integrations) where capability declarations like REQUIRED_CONNECTIONS need a single class-attribute home the Agent UI can introspect.

The agent.yaml companion sidecar (declares models: next to a Python agent) is unchanged.

Closes #912

Why this matters

  • One way to define a custom agent. Removes the parallel YAML schema and dynamic type() builder, so every agent is a regular Python class readable by mypy, IDEs, and git grep.
  • Smaller, simpler registry. −276 lines from registry.py; the dynamic class builder, _TOOL_REGISTRY.clear() race risk, and the merged-MCP-config disk write are all gone.
  • Foundation for Phase-1 connection work. OAuth scope declarations now have an unambiguous Python-class home rather than a parallel YAML field needing doubled validation and doubled UI handling.

Behaviour

  • _load_from_dir loads only agent.py. A directory containing only agent.yaml (no sibling agent.py) emits a DeprecationWarning and is skipped.
  • Companion-YAML guard. When both agent.py and agent.yaml exist, only models: is read. Legacy manifest fingerprint keys (manifest_version, tools, instructions, mcp_servers, id) trigger a DeprecationWarning enumerating the ignored keys instead of silently discarding them. A scalar models: value is rejected with a logger warning rather than leaking through as a string that would later iterate as characters.
  • Type narrowing. AgentRegistration.source (dataclass) and AgentInfo.source (Pydantic) are now Literal["builtin", "custom_python"]. Pydantic enforces at the API boundary.
  • Importer. Bundles whose entry directories ship YAML-only or no agent.py at all surface an actionable ImportResult.errors message rather than silently moving an unloadable directory into ~/.gaia/agents/. Exporter drops the YAML-only branch from _is_custom_agent_dir.

Removed

  • AgentManifest Pydantic model and its validate_tools / validate_id validators.
  • _load_manifest_agent, _create_manifest_agent_class, _write_merged_mcp_config from AgentRegistry.
  • schemas/agent-manifest.schema.json and util/gen_manifest_schema.py (and the corresponding lint check in util/check_agent_conventions.py).
  • All tests/unit/agents/test_registry.py cases that exercised the manifest loader.

Pre-merge verification

  • No first-party agent ships as YAML. Built-in registrations in _register_builtin_agents are Python-only.
  • BuilderAgent emits Python only. src/gaia/agents/builder/template.py generate_agent_source() returns Python source for all three render paths; tests/unit/agents/test_builder_agent.py already asserts no agent.yaml is written. No BuilderAgent changes required.
  • Internal demos / customer integrations. Please confirm before merge that no internal workshop, eval set, or customer hand-off relies on ~/.gaia/agents/*/agent.yaml.

Test plan

  • python -m pytest tests/unit/agents/test_registry.py tests/unit/test_export_import.py -xvs — 45 passed
  • python -m pytest tests/unit/ -x — 1642 passed, 15 skipped (3 pre-existing chat/ui/test_sse_confirmation.py flakes verified independent of this PR)
  • python util/lint.py --all --fix — 6/6 quality checks pass
  • grep -rn "AgentManifest\|_load_manifest_agent\|_create_manifest_agent_class\|_write_merged_mcp_config\|custom_manifest" src/ tests/ docs/ — no residual references outside the deprecation-warning string, the regression-guard test, and the release-notes entry
  • python -c "import json; json.load(open('docs/docs.json'))" — Mintlify nav still valid JSON

Migration

Users with ~/.gaia/agents/<id>/agent.yaml manifests need to convert to agent.py. The release notes link to a new "Migrating from a YAML Manifest" section in docs/guides/custom-agent.mdx with a field-by-field translation table and a worked example. The fastest path is the Gaia Builder Agent (gaia chat --ui+Build a Custom Agent), which scaffolds a working agent.py.

Consolidate custom agents on a single definition format: Python `agent.py`.
Deletes ~200 lines of dynamic `type()`-based class construction, Pydantic
manifest validation, and per-agent MCP-config merging from the registry.

The `agent.yaml` companion sidecar (used to declare a `models:` preference
list next to a Python agent) is unchanged.

Behaviour
- `_load_from_dir` now loads only `agent.py`. A directory containing only
  `agent.yaml` (no sibling `agent.py`) emits a `DeprecationWarning` and is
  skipped — `~/.gaia/agents/<id>/agent.yaml` as the sole agent definition
  is no longer loaded.
- Companion-YAML guard: when both files exist, only `models:` is read. If
  the sidecar carries legacy manifest fingerprint keys (`manifest_version`,
  `tools`, `instructions`, `mcp_servers`, `id`), a `DeprecationWarning`
  enumerates the ignored keys instead of silently discarding them.
- Companion `models:` is now type-checked: a scalar value is rejected with
  a logger warning, never leaks into the registration as a string.
- `AgentRegistration.source` and `AgentInfo.source` narrowed to
  `Literal["builtin", "custom_python"]`.
- Importer rejects bundles whose entry directories ship YAML-only or no
  `agent.py` at all, with an actionable `ImportResult.errors` message
  instead of silently moving an unloadable directory into place. Exporter
  drops the YAML-only branch from `_is_custom_agent_dir`.

Removed
- `AgentManifest` Pydantic model and its `validate_tools` / `validate_id`
  validators.
- `_load_manifest_agent`, `_create_manifest_agent_class`, and
  `_write_merged_mcp_config` from `AgentRegistry`.
- `schemas/agent-manifest.schema.json` and `util/gen_manifest_schema.py`
  (and the corresponding lint check in `util/check_agent_conventions.py`).
- All `tests/unit/agents/test_registry.py` cases that exercised the
  manifest loader.

Tests
- New `TestYamlOnlyDeprecation`, `TestCompanionYamlGuard`, and
  `TestRemovedSymbols` classes in `tests/unit/agents/test_registry.py`,
  plus a `sys.modules` cleanup fixture so `importlib`-based loads cannot
  leak across tests.
- Two new bundle-import tests in `tests/unit/test_export_import.py`
  covering legacy YAML-only bundles and bundles missing `agent.py`
  entirely.

Docs
- `docs/guides/custom-agent.mdx`: added a "Migrating from a YAML Manifest"
  section with a field-by-field translation table and a worked example.
- `docs/spec/plugin-registry.mdx` and `docs/sdk/patterns.mdx`: rewritten
  to remove manifest-format coverage.
- `docs/releases/v0.17.5.mdx`: new release-notes entry flagging the
  breaking change.

Closes #912
@itomek itomek linked an issue Apr 27, 2026 that may be closed by this pull request
9 tasks
@github-actions github-actions Bot added documentation Documentation changes tests Test changes agents labels Apr 27, 2026
@itomek itomek self-assigned this Apr 27, 2026
@itomek itomek marked this pull request as ready for review April 27, 2026 20:53
@itomek itomek requested a review from kovtcharov-amd as a code owner April 27, 2026 20:53
itomek added 2 commits April 27, 2026 16:54
The Verify external URLs check fails because the linked anchor lives in
this PR's docs/guides/custom-agent.mdx and is not on the live amd-gaia.ai
site yet. Use a Mintlify-internal relative link, which the internal
cross-reference check (already passing) validates without hitting the
live site.
Release notes for v0.17.5 will land in the release PR rather than this
feature PR, to avoid merge conflicts with sibling milestone work.
@github-actions
Copy link
Copy Markdown
Contributor

Summary

Clean, well-scoped removal of YAML-manifest agent support. The registry change is small and surgical (~−276 lines), the migration path is well-documented in docs/guides/custom-agent.mdx, and the importer/exporter changes correctly surface actionable errors instead of silently breaking. The biggest miss is stale guidance left in repo-onboarding docs (CLAUDE.md, .github/workflows/claude.yml, .claude/agents/*.md) — they still tell contributors to register agents via YAML manifests, which directly contradicts the new behaviour.


Issues Found

🟡 Important — Stale guidance in onboarding docs still recommends the removed YAML format

These files are not in the diff but actively contradict the PR. Both repo contributors and the gaia-agent-builder Claude agent read them as authoritative.

CLAUDE.md (4 lines, all stale):

  • CLAUDE.md:284└── registry.py # YAML-manifest agent registry + KNOWN_TOOLS map
  • CLAUDE.md:377"New agents are preferably registered via YAML manifests validated by Pydantic..."
  • CLAUDE.md:389"...register it in KNOWN_TOOLS so YAML-manifest agents can declare it."
  • CLAUDE.md:680"gaia-agent-builder — Creating a new GAIA agent (Python class or YAML manifest)..."

.github/workflows/claude.yml:195-196 — duplicated review-bot guidance:

            - **New agents** must be registered as a Python `agent.py` (subclass of `Agent` with `AGENT_ID` / `AGENT_NAME` class attributes). YAML manifests were removed in v0.17.5 (#912).
            - **New tool mixins** MUST be added to `KNOWN_TOOLS` in `src/gaia/agents/registry.py`; the BuilderAgent template uses this map to scaffold imports for new `agent.py` files.

.claude/agents/gaia-agent-builder.md — has an entire "B. YAML manifest" section (lines 38-41) and references to the deleted AgentManifest type (lines 28, 41, 65, 71).

.claude/agents/sdk-architect.md:32 — references AgentManifest (YAML spec).

This is the same kind of doc/code drift the PR's regression test (TestRemovedSymbols) was written to prevent — worth catching it in CLAUDE.md and the agent definitions too.

🟢 Minor — Obsolete subsections in code-index docs

  • docs/sdk/sdks/code-index.mdx:204-214"Opting in from a YAML-manifest agent" with a class_path: / tools: YAML example. The whole subsection is now dead — either delete it or rewrite as a Python-agent recipe pointing back to docs/guides/custom-agent.mdx.
  • docs/guides/code-index.mdx:134"YAML-manifest agents can opt in by listing code_index under tools..." same fix; reword to "Custom agents can compose CodeIndexToolsMixin directly...".

🟢 Minor — Stale docstring entry in convention linter

util/check_agent_conventions.py:17 still lists "Manifest JSON Schema is not stale" under "Hard checks", but _check_manifest_schema() was deleted in this PR.

- ``KNOWN_TOOLS`` entries in registry.py resolve to importable classes

(Drop the Manifest JSON Schema is not stale bullet entirely — it has no implementation behind it any more.)

🟢 Minor — Export silently drops YAML-only dirs; users get no migration prompt from that path

src/gaia/installer/export_import.py:99_is_custom_agent_dir returns False for YAML-only dirs, and the test test_export_skips_yaml_only_dirs documents this as intentional. The discovery path emits a DeprecationWarning for the same case, but the export path is silent. If a user runs export_custom_agents on a .gaia/agents/ containing one Python and one YAML agent, the resulting bundle quietly contains only the Python one — they have no signal to migrate.

Suggest emitting a single log.warning per skipped YAML-only directory, with the same migration link the importer uses. Not blocking; the test correctly captures the chosen tradeoff.

🟢 Minor — Non-string entries in companion models: are silently dropped

src/gaia/agents/registry.py:281models = [m for m in raw_models if isinstance(m, str)]. A list like [123, "Qwen3-0.6B-GGUF"] passes the outer isinstance(raw_models, list) guard, drops the int, and registers ["Qwen3-0.6B-GGUF"] with no warning. Mirrors the warning emitted for a scalar models: value — for consistency, consider warning when the filter drops anything:

                    raw_models = yaml_data.get("models")
                    if isinstance(raw_models, list):
                        bad = [m for m in raw_models if not isinstance(m, str)]
                        if bad:
                            logger.warning(
                                "registry: companion YAML %s: 'models' contains "
                                "non-string entries %r — ignoring those",
                                yaml_file,
                                bad,
                            )
                        models = [m for m in raw_models if isinstance(m, str)]
                    elif raw_models is not None:

Strengths

  • Excellent regression-guard test (TestRemovedSymbols). Parametrised over both module-level and class-level deletions — exactly the pattern that prevents the deletions from accidentally being re-introduced by a later refactor.
  • Companion-YAML manifest-fingerprint guard is a smart UX touch. Detecting manifest_version / tools / instructions / mcp_servers / id next to a Python agent and emitting an enumerated DeprecationWarning (rather than silently dropping the keys) preserves the principle from CLAUDE.md that legacy state should fail loudly.
  • New _purge_custom_agent_modules autouse fixture (tests/unit/agents/test_registry.py:33) correctly prevents sys.modules cache leakage from importlib.util.spec_from_file_location reuse — a subtle bug that bites only when test order changes. Good defensive test hygiene.
  • Type narrowing of AgentRegistration.source and AgentInfo.source to Literal["builtin", "custom_python"] removes a string-typed enum at the API boundary and lets Pydantic enforce it.
  • Importer error messages name the agent ID, the cause, and the migration URL — exactly the three-part actionable error the CLAUDE.md "fail loudly" rule asks for.

Verdict

Approve with suggestions. No blockers. The 🟡 onboarding-doc inconsistencies should ideally be cleaned up in this PR (they're the same scope of work as the docs/guides/custom-agent.mdx rewrite already in the diff), but they can also be a follow-up if the maintainer prefers to keep this PR tightly scoped.

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

@claude can you evaluate the pros/cons of removing yaml support? I do think there's space for both where you implement the programmatic portions of an agent in a python file while using yaml for simple configurations of the agent. analyze what is cleanest and architecturally sound.

@github-actions
Copy link
Copy Markdown
Contributor

@kovtcharov-amd — good question, and worth pulling apart. Two things are getting bundled in the term "YAML support" and they have different tradeoffs.

What this PR removed vs. what it kept

Before After
Full YAML manifest (id, name, instructions, tools, mcp_servers — entire agent declared in YAML, class built dynamically via type()) ✅ Supported ❌ Removed
Companion sidecar agent.yaml next to agent.py, scoped to declarative config (models:) ✅ Supported ✅ Kept

The deletion is specifically the YAML-as-class-definition path — the one that did dynamic class construction, MRO stitching, _TOOL_REGISTRY.clear() per agent, and merged-MCP-config disk writes. The hybrid you're describing (Python for code, YAML for declarative config) is already in the codebase as the sidecar — see registry.py:262-292. Today it only reads models:, but that's a deliberate door, not a hard ceiling.

Pros / cons of going further on the hybrid

Pros of expanding the sidecar (YAML-for-config):

  • Separation of concerns. Editing a description, conversation starter, or OAuth scope shouldn't require touching agent.py.
  • Non-developer authoring. Domain experts / PMs can tweak metadata; Agent UI can render a form that round-trips YAML cleanly.
  • Future Agent Hub fit (docs/plans/agent-hub.mdx). Shared Python "agent kits" + per-deployment YAML overrides is a well-trodden plugin shape.
  • Multi-environment overrides. Same Python, different YAML per environment (rare but real).

Cons / what bit us last time:

  • Two sources of truth. When agent.py has AGENT_ID = "x" and agent.yaml has id: y, someone has to define precedence and the UI has to know which one to surface. The current manifest-fingerprint guard at registry.py:268-278 exists precisely because this drift is real.
  • mypy / IDE blindness. Anything in YAML is invisible to type-checkers and git grep.
  • Doubled UI handling. The Phase-1 connection plumbing motivation (REQUIRED_CONNECTIONS) is exactly what gets cheaper with one path — adding YAML back doubles the UI handler surface again.
  • YAGNI on most "config" fields. AGENT_ID, AGENT_NAME, CONVERSATION_STARTERS are one-line class attributes today. Moving them to YAML costs a parser, a schema, and a precedence rule for ~3 lines saved.

Architectural recommendation

I'd argue the cleanest position is what we have right now, with a clear rule for the sidecar:

Python is canonical. The sidecar carries only declarative fields where (a) the value is plain data, (b) the field is plausibly edited without a code change, (c) the Agent UI consumes it as a form.

models: qualifies because users tune available models per machine. Future candidates that pass that bar:

  • oauth_scopes: / required_connections: — per-deployment, surfaced in a consent dialog
  • mcp_servers: overrides (already a sibling mcp_servers.json, not a registry concern)
  • branding / panel defaults if/when those exist

Fields that should not drift back into YAML:

  • instructions / system prompt — frequently references self, dynamic context
  • tools — mixin selection is a class-definition concern; @tool registration has Python semantics
  • id / name / description — fine as Python class attrs; moving them to YAML buys nothing

Concrete next step if you want to push further on the hybrid: when REQUIRED_CONNECTIONS lands, add it to the sidecar (with Pydantic validation) rather than as a class attribute, and let the Agent UI read it without importing Python. That's the test case that proves the sidecar pattern scales — and it's the motivation called out in the PR description.

So: keep this PR as-is, treat the models:-only sidecar as the seed of the hybrid model rather than its end state, and grow the sidecar narrowly when an OAuth/connection field arrives that genuinely benefits from being declarative.

@itomek-amd itomek-amd added this pull request to the merge queue Apr 27, 2026
Merged via the queue into main with commit 098e08e Apr 27, 2026
34 of 35 checks passed
@itomek-amd itomek-amd deleted the 912-refactor-remove-yaml-manifest-agent-support-consolidate-on-python-custom-agents branch April 27, 2026 21:59
pull Bot pushed a commit to bhardwajRahul/gaia that referenced this pull request Apr 28, 2026
…#918)

## Summary

Follow-up to [amd#914](amd#914). The
github-actions review-bot left a batch of doc-cleanup suggestions that
landed after merge — onboarding files (`CLAUDE.md`,
`.github/workflows/claude.yml`, `.claude/agents/*.md`) and a couple of
SDK doc subsections still told contributors to register agents via YAML
manifests, directly contradicting the Python-only registry shipped in
amd#914. Same drift class the
[`TestRemovedSymbols`](https://github.com/amd/gaia/blob/main/tests/unit/agents/test_registry.py)
regression guard exists to prevent — it just didn't extend to prose docs
and `.claude/agents/`.

Also folds in the two silent-drop polish items the bot flagged:
companion YAML `models:` filtering and `export_custom_agents` skipping
legacy YAML-only directories.

## Threads

- **Onboarding/agent-doc cleanup** — `CLAUDE.md` (4 spots),
`.github/workflows/claude.yml` (review-bot guidance),
`.claude/agents/{gaia-agent-builder,sdk-architect,code-reviewer,architecture-reviewer,rag-specialist}.md`.
Why: contributors and the `gaia-agent-builder` Claude agent read these
as authoritative — leaving them stale would steer new agents back into
the deleted YAML path.
- **SDK doc subsections** — drops the dead "Opting in from a
YAML-manifest agent" snippet in
[docs/sdk/sdks/code-index.mdx](docs/sdk/sdks/code-index.mdx) and
rewrites the equivalent line in
[docs/guides/code-index.mdx](docs/guides/code-index.mdx) to describe
Python-class composition. Why: example showed a YAML format that no
longer loads.
- **Linter docstring** — `util/check_agent_conventions.py` listed
"Manifest JSON Schema is not stale" under hard checks, but the
corresponding `_check_manifest_schema()` was deleted in amd#914. Drop the
stale bullet.
- **Loud-failure polish** — [`registry.py`](src/gaia/agents/registry.py)
now warns when companion YAML `models:` contains non-string entries
instead of silently filtering them;
[`export_import.py`](src/gaia/installer/export_import.py) emits one
`log.warning` per skipped YAML-only directory during export so users get
the same migration signal the discovery path already gives. Why: matches
the "no silent fallback / actionable error" rule from CLAUDE.md.

Two new test cases guard the warnings.

## Test plan

- [x] `python -m pytest tests/unit/agents/test_registry.py
tests/unit/test_export_import.py -xvs` — 46 passed
- [x] `python util/check_agent_conventions.py` — 0 errors
- [x] `python util/lint.py --black --isort` — pass
- [x] grep -rn "YAML-manifest agents can\|or YAML manifest\|registered
via YAML manifest\|AgentManifest" `CLAUDE.md .claude/ .github/ docs/sdk/
docs/guides/code-index.mdx util/` — zero hits (migration guide and
release notes intentionally retained)

Refs amd#914.
pull Bot pushed a commit to bhardwajRahul/gaia that referenced this pull request Apr 30, 2026
## Changes

This pull request introduces a new optional governance layer for GAIA
agents, providing action-level governance (ACGS-lite semantics) with
extension points for future workflow-level features. The governance
system is opt-in and does not affect existing agents unless explicitly
enabled. The changes include the addition of a new `gaia.governance`
package, a comprehensive example agent demonstrating governance
features, and detailed documentation to guide users. The governance
framework is modular, allowing developers to mix in governance
capabilities, tag tools with risk levels, and configure policy engines,
reviewers, and audit logging.

The most important changes are:

**New Governance Framework:**

* Added the `gaia.governance` package, introducing a modular governance
layer for GAIA agents. This includes the `GovernedAgentMixin`,
`GaiaGovernanceAdapter`, risk tagging decorators, and extension points
for policy engines, receipt services, and checkpoint runtimes.
* Implemented the `GaiaGovernanceAdapter` class, which composes policy
evaluation, checkpointing, receipt issuance, and policy version binding
into a single entry point. It ensures secure, auditable, and extensible
governance flows for agent tool calls.
* Provided an `action_mapper` utility to map GAIA tool calls into
governance action requests, standardizing how actions are represented
for policy evaluation.

**Documentation and Examples:**

* Added a comprehensive `README.md` for the `gaia.governance` package,
including quick start instructions, configuration options, security
properties, and extension points. This documentation enables developers
to quickly understand and adopt the governance system.
* Introduced a new example, `examples/governed_weather_agent.py`,
demonstrating how to wrap an agent with governance, define risk-tagged
tools, and handle governance decisions (ALLOW, BLOCK, REVIEW) with local
and MCP tools.

**Packaging:**

* Updated `setup.py` to include the new `gaia.governance` package in the
distribution, ensuring it is installed and available for import.

---

## Hardening & Polish (added in 4 follow-up commits)

Triggered by a PR-review pass that surfaced merge blockers and
architectural feedback. All concerns addressed without expanding feature
scope.

**Merge blockers fixed** — `f242e28 fix(governance): harden error
handling and align docs with additive tags`

* Tightened five `except Exception` sites that were silently swallowing
errors. The most important one (`_resolve_canonical_tool_name`) now logs
unexpected resolver errors with `exc_info=True` instead of falling
through silently. This closes the alias-bypass risk where governance
could check tags on the wrong key when the resolver had a bug. The other
four sites (`_lookup_tool_fn`, `_invoke_callback`, `_prompt_review`,
`JsonlReceiptService._read_all`) now use specific exception types and
log at WARNING.
* `_prompt_review` now returns `(approved, exception_or_None)` so
`_handle_review_checkpoint` can stamp the exception type and message
into the receipt's `metadata.evidence.resolution.reason` (`15bc40b`).
The audit log can now distinguish "reviewer chose no" from "reviewer
crashed" — previously both produced the same boilerplate `"reviewer
rejected"` reason.
* Documentation now matches the code: tag merge is **additive (union,
deduplicated)** — *not* "explicit dict wins". Updated README, the
`@govern` decorator's docstring, and the inline comment in
`mixin._build_action_request` to describe what the tests have always
asserted.
* `_canonical_hash` for BLOCK-receipt evidence now handles non-JSON tool
args, complex types, and cycles without falling back to `repr()`,
keeping receipts deterministically hashable across all inputs.
* `JsonlReceiptService.issue_receipt` now performs strict canonical JSON
validation at issue time, rejecting non-canonical metadata (NaN/Inf,
opaque objects) so tampered or unparseable receipts cannot land in the
audit log.
* Public docs registered: new `docs/sdk/sdks/governance.mdx` plus an
entry in `docs/docs.json` SDK navigation. Closes the missing-docs
blocker.

**CI guard** — `2ed500d ci(test_api): cap job runtime at 30 minutes`

* The API Tests job had no `timeout-minutes` and was hanging for 4+
hours on the in-flight CI run for this PR. Added a 30-minute cap (covers
worst-case Lemonade boot + model pull + tests) so future runs fail fast
on hangs.

**Polish** — `ca941a9 refactor(governance): polish pass — drop dead
code, tighten lock, deep-copy tags`

Driven by a parallel three-agent review (code-reviewer +
architecture-reviewer + test-engineer):

* Deleted `workflow_mapper.py` and
`StaticPolicyBindingService.bind_receipt`. Both were "forward-compat
seams" with zero callers in src/, tests/, examples/, or docs/. They'll
come back in the PR that adds the real event surface, when the actual
signature is known. YAGNI.
* Tightened `JsonlReceiptService.get_receipt`: cache reads/writes were
unsynchronized while a concurrent `issue_receipt` was mutating the same
dict under `_lock`. Both paths are now under the lock.
* `GovernedAgentMixin.__init__` now deep-copies inner risk-tag lists so
a caller cannot mutate the agent's tag table after construction by
holding onto the original list reference.
* Added a comment on the `bool`-before-`int` ordering in
`_canonical_json_value` (subclass relationship — without the order,
`True` would canonicalize as `1`).
* Debug breadcrumb on receipt-log malformed-line skips, so an operator
chasing a missing receipt has something to grep.

**Test additions** — `5cdfee5 test(governance): cover hardened error
paths and fail-closed branches`

Added 6 new tests covering branches that had no regression guard:

* `test_resolver_unexpected_exception_logs_and_governs_raw_name` —
proves a buggy `_resolve_tool_name` raising RuntimeError still triggers
governance on the raw name AND emits an operator-visible warning. Future
regression where the warning is swapped for a silent fallback fails this
test.
* `test_resolver_lookup_error_is_silent_and_governs_raw_name` — proves
the expected "tool not in registry" case (`LookupError`) is absorbed
silently with no log noise.
* `test_unknown_transition_outcome_fails_closed` — proves a custom
`CheckpointRuntime` returning a status the mixin doesn't know is denied,
not let through.
* `test_handle_transition_rejects_unknown_decision_type` — same idea at
the adapter layer for an unknown `GovernanceDecision.decision`.
* `test_read_all_skips_malformed_lines` — proves a corrupt line in the
middle of an audit log doesn't block readers from finding subsequent
valid records.
* Existing callback-exception and reviewer-exception tests gained
`caplog` assertions so a future silent-swallow regression is caught.

Plus two readability fixes: renamed
`test_explicit_dict_overrides_decorated_tags` →
`test_explicit_empty_dict_does_not_downgrade_decorator_tags` (the body
asserted additive semantics, the old name said the opposite); replaced
hardcoded `"test_governance_adapter.SlotOnlyEvidence"` qualname strings
with `f"{Cls.__module__}.{Cls.__qualname__}"` so the tests survive a
file rename.

**Verification (fresh evidence at HEAD `15bc40b`)**

* Governance test suite: **67 passed** (was 27 before the polish — added
5 from the in-flight strict-evidence work and 6 from the polish review).
* `python util/lint.py --black --isort`: PASS.
* No dead code residue: `git grep` of `workflow_mapper`,
`map_gaia_event_to_transition`, `bind_receipt` returns zero matches.
* Public-import smoke test: `GaiaGovernanceAdapter.default()` constructs
with the four expected components.
* Broader unit tests (excl. `tests/unit/chat/` which needs the optional
`[ui]` extra): **946 passed, 16 skipped** — no regressions introduced.
* Upstream merge of `amd:main` (10+ commits including the
YAML-manifest-removal refactor `amd#914`) is incorporated. `_TOOL_REGISTRY`
survived that refactor; governance imports remain green.

**Items intentionally not in this PR** (deferred for follow-up):

* `Agent.__init__` accepting `**kwargs` so multi-mixin composition
(`MCPAgent + GovernedAgentMixin + ApiAgent`) doesn't trip on closed
signatures — touches `agents/base/agent.py` and is a separate concern.
* Public accessor for `_TOOL_REGISTRY` to replace the
`gaia.agents.base.tools._TOOL_REGISTRY` private import in
`mixin._lookup_tool_fn`.
* Extracting `_canonical_hash` and `_canonical_json_value` to a public
`gaia.governance.canonical` module so any conforming
`ReceiptServiceProtocol` can verify or recompute hashes independently.
* `default()` accepting component overrides for `policy_engine`,
`receipt_service`, `checkpoint_runtime`, `policy_binding` so third
parties can swap engines without forgoing the factory.

These are good ideas that expand public API surface and belong in a
focused follow-up PR rather than bundled into this merge.


---

## Governance REVIEW + existing confirmation path

Follow-up for PR review 4197475871: this PR takes Path A. Governance
remains an opt-in policy layer, but REVIEW decisions now reuse GAIA
Agent UI confirmation when the active console advertises
`blocking_confirmation = True` (`SSEOutputHandler`). An explicit
`governance_reviewer` still takes precedence for non-UI or custom
approval flows, and default `AgentConsole` remains fail-closed because
its confirmation method auto-approves.

Regression coverage added:

* Blocking-console fallback: governance REVIEW delegates to
`console.confirm_tool_execution` only for consoles marked
`blocking_confirmation = True`.
* Agent UI path: a governance-tagged REVIEW tool with `SSEOutputHandler`
emits the existing `permission_request` event and runs only after
approval.
* Default-console safety: unmarked consoles are not treated as implicit
reviewers, preserving fail-closed behavior.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: dislovelhl <dislovelhl@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot mentioned this pull request May 1, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents documentation Documentation changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: Remove YAML manifest agent support; consolidate on Python custom agents

3 participants