feat(coder): Phases 9+10 — OSS reuse + codebase research + repo binding + RAG freshness#827
Conversation
11 `gh_*` tools wrapping the `gh` CLI via subprocess. Tests mock the `_run_gh` boundary to avoid real API calls. Every non-zero `gh` exit raises `GitHubCLIError` with the stderr attached — fail-loudly per CLAUDE.md rule. Defaults `--repo` to $GITHUB_REPO so the LLM never repeats owner/name and a stray call cannot target a different repo. 21 tests, all passing.
Four tools (gh_search_code, gh_search_repos, vet_license, import_with_attribution) implementing the license-aware-reuse contract from §5.4: * Permissive allowlist: MIT, BSD-2/3-Clause, Apache-2.0, ISC, Unlicense, 0BSD. GPL/AGPL/LGPL/SSPL/BUSL hard-fail with LicenseIncompatibleError. * import_with_attribution enforces four guarantees at the tool layer: compatible license, SHA (not branch) pin, per-file '# Adapted from <repo> @ <sha> — <license>' header, and an append-only THIRD_PARTY_NOTICES.md entry at repo root. * All network I/O funnels through _gh_api / _fetch_raw; tests mock both. 18 tests, all passing.
* RepoBinding Pydantic model mirroring the §15.6 TOML layout. Missing or malformed fields raise RepoBindingError — no silent defaults. * doctor() aggregates four bootstrap checks (App install, PEM decrypts, webhook signature round-trip, coder branch exists) into a DoctorResult. The agent refuses to act until result.green is True. * verify_webhook_signature for §15.5 HMAC-SHA256 header validation. * agents_md_entry renders the canonical §5.11 discoverability block for AGENTS.md / .github/copilot-instructions.md. 17 tests, all passing. keyring_getter + gh_runner are injected so tests never touch real credentials or the network.
* FreshnessContract.default() declares the five §6.9 corpora (source_tree, pr_descriptions, issues, adrs_plans, claude_agents_md) with a 12h per-corpus staleness threshold and a 36h watchdog. * ensure_fresh_or_raise raises StaleIndexError — §6.9 'fail loudly, never silently degrade' rule. The error message points at the exact CLI command that fixes it. * reindex_watchdog surfaces a 'critical'-severity EM-inbox message if any corpus is past 36h (or never indexed at all). * check_citation_valid wraps 'git cat-file -e <ref>:<path>' and raises CitationStaleError on a missing path — the Pass 3 architectural check for 'cites a file that was indexed but has been deleted/renamed'. * rag_status / rag_refresh / rag_rebuild expose the Python API behind the CLI commands (CLI wiring lives in the unification follow-up). 22 tests, all passing. Provider/runner callables injected so tests never touch real RAG backends or git.
research(source, question, ...) dispatches a short-lived subagent into a scratch workspace under ~/.gaia/coder/research/<session>/ by default. The subagent answers the question and returns a StructuredAnalysis matching the §5.10 schema verbatim. Hard budget enforcement via ResearchBudget: wall-clock (default 10m), dollar (default $2), and tool-call (default 40) ceilings. Tripping any ceiling raises BudgetExceededError with the partial analysis attached so the caller can still log what was produced. The LLM engine is injected via the ResearchEngine protocol; this module never imports an Anthropic / Claude SDK — the caller owns the model binding. Tests inject a canned engine + stub cloner so no network, no real clone. 15 tests, all passing. Scratch workspaces are cleaned up on exit unless keep=True, and cleanup still runs on budget-exception paths via a try/finally block.
Adds the two new coder mixins to KNOWN_TOOLS so YAML-manifest agents can opt in via `tools: [github, oss_reuse]` (per §15.2 / §5.2). Regenerates schemas/agent-manifest.schema.json to match. Also folds in black+isort formatting pass on the five new modules and their tests, plus cleans up unused imports flake8 flagged. Net 80+ line reduction vs. the initial drop. Tests unchanged — all 93 new tests still pass, full coder suite still at 291 passed / 2 skipped.
543c9d0 to
8b99a6b
Compare
SummaryThis PR lands Phases 9+10 of Most important finding: Issues Found🔴 Critical🔒 SECURITY CONCERN: Path traversal in
|
…ime) (#832) ## Summary Six fixes flagged by the auto-review bot: one Critical (security), five Important (two on #827, two on #828, one on both). All 395 tests pass on `coder` with the fixes. ## Changes **Critical (security):** - `oss_reuse.py` `import_with_attribution` — path traversal on LLM-controlled `dest_path`. Now resolves + `relative_to(root)`-checks; raises `AttributionError` on escape. **Important:** - `oss_reuse.py` `_validate_license_filter` — unknown SPDX ids silently dropped; now raises per CLAUDE.md fail-loudly. - `tools/github.py` `gh_pr_merge` — hardcoded `--admin`; now gated behind `admin_override=False` default. - `repo_binding.py` webhook round-trip — only did positive check; added wrong-signature + wrong-payload discrimination. - `tools/debug.py` `add_instrumented_trace` — emitted `logger.debug(...)` requiring pre-bound `logger`; now inlines `__import__('logging')` lookup. - `tools/debug.py` `diff_behavior` — `git switch -` after two detached switches returns to wrong ref; now captures + explicitly restores original HEAD. ## Test plan - [x] `pytest tests/coder/ tests/eval/` — 395 pass - [x] 7 new regression tests in `test_fixes_827_828.py` cover each fix - [x] `test_add_instrumented_trace_*` now asserts the mutated module actually imports (previously asserted only the string was written)
Summary
Bundles Phases 9 and 10 of
gaia-coderinto one draft PR to keep the public PR count low. Lands five new modules + the registry entries that make them discoverable. Everything is additive — no existing coder code is touched, no CLI wiring yet (that's the unification follow-up).OSSReuseMixinhard-fails GPL/AGPL/LGPL/SSPL/BUSL and vendors compatible sources with a# Adapted from <repo> @ <sha> — <license>header plus aTHIRD_PARTY_NOTICES.mdappend. Prevents the "quietly vendored GPL into MIT repo" regression class at the tool layer, not just in review.research(source, question, ...)runs a short-lived observational subagent with a hard wall-clock +$+ tool-call budget. Keeps external-repo investigation out of the primary RAG (cache pollution) and away fromwrite_file/gh/ memory writes (safety).repo_binding.toml;doctor()aggregates four bootstrap invariants (App install, PEM decrypts, webhook signature round-trip,coderbranch exists). Agent is blocked untildoctor().green is True— the §15.6 hard bootstrap gate.StaleIndexErrorraised when a query's youngest-indexed doc is older than 12h;reindex_watchdogfirescritical-severity at 36h;check_citation_validis the Pass 3 gate that rejects "cites a file that was indexed but has been deleted/renamed."GitHubToolsMixin(§15.2) — the 11gh_*wrappers. Foundational for OSS reuse (which composes_run_gh) and for the whole §5.5 event-trigger layer that lands later.KNOWN_TOOLSregistration —github+oss_reuseadded so YAML-manifest agents can opt in viatools: [github, oss_reuse]. Schema regenerated.Test plan
pytest tests/coder/test_github_tools.py— 21 passingpytest tests/coder/test_oss_reuse.py— 18 passing (incl. explicittorvalds/linux→ GPL-rejected andpallets/click→ BSD-3-accepted acceptance checks)pytest tests/coder/test_repo_binding.py— 17 passingpytest tests/coder/test_rag_freshness.py— 22 passingpytest tests/coder/test_codebase_research.py— 15 passingpytest tests/coder/full coder suite — 291 passed, 2 skipped (nothing regressed)python util/lint.py --all— no errors on new filesgit diff --stat origin/coder..HEADis limited to the five new modules, their tests, the two-lineKNOWN_TOOLSaddition, and the auto-regenerated schemaNo real
gh/ network / keyring calls in tests — every external boundary is mocked (_run_gh,_gh_api,_fetch_raw,cloner,keyring_getter,gh_runner).Do not merge — draft for review.