Skip to content

feat(code-audit): add PR auditor as second code-audit slice#18

Merged
DavidHavoc merged 1 commit into
mainfrom
feat/pr-auditor
May 15, 2026
Merged

feat(code-audit): add PR auditor as second code-audit slice#18
DavidHavoc merged 1 commit into
mainfrom
feat/pr-auditor

Conversation

@W00DSRULES
Copy link
Copy Markdown
Collaborator

openworkers audit pr <github-url> extracts atomic claims from a PR description and verdicts each against the actual unified diff. Same pipeline shape as the README auditor (planner → deterministic researcher → checker + trust gate → critic), proving that the SourceAdapter abstraction generalises to a second backend.

New modules:

  • core/sources/github.py — GitHubAdapter over a unified diff via a PrSpec value object; live fetch (fetch_pr_from_github) is a sibling helper, decoupled from the adapter so tests stay network-free. parse_pr_url + load_pr_fixture as supporting helpers.
  • core/orchestrator/pr_flow.py — PrAuditOrchestrator.
  • core/orchestrator/audit_prompts.py — shared audit-prompt loader (extracted from readme_flow.py so each new auditor registers its templates in one place).
  • prompts/code_audit/pr_planner.md + pr_checker.md — PR-specific claim types (add | remove | fix | refactor | test | behavior | doc | other) and diff-aware verdict rules.

Schema:

  • core/schemas_audit.py exposes AuditClaim / AuditClaimList as aliases of ReadmeClaim / ReadmeClaimList so PR code reads cleanly without churning the README slice. AuditReport.target captures the audited artefact id (PR URL, etc.).

Agents:

  • providers/code_audit_agents.py adds PrPlannerAgent + PrCheckerAgent alongside the README versions. The same _enforce_trust_gate runs after the LLM responds, so any claim with no diff evidence is forced to unsupported regardless of LLM output. AuditCriticAgent is reused as-is.

CLI:

  • openworkers audit pr <url> (or --fixture <dir> for offline runs) with optional --token falling back to GITHUB_TOKEN / GH_TOKEN.

Tests:

  • tests/fixtures/sample_pr/ — canned PR JSON + unified diff with one verified, one drifted, one contradicted, one fabricated (no-evidence) claim.
  • tests/code_audit/test_pr_flow.py — URL parser, fixture loader, adapter grep, end-to-end audit with verdict distribution, and an explicit trust-gate-override assertion for the hallucinated WIDGETLIB_DEBUG claim.

Docs: README.md, ROADMAP.md, CHANGELOG.md, AGENTS.md updated with the new slice, shared pipeline section, and PR-audit-flow walkthrough.

Verification: 159/159 tests pass (153 existing + 6 new PR tests), mypy strict clean on new modules, ruff clean on new files, black formatted. CLI smoke-runs in text and JSON modes against the fixture.

Summary

Changes

Testing

  • pytest tests/ -v passes
  • ruff check . passes
  • black --check . passes
  • mypy core/ providers/ --strict --ignore-missing-imports passes

Checklist

  • I have read CONTRIBUTING.md
  • Code follows project style (black, ruff, mypy strict for core/providers)
  • Tests added or updated for the changes
  • Documentation updated if needed

`openworkers audit pr <github-url>` extracts atomic claims from a PR
description and verdicts each against the actual unified diff. Same
pipeline shape as the README auditor (planner → deterministic
researcher → checker + trust gate → critic), proving that the
`SourceAdapter` abstraction generalises to a second backend.

New modules:
- core/sources/github.py — GitHubAdapter over a unified diff via a
  PrSpec value object; live fetch (fetch_pr_from_github) is a sibling
  helper, decoupled from the adapter so tests stay network-free.
  parse_pr_url + load_pr_fixture as supporting helpers.
- core/orchestrator/pr_flow.py — PrAuditOrchestrator.
- core/orchestrator/audit_prompts.py — shared audit-prompt loader
  (extracted from readme_flow.py so each new auditor registers its
  templates in one place).
- prompts/code_audit/pr_planner.md + pr_checker.md — PR-specific claim
  types (add | remove | fix | refactor | test | behavior | doc | other)
  and diff-aware verdict rules.

Schema:
- core/schemas_audit.py exposes AuditClaim / AuditClaimList as aliases
  of ReadmeClaim / ReadmeClaimList so PR code reads cleanly without
  churning the README slice. AuditReport.target captures the audited
  artefact id (PR URL, etc.).

Agents:
- providers/code_audit_agents.py adds PrPlannerAgent + PrCheckerAgent
  alongside the README versions. The same _enforce_trust_gate runs
  after the LLM responds, so any claim with no diff evidence is
  forced to `unsupported` regardless of LLM output. AuditCriticAgent
  is reused as-is.

CLI:
- `openworkers audit pr <url>` (or `--fixture <dir>` for offline runs)
  with optional `--token` falling back to GITHUB_TOKEN / GH_TOKEN.

Tests:
- tests/fixtures/sample_pr/ — canned PR JSON + unified diff with one
  verified, one drifted, one contradicted, one fabricated (no-evidence)
  claim.
- tests/code_audit/test_pr_flow.py — URL parser, fixture loader,
  adapter grep, end-to-end audit with verdict distribution, and an
  explicit trust-gate-override assertion for the hallucinated
  WIDGETLIB_DEBUG claim.

Docs: README.md, ROADMAP.md, CHANGELOG.md, AGENTS.md updated with the
new slice, shared pipeline section, and PR-audit-flow walkthrough.

Verification: 159/159 tests pass (153 existing + 6 new PR tests),
mypy strict clean on new modules, ruff clean on new files, black
formatted. CLI smoke-runs in text and JSON modes against the fixture.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@DavidHavoc DavidHavoc merged commit c8e21ce into main May 15, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants