Quick Start β’ Why github-agent β’ Big Projects β’ Architecture β’ Safety β’ Roadmap
github-agent is an autonomous engineering pipeline built on Claude. Give it a GitHub issue URL; it clones the repo, edits the code, runs the tests, has a second AI instance review the diff, refuses to ship a PR that fails its own review, and opens a pull request β all in one command.
node src/pipeline.js issue https://github.com/your/repo/issues/42$ node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment
<h3 align="center">An AI that ships pull requests β and reviews its own work before opening them.</h3>
<p align="center">
<a href="#-quick-start">Quick Start</a> β’
<a href="#-what-makes-this-different">Why github-agent</a> β’
<a href="#οΈ-architecture">Architecture</a> β’
<a href="#-safety-guardrails">Safety</a> β’
<a href="#-roadmap">Roadmap</a>
</p>
<p align="center">
<img src="https://img.shields.io/badge/model-Claude%20Sonnet-blueviolet?style=flat-square&logo=anthropic" alt="Claude Sonnet">
<img src="https://img.shields.io/badge/license-MIT-green?style=flat-square" alt="MIT License">
<img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen?style=flat-square&logo=node.js" alt="Node 18+">
</p>
---
`github-agent` is an **autonomous engineering pipeline** built on Claude. Give it a GitHub issue URL; it clones the repo, edits the code, runs the tests, has a **second AI instance review the diff**, and opens a pull request β all in one command.
```bash
node src/pipeline.js issue https://github.com/your/repo/issues/42
$ node src/pipeline.js issue https://github.com/your/repo/issues/42
ββββββββββββββββββββββββββββββββββββββββββββββ
β github-agent β autonomous PR engineer β
β engineering β self-review β ship β
ββββββββββββββββββββββββββββββββββββββββββββββ
βΈ Issue qiskit/qiskit#9421
title: Transpiler drops global phase on conditional gates
default branch: main
βΈ Cloning + branching
β branch: fix/issue-9421
test command: tox
lint commands: ruff check ., black --check ., mypy .
monorepo sub-packages: terra, aer, ibmq
guessed sub-package for issue: terra
CONTRIBUTING.md found at CONTRIBUTING.md
Project requires DCO Signed-off-by β will auto-sign commits.
20 file(s) prefiltered as likely relevant
pre-fix HEAD: 3f4a1b2
βΈ Engineering agent β autonomous fix loop
π [turn 1] Scoring the shortlist β transpiler/passes/optimization looks like the hit.
π§ find_relevant_files(query="transpiler global phase conditional gates")
π§ read_file(qiskit/transpiler/passes/optimization/consolidate_blocks.py)
π [turn 2] Found it β line 142 drops .global_phase on IfElseOp. Patching.
π§ apply_patch(qiskit/transpiler/passes/optimization/consolidate_blocks.py, ...)
π§ run_tests(tox) β PASS
π§ run_lint(ruff check .) β PASS
π§ run_lint(mypy .) β PASS
π§ finish({"pr_summary":"Preserve global_phase through IfElseOp consolidation..."})
β Agent finished after 6 turn(s)
π [turn 1] Let me start by exploring the auth module.
π§ list_files(src/auth)
π§ read_file(src/auth/login.js)
π [turn 2] The issue is at line 47 β email isn't lowercased before lookup.
π§ apply_patch(src/auth/login.js, ...)
π§ run_tests(npm test)
β ok
π§ finish({"pr_summary":"Lowercase email before lookup..."})
β Agent finished after 4 turn(s)
βΈ Self-review β auditing the diff
β Review verdict: APPROVE
Token usage (engineering + revision)
input: 18,204 tok Β· output: 2,131 tok Β· cache_read: 14,067 tok
cost: $0.4912
βΈ Committing + pushing
β added DCO Signed-off-by trailer
β pushed fix/issue-9421 to Hadar01/qiskit
βΈ Opening pull request
β PR opened: https://github.com/qiskit/qiskit/pull/11504
β commented on issue: https://github.com/qiskit/qiskit/issues/9421#issuecomment-...
Most AI coding tools generate code and hand it to a human. github-agent ships it β and audits itself first, refuses to ship bad work, and handles OSS repos you don't own.
Most AI coding tools generate code and hand it to a human. github-agent ships it β and audits itself first.
| Copilot / Cursor | Devin / SWE-agent | github-agent | |
|---|---|---|---|
| Generates code | β | β | β |
| Runs tests autonomously | β | β | β |
| Runs project linters autonomously | β | partial | β |
| Opens the PR for you | β | β | β |
| Reviews its own diff before shipping | β | β | β |
| Refuses to ship on bad self-review | β | β | β |
| Revises based on its own review | β | β | β |
| Knows when to give up | β | β | β |
| Works on repos you don't own (fork + PR) | β | β | β |
| Human-readable audit trail in PR body | β | partial | β |
| Cost estimate + kill switch per run | β | β | β |
| Full audit trail in the PR body | β | partial | β |
| Cost estimate per run | β | β | β |
A second Claude instance, with a completely fresh context and a different system prompt, audits the diff for:
- π Bug risk β logic errors, off-by-ones, null dereferences, drift from the original issue intent
- π Bug risk β logic errors, off-by-ones, null dereferences
- π² Edge cases β inputs the engineering agent didn't consider
- π§ͺ Test coverage β is the change actually tested?
- π― Scope creep β did the agent touch things it shouldn't?
Verdict is one of APPROVE / REQUEST_CHANGES / NEEDS_DISCUSSION. On REQUEST_CHANGES the engineering agent does a revision pass with the review as input. On anything that isn't APPROVE, the pipeline refuses to open the PR β you have to pass --force-pr to override. No silent bad PRs.
Working on a 50-file toy repo is easy. Working on Qiskit, Cirq, or TQEC is not. github-agent has specific affordances for large scientific-Python-class codebases:
| Problem on a Qiskit-scale repo | What github-agent does |
|---|---|
| Thousands of files β context blows up | Keyword relevance prefilter scores every file against issue text; top-20 injected as starting hint. No embeddings API needed. |
Narrow language support misses .pyx/.pxd/.pyi/.rst/config |
Walks all of them, plus Makefile, tox.ini, noxfile.py, CONTRIBUTING.md, PR templates. |
Monorepos with sub-packages (qiskit-terra, qiskit-aer, β¦) |
Auto-detects sub-packages, guesses from issue text which one the change belongs to, tells the agent. |
Test command isn't bare pytest β it's tox, nox, make test |
Priority-ordered detection: Makefile test: target β make test. tox.ini β tox. noxfile.py β nox. Then Python/Node/Rust. |
CI gates on ruff, black, mypy β not just tests |
Lint gate: auto-detects configured linters and the agent must pass them all before finish(). |
Deeply-indented Python makes apply_patch brittle |
Whitespace-normalized fallback + apply_patch_range (replace by line numbers) when strings won't disambiguate. |
| DCO sign-off / PR templates / CONTRIBUTING.md rules | All read and honored. Signed-off-by: trailer appended automatically. PR template preserved at top of PR body. |
| Scientific deps fail to install (BLAS/CUDA/compiled extensions) | run_tests detects ModuleNotFoundError/ImportError and flags env_error:true. The agent gives up gracefully instead of thrashing. |
| Complex issues need human judgment | The agent can call give_up({reason, explanation, blockers}). With --comment it posts the reason on the issue so a human picks up with full context. |
| Duplicate runs open duplicate PRs | Duplicate-PR guard β scans open PRs for Resolves/Fixes/Closes #N or matching fix/issue-N branch before cloning. |
π Honest limitation: we don't provision test environments. If a repo needs GPU / BLAS / conda, you'll want to run the agent inside a pre-warmed Docker image. That executor is on the roadmap.
You can run github-agent on any public open-source project, even without write access. A public_repo-scoped PAT is enough.
# Fork-and-PR: pushes to your own fork, opens PR upstream, links back to the issue.
node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment
# Review a PR in a project you're not a maintainer of.
# --post submits the review as a PR comment (falls back to issue comment if permissions block).
node src/pipeline.js review https://github.com/qiskit/qiskit/pull/11504 --post
# Triage multiple issues in one shot.
node src/pipeline.js triage https://github.com/qiskit/qiskit --label=bug --max=5 --fork --commentThe review subcommand exits non-zero on REQUEST_CHANGES so you can wire it straight into CI as a pre-merge gate.
If the verdict is REQUEST_CHANGES, the engineering agent does a revision pass with the review feedback as input. The full report ships in the PR body β no black box, no guessing what the AI did.
- Node.js 18+
- An Anthropic API key
- A GitHub Personal Access Token β
public_repofor OSS work,repofor private repos - A GitHub Personal Access Token with
reposcope
git clone https://github.com/Hadar01/github-agents.git
cd github-agents
npm install
cp .env.example .env
# edit .env:
# ANTHROPIC_API_KEY=sk-ant-...
# GITHUB_TOKEN=ghp_...# Dry run first β full pipeline, no commits/push/PR
node src/pipeline.js issue https://github.com/your/repo/issues/42 --dry-run
# Ship it for real
node src/pipeline.js issue https://github.com/your/repo/issues/42
# Review an existing PR (no editing β just the audit)
git clone <your-fork-url>
cd github-agent
npm installCreate a .env file in the repo root:
ANTHROPIC_API_KEY=sk-ant-...
GITHUB_TOKEN=ghp_...# Fix an issue and open a PR (the main event)
node src/pipeline.js issue https://github.com/your/repo/issues/42
# Dry run β full pipeline, no commits/push/PR
node src/pipeline.js issue https://github.com/your/repo/issues/42 --dry-run
# Audit an existing PR (no editing β just the review report)
node src/pipeline.js review https://github.com/your/repo/pull/123Or use the npm shorthand scripts:
npm run issue -- https://github.com/your/repo/issues/42
npm run review -- https://github.com/your/repo/pull/123βββββββββββββββββββ
β GitHub Issue β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Engineering Agent (Claude + tool use) β
β β
β Tools: read_file list_files write_file β
β apply_patch run_tests git_diff β
β git_status finish β
β β
β Loop: explore β patch β test β repeat β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β diff
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Self-Review (Claude, fresh context) β
β β
β Audits: bug risk Β· edge cases β
β test coverage Β· scope creep β
β β
β Verdict: APPROVE / REQUEST_CHANGES / DISCUSS β
βββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
ββββββββ΄βββββββββββββββββββββ
β APPROVE β REQUEST_CHANGES
β βΌ
β βββββββββββββββββββββββββ
β β Revision Pass β
β β (engineering agent β
β β + review feedback) β
β ββββββββββββ¬βββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββ
β Commit β Push β Open PR β
β (PR body includes review report) β
ββββββββββββββββββββββββββββββββββββββ
The agent has real write access to files on disk. We've put real fences around it:
| Guardrail | Detail |
|---|---|
| Path traversal blocked | read_file, write_file, apply_patch reject any path escaping the repo root |
| Test command allowlist | run_tests only accepts npm test, pytest, go test, cargo test, etc. β no arbitrary shell |
| Iteration cap | Hard stop at 18 agent turns per pass |
| Cost kill-switch | Configurable per-run USD ceiling (default $5.00) β aborts before overspending |
| Token leak prevention | GitHub PAT is used for clone + push but never written to .git/config |
| Patch uniqueness | apply_patch requires the target string to be unique in the file β no accidental multi-site rewrites |
--dry-run mode |
Full pipeline simulation without committing, pushing, or opening anything |
Every run prints a token breakdown and a USD estimate. The audit trail records the same numbers in the PR body.
Typical cost per issue: $0.20 β $1.50, depending on repo size and whether the self-review triggers a revision pass.
Token usage (engineering + revision)
input: 12,403 tok @ $3.00 / MTok
output: 1,847 tok @ $15.00 / MTok
cache read: 8,912 tok @ $0.30 / MTok
βββββββββββββββββββββββββββββββββββββ
estimated cost: $0.0676
Rates are read from
src/config.js(COST_INPUT_PER_MTOK,COST_OUTPUT_PER_MTOK,COST_CACHE_READ_PER_MTOK). Update them there if Anthropic's pricing changes.
github-agent/
βββ src/
β βββ pipeline.js β CLI entry + subcommands
β βββ orchestrator.js β engineering β review β revision β PR + project discovery
β βββ config.js β model, limits, cost rates
β βββ agents/
β β βββ engineeringAgent.js β issue β autonomous fix
β β βββ reviewCopilot.js β diff β structured audit
β β βββ agentLoop.js β multi-turn tool-use loop, retries, cost ceiling
β β βββ tools.js β tool schemas + sandboxed handlers
β βββ prompts/
β β βββ engineering.js β agentic system prompt, monorepo/lint/contrib hints
β β βββ review.js β review system prompt + verdict format
β βββ mapper/
β β βββ repoMap.js β big-project file walker, ignore-dirs, truncation
β β βββ fileRelevance.js β keyword scorer β starting-file prefilter
β βββ utils/
β β βββ cost.js β pricing math (input/output/cache)
β β βββ githubUrl.js β parse owner/repo/number from URLs
β βββ cli/
β β βββ output.js β pretty terminal + cost summary
β βββ web/
β βββ server.js β Express SSE dashboard
β βββ public/index.html β live agent feed
βββ tests/ β 127 tests across 9 suites
βββ .github/workflows/test.yml β CI matrix: Linux/macOS/Windows Γ Node 18/20/22
npm test127 tests across 9 suites covering path traversal, shell-injection guards, patch fallback strategies, repo walker truncation, big-project ignore-dirs, orchestrator verdict parsing, monorepo detection, CONTRIBUTING/DCO reading, cost math (including cache creation), audit trail structure, PR body + template honoring, and a mocked-SDK end-to-end run with retry semantics.
CI runs the full suite on Linux / macOS / Windows Γ Node 18 / 20 / 22 for every push and pull request. See CONTRIBUTING.md for the contributor workflow and TESTING.md for live, end-to-end feature testing recipes.
- Docker/devcontainer executor β so
pytestworks on Qiskit-class repos that need BLAS / CUDA / compiled extensions - Embedding-based relevance β drop-in replacement for the keyword prefilter on very abstract issues
- Parallel triage β one dashboard pane per issue when batching
- LangSmith / Helicone telemetry export
- Pluggable language adapters β
rustfmt+cargo,gofmt+go vet, etc.
See CONTRIBUTING.md. Short version: one behaviour change per PR, add a test with every behaviour change, npm test must be green on Node 18/20/22.
MIT β use it, fork it, ship it.