Skip to content

Hadar01/github-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


πŸ€– github-agent

An AI that ships pull requests β€” and reviews its own work before opening them.

Quick Start β€’ Why github-agent β€’ Big Projects β€’ Architecture β€’ Safety β€’ Roadmap

Claude Sonnet 4.6 127 tests passing Node 18+ MIT License CI matrix


github-agent is an autonomous engineering pipeline built on Claude. Give it a GitHub issue URL; it clones the repo, edits the code, runs the tests, has a second AI instance review the diff, refuses to ship a PR that fails its own review, and opens a pull request β€” all in one command.

node src/pipeline.js issue https://github.com/your/repo/issues/42

✨ See it in action

$ node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment

<h3 align="center">An AI that ships pull requests β€” and reviews its own work before opening them.</h3>

<p align="center">
  <a href="#-quick-start">Quick Start</a> β€’
  <a href="#-what-makes-this-different">Why github-agent</a> β€’
  <a href="#️-architecture">Architecture</a> β€’
  <a href="#-safety-guardrails">Safety</a> β€’
  <a href="#-roadmap">Roadmap</a>
</p>

<p align="center">
  <img src="https://img.shields.io/badge/model-Claude%20Sonnet-blueviolet?style=flat-square&logo=anthropic" alt="Claude Sonnet">
  <img src="https://img.shields.io/badge/license-MIT-green?style=flat-square" alt="MIT License">
  <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen?style=flat-square&logo=node.js" alt="Node 18+">
</p>

---

`github-agent` is an **autonomous engineering pipeline** built on Claude. Give it a GitHub issue URL; it clones the repo, edits the code, runs the tests, has a **second AI instance review the diff**, and opens a pull request β€” all in one command.

```bash
node src/pipeline.js issue https://github.com/your/repo/issues/42

✨ See it in action

$ node src/pipeline.js issue https://github.com/your/repo/issues/42

   ╔════════════════════════════════════════════╗
   β•‘   github-agent β€” autonomous PR engineer    β•‘
   β•‘   engineering β†’ self-review β†’ ship         β•‘
   β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

β–Έ Issue qiskit/qiskit#9421
  title: Transpiler drops global phase on conditional gates
  default branch: main

β–Έ Cloning + branching
  βœ“ branch: fix/issue-9421
  test command: tox
  lint commands: ruff check ., black --check ., mypy .
  monorepo sub-packages: terra, aer, ibmq
  guessed sub-package for issue: terra
  CONTRIBUTING.md found at CONTRIBUTING.md
  Project requires DCO Signed-off-by β€” will auto-sign commits.
  20 file(s) prefiltered as likely relevant
  pre-fix HEAD: 3f4a1b2

β–Έ Engineering agent β€” autonomous fix loop
  πŸ’­ [turn 1] Scoring the shortlist β€” transpiler/passes/optimization looks like the hit.
  πŸ”§ find_relevant_files(query="transpiler global phase conditional gates")
  πŸ”§ read_file(qiskit/transpiler/passes/optimization/consolidate_blocks.py)
  πŸ’­ [turn 2] Found it β€” line 142 drops .global_phase on IfElseOp. Patching.
  πŸ”§ apply_patch(qiskit/transpiler/passes/optimization/consolidate_blocks.py, ...)
  πŸ”§ run_tests(tox)      β†’ PASS
  πŸ”§ run_lint(ruff check .)   β†’ PASS
  πŸ”§ run_lint(mypy .)         β†’ PASS
  πŸ”§ finish({"pr_summary":"Preserve global_phase through IfElseOp consolidation..."})
  βœ“ Agent finished after 6 turn(s)
  πŸ’­ [turn 1] Let me start by exploring the auth module.
  πŸ”§ list_files(src/auth)
  πŸ”§ read_file(src/auth/login.js)
  πŸ’­ [turn 2] The issue is at line 47 β€” email isn't lowercased before lookup.
  πŸ”§ apply_patch(src/auth/login.js, ...)
  πŸ”§ run_tests(npm test)
     β†’ ok
  πŸ”§ finish({"pr_summary":"Lowercase email before lookup..."})
  βœ“ Agent finished after 4 turn(s)

β–Έ Self-review β€” auditing the diff
  βœ“ Review verdict: APPROVE

Token usage (engineering + revision)
  input: 18,204 tok Β· output: 2,131 tok Β· cache_read: 14,067 tok
  cost: $0.4912

β–Έ Committing + pushing
  βœ“ added DCO Signed-off-by trailer
  βœ“ pushed fix/issue-9421 to Hadar01/qiskit

β–Έ Opening pull request
  βœ“ PR opened: https://github.com/qiskit/qiskit/pull/11504
  βœ“ commented on issue: https://github.com/qiskit/qiskit/issues/9421#issuecomment-...

πŸ† What makes this different

Most AI coding tools generate code and hand it to a human. github-agent ships it β€” and audits itself first, refuses to ship bad work, and handles OSS repos you don't own. Most AI coding tools generate code and hand it to a human. github-agent ships it β€” and audits itself first.

Copilot / Cursor Devin / SWE-agent github-agent
Generates code βœ… βœ… βœ…
Runs tests autonomously ❌ βœ… βœ…
Runs project linters autonomously ❌ partial βœ…
Opens the PR for you ❌ βœ… βœ…
Reviews its own diff before shipping ❌ ❌ βœ…
Refuses to ship on bad self-review ❌ ❌ βœ…
Revises based on its own review ❌ ❌ βœ…
Knows when to give up ❌ ❌ βœ…
Works on repos you don't own (fork + PR) ❌ ❌ βœ…
Human-readable audit trail in PR body ❌ partial βœ…
Cost estimate + kill switch per run ❌ ❌ βœ…
Full audit trail in the PR body ❌ partial βœ…
Cost estimate per run ❌ ❌ βœ…

The self-review loop β€” the killer feature

A second Claude instance, with a completely fresh context and a different system prompt, audits the diff for:

  • πŸ› Bug risk β€” logic errors, off-by-ones, null dereferences, drift from the original issue intent
  • πŸ› Bug risk β€” logic errors, off-by-ones, null dereferences
  • πŸ”² Edge cases β€” inputs the engineering agent didn't consider
  • πŸ§ͺ Test coverage β€” is the change actually tested?
  • 🎯 Scope creep β€” did the agent touch things it shouldn't?

Verdict is one of APPROVE / REQUEST_CHANGES / NEEDS_DISCUSSION. On REQUEST_CHANGES the engineering agent does a revision pass with the review as input. On anything that isn't APPROVE, the pipeline refuses to open the PR β€” you have to pass --force-pr to override. No silent bad PRs.


πŸ”¬ Built for big open-source projects

Working on a 50-file toy repo is easy. Working on Qiskit, Cirq, or TQEC is not. github-agent has specific affordances for large scientific-Python-class codebases:

Problem on a Qiskit-scale repo What github-agent does
Thousands of files β€” context blows up Keyword relevance prefilter scores every file against issue text; top-20 injected as starting hint. No embeddings API needed.
Narrow language support misses .pyx/.pxd/.pyi/.rst/config Walks all of them, plus Makefile, tox.ini, noxfile.py, CONTRIBUTING.md, PR templates.
Monorepos with sub-packages (qiskit-terra, qiskit-aer, …) Auto-detects sub-packages, guesses from issue text which one the change belongs to, tells the agent.
Test command isn't bare pytest β€” it's tox, nox, make test Priority-ordered detection: Makefile test: target β†’ make test. tox.ini β†’ tox. noxfile.py β†’ nox. Then Python/Node/Rust.
CI gates on ruff, black, mypy β€” not just tests Lint gate: auto-detects configured linters and the agent must pass them all before finish().
Deeply-indented Python makes apply_patch brittle Whitespace-normalized fallback + apply_patch_range (replace by line numbers) when strings won't disambiguate.
DCO sign-off / PR templates / CONTRIBUTING.md rules All read and honored. Signed-off-by: trailer appended automatically. PR template preserved at top of PR body.
Scientific deps fail to install (BLAS/CUDA/compiled extensions) run_tests detects ModuleNotFoundError/ImportError and flags env_error:true. The agent gives up gracefully instead of thrashing.
Complex issues need human judgment The agent can call give_up({reason, explanation, blockers}). With --comment it posts the reason on the issue so a human picks up with full context.
Duplicate runs open duplicate PRs Duplicate-PR guard β€” scans open PRs for Resolves/Fixes/Closes #N or matching fix/issue-N branch before cloning.

πŸ›‘ Honest limitation: we don't provision test environments. If a repo needs GPU / BLAS / conda, you'll want to run the agent inside a pre-warmed Docker image. That executor is on the roadmap.


🀝 Contributing to repos you don't own

You can run github-agent on any public open-source project, even without write access. A public_repo-scoped PAT is enough.

# Fork-and-PR: pushes to your own fork, opens PR upstream, links back to the issue.
node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment

# Review a PR in a project you're not a maintainer of.
# --post submits the review as a PR comment (falls back to issue comment if permissions block).
node src/pipeline.js review https://github.com/qiskit/qiskit/pull/11504 --post

# Triage multiple issues in one shot.
node src/pipeline.js triage https://github.com/qiskit/qiskit --label=bug --max=5 --fork --comment

The review subcommand exits non-zero on REQUEST_CHANGES so you can wire it straight into CI as a pre-merge gate. If the verdict is REQUEST_CHANGES, the engineering agent does a revision pass with the review feedback as input. The full report ships in the PR body β€” no black box, no guessing what the AI did.


πŸš€ Quick start

Prerequisites

Installation

git clone https://github.com/Hadar01/github-agents.git
cd github-agents
npm install
cp .env.example .env
# edit .env:
#   ANTHROPIC_API_KEY=sk-ant-...
#   GITHUB_TOKEN=ghp_...

Your first run

# Dry run first β€” full pipeline, no commits/push/PR
node src/pipeline.js issue https://github.com/your/repo/issues/42 --dry-run

# Ship it for real
node src/pipeline.js issue https://github.com/your/repo/issues/42

# Review an existing PR (no editing β€” just the audit)
git clone <your-fork-url>
cd github-agent
npm install

Create a .env file in the repo root:

ANTHROPIC_API_KEY=sk-ant-...
GITHUB_TOKEN=ghp_...

Usage

# Fix an issue and open a PR (the main event)
node src/pipeline.js issue https://github.com/your/repo/issues/42

# Dry run β€” full pipeline, no commits/push/PR
node src/pipeline.js issue https://github.com/your/repo/issues/42 --dry-run

# Audit an existing PR (no editing β€” just the review report)
node src/pipeline.js review https://github.com/your/repo/pull/123

Or use the npm shorthand scripts:

npm run issue -- https://github.com/your/repo/issues/42
npm run review -- https://github.com/your/repo/pull/123

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GitHub Issue   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Engineering Agent  (Claude + tool use)             β”‚
β”‚                                                     β”‚
β”‚  Tools:  read_file   list_files   write_file        β”‚
β”‚          apply_patch  run_tests   git_diff          β”‚
β”‚          git_status   finish                        β”‚
β”‚                                                     β”‚
β”‚  Loop:   explore β†’ patch β†’ test β†’ repeat            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚  diff
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Self-Review  (Claude, fresh context)               β”‚
β”‚                                                     β”‚
β”‚  Audits:  bug risk Β· edge cases                     β”‚
β”‚           test coverage Β· scope creep               β”‚
β”‚                                                     β”‚
β”‚  Verdict:  APPROVE / REQUEST_CHANGES / DISCUSS      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚ APPROVE                   β”‚ REQUEST_CHANGES
       β”‚                           β–Ό
       β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚               β”‚  Revision Pass        β”‚
       β”‚               β”‚  (engineering agent   β”‚
       β”‚               β”‚   + review feedback)  β”‚
       β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                          β”‚
       β–Ό                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Commit β†’ Push β†’ Open PR           β”‚
β”‚  (PR body includes review report)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ›‘οΈ Safety guardrails

The agent has real write access to files on disk. We've put real fences around it:

Guardrail Detail
Path traversal blocked read_file, write_file, apply_patch reject any path escaping the repo root
Test command allowlist run_tests only accepts npm test, pytest, go test, cargo test, etc. β€” no arbitrary shell
Iteration cap Hard stop at 18 agent turns per pass
Cost kill-switch Configurable per-run USD ceiling (default $5.00) β€” aborts before overspending
Token leak prevention GitHub PAT is used for clone + push but never written to .git/config
Patch uniqueness apply_patch requires the target string to be unique in the file β€” no accidental multi-site rewrites
--dry-run mode Full pipeline simulation without committing, pushing, or opening anything

πŸ’° Cost transparency

Every run prints a token breakdown and a USD estimate. The audit trail records the same numbers in the PR body.

Typical cost per issue: $0.20 – $1.50, depending on repo size and whether the self-review triggers a revision pass.

Token usage (engineering + revision)
  input:      12,403 tok   @  $3.00 / MTok
  output:      1,847 tok   @ $15.00 / MTok
  cache read:  8,912 tok   @  $0.30 / MTok
  ─────────────────────────────────────
  estimated cost:  $0.0676

Rates are read from src/config.js (COST_INPUT_PER_MTOK, COST_OUTPUT_PER_MTOK, COST_CACHE_READ_PER_MTOK). Update them there if Anthropic's pricing changes.


πŸ“ Project structure

github-agent/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ pipeline.js              ← CLI entry + subcommands
β”‚   β”œβ”€β”€ orchestrator.js          ← engineering β†’ review β†’ revision β†’ PR + project discovery
β”‚   β”œβ”€β”€ config.js                ← model, limits, cost rates
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ engineeringAgent.js  ← issue β†’ autonomous fix
β”‚   β”‚   β”œβ”€β”€ reviewCopilot.js     ← diff β†’ structured audit
β”‚   β”‚   β”œβ”€β”€ agentLoop.js         ← multi-turn tool-use loop, retries, cost ceiling
β”‚   β”‚   └── tools.js             ← tool schemas + sandboxed handlers
β”‚   β”œβ”€β”€ prompts/
β”‚   β”‚   β”œβ”€β”€ engineering.js       ← agentic system prompt, monorepo/lint/contrib hints
β”‚   β”‚   └── review.js            ← review system prompt + verdict format
β”‚   β”œβ”€β”€ mapper/
β”‚   β”‚   β”œβ”€β”€ repoMap.js           ← big-project file walker, ignore-dirs, truncation
β”‚   β”‚   └── fileRelevance.js     ← keyword scorer β€” starting-file prefilter
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ cost.js              ← pricing math (input/output/cache)
β”‚   β”‚   └── githubUrl.js         ← parse owner/repo/number from URLs
β”‚   β”œβ”€β”€ cli/
β”‚   β”‚   └── output.js            ← pretty terminal + cost summary
β”‚   └── web/
β”‚       β”œβ”€β”€ server.js            ← Express SSE dashboard
β”‚       └── public/index.html    ← live agent feed
β”œβ”€β”€ tests/                       ← 127 tests across 9 suites
└── .github/workflows/test.yml   ← CI matrix: Linux/macOS/Windows Γ— Node 18/20/22

πŸ§ͺ Tests

npm test

127 tests across 9 suites covering path traversal, shell-injection guards, patch fallback strategies, repo walker truncation, big-project ignore-dirs, orchestrator verdict parsing, monorepo detection, CONTRIBUTING/DCO reading, cost math (including cache creation), audit trail structure, PR body + template honoring, and a mocked-SDK end-to-end run with retry semantics.

CI runs the full suite on Linux / macOS / Windows Γ— Node 18 / 20 / 22 for every push and pull request. See CONTRIBUTING.md for the contributor workflow and TESTING.md for live, end-to-end feature testing recipes.


πŸ—ΊοΈ Roadmap

  • Docker/devcontainer executor β€” so pytest works on Qiskit-class repos that need BLAS / CUDA / compiled extensions
  • Embedding-based relevance β€” drop-in replacement for the keyword prefilter on very abstract issues
  • Parallel triage β€” one dashboard pane per issue when batching
  • LangSmith / Helicone telemetry export
  • Pluggable language adapters β€” rustfmt+cargo, gofmt+go vet, etc.

🀝 Contributing

See CONTRIBUTING.md. Short version: one behaviour change per PR, add a test with every behaviour change, npm test must be green on Node 18/20/22.


πŸ“„ License

MIT β€” use it, fork it, ship it.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors