/bug-hunter

Adversarial AI bug detection and auto-fix for coding agents
Multi-agent pipeline that finds security vulnerabilities, logic errors, and runtime bugs — then fixes them autonomously on a safe branch.

Why Bug Hunter?

LLMs are sycophantic code reviewers. Ask one to find bugs and it over-reports. Ask it to verify those bugs and it agrees with itself. The result: noise, false positives, wasted time.

Bug Hunter solves this by pitting multiple AI agents against each other in isolated contexts. Each agent has competing incentives — adversarial tension that produces high-fidelity bug reports with minimal false positives.

Unlike traditional static analysis tools, Bug Hunter understands runtime behavior, cross-file dependencies, and framework-specific patterns. It catches the bugs that linters miss.

How Adversarial Bug Detection Works

Phase 1 — Find and Verify Bugs

                    +-- Hunter-A (Security lens) --+       +-- Skeptic-A (cluster 1) --+
Recon (map) ------->|                              |-- merge ->|                          |-- merge --> Referee
                    +-- Hunter-B (Logic lens)    --+       +-- Skeptic-B (cluster 2) --+

Step	Agent	What it does	Scoring incentive
1	Recon	Maps codebase architecture, identifies trust boundaries, computes context budget	Accurate risk map = better Hunter coverage
2	Hunters	Dual-lens scan (security + logic) with mandatory security checklist per file	+1/+5/+10 per real bug. -3 per false positive
3	Skeptics	Adversarially challenge every finding, verify claims against real docs via Context7	+points for disproving false positives. -2x penalty for wrongly dismissing real bugs
4	Referee	Reads code independently, spot-checks evidence quotes, makes final verdicts	Symmetric +1/-1 scoring. Ground truth framing

Every agent runs in completely isolated context — they cannot see each other's reasoning, only structured findings. This prevents anchoring bias and forces independent verification.

Phase 2 — Auto-Fix and Verify

                  +-- Fixer-A (worktree 1) --+
Git branch ------>|                          |-- merge --> Test diff --> Report
                  +-- Fixer-B (worktree 2) --+

Step	Agent	What it does
5	Fixers	Apply minimal surgical fixes in isolated git worktrees, one checkpoint commit per bug
6	Verify	Run test suite, diff against baseline, auto-revert any fix that introduces regressions
7	Re-scan	Lightweight Hunter scans only changed lines to catch fixer-introduced bugs

Each fix is an individual commit that can be reverted independently. Failed fixes are auto-reverted — the codebase stays clean.

Supported Coding Agents and Editors

Coding Agents (CLI)

Agent	Status
Claude Code	Full support
OpenAI Codex CLI	Full support
GitHub Copilot CLI	Full support
Kiro CLI (AWS)	Full support
Pi Coding Agent	Full support
Opencode	Full support
Gemini CLI	Full support
Amp	Full support

Editors and IDEs

Editor	Status
Cursor	Full support
VS Code / Windsurf	Full support
JetBrains (IntelliJ, PyCharm, WebStorm)	Full support
Antigravity (Google)	Full support
Neovim / Vim	Full support via terminal

Terminals

Works in any terminal — iTerm2, Ghostty, Warp, Alacritty, Kitty, Hyper, Windows Terminal.

If your coding agent or editor supports skills, Bug Hunter works out of the box.

Installation

git clone https://github.com/codexstar69/bug-hunter.git ~/.claude/skills/bug-hunter

Coding agents auto-discover skills in ~/.claude/skills/.

Set Up Context7 (Recommended)

Bug Hunter verifies claims about library and framework behavior against real documentation using the Context7 API. This significantly reduces false positives from hallucinated framework assumptions.

Get a free API key from context7.com
Add to your shell profile (.zshrc, .bashrc, etc.):

export CONTEXT7_API_KEY="your-api-key-here"

Restart your terminal

On first run, Bug Hunter checks for the key and runs a smoke test. If missing, it prompts you to set it up.

Usage

/bug-hunter                              # Scan entire project
/bug-hunter src/                         # Scan specific directory
/bug-hunter lib/auth.ts                  # Scan specific file
/bug-hunter -b feature-xyz              # Scan files changed in feature-xyz vs main
/bug-hunter -b feature-xyz --base dev   # Scan files changed in feature-xyz vs dev
/bug-hunter --staged                    # Scan staged files (pre-commit check)
/bug-hunter --fix src/                   # Find bugs AND auto-fix them
/bug-hunter --fix -b feature-xyz        # Find + fix on branch diff
/bug-hunter --fix --approve src/        # Find + fix, but approve each fix manually
/bug-hunter --loop src/                  # Loop mode: audit until 100% coverage
/bug-hunter --loop --fix src/            # Loop mode: find + fix until clean

Auto-Scaling Modes

The pipeline auto-selects the right mode based on codebase size. Recon dynamically computes the context budget per agent based on average file sizes.

Mode	Source files	Agents launched
Single-file	1	1 Hunter + 1 Skeptic + 1 Referee
Small	2-10	1 Hunter + 1 Skeptic + 1 Referee
Parallel	11-40	Recon + 2 Hunters + 2 Skeptics + Referee
Extended	41-80	Recon + 4 Hunters + 2 Skeptics + Referee
Scaled	81-120	Recon + 6 Hunters + 3 Skeptics + Referee
Loop	120+	Iterates in batches until full coverage achieved

What Bugs Does It Catch?

Bug Hunter scans for behavioral bugs — issues that cause incorrect behavior at runtime:

Security vulnerabilities — SQL injection, authentication bypass, SSRF, path traversal, hardcoded secrets, JWT without expiry
Logic errors — off-by-one, wrong comparisons, inverted conditions, broken pagination
Error handling gaps — silent error swallowing, missing null checks, unhandled promise rejections
Type safety issues — type coercion traps across boundaries, non-string inputs to string-only APIs
Race conditions — async I/O interleaving, shared mutable state without coordination
API contract violations — wrong status codes, missing required fields, broken callers
Data integrity bugs — truncation, encoding issues, timezone bugs, integer overflow
Cross-file bugs — assumption mismatches across module boundaries, auth gaps in call chains

What It Skips (By Design)

Style, formatting, naming conventions, unused imports, missing types, TODO comments, test coverage gaps, dependency versions. Those are linter and type-checker responsibilities.

Scoring System

The scoring incentives are load-bearing — they exploit each agent's desire to maximize its score:

Agent	Scoring	Effect
Hunter	+1/+5/+10 per real Low/Medium/Critical bug. -3 per false positive	Motivates thoroughness but penalizes sloppiness
Skeptic	+points for valid disproves. -2x points for wrongly dismissing real bugs	Creates calibrated caution — only disproves when >67% confident
Referee	Symmetric +1/-1 with ground truth framing	Precise rather than biased toward either side

Five real bugs beat twenty false positives. Quality over quantity.

Autonomous Fix Pipeline

By default, the fix pipeline is fully autonomous — no human intervention needed. Bugs are found, fixed, tested, and verified end-to-end.

All fixes happen on a separate branch. Your working branch is never touched. Review the diff, then merge when you're ready.

Mode	Behavior
`--fix` (default)	Fully autonomous — creates branch, applies fixes, runs tests, auto-reverts failures
`--fix --approve`	Pauses before each fix for manual approval

Git Safety and Branch Protection

Dedicated fix branch — creates bug-hunter-fix-<timestamp> from your current branch. Your code stays untouched until you merge.
Stashes uncommitted work — any dirty working tree is stashed before fixes begin, restored after
Test baseline — captures pre-fix test results for accurate regression diffing
Checkpoint commits — each bug fix is a separate fix(bug-hunter): BUG-N commit
Auto-revert on regression — if a fix causes new test failures, it is automatically reverted via git revert
Post-fix re-scan — a lightweight Hunter scans only changed lines to catch fixer-introduced bugs
Individual revertability — any single fix can be surgically reverted without affecting others
Test hook auto-detection — auto-detects test runner, typecheck, and build commands from your project config

Supported Languages

Bug Hunter works with any language your coding agent can read. It has been tested extensively with:

TypeScript / JavaScript (Node.js, React, Next.js, Express)
Python (Django, Flask, FastAPI)
Go
Rust
Java / Kotlin
Ruby
PHP

Project Structure

bug-hunter/
  SKILL.md              # Core dispatcher (argument parsing, mode routing, report)
  prompts/              # Agent prompt files
    recon.md            # Architecture mapper
    hunter.md           # Bug finder (dual-lens: security + logic)
    skeptic.md          # Adversarial challenger
    referee.md          # Final arbiter
    fixer.md            # Surgical code fixer
    doc-lookup.md       # Context7 doc verification reference
  modes/                # Execution mode files (loaded on demand)
    single-file.md      # 1 file
    small.md            # 2-10 files
    parallel.md         # 11-40 files
    extended.md         # 41-80 files
    scaled.md           # 81-120 files
    loop.md             # Coverage tracking across iterations
    fix-pipeline.md     # Phase 2: fix + verify
    fix-loop.md         # Combined find + fix loop
  scripts/
    context7-api.cjs    # Context7 doc lookup CLI
    init-test-fixture.sh # Initialize test fixture git repo
  test-fixture/         # Self-test app with planted bugs
  assets/               # Images and diagrams

Self-Test

Bug Hunter ships with a test fixture — a small Express app with 6 intentionally planted bugs (2 Critical, 2 Medium, 2 Low). Run it to validate the pipeline:

/bug-hunter test-fixture/

Expected results:

Recon classifies 3 files as CRITICAL, 1 as HIGH
Hunters find all 6 bugs
Skeptic challenges at least 1 false positive
Referee confirms all planted bugs

FAQ

How is this different from a linter or static analysis tool?

Linters check syntax and style. Static analysis tools check type safety and simple patterns. Bug Hunter finds runtime behavioral bugs — logic errors, security vulnerabilities, race conditions, and cross-file assumption mismatches that no linter can detect. It understands what your code does, not just how it looks.

Does it modify my code directly?

No. All fixes are applied on a dedicated branch (bug-hunter-fix-<timestamp>). Your working branch is never modified. You review the diff and merge when ready.

What if a fix breaks something?

Each fix is a separate checkpoint commit. If a fix causes new test failures, it is automatically reverted — no manual cleanup needed. The codebase stays clean.

How do I review fixes before they're applied?

Use --approve mode: /bug-hunter --fix --approve src/. The system pauses before each fix and waits for your approval.

What languages does it support?

Any language your coding agent can read. Tested with TypeScript, JavaScript, Python, Go, Rust, Java, Kotlin, Ruby, and PHP.

How does it reduce false positives?

Three mechanisms: (1) adversarial Skeptic agents that challenge every finding, (2) Context7 doc verification against real library documentation, and (3) an independent Referee that reads code from scratch before making final verdicts. Agents are scored with asymmetric penalties that make false positives expensive.

Update

cd ~/.claude/skills/bug-hunter && git pull

Uninstall

rm -rf ~/.claude/skills/bug-hunter

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.claude		.claude
assets		assets
evals		evals
modes		modes
prompts		prompts
scripts		scripts
test-fixture		test-fixture
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

/bug-hunter

Why Bug Hunter?

How Adversarial Bug Detection Works

Phase 1 — Find and Verify Bugs

Phase 2 — Auto-Fix and Verify

Supported Coding Agents and Editors

Coding Agents (CLI)

Editors and IDEs

Terminals

Installation

Set Up Context7 (Recommended)

Usage

Auto-Scaling Modes

What Bugs Does It Catch?

What It Skips (By Design)

Scoring System

Autonomous Fix Pipeline

Git Safety and Branch Protection

Supported Languages

Project Structure

Self-Test

FAQ

How is this different from a linter or static analysis tool?

Does it modify my code directly?

What if a fix breaks something?

How do I review fixes before they're applied?

What languages does it support?

How does it reduce false positives?

Update

Uninstall

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages