Evidence, not alerts. Stalk. Wait. Strike.
🌐 mantishack.com · Quickstart · Docs
Autonomous offensive-security platform that runs on any AI model, any agent harness. A 7-phase finite state machine, wrapped in the Mantis DISCOVER → REASON → TEST → LEARN methodology, orchestrates specialist agents that do recon, hunt in parallel waves, verify findings across three skeptical rounds, grade them on a 5-axis rubric, and produce submission-ready reports. End to end from a single command.
Works with any model · Claude (Opus / Sonnet / Haiku) · GPT-5 / o3 · Gemini 2.5 Pro / Flash · DeepSeek-V3 · Llama 3.3 · Qwen 3 · anything on OpenRouter · local Ollama
Drop into any harness: Claude Code · OpenCode · Aider · Cline · Cursor / Continue / Goose / raw MCP
⭐ Built by a strong community of hackers. Help support us by giving us a star before you try it.
git clone https://github.com/deonmenezes/bountyhunter.git mantis
cd mantis
./install.sh /path/to/projectThe installer asks you to pick a harness: [c] Claude Code · [o] OpenCode (any model) · [a] Both. Pick o or a if you want to use anything other than Claude. Then drop the matching API key and go:
# Claude Code (Anthropic)
claude → /mantis target.com
# OpenCode (any provider)
export OPENAI_API_KEY=... # or ANTHROPIC_API_KEY / GOOGLE_API_KEY / OPENROUTER_API_KEY
opencode → @mantis-orchestrator target.comTo switch models per agent role, edit opencode.json: every agent has its own model: line. See adapters/MODELS.md for the per-role cross-provider matrix.
- What it is
- The Mantis methodology
- Use with any AI / any harness
- Why this architecture
- System architecture
- The 7-phase FSM
- Specialist agent catalog
- MCP control plane
- Safety rails
- Fleet Intelligence
- Directory layout
- Data model
- Install & usage
- Tech stack
- Quality assessment
A multi-agent orchestration layer that turns Claude Code into a semi-autonomous offensive-security operator. You give it one command:
/mantis target.com
…and it runs the full pipeline: recon → auth → parallel hunting waves → chain-building → three-round verification → grading → report writing. State is persisted to disk between phases, so long runs can be paused and resumed.
It is not:
- A scanner (nuclei/subfinder are just inputs).
- A single monolithic prompt.
- A black box; every phase writes structured JSON artifacts you can inspect.
It is:
- A finite state machine that knows which phase it's in and what's next.
- A set of ~10 specialist agents (Web Application, API, Network, Identity Provider, plus chain-builder, verifiers, grader, reporter, disclosure-sender), each with a narrow role and tool whitelist.
- A local MCP server that acts as the control plane: findings, verifications, grades, and hand-offs all flow through typed tool calls instead of free-form prose.
- A self-defending host: agents treat target responses as untrusted instruction streams and refuse to execute injected payloads (see
.claude/rules/hunting.md, §"Self-defense").
Mantis wraps the underlying 7-phase FSM in a four-phase methodology mirrored on mantiscore.ai. Each phase is the visual idiom of a real praying mantis hunting:
![]() DISCOVER stalk · fingerprint · map |
![]() REASON plan · chain · pick the bypass |
![]() TEST (STRIKE) parallel waves · 3-round verify |
![]() LEARN grade · report · remember |
| Mantis phase | Maps to FSM | What it produces |
|---|---|---|
| DISCOVER | RECON + AUTH |
Attack surface, fingerprints, auth profiles |
| REASON | HUNT (planning), CHAIN |
Per-surface bypass selection, kill-chain hypotheses |
| TEST (the strike) | HUNT (execution), VERIFY (×3) |
Multi-Step Evidence: three adversarial verification rounds |
| LEARN | GRADE, REPORT, EXPLORE, fleet write-back |
Verdicts, submission-ready reports, Fleet Intelligence updates |
The contract is Evidence, not alerts: a finding only ships if every verification round produced a fresh, reproducible PoC. See docs/MANTIS_METHODOLOGY.md for the full mapping.
The MCP server is harness-agnostic. The agent prompts are plain markdown. The bypass tables are plain text. You can drive Mantis from whatever agent runner you prefer, with whatever model you have an API key for.
git clone https://github.com/deonmenezes/bountyhunter.git mantis
cd mantis
./install.sh /path/to/project # default: Claude Code
./install.sh /path/to/project --harness=opencode # OpenCode (any model)
./install.sh /path/to/project --harness=all # both, side-by-side| Harness | Invocation | Models | Notes |
|---|---|---|---|
| Claude Code | /mantis target.com |
Anthropic | Deepest integration; parallel hunter waves via run_in_background, PreToolUse safety hooks |
| OpenCode | @mantis-orchestrator target.com |
any (Anthropic, OpenAI, Google, OpenRouter, local Ollama, …) | One config (opencode.json), 12 named agents, per-agent model overrides |
| Aider / Cline | chat-driven, single-model | any | Manual FSM driving; no parallel waves, but the MCP tools all work |
| raw MCP (Cursor / Continue / Goose / custom) | depends on client | any | Point any MCP client at mcp/server.js. Documented wire format. |
See adapters/MODELS.md for per-agent model picks across Anthropic / OpenAI / Google / open-weight providers. The verifier rounds and chain-builder are the only roles you should keep at top-tier.
The core problem with LLM-based hunting is hallucination and drift. A single long-running agent will happily invent findings, inflate severity, and forget what it already tested. This framework solves that with three decisions:
| Problem | Decision |
|---|---|
| Agents invent findings | Three adversarial verification rounds re-run PoCs with fresh HTTP requests |
| Agents inflate severity | A separate grader with a 5-axis rubric issues SUBMIT/HOLD/SKIP |
| Agents forget state | MCP server is the single source of truth, JSON files, not prose |
| Agents lose focus | Each agent is spawned per-task with a narrow tool whitelist and role prompt |
| Waves step on each other | Per-wave assignment files (wave-N-assignments.json) dedupe surfaces |
┌─────────────────────────────────────────────────────────────────────┐
│ Claude Code (host) │
│ │
│ /mantis target.com │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Orchestrator │◄──►│ FSM state.json │ │
│ │ (slash command) │ │ phase / wave │ │
│ └────────┬─────────┘ └──────────────────┘ │
│ │ spawns │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Specialized agents (parallel) │ │
│ │ │ │
│ │ recon │ hunter × N │ chain │ verifier × 3 │ ... │ │
│ └──────────────────┬───────────────────────────────┘ │
│ │ calls tools │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ MCP server (stdio, Node.js, zero deps) │ │
│ │ │ │
│ │ mantis_http_scan mantis_record_finding │ │
│ │ mantis_*_handoff mantis_write_verification │ │
│ │ mantis_write_grade mantis_transition_phase ... │ │
│ └──────────────────┬───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Session directory (on disk, per-target) │ │
│ │ ~/mantis-sessions/target.com/ │ │
│ │ state.json │ findings.jsonl │ handoff-wN-aN.* │ │
│ │ brutalist.json │ balanced.json │ verified.json │ │
│ │ grade.json │ report.md │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ PreToolUse hooks ─── scope-guard.sh (blocks out-of-scope Bash) │
│ StatusLine ─── mantis-statusline.js (phase/wave/findings) │
└─────────────────────────────────────────────────────────────────────┘
- Control plane = structured JSON artifacts written by the MCP server. The orchestrator and every downstream agent read these. This is the only source of truth for what's been found, verified, and graded.
- Data plane = markdown mirrors (
findings.md,brutalist.md, etc.) written best-effort for humans to eyeball. Prompts and code never parse them.
This split is deliberate, it's the main thing keeping the system deterministic enough to resume across sessions.
RECON ──► AUTH ──► HUNT ──► CHAIN ──► VERIFY ──► GRADE ──► REPORT
│ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼
recon optional parallel A→B 3 rounds 5-axis write
tools creds waves chains (B/B/F) score submit-
ready
| Phase | Input | Agents | Output |
|---|---|---|---|
| RECON | target domain | recon-agent |
attack_surface.json (subdomains, live hosts, archived URLs, nuclei results, JS-extracted endpoints/secrets) |
| AUTH | user-provided cookies / tokens (optional) | orchestrator | auth profile stored in MCP; unauth mode if none |
| HUNT | attack surface | hunter-agent × N in parallel waves |
handoff-wN-aN.json per hunter, merged into findings.jsonl |
| CHAIN | raw findings | chain-builder |
chains.md, A→B exploit chains that elevate severity |
| VERIFY | findings + chains | brutalist-verifier → balanced-verifier → final-verifier |
brutalist.json, balanced.json, verified-final.json |
| GRADE | verified findings | grader |
grade.json, per-finding 5-axis score + SUBMIT/HOLD/SKIP |
| REPORT | graded SUBMITs | report-writer |
report.md, platform-ready, under 600 words, with PoC + CVSS |
- Round 1, brutalist (max skepticism): re-runs every PoC; the default answer is "this isn't real, prove me wrong." Kills hallucinated findings.
- Round 2, balanced: looks for false negatives the brutalist rejected too aggressively. Catches severity under-correction.
- Round 3, final: fresh HTTP requests with fresh context on only the survivors. Last confirmation before grading.
Findings survive only if all three rounds agree. This is slow but it's the reason the submission validity ratio stays high.
Each agent is a markdown file in .claude/agents/ declaring its role prompt, allowed tools, and model preference. They're spawned by the orchestrator with injected context, they don't see the full conversation, only what they need. The hunter agent self-specialises by reading tech_stack from its brief: webapp (OWASP, IDOR, SQLi, XSS), api (GraphQL/REST/gRPC/WebSocket), network (Nmap/CVE, when available), or identity (SAML/OAuth/OIDC/JWT).
| Agent | Mantis role | Tools |
|---|---|---|
recon-agent |
DISCOVER, asset discovery, fingerprinting, archived URLs, nuclei, JS extraction | Bash, Read, Write, Glob, Grep |
hunter-agent |
REASON+TEST, specialist hunter (webapp/api/network/identity per brief) | Bash, Read, Grep, Glob, MCP |
triage-agent |
DISCOVER, Haiku-grade surface scoring (promote/defer/kill) |
Read, Write, Grep, Glob |
chain-builder |
REASON, A→B kill-chain construction | Read, Write, Bash, MCP |
brutalist-verifier |
TEST, Multi-Step Evidence round 1 (maximum skepticism) | Bash, Read, MCP |
balanced-verifier |
TEST, Multi-Step Evidence round 2 (catch false negatives) | Bash, Read, MCP |
final-verifier |
TEST, Multi-Step Evidence round 3 (fresh PoC confirmation) | Bash, MCP |
grader |
LEARN, 5-axis scoring + SUBMIT/HOLD/SKIP, "Evidence, Not Alerts" | MCP |
report-writer |
LEARN, submission-ready report under 600 words | Write, MCP |
patch-writer |
LEARN, suggested code-level fix per finding | Read, Write, MCP |
disclosure-sender |
LEARN, gated email send of report to verified security contact | Read, Write, Bash, Gmail MCP |
A local stdio MCP server (mcp/server.js), zero dependencies, pure Node, exposes typed tools for every state transition. Agents never write session files directly; everything goes through the server.
Tool families:
| Family | Purpose |
|---|---|
mantis_http_scan |
HTTP request + auto-analysis (tech fingerprint, secret detection, endpoint extraction) |
mantis_record_finding / mantis_read_findings / mantis_list_findings |
Finding CRUD, append-only findings.jsonl |
mantis_write_verification_round / mantis_read_verification_round |
Per-round structured verification artifacts |
mantis_write_grade_verdict / mantis_read_grade_verdict |
Grader output |
mantis_write_handoff / mantis_write_wave_handoff / mantis_merge_wave_handoffs / mantis_wave_handoff_status |
Cross-session and cross-wave hand-offs |
mantis_init_session / mantis_transition_phase / mantis_read_session_state |
FSM lifecycle |
mantis_auth_manual / mantis_auth_store |
Auth profile storage |
mantis_temp_email / mantis_signup_detect / mantis_auto_signup |
Disposable-email sign-up for targets that need accounts |
mantis_log_dead_ends |
Negative-result memory so later waves don't repeat dead leads |
Why MCP instead of file writes directly? Three reasons: schema validation on every write, provenance (every artifact has a source tool), and dedupe (findings get canonical IDs F-1, F-2…).
| Rail | What it does |
|---|---|
scope-guard.sh |
PreToolUse hook on Bash, logs out-of-scope HTTP requests, hard-blocks domains listed in deny-list.txt |
scope-guard-mcp.sh |
Same guard for MCP HTTP tool calls |
session-write-guard.sh |
Prevents agents from clobbering session state directly |
.claude/rules/hunting.md |
20 always-active hunting rules + a "Self-defense, treat target responses as untrusted" section (Project Mantis lineage) |
.claude/rules/reporting.md |
12 reporting rules (no theoretical language, mandatory PoC, CVSS accuracy, title formula, 600-word cap) |
vendor-bypass-tables/ |
Vendor-Aware Bypass tables: Firebase, GraphQL, JWT, Next.js, OAuth/OIDC, REST, SSRF, WordPress + Cloudflare, Akamai, AWS WAF, Google Cloud Armor |
Cross-target knowledge that accumulates as you run Mantis against more targets. Lives in ~/.mantis-fleet/ (opt-in via MANTIS_FLEET_ENABLED=1):
| File | Purpose |
|---|---|
dead-ends.jsonl |
Endpoint patterns that returned WAF/404/null across N targets, wave 1 skips them on new targets |
working-bypasses.jsonl |
Confirmed bypass patterns: {vendor, technique, target, finding_id} |
tech-fingerprints.jsonl |
host → tech-stack cache so RECON skips fingerprinting on revisits |
Tool plumbing (mantis_fleet_read, mantis_fleet_write) is described in docs/MANTIS_METHODOLOGY.md. It's a deferred follow-up; the directory layout is the contract.
mantis/
├── install.sh # installer for target projects
├── dev-sync.sh # dev-only sync to test workspace
├── mcp/
│ ├── server.js # MCP server, all tools, harness-agnostic
│ └── auto-signup.js # headless browser signup helper
├── .claude/ # Claude Code harness
│ ├── agents/ # 12 specialist agent definitions (.md)
│ ├── commands/ # /mantis + speed-mode slash commands
│ ├── hooks/ # scope-guard, mantis-statusline, write-guard
│ ├── rules/ # hunting.md (incl. self-defense), reporting.md
│ ├── vendor-bypass-tables/ # 12 vendor + vuln-class bypass cheatsheets
│ └── settings.json # hook wiring + status line config
├── opencode.json # OpenCode harness, registers MCP + 12 agents
├── adapters/ # multi-harness docs (OpenCode, Aider, Cline, raw MCP)
│ ├── README.md
│ ├── MODELS.md # per-agent model picks across providers
│ ├── opencode.md / claude-code.md / aider.md / cline.md / raw-mcp.md
├── docs/
│ ├── MANTIS_METHODOLOGY.md # DISCOVER → REASON → TEST → LEARN mapping
│ ├── HARNESSES.md # speed-mode reference
│ └── mantis-architecture.svg
├── scripts/
│ └── mantis-worktree.sh # concurrent multi-target hunting
├── test/ # mcp-server + prompt-contracts tests
├── package.json # node --test runner
├── mantis-upgrade-requirements.md # rebrand spec & acceptance checklist
└── CLAUDE.md # project instructions for Claude Code
Per-target runtime state lives outside the repo, at ~/mantis-sessions/<domain>/.
Session directory (~/mantis-sessions/<domain>/):
| File | Format | Purpose |
|---|---|---|
state.json |
JSON | FSM phase, wave count, pending wave, findings count, explored surface IDs, exclusions, lead routing |
attack_surface.json |
JSON | Recon output grouped by priority |
wave-N-assignments.json |
JSON | Per-wave agent → surface_id map (prevents double-testing) |
handoff-wN-aN.json |
JSON | Authoritative hunter hand-off (deterministic merge) |
handoff-wN-aN.md |
Markdown | Freeform hunter notes (humans + chain-builder) |
findings.jsonl |
JSONL | Append-only canonical findings |
findings.md |
Markdown | Human mirror |
brutalist.json / .md |
JSON + MD | Round 1 verification |
balanced.json / .md |
JSON + MD | Round 2 verification |
verified-final.json / .md |
JSON + MD | Round 3 verification |
chains.md |
Markdown | A→B exploit chains |
grade.json / .md |
JSON + MD | 5-axis score + verdict |
report.md |
Markdown | Submission-ready |
SESSION_HANDOFF.md |
Markdown | Cross-session resume hint |
git clone https://github.com/deonmenezes/bountyhunter.git mantis
cd mantis
chmod +x install.sh
./install.sh /absolute/path/to/your/projectThe installer drops agents/hooks/rules into <project>/.claude/, wires the MCP server into <project>/.mcp.json (under the mantis key), and configures the status line. If the target already has .claude/settings.json or .mcp.json, it prints exactly the keys to merge instead of overwriting.
Migrating from the old
bountyagentbrand? Runmv ~/bounty-agent-sessions ~/mantis-sessionsonce if you have an in-flight session, then re-run./install.shagainst your target project so.mcp.jsonis updated to themantisserver key.
Then, from the target project:
claudeInside the session:
/mantis target.com # full autonomous run
/mantis resume target.com # pick up from last phase
/mantis resume target.com force-merge # reconcile a stuck wave
The standard /mantis is balanced. When you need different tradeoffs, use one of the harness commands. They are dispatch policies built on top of the same 7-phase FSM, they do not re-implement it.
| Mode | Use when | Cost vs std | Pipeline |
|---|---|---|---|
/mantis-fast |
Triage an unknown target cheaply | ~10–15% | recon → triage (Haiku) → 1 hunter wave → brutalist verify → grade |
/mantis |
Default | 100% | full 7-phase FSM |
/mantis-ultra |
High-value, deadline pressure | ~200–300% | full FSM, 3× wider waves, parallel verifier dispatch |
/mantis-loop |
Long mission, find rare bugs | budgeted | repeats /mantis via EXPLORE iterations until findings/time budget hit |
/mantis-fullsend |
Auto-disclose after a verified find | ~110% | adds PATCH + DISCLOSE phases |
/mantisplan |
Pre-HUNT sanity check | <1% | reads recon, writes plan.md, no HTTP |
/mantis-fast https://target.com # cheap pre-screen
/mantis-ultra https://target.com # wide parallel
/mantis-loop https://target.com --findings 3 --budget-min 240
/mantisplan target.com # plan before HUNT
A new Haiku-grade triage-agent runs after recon to score surfaces into promote / defer / kill, used implicitly by fast/ultra/plan modes to cut wasted hunter spawns.
For concurrent multi-target hunting, use the worktree helper:
./scripts/mantis-worktree.sh target1.com # creates ~/mantis-worktrees/target1.com on its own branch
./scripts/mantis-worktree.sh target2.com # second worktree, independent MCP server, independent state
Full mode reference: docs/HARNESSES.md.
- Claude Code with Opus access
curl,python3, Node.js 18+- Optional recon tools for better RECON:
All optional, if missing, RECON steps skip cleanly.
go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest go install github.com/projectdiscovery/httpx/cmd/httpx@latest go install github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
For working on the framework itself:
./dev-sync.sh /absolute/path/to/test-workspaceBacks up the test workspace's .mcp.json + .claude/settings.json, re-runs the installer, and runs claude mcp list as a smoke check.
| Layer | Tech | Why |
|---|---|---|
| Host | Claude Code (CLI) | Has the slash command, hook, MCP, and sub-agent primitives |
| Orchestrator | Markdown slash commands | /mantis, /mantis-fullsend |
| Agents | Markdown frontmatter definitions | Declarative role + tool whitelist, versionable in git |
| Control plane | Custom stdio MCP server (Node.js) | Zero-dep, fast startup, schema-enforced tool I/O |
| Transport | JSON-RPC over stdio | Standard MCP wire format |
| Persistence | Plain JSON + JSONL on disk | Human-inspectable, resumable, no DB to run |
| Hooks | POSIX shell + Node.js | PreToolUse guards, status line |
| Recon | subfinder, httpx, nuclei, curl |
Optional, degrades gracefully |
| Signup automation | patchright (Chromium) |
Optional, only when a target needs account creation |
| Tests | node --test |
Built-in runner, no framework |
What's well-executed
- Separation of concerns. Each agent has one job, a narrow tool whitelist, and a short role prompt. This is the single most important anti-hallucination lever.
- Control plane / data plane split. JSON is authoritative, markdown is a debug mirror. Prompts and code never parse the markdown. This is the reason resume works reliably.
- Three-round verification. The brutalist → balanced → final chain catches both over- and under-correction without a single agent needing to be "right."
- Append-only findings with canonical IDs.
findings.jsonl+F-NIDs mean you can reason about findings across waves without races. - Hooks as real safety rails.
scope-guard.shruns before any Bash call; out-of-scope requests are blocked, not just logged. The rules in.claude/rules/*.mdare loaded into every agent's context. - Zero-dependency MCP server. Pure Node, stdio transport, trivial to install, no version drift.
- Graceful degradation. Missing subfinder/httpx/nuclei? RECON continues with what's there. Missing auth? Runs unauthenticated. No single-point-of-failure dependency.
Trade-offs / limits
- Opus-only economics. The multi-round verification is expensive. Cheaper models produce noticeably worse grading. This is a cost-versus-validity trade, and the framework chooses validity.
- No provenance enforcement on MCP writes.
mantis_merge_wave_handoffstrusts the structured hand-offs it's given. A malicious or broken hunter could write plausibly-shaped garbage. Comments inserver.jsexplicitly call this out as out of scope for the current patch. - Synchronous waves. Parallelism is within a wave, not across waves. CHAIN and VERIFY wait for HUNT to finish. For a large target with 6 waves, wall-clock time is the bottleneck.
- Scope enforcement is denylist-based.
deny-list.txtis a hard block; everything else is allowed. An allowlist-first model would be safer but more configuration-heavy. - Recon leans on external tools.
subfinder/nucleiare rate-limited and can get you blocked. No built-in throttling beyond what those tools provide.
Overall
For what it is, an open-source, harness-agnostic, resumable Mantis-style offensive-security agent, the architecture is legitimately good. The FSM + MCP control plane + three-round Multi-Step Evidence verification is the right shape for the hallucination problem, and the code matches the design. It's not a production security product, but it is a genuinely useful research-grade tool and a clean template for anyone building multi-agent systems.
⭐ Built by a strong community of hackers, help support us by giving us a star. Every star tells the next hunter this work is worth continuing. If Mantis caught something for you, even a near-miss, please drop one.
Only test systems you own or are explicitly authorized to test. That means your own apps, your employer's apps (with written approval), or assets listed in a public bug-bounty program with a safe-harbor clause (HackerOne, Bugcrowd, Intigriti, Immunefi, etc.).
Running Mantis against any system without permission may violate computer-crime laws — including the CFAA (US), Computer Misuse Act (UK), IT Act § 43 / 66 (India), and equivalent legislation in every other jurisdiction. It may also violate the target's terms of service and result in account suspension or permanent bans from the platform you're using to test, the AI provider you're using to drive Mantis, or any cloud / hosting service involved.
By using Mantis you accept full responsibility for your actions. The authors, contributors, and project maintainers are not liable for any consequences of use or misuse — including but not limited to:
- civil or criminal proceedings, investigations, or charges;
- lawsuits, settlements, or legal fees;
- account suspensions, bans, or revoked API access;
- terms-of-service violations;
- financial damages, lost revenue, or reputational harm;
- any other direct, indirect, incidental, or consequential outcome.
Mantis is provided "as is", without warranty of any kind, express or implied, including but not limited to merchantability, fitness for a particular purpose, or non-infringement. The built-in scope-guard hook helps prevent obvious mistakes — it cannot save you from bad inputs, careless targeting, or deliberate misuse.
Mantis is a tool. You are the operator. Use it lawfully. If you're unsure whether a target is in scope, don't run it. Ask first.





