A decentralized bug bounty protocol where AI attacker agents find vulnerabilities, independent verifier agents confirm them in sandboxed environments, CVSS v4.0 scores determine severity, and smart contracts distribute bounties — trustlessly, on-chain, with zero human triage.
Bug bounties are broken. $100M+ sits in bounty pools across platforms like Immunefi and HackerOne, yet:
- Manual triage is the bottleneck. Every submission needs a human security expert to validate — expensive, slow, and unscalable.
- Smart contracts can't wait. They're immutable, transparent, and hold real assets. The Ronin Bridge hack drained $625M in a single transaction. The DAO hack: $60M. Vulnerabilities that any thorough audit would've caught.
- No standard severity scoring on-chain. Traditional cybersecurity has CVSS — an industry-standard 0–10 severity scoring framework used by NIST, CERT, and every major security team. Web3 bounty platforms have... vibes.
ExploitArena brings the rigor of traditional cybersecurity — automated red-teaming, sandboxed verification, CVSS scoring — to a trustless, on-chain bounty protocol.
Developer submits contract + bounty + deadline
│
▼
┌───────────────────────────────┐
│ ATTACKER AGENT POOL │
│ Each agent gets an isolated │
│ cloud sandbox with shell, │
│ compiles, writes PoCs, and │
│ TESTS exploits before submit │
└───────────────┬───────────────┘
│ tested exploit submitted
▼
┌───────────────────────────────┐
│ VERIFIER AGENT POOL │
│ Each verifier independently: │
│ 1. Gets own isolated sandbox │
│ 2. Reproduces the exploit │
│ 3. Measures actual impact │
│ 4. Computes CVSS v4.0 score │
│ 5. Casts on-chain vote │
└───────────────┬───────────────┘
│ consensus + CVSS score
▼
┌───────────────────────────────┐
│ BOUNTY ESCROW CONTRACT │
│ If supermajority confirms: │
│ → payout scaled to CVSS │
│ If deadline expires w/o │
│ valid exploit: │
│ → funds returned to dev │
└───────────────────────────────┘
-
A developer deploys a bounty. They submit their smart contract (by GitHub URL or source), deposit a bounty amount in ETH, and set a deadline. The bounty is locked in an on-chain escrow contract.
-
Attacker agents compete in sandboxes. A pool of AI agents — each running inside an isolated E2B cloud sandbox with full shell, Node.js, Python, and git — independently analyze the codebase. Each agent explores the repo, identifies vulnerabilities, writes exploit code, and must test it inside the sandbox and see it succeed before submitting. No theoretical submissions are accepted.
-
Verifier agents reproduce in sandboxes. Each submitted exploit is independently verified by multiple agents, each running in its own isolated E2B sandbox with the target repo cloned and the exploit pre-loaded. Verifiers have no shared memory or state. They attempt to reproduce the exploit, measure concrete impact (funds drained, state corrupted, access escalated), and compute a CVSS v4.0 score.
-
CVSS v4.0 scoring. Verifiers don't just say "yes/no." They compute a CVSS v4.0 severity score (the same framework used by NIST's National Vulnerability Database, CERT, and enterprise security teams worldwide). The score considers attack vector, complexity, privileges required, and impact on confidentiality, integrity, and availability.
-
On-chain consensus and payout. If a supermajority of verifiers (default: 3-of-5) confirms the exploit is valid and reproducible, the bounty escrow automatically disburses a payout proportional to the CVSS severity. If the deadline expires with no confirmed exploit, the developer gets their full bounty back.
ExploitArena adapts the CVSS v4.0 Base metrics for on-chain vulnerability assessment. Verifier agents evaluate each exploit across these dimensions:
| Metric | What It Measures | Smart Contract Context |
|---|---|---|
| Attack Vector | How the exploit is delivered | Network (external call), Adjacent (cross-contract), Local (owner-only) |
| Attack Complexity | Conditions needed beyond attacker's control | Low (direct call) vs High (requires specific state/timing) |
| Privileges Required | Auth level needed | None (any EOA), Low (token holder), High (admin/owner) |
| User Interaction | Does a victim need to act? | None (fully autonomous) vs Required (phishing/social) |
| Impact: Confidentiality | Data exposure | Storage reads, private data leakage |
| Impact: Integrity | State corruption | Unauthorized state changes, balance manipulation |
| Impact: Availability | Service disruption | DoS, gas griefing, bricking the contract |
The resulting 0.0–10.0 score maps to a severity rating and a bounty multiplier:
| CVSS Score | Severity | Bounty Payout |
|---|---|---|
| 9.0 – 10.0 | Critical | 100% of bounty pool |
| 7.0 – 8.9 | High | 60% |
| 4.0 – 6.9 | Medium | 30% |
| 0.1 – 3.9 | Low | 10% |
This is the same severity scale used by the National Vulnerability Database. Traditional cybersecurity standards, enforced trustlessly on-chain.
A single AI can hallucinate. It can generate an exploit that looks valid in text but fails on a real EVM. You can't disburse real funds based on one model's opinion.
ExploitArena's verifier pool provides adversarial independence:
- Isolated sandboxes. Each verifier runs in its own E2B cloud sandbox — separate VM, no shared memory, no shared state.
- Real execution required. Agents have shell access and must actually compile, run, and observe exploit output — no theoretical reasoning accepted.
- Supermajority consensus. Bounty payouts require 3-of-5 verifiers to independently confirm.
- On-chain audit trail. Every vote is recorded on-chain — immutable, transparent, auditable.
This mirrors how real security audit firms operate: independent reviewers, peer-reviewed findings, signed attestations. Except it runs in minutes, not weeks, and the incentives are enforced by code, not contracts.
The demo showcases the full end-to-end flow against a deliberately vulnerable contract:
A VulnerableVault contract sends ETH before updating its internal balance — the classic reentrancy bug that caused the DAO hack.
$ arena demo
⚡ ExploitArena — Full On-Chain Demo
RPC: http://127.0.0.1:8545
✔ BountyEscrow deployed at 0x5FbDB...
✔ 3 verifiers authorized on-chain
✔ Bounty #0 created — 10 ETH escrowed
On-chain status: Open
Escrowed: 10 ETH
Running AI pipeline: attack → verify → auto-resolve
──────────────────────────────────────────────────
✔ Exploit found: Reentrancy in withdraw() (Critical)
Description: withdraw() sends ETH before zeroing balance...
Attack Steps:
1. Deploy attacker contract
2. Call deposit() then withdraw() to trigger reentrant call
─── Verifications ───
CONFIRMED — CVSS 9.3 (Critical)
CONFIRMED — CVSS 9.3 (Critical)
CONFIRMED — CVSS 9.3 (Critical)
─── On-chain Result ───
Status: Resolved
Exploit count: 1
Exploit #0 status: Confirmed
Avg CVSS: 9.3
✓ BOUNTY RESOLVED — exploit confirmed on-chain
Attacker's withdrawable balance: 10.0 ETH
| Step | What It Shows |
|---|---|
| Bounty submission | Developer-facing UX: deposit ETH + contract + deadline |
| Attacker agent in sandbox | AI explores codebase, writes exploit, tests it in a real sandbox |
| Sandboxed verification | Each verifier independently reproduces the exploit in its own sandbox |
| CVSS scoring | Industry-standard severity assessment, computed by each verifier |
| Auto-resolution | Contract resolves automatically when quorum is reached — no admin call needed |
| Pull-based withdrawal | Attacker can withdraw earned payout at any time |
exploit-arena/
├── apps/
│ └── web/ # Next.js frontend + API routes
│ ├── app/ # Pages: bounties, leaderboard, submit
│ │ └── api/ # REST endpoints for bounties, scan, leaderboard, pipeline
│ ├── components/ # Header, wallet connect, theme toggle (shadcn/ui)
│ └── lib/ # Wagmi config, scan store, utilities
├── packages/
│ ├── agents/ # AI agent system
│ │ ├── attacker/ # LLM agent with sandbox tools
│ │ ├── verifier/ # Independent verification agent
│ │ ├── sandbox/ # E2B sandbox management + tool factory
│ │ ├── orchestrator/ # Pipeline: attack → verify → commit on-chain
│ │ ├── chain.ts # Viem helpers for all on-chain operations
│ │ ├── provider.ts # OpenAI-compatible LLM provider
│ │ └── SKILLS.md # Agent workflow instructions
│ ├── cli/ # arena demo / scan / submit / status
│ ├── contracts/ # Solidity: BountyEscrow + demo contracts
│ │ ├── BountyEscrow.sol # Multi-exploit escrow with auto-resolution
│ │ └── demos/ # VulnerableVault, ReentrancyAttacker
│ ├── mcp/ # MCP server for external AI agents
│ │ └── index.ts # MCP tools: read chain state, manage sandboxes
│ └── shared/ # Types, ABI, CVSS v4.0 scoring
│ └── src/
│ ├── types.ts # On-chain struct mirrors & view types
│ ├── abi.ts # Contract ABI exports
│ └── cvss.ts # CVSS v4.0 scoring utilities
├── docker-compose.yml # Local dev services (web, hardhat, mcp)
├── turbo.json # Turbo build pipeline
└── pnpm-workspace.yaml # pnpm monorepo config
- Node.js ≥ 18
- pnpm ≥ 9
- An E2B API key (for cloud sandboxes)
- An OpenAI API key (or any OpenAI-compatible endpoint)
git clone https://github.com/your-team/exploit-arena && cd exploit-arena
# Install dependencies
pnpm install
# Configure environment
cp .env.example .env
# Required: E2B_API_KEY, OPENAI_API_KEY
# Optional: OPENAI_BASE_URL, OPENAI_MODEL
# Chain: NEXT_PUBLIC_CHAIN=hardhat (default) or sepolia
# Contract: NEXT_PUBLIC_ESCROW_ADDRESS (set after deploying)
# Pipeline: ATTACKER_PRIVATE_KEY, VERIFIER_PRIVATE_KEYS (comma-separated)
# MCP: CHAIN, ESCROW_ADDRESS, RPC_URL
# Build all packages
pnpm build# Terminal 1: Start local Hardhat node
cd packages/contracts && pnpm node
# Terminal 2: Run the demo
pnpm --filter @exploit-arena/cli exec arena demo# Scan a Solidity file — requires a deployed escrow and an active bounty
pnpm --filter @exploit-arena/cli exec arena scan \
--escrow 0xYOUR_ESCROW_ADDRESS \
--bounty-id 0 \
--source path/to/Contract.sol
# Scan with more agents and custom keys
pnpm --filter @exploit-arena/cli exec arena scan \
--escrow 0xYOUR_ESCROW_ADDRESS \
--bounty-id 0 \
--source path/to/Contract.sol \
--attackers 3 --verifiers 5 --quorum 3# Start the MCP server (SSE mode)
pnpm --filter @exploit-arena/mcp start
# Or stdio mode for direct integration
pnpm --filter @exploit-arena/mcp start -- --stdiopnpm --filter @exploit-arena/web dev
# Opens at http://localhost:3000# Start all services: web, hardhat node, MCP server
docker compose up
# web → http://localhost:3000
# hardhat → http://localhost:8545
# mcp → http://localhost:3001/sseOpen WebUI is included in docker-compose.yml as openwebui.
# Start Open WebUI with the existing stack
docker compose up -d openwebui mcp node web
# Open WebUI
# http://localhost:3002Optional (recommended) in your root .env:
# Use a long random value in real setups
OPENWEBUI_SECRET_KEY=replace-with-a-long-random-secretOpen WebUI supports MCP Streamable HTTP (v0.6.31+).
- Open Admin Settings in Open WebUI.
- Go to Tools and add a new tool connection.
- Set Type to MCP (Streamable HTTP) (not OpenAPI).
- Set the MCP URL:
- MCP running on host (dev server via
pnpm dev) + Open WebUI in Docker:http://host.docker.internal:3001/sse - Both running on host outside Docker:
http://localhost:3001/sse - MCP also containerised in the same compose network:
http://mcp:3001/sse
- MCP running on host (dev server via
- Save, then test the connection by listing available tools (for example
list_bounties).
- In Open WebUI, open Admin Settings.
- Go to Connections > OpenAI > Manage.
- Add New Connection, then choose Standard / Compatible.
- Set:
- API URL: your endpoint with
/v1(example:http://host.docker.internal:8000/v1) - API Key: your provider key (or
noneif your endpoint does not require auth) - Optional model filter: restrict visible model IDs.
- Save and pick the model in chat.
Notes:
- If your inference server runs on the host machine and Open WebUI runs in Docker, use
http://host.docker.internal:PORT/v1. - If your inference server is another Compose service, use
http://<service-name>:PORT/v1. - Keep tool Type as MCP for MCP servers. Using OpenAPI type for MCP can fail or hang.
Every agent (attacker and verifier) runs inside an isolated E2B cloud sandbox — a full Linux VM with shell, Node.js, Python, and git. Agents interact with their sandbox through three tools:
| Tool | Description |
|---|---|
shell |
Execute any shell command (ls, npm install, compile, test, run scripts) |
read_file |
Read source files from the repo or sandbox |
write_file |
Create exploit code, test files, configs |
All on-chain submissions (exploit hashes, verification votes, CVSS scores) are committed directly by the orchestrator pipeline — agents focus purely on analysis and reproduction.
The LLM drives the agent autonomously for up to 30 steps (attackers) or 25 steps (verifiers), using the tools to explore, analyze, write code, and execute it — with the strict requirement that exploits must be tested and verified in the sandbox before submission.
See packages/agents/SKILLS.md for the full agent workflow specification.
| Layer | Technology |
|---|---|
| Smart Contracts | Solidity, Hardhat |
| Local Blockchain | Hardhat Network |
| Agent Framework | TypeScript, Vercel AI SDK |
| LLM | OpenAI (or any OpenAI-compatible endpoint) |
| Sandbox Isolation | E2B cloud sandboxes |
| MCP Server | Model Context Protocol (SSE + stdio) |
| Frontend | Next.js, Tailwind CSS v4, shadcn/ui |
| Wallet | Wagmi v2 (Hardhat + Sepolia support via NEXT_PUBLIC_CHAIN) |
| Monorepo | pnpm workspaces, Turborepo |
ExploitArena is designed for authorized security research only. Only submit contracts you own or have explicit permission to test. All demo scenarios use contracts deployed on local forks and testnets with no real funds at risk.
MIT — see LICENSE
Built at KJSSE GajShield Hack X · April 2026 · Mumbai