Skip to content

WhyAsh5114/exploit-arena

Repository files navigation

⚡ ExploitArena

A decentralized bug bounty protocol where AI attacker agents find vulnerabilities, independent verifier agents confirm them in sandboxed environments, CVSS v4.0 scores determine severity, and smart contracts distribute bounties — trustlessly, on-chain, with zero human triage.

License: MIT


The Problem

Bug bounties are broken. $100M+ sits in bounty pools across platforms like Immunefi and HackerOne, yet:

  • Manual triage is the bottleneck. Every submission needs a human security expert to validate — expensive, slow, and unscalable.
  • Smart contracts can't wait. They're immutable, transparent, and hold real assets. The Ronin Bridge hack drained $625M in a single transaction. The DAO hack: $60M. Vulnerabilities that any thorough audit would've caught.
  • No standard severity scoring on-chain. Traditional cybersecurity has CVSS — an industry-standard 0–10 severity scoring framework used by NIST, CERT, and every major security team. Web3 bounty platforms have... vibes.

ExploitArena brings the rigor of traditional cybersecurity — automated red-teaming, sandboxed verification, CVSS scoring — to a trustless, on-chain bounty protocol.


How It Works

Developer submits contract + bounty + deadline
                    │
                    ▼
    ┌───────────────────────────────┐
    │     ATTACKER AGENT POOL       │
    │  Each agent gets an isolated  │
    │  cloud sandbox with shell,    │
    │  compiles, writes PoCs, and   │
    │  TESTS exploits before submit │
    └───────────────┬───────────────┘
                    │ tested exploit submitted
                    ▼
    ┌───────────────────────────────┐
    │     VERIFIER AGENT POOL       │
    │  Each verifier independently: │
    │  1. Gets own isolated sandbox │
    │  2. Reproduces the exploit    │
    │  3. Measures actual impact    │
    │  4. Computes CVSS v4.0 score  │
    │  5. Casts on-chain vote       │
    └───────────────┬───────────────┘
                    │ consensus + CVSS score
                    ▼
    ┌───────────────────────────────┐
    │     BOUNTY ESCROW CONTRACT    │
    │  If supermajority confirms:   │
    │  → payout scaled to CVSS      │
    │  If deadline expires w/o      │
    │    valid exploit:             │
    │  → funds returned to dev      │
    └───────────────────────────────┘

The Flow

  1. A developer deploys a bounty. They submit their smart contract (by GitHub URL or source), deposit a bounty amount in ETH, and set a deadline. The bounty is locked in an on-chain escrow contract.

  2. Attacker agents compete in sandboxes. A pool of AI agents — each running inside an isolated E2B cloud sandbox with full shell, Node.js, Python, and git — independently analyze the codebase. Each agent explores the repo, identifies vulnerabilities, writes exploit code, and must test it inside the sandbox and see it succeed before submitting. No theoretical submissions are accepted.

  3. Verifier agents reproduce in sandboxes. Each submitted exploit is independently verified by multiple agents, each running in its own isolated E2B sandbox with the target repo cloned and the exploit pre-loaded. Verifiers have no shared memory or state. They attempt to reproduce the exploit, measure concrete impact (funds drained, state corrupted, access escalated), and compute a CVSS v4.0 score.

  4. CVSS v4.0 scoring. Verifiers don't just say "yes/no." They compute a CVSS v4.0 severity score (the same framework used by NIST's National Vulnerability Database, CERT, and enterprise security teams worldwide). The score considers attack vector, complexity, privileges required, and impact on confidentiality, integrity, and availability.

  5. On-chain consensus and payout. If a supermajority of verifiers (default: 3-of-5) confirms the exploit is valid and reproducible, the bounty escrow automatically disburses a payout proportional to the CVSS severity. If the deadline expires with no confirmed exploit, the developer gets their full bounty back.


CVSS Scoring On-Chain

ExploitArena adapts the CVSS v4.0 Base metrics for on-chain vulnerability assessment. Verifier agents evaluate each exploit across these dimensions:

Metric What It Measures Smart Contract Context
Attack Vector How the exploit is delivered Network (external call), Adjacent (cross-contract), Local (owner-only)
Attack Complexity Conditions needed beyond attacker's control Low (direct call) vs High (requires specific state/timing)
Privileges Required Auth level needed None (any EOA), Low (token holder), High (admin/owner)
User Interaction Does a victim need to act? None (fully autonomous) vs Required (phishing/social)
Impact: Confidentiality Data exposure Storage reads, private data leakage
Impact: Integrity State corruption Unauthorized state changes, balance manipulation
Impact: Availability Service disruption DoS, gas griefing, bricking the contract

The resulting 0.0–10.0 score maps to a severity rating and a bounty multiplier:

CVSS Score Severity Bounty Payout
9.0 – 10.0 Critical 100% of bounty pool
7.0 – 8.9 High 60%
4.0 – 6.9 Medium 30%
0.1 – 3.9 Low 10%

This is the same severity scale used by the National Vulnerability Database. Traditional cybersecurity standards, enforced trustlessly on-chain.


Why Multi-Agent Verification?

A single AI can hallucinate. It can generate an exploit that looks valid in text but fails on a real EVM. You can't disburse real funds based on one model's opinion.

ExploitArena's verifier pool provides adversarial independence:

  • Isolated sandboxes. Each verifier runs in its own E2B cloud sandbox — separate VM, no shared memory, no shared state.
  • Real execution required. Agents have shell access and must actually compile, run, and observe exploit output — no theoretical reasoning accepted.
  • Supermajority consensus. Bounty payouts require 3-of-5 verifiers to independently confirm.
  • On-chain audit trail. Every vote is recorded on-chain — immutable, transparent, auditable.

This mirrors how real security audit firms operate: independent reviewers, peer-reviewed findings, signed attestations. Except it runs in minutes, not weeks, and the incentives are enforced by code, not contracts.


Demo

The demo showcases the full end-to-end flow against a deliberately vulnerable contract:

Scenario: Reentrancy Exploit

A VulnerableVault contract sends ETH before updating its internal balance — the classic reentrancy bug that caused the DAO hack.

$ arena demo

⚡ ExploitArena — Full On-Chain Demo

  RPC: http://127.0.0.1:8545

✔ BountyEscrow deployed at 0x5FbDB...
✔ 3 verifiers authorized on-chain
✔ Bounty #0 created — 10 ETH escrowed
  On-chain status: Open
  Escrowed: 10 ETH

Running AI pipeline: attack → verify → auto-resolve
──────────────────────────────────────────────────
✔ Exploit found: Reentrancy in withdraw() (Critical)

  Description: withdraw() sends ETH before zeroing balance...
  Attack Steps:
    1. Deploy attacker contract
    2. Call deposit() then withdraw() to trigger reentrant call

─── Verifications ───
  CONFIRMED — CVSS 9.3 (Critical)
  CONFIRMED — CVSS 9.3 (Critical)
  CONFIRMED — CVSS 9.3 (Critical)

─── On-chain Result ───
  Status: Resolved
  Exploit count: 1
  Exploit #0 status: Confirmed
  Avg CVSS: 9.3

✓ BOUNTY RESOLVED — exploit confirmed on-chain
  Attacker's withdrawable balance: 10.0 ETH

What the Demo Proves

Step What It Shows
Bounty submission Developer-facing UX: deposit ETH + contract + deadline
Attacker agent in sandbox AI explores codebase, writes exploit, tests it in a real sandbox
Sandboxed verification Each verifier independently reproduces the exploit in its own sandbox
CVSS scoring Industry-standard severity assessment, computed by each verifier
Auto-resolution Contract resolves automatically when quorum is reached — no admin call needed
Pull-based withdrawal Attacker can withdraw earned payout at any time

Project Structure

exploit-arena/
├── apps/
│   └── web/                     # Next.js frontend + API routes
│       ├── app/                 # Pages: bounties, leaderboard, submit
│       │   └── api/             # REST endpoints for bounties, scan, leaderboard, pipeline
│       ├── components/          # Header, wallet connect, theme toggle (shadcn/ui)
│       └── lib/                 # Wagmi config, scan store, utilities
├── packages/
│   ├── agents/                  # AI agent system
│   │   ├── attacker/            # LLM agent with sandbox tools
│   │   ├── verifier/            # Independent verification agent
│   │   ├── sandbox/             # E2B sandbox management + tool factory
│   │   ├── orchestrator/        # Pipeline: attack → verify → commit on-chain
│   │   ├── chain.ts             # Viem helpers for all on-chain operations
│   │   ├── provider.ts          # OpenAI-compatible LLM provider
│   │   └── SKILLS.md            # Agent workflow instructions
│   ├── cli/                     # arena demo / scan / submit / status
│   ├── contracts/               # Solidity: BountyEscrow + demo contracts
│   │   ├── BountyEscrow.sol     # Multi-exploit escrow with auto-resolution
│   │   └── demos/               # VulnerableVault, ReentrancyAttacker
│   ├── mcp/                     # MCP server for external AI agents
│   │   └── index.ts             # MCP tools: read chain state, manage sandboxes
│   └── shared/                  # Types, ABI, CVSS v4.0 scoring
│       └── src/
│           ├── types.ts         # On-chain struct mirrors & view types
│           ├── abi.ts           # Contract ABI exports
│           └── cvss.ts          # CVSS v4.0 scoring utilities
├── docker-compose.yml           # Local dev services (web, hardhat, mcp)
├── turbo.json                   # Turbo build pipeline
└── pnpm-workspace.yaml          # pnpm monorepo config

Quick Start

Prerequisites

  • Node.js ≥ 18
  • pnpm ≥ 9
  • An E2B API key (for cloud sandboxes)
  • An OpenAI API key (or any OpenAI-compatible endpoint)

Setup

git clone https://github.com/your-team/exploit-arena && cd exploit-arena

# Install dependencies
pnpm install

# Configure environment
cp .env.example .env
# Required: E2B_API_KEY, OPENAI_API_KEY
# Optional: OPENAI_BASE_URL, OPENAI_MODEL
# Chain:    NEXT_PUBLIC_CHAIN=hardhat (default) or sepolia
# Contract: NEXT_PUBLIC_ESCROW_ADDRESS (set after deploying)
# Pipeline: ATTACKER_PRIVATE_KEY, VERIFIER_PRIVATE_KEYS (comma-separated)
# MCP:      CHAIN, ESCROW_ADDRESS, RPC_URL

# Build all packages
pnpm build

Run the Demo (full on-chain cycle)

# Terminal 1: Start local Hardhat node
cd packages/contracts && pnpm node

# Terminal 2: Run the demo
pnpm --filter @exploit-arena/cli exec arena demo

Scan a Source File (on-chain)

# Scan a Solidity file — requires a deployed escrow and an active bounty
pnpm --filter @exploit-arena/cli exec arena scan \
  --escrow 0xYOUR_ESCROW_ADDRESS \
  --bounty-id 0 \
  --source path/to/Contract.sol

# Scan with more agents and custom keys
pnpm --filter @exploit-arena/cli exec arena scan \
  --escrow 0xYOUR_ESCROW_ADDRESS \
  --bounty-id 0 \
  --source path/to/Contract.sol \
  --attackers 3 --verifiers 5 --quorum 3

Run the MCP Server (for external AI agents)

# Start the MCP server (SSE mode)
pnpm --filter @exploit-arena/mcp start

# Or stdio mode for direct integration
pnpm --filter @exploit-arena/mcp start -- --stdio

Run the Web Dashboard

pnpm --filter @exploit-arena/web dev
# Opens at http://localhost:3000

Docker Compose

# Start all services: web, hardhat node, MCP server
docker compose up

# web     → http://localhost:3000
# hardhat → http://localhost:8545
# mcp     → http://localhost:3001/sse

Open WebUI (Docker, local)

Open WebUI is included in docker-compose.yml as openwebui.

# Start Open WebUI with the existing stack
docker compose up -d openwebui mcp node web

# Open WebUI
# http://localhost:3002

Optional (recommended) in your root .env:

# Use a long random value in real setups
OPENWEBUI_SECRET_KEY=replace-with-a-long-random-secret

Add ExploitArena MCP (SSE) to Open WebUI

Open WebUI supports MCP Streamable HTTP (v0.6.31+).

  1. Open Admin Settings in Open WebUI.
  2. Go to Tools and add a new tool connection.
  3. Set Type to MCP (Streamable HTTP) (not OpenAPI).
  4. Set the MCP URL:
    • MCP running on host (dev server via pnpm dev) + Open WebUI in Docker: http://host.docker.internal:3001/sse
    • Both running on host outside Docker: http://localhost:3001/sse
    • MCP also containerised in the same compose network: http://mcp:3001/sse
  5. Save, then test the connection by listing available tools (for example list_bounties).

Add a Custom OpenAI-Compatible Inference Endpoint

  1. In Open WebUI, open Admin Settings.
  2. Go to Connections > OpenAI > Manage.
  3. Add New Connection, then choose Standard / Compatible.
  4. Set:
  • API URL: your endpoint with /v1 (example: http://host.docker.internal:8000/v1)
  • API Key: your provider key (or none if your endpoint does not require auth)
  • Optional model filter: restrict visible model IDs.
  1. Save and pick the model in chat.

Notes:

  • If your inference server runs on the host machine and Open WebUI runs in Docker, use http://host.docker.internal:PORT/v1.
  • If your inference server is another Compose service, use http://<service-name>:PORT/v1.
  • Keep tool Type as MCP for MCP servers. Using OpenAPI type for MCP can fail or hang.

Architecture: Agent Sandbox Model

Every agent (attacker and verifier) runs inside an isolated E2B cloud sandbox — a full Linux VM with shell, Node.js, Python, and git. Agents interact with their sandbox through three tools:

Tool Description
shell Execute any shell command (ls, npm install, compile, test, run scripts)
read_file Read source files from the repo or sandbox
write_file Create exploit code, test files, configs

All on-chain submissions (exploit hashes, verification votes, CVSS scores) are committed directly by the orchestrator pipeline — agents focus purely on analysis and reproduction.

The LLM drives the agent autonomously for up to 30 steps (attackers) or 25 steps (verifiers), using the tools to explore, analyze, write code, and execute it — with the strict requirement that exploits must be tested and verified in the sandbox before submission.

See packages/agents/SKILLS.md for the full agent workflow specification.


Tech Stack

Layer Technology
Smart Contracts Solidity, Hardhat
Local Blockchain Hardhat Network
Agent Framework TypeScript, Vercel AI SDK
LLM OpenAI (or any OpenAI-compatible endpoint)
Sandbox Isolation E2B cloud sandboxes
MCP Server Model Context Protocol (SSE + stdio)
Frontend Next.js, Tailwind CSS v4, shadcn/ui
Wallet Wagmi v2 (Hardhat + Sepolia support via NEXT_PUBLIC_CHAIN)
Monorepo pnpm workspaces, Turborepo

Responsible Use

ExploitArena is designed for authorized security research only. Only submit contracts you own or have explicit permission to test. All demo scenarios use contracts deployed on local forks and testnets with no real funds at risk.


License

MIT — see LICENSE


Built at KJSSE GajShield Hack X · April 2026 · Mumbai

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors