-
Clone the repository:
git clone https://github.com/Swarm-Security/SwarmSecurity.git cd SwarmSecurity -
Run the Makefile setup:
make all
This command will:
- Create a
.envfile (prompting for required API keys) - Set up a Python virtual environment
- Install all Python dependencies
- Install npm dependencies (localtunnel)
- Start the audit-agent server
- Start a localtunnel for webhook access
Note: The
make allcommand will prompt you for:OPENAI_API_KEY: Your OpenAI API keyOPENAI_MODEL: Model to use (default:gpt-4.1-nano-2025-04-14)LOG_LEVEL: Logging level (default:INFO)LOG_FILE: Log file path (default:agent.log)AGENTARENA_API_KEY: (Optional) For server modeWEBHOOK_AUTH_TOKEN: (Optional) For server modeDATA_DIR: Data directory (default:./data)
- Create a
-
Alternative: Manual Setup
make setup # Create .env and venv source venv/bin/activate make install # Install Python packages make install-deps # Install npm dependencies make server # Start server (or make tunnel for webhook)
Swarm.Security is an advanced AI-powered security audit agent designed to automatically detect vulnerabilities in Solidity smart contracts. The system employs a swarm intelligence architecture where multiple specialized AI personas work collaboratively to analyze code from different security perspectives.
Unlike traditional static analysis tools, Swarm.Security uses Large Language Models (LLMs) to understand contract logic, identify complex attack vectors, and provide detailed explanations of potential vulnerabilities. The agent can operate in two modes:
- Server Mode: Receives audit tasks via webhook from the Nethermind Agent Arena platform
- Local Mode: Directly audits GitHub repositories for testing and development
The system is built with modularity and extensibility in mind, allowing new security expert personas to be easily added to the swarm.
Swarm.Security was specifically developed to compete in the Nethermind Agent Arena, a platform that evaluates AI agents' ability to find security vulnerabilities in smart contracts. The agent is designed to:
-
Receive Tasks via Webhook: The platform sends notifications when new audit challenges are available, including:
- Task ID and repository URL
- Selected files to audit
- Additional documentation and context
- QA responses from the challenge creator
-
Process Contracts Intelligently: The agent analyzes the provided Solidity files using its swarm of expert personas, identifying vulnerabilities across multiple security domains.
-
Submit Findings: Results are automatically formatted and submitted back to the platform via API, with findings classified by severity (Critical, High, Medium, Low, Informational).
-
Comply with Platform Standards: The agent adheres to the Agent Arena's API specifications, including proper authentication, finding format, and response structure.
The architecture is optimized for the competitive environment, balancing thoroughness with efficiency to maximize vulnerability detection within time constraints.
To validate the effectiveness of Swarm.Security, we conducted comprehensive benchmarking against real-world exploits. Our testing methodology involved comparing the agent's vulnerability detection capabilities against:
- Production Code: The same codebase that ran on the Agent Arena server during competition
- Real-World Exploits: Verified historical exploits from public security repositories
We verified the agent's detection capabilities against recent exploits documented in DeFiHackLabs, a comprehensive repository of DeFi hack reproductions using Foundry. This repository contains real-world exploit scenarios that have been successfully executed and verified.
-
20251007 TokenHolder - Access Control
- Vulnerability Type: Access Control
- Successfully detected by the agent's
AccessControlExpertandThiefpersonas
-
20250827 0xf340 - Access Control
- Vulnerability Type: Access Control
- Detected through comprehensive access control analysis
-
20251004 MIMSpell3 - Bypassed Insolvency Check
- Vulnerability Type: Logic Flaw (Insolvency Check Bypass)
- Identified by
LogicExpert,DeFiAnalyst, andEconomicExpertpersonas
The agent has also been tested against exploits from learn-solidity-hacks, which provides educational examples of common Solidity vulnerabilities. These tests help ensure the agent can detect a wide range of vulnerability patterns across different contract architectures and attack vectors.
- Repository Cloning: Each exploit repository is cloned locally
- Contract Extraction: Target contracts are identified and extracted
- Agent Analysis: Swarm.Security analyzes the contracts using its swarm of expert personas
- Result Validation: Findings are compared against known vulnerabilities to verify detection accuracy
- False Positive Analysis: The agent's findings are reviewed to minimize false positives while maintaining high detection rates
This benchmarking approach ensures that Swarm.Security maintains high accuracy in detecting real-world vulnerabilities while operating within the constraints of the Agent Arena platform.
Swarm.Security follows a multi-stage analysis pipeline:
- Server Mode: FastAPI webhook endpoint receives notifications from Agent Arena
- Local Mode: Directly processes GitHub repositories via CLI
- Downloads/clones the repository
- Extracts selected Solidity files
- Loads additional context (documentation, links, QA responses)
- The RoutingAnalyst persona analyzes each file's metadata (imports, pragma, contract names, heuristics)
- Selects the most relevant expert personas for that specific contract
- Falls back to a core set of always-on personas if routing fails
- Selected personas analyze the contract simultaneously using ThreadPoolExecutor
- Each persona uses an LLM with a specialized system prompt
- Returns structured JSON findings with vulnerability details
- Findings from all personas are collected
- Deduplication and ranking by severity/consensus
- Top findings (limited to 20) are selected
- Results formatted for Agent Arena API
- Findings are sent back to the platform via POST request
- Includes detailed descriptions, attack logic, and code snippets
All expert personas inherit from BasePersona, which provides:
- OpenAI client initialization
- Standardized
hunt()method that:- Formats system and user prompts
- Calls the LLM with JSON response format
- Handles errors gracefully
- Each persona implements
get_system_prompt()with domain-specific expertise
The Swarm class orchestrates the multi-persona analysis:
-
Agent Selection (
_select_agents):- Extracts lightweight context (pragma, imports, contract names, heuristics)
- Calls
RoutingAnalystwith this context - Router returns recommended persona list
- Falls back to always-on set:
Thief,AccessControlExpert,ReentrancyExpert,LogicExpert,Logician,DeFiAnalyst,GasOptimizationExpert
-
Heuristic Detection:
- Scans code for keywords indicating specific vulnerability types
- Examples: "oracle", "flashloan", "proxy", "signature", "token", etc.
- These heuristics inform the router's decision
-
Parallel Execution:
- Uses
ThreadPoolExecutorto run selected personas concurrently - Each persona's
hunt()method is called with full source code - Results are collected and processed
- Uses
-
Finding Formatting:
- Extracts vulnerability details from persona responses
- Adds code snippets with line numbers
- Maps to Agent Arena finding format
The SolidityAuditor class:
- Initializes
SwarmandScout(for static analysis) - Processes list of
SolidityFileobjects - Calls
swarm.analyze_file()for each contract - Aggregates findings and applies deduplication
- Returns
Auditobject withVulnerabilityFindinglist
The select_top_findings function:
- Removes duplicate findings based on similarity
- Ranks by severity (Critical > High > Medium > Low > Informational)
- Limits to top 20 findings to meet platform constraints
- Prioritizes findings with higher consensus (detected by multiple personas)
The FastAPI server (server.py):
/webhookendpoint receives Agent Arena notifications- Validates webhook authorization token
- Processes notifications in background tasks
- Downloads repository ZIP, extracts files
- Fetches task details (selected files, docs, links)
- Calls auditor service
- Submits results back to platform
The local processor (local.py):
- Clones GitHub repositories
- Finds all
.solfiles (or allows interactive selection) - Creates
SolidityFileobjects - Calls auditor service
- Saves results to JSON file
- Records benchmark metrics
Agent Arena → Webhook → Server
↓
Download Repo
↓
Load Contracts
↓
SolidityAuditor
↓
Swarm.analyze_file()
↓
RoutingAnalyst (selects personas)
↓
┌─────────────────┴─────────────────┐
↓ ↓
Persona 1 (Thief) Persona N (OracleExpert)
↓ ↓
LLM Analysis LLM Analysis
↓ ↓
JSON Findings JSON Findings
└─────────────────┬─────────────────┘
↓
Aggregate Findings
↓
Deduplication
↓
Top 20 Selection
↓
Format for API
↓
POST to Agent Arena
Swarm.Security employs 25 specialized AI personas, each an expert in a specific security domain. Below is a breakdown of each persona:
- Focus: Access control and asset draining vulnerabilities
- Targets: Public
init/initializefunctions, unprotected withdrawal functions - Key Patterns: Privilege escalation, missing
onlyOwnermodifiers - Always Active: Yes
- Focus: Authorization and privilege management
- Expertise: Missing access modifiers,
tx.originauthentication, public initializers, RBAC issues - Real-World: Ronin Bridge ($625M), Poly Network ($611M), Parity Wallet ($150M)
- Key Patterns: Unprotected admin functions, default visibility issues, constructor vs initializer confusion
- Focus: All forms of reentrancy attacks
- Expertise: Single-function, cross-function, cross-contract, cross-chain, and read-only reentrancy
- Real-World: The DAO ($60M), Cream Finance ($130M), Curve Finance ($69M - read-only)
- Key Patterns: Check-Effects-Interactions violations, missing reentrancy guards, state reads during callbacks
- Focus: General logical flaws and state management issues
- Targets: Business logic errors, incorrect state transitions, flawed conditional logic
- Always Active: Yes
- Focus: Methodical analysis of reentrancy and state update patterns
- Targets: Check-Effects-Interaction violations, state updates after external calls
- Always Active: Yes
- Focus: Economic exploits and cross-contract interactions in DeFi protocols
- Priority Targets: Oracle manipulation, flash loan attacks, liquidation edge cases, misconfigured parameters, token quirks
- Always Active: Yes
- Focus: Flash loan attack vectors and price manipulation
- Expertise: Price manipulation, oracle manipulation via flash loans, governance attacks, liquidity pool manipulation, TWAP manipulation
- Real-World: PancakeBunny ($45M), bZx ($954K)
- Key Patterns: Spot price calculations, single DEX oracle reliance, governance voting with flash loans
- Focus: Price oracle manipulation and oracle-related vulnerabilities
- Expertise: Spot price vs TWAP, low-liquidity pool exploitation, oracle staleness, weak fallback oracles, single oracle dependency
- Real-World: Mango Markets ($116M), Polter Finance ($12M)
- Key Patterns: Single DEX price reliance, missing staleness checks, no fallback mechanism
- Focus: Economic model vulnerabilities and incentive misalignments
- Expertise: Tokenomics flaws, reward calculation errors, incentive manipulation, economic attacks
- Focus: ERC token standard vulnerabilities
- Expertise: ERC4626 inflation attacks, ERC20 approval race conditions, fee-on-transfer incompatibility, callback reentrancy (ERC721/1155), missing return value checks
- Key Patterns: First depositor manipulation, non-standard token handling, NFT callback reentrancy
- Focus: Signature replay attacks and cryptographic vulnerabilities
- Expertise: Same-chain/cross-chain replay, signature malleability,
ecrecovermisuse, missing nonces, missing chain ID validation (EIP-155) - Real-World: Wormhole Bridge ($325M)
- Key Patterns: Signatures without nonce, missing chainId, zero-address returns from
ecrecover
- Focus: Integer overflow/underflow and precision vulnerabilities
- Expertise: Overflow/underflow (pre-Solidity 0.8), precision loss in division, rounding errors, unsafe type casting, negative value handling
- Real-World: BeautyChain (BEC) BatchOverflow
- Key Patterns: Arithmetic without SafeMath, division before multiplication, unchecked loops
- Focus: Input validation and boundary checking
- Expertise: Missing input validation, boundary condition errors, unchecked user inputs, validation bypasses
- Focus: Storage collision, proxy vulnerabilities, and uninitialized contracts
- Expertise: Storage collision in UUPS/Transparent proxies, uninitialized proxy/implementation contracts, storage layout mismatches, uninitialized storage pointers, upgrade vulnerabilities
- Real-World: Parity Wallet ($150M+)
- Key Patterns: Storage slot collisions, missing
_disableInitializers(), publicinitialize()functions
- Focus: Compiler-specific issues and version compatibility
- Expertise: Solidity version issues, compiler bugs, ABI encoding problems, optimization pitfalls
- Focus: Inheritance and contract composition vulnerabilities
- Expertise: Diamond inheritance issues, function selector collisions, multiple inheritance problems, override errors
- Focus: Interface implementation and external contract interaction issues
- Expertise: Interface compliance, missing function implementations, incorrect interface usage
- Focus: Low-level call vulnerabilities (
call,delegatecall,staticcall, assembly) - Expertise: Unchecked return values, delegatecall storage collisions, assembly vulnerabilities, gas griefing
- Focus: Timestamp and block number manipulation
- Expertise:
block.timestampmanipulation,block.numberdependencies, VRF vulnerabilities, time-based logic flaws
- Focus: Error handling and exception management
- Expertise: Missing error handling, unchecked return values, exception propagation issues, error recovery flaws
- Focus: Denial of Service vulnerabilities
- Expertise: Gas griefing, unbounded loops, external call DoS, state exhaustion attacks
- Focus: Frontrunning and MEV vulnerabilities
- Expertise: Transaction ordering dependencies, predictable randomness, sandwich attacks, MEV exploitation
- Focus: Centralization risks and single points of failure
- Expertise: Single owner risks, centralized oracles, admin key management, governance centralization
- Focus: Gas inefficiencies and optimization opportunities
- Expertise: Storage optimization (packing variables), loop optimization, function visibility, redundant operations, expensive operations in loops
- Note: Returns
optimization_opportunity: trueinstead of vulnerabilities - Always Active: Yes
- Focus: Intelligent persona selection for each contract
- Function: Analyzes contract metadata (filename, heuristics, pragma, imports, contracts, code excerpts) and recommends which personas should analyze the file
- Output: List of recommended persona names with justification
- Fallback: If router fails or returns empty, uses always-on persona set
Create a .env file with the following variables:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=gpt-4.1-nano-2025-04-14
# Logging
LOG_LEVEL=INFO
LOG_FILE=agent.log
# Server Mode (Optional)
AGENTARENA_API_KEY=aa-...
WEBHOOK_AUTH_TOKEN=your_webhook_auth_token
DATA_DIR=./dataRun the agent as a webhook server:
audit-agent serverOr with custom port:
audit-agent server --port 8008Audit a GitHub repository directly:
audit-agent local --repo https://github.com/example/repo.git --output audit.jsonFor interactive file selection:
audit-agent local --repo https://github.com/example/repo.git --only-selected-filesCC NC