Skip to content

critfinds/Web3ExploitEngineML---AproxResearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AproxResearch

AproxResearch — Exploit ML Engine

A modular ML framework for detecting smart contract vulnerabilities and real-time on-chain exploits. Built for the AMD Instinct MI100X (ROCm) compute cluster.


Table of Contents


Overview

AproxResearch is a FastAPI backend that orchestrates multiple security analysis tools and machine-learning models to audit Solidity smart contracts and detect live on-chain exploit patterns.

Core capabilities:

  • Multi-phase SAST → symbolic execution → fuzzing → ML scoring pipeline
  • Hybrid ML detector (Random Forest + Isolation Forest + LSTM autoencoder)
  • Tool-to-tensor feature fusion across Slither, Aderyn, Semgrep, Mythril, Echidna, and 4naly3er
  • Real-time blockchain scanning via Alchemy WebSocket
  • Circuit-breaker alerts via Telegram and generic webhooks
  • Supabase (PostgREST + Storage) as the persistence layer

Architecture

POST /orchestrate/audit
  │
  ├── Phase 1 — Rapid SAST (parallel)
  │     ├── Slither + Aderyn   → CFG, known patterns
  │     └── Semgrep            → deprecated patterns, logic flaws
  │
  ├── Phase 2 — Deep Logic & Invariants (parallel)
  │     ├── Mythril / Halmos   → symbolic path exploration
  │     └── Echidna / Medusa   → property-based fuzzing
  │
  ├── Phase 3 — Complexity Analysis
  │     └── 4naly3er           → cyclomatic complexity, gas, NC issues
  │
  └── ML Brain
        ├── Tool-to-Tensor     → 69-dim feature vector
        ├── Hybrid Detector    → RF + Isolation Forest + LSTM fusion
        └── Confidence score   → confirmed findings with root-cause

Requirements

Component Version
Python 3.12+
PyTorch ROCm 6.x build (torch==2.5.0+rocm6.1)
ROCm 6.1+
Uvicorn 0.31.0
FastAPI 0.115.0

Optional (significantly improves analysis):

  • slither-analyzer — pip install slither-analyzer
  • semgrep — pip install semgrep (bundled in venv)
  • mythril — pip install mythril
  • echidna — install via GitHub releases
  • 4naly3er — install via npm: npm i -g @4naly3er/4naly3er

Installation

# Clone the repo
git clone <repo-url>
cd Backend---AproxResearch

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

# Install PyTorch for ROCm 6.1 (AMD MI100X)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1

# Install training extras (needed for ML training scripts)
pip install -r requirements-training.txt

Configuration

Copy the example file and fill in your credentials:

cp .env.example .env

Minimum required variables:

# Security
API_KEY=<random-hex-string>          # authenticate all API calls via X-API-Key header

# Blockchain scanning
ALCHEMY_API_KEY=<your-alchemy-key>
ETHERSCAN_API_KEY=<your-etherscan-key>

# Database
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_SERVICE_ROLE_KEY=<service-role-key>

Optional but recommended:

# Telegram exploit alerts
TELEGRAM_BOT_TOKEN=<bot-token>
TELEGRAM_CHAT_ID=<chat-id>

# Weights & Biases training monitoring
WANDB_API_KEY=<wandb-key>

# Alert webhook (Slack, Discord, or custom)
WEBHOOK_URL=<webhook-url>
WEBHOOK_SECRET=<webhook-secret>

# Chain scanners for BSC / Polygon
BSCSCAN_API_KEY=<bscscan-key>
POLYGONSCAN_API_KEY=<polygonscan-key>

# Auto-train baseline models if none found on startup
RETRAIN_ON_STARTUP=false

Running the Service

Development (with hot reload)

source venv/bin/activate
DEBUG=true uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Production (single process)

source venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1

Use --workers 1 — the LSTM autoencoder and chain scanner are stateful singletons. For horizontal scaling, run multiple instances behind a load balancer.

Health check

curl http://localhost:8000/health

Running 24/7 (systemd)

1. Create the service file

sudo tee /etc/systemd/system/aproxresearch.service > /dev/null <<'EOF'
[Unit]
Description=AproxResearch Exploit ML Engine
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=root
WorkingDirectory=/root/Backend---AproxResearch
EnvironmentFile=/root/Backend---AproxResearch/.env

# ROCm / HIP device visibility — expose all GPU devices
Environment=ROCR_VISIBLE_DEVICES=0
Environment=HIP_VISIBLE_DEVICES=0
Environment=HSA_OVERRIDE_GFX_VERSION=10.3.0

ExecStart=/root/Backend---AproxResearch/venv/bin/uvicorn main:app \
    --host 0.0.0.0 \
    --port 8000 \
    --workers 1 \
    --log-level info

Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=aproxresearch

# Allow more open file descriptors for WebSocket connections
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

2. Enable and start

sudo systemctl daemon-reload
sudo systemctl enable aproxresearch
sudo systemctl start aproxresearch

3. Check status

sudo systemctl status aproxresearch
sudo journalctl -u aproxresearch -f          # live logs
sudo journalctl -u aproxresearch --since today

4. Stop / restart

sudo systemctl stop aproxresearch
sudo systemctl restart aproxresearch

API Reference

All endpoints (except /health) require the X-API-Key header.

export API_KEY="your-api-key-here"
export BASE="http://localhost:8001"   # systemd service port (Docker uses 8000)

Health

# Basic liveness (no API key required)
curl http://localhost:8001/health

# ML model status
curl -H "X-API-Key: $API_KEY" $BASE/health/models

# ML backtest status
curl -H "X-API-Key: $API_KEY" $BASE/health/backtest

Analyze a contract

curl -X POST $BASE/analyze \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bytecode": "0x6080604052...",
    "contract_address": "0xabc...def",
    "chain": "ethereum"
  }'

Full security audit (orchestrated)

# Audit from source code
curl -X POST $BASE/orchestrate/audit \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_code": "pragma solidity ^0.8.0;\ncontract Foo {...}",
    "contract_name": "Foo",
    "chain": "ethereum",
    "phases": ["phase1", "phase2", "phase3"]
  }'

# Audit from on-chain address (fetches source from Etherscan)
curl -X POST $BASE/orchestrate/audit \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contract_address": "0xabc...def",
    "chain": "ethereum"
  }'

Response fields:

Field Description
audit_id Unique audit identifier (audit-<uuid>)
confirmed_findings List of confirmed vulnerabilities with severity, root cause, confidence
phase_results Timing and raw output per phase
dominant_root_cause Most common vulnerability class detected
tool_consensus_score Agreement level across all tools (0–1)
tool_feature_norm L2 norm of the 69-dim tool feature vector
total_duration_ms End-to-end latency in milliseconds

Monitor (live blockchain scanning)

# List tracked addresses
curl -H "X-API-Key: $API_KEY" $BASE/monitor/addresses

# Add an address to scan
curl -X POST $BASE/monitor/watch \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"address": "0xabc...def", "chain": "ethereum"}'

# List recent alerts
curl -H "X-API-Key: $API_KEY" $BASE/monitor/alerts

Human Feedback Loop

# Submit analyst feedback on a finding
curl -X POST $BASE/feedback \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audit_id": "audit-abc123",
    "finding_id": "finding-xyz",
    "label": "confirmed",
    "analyst_notes": "Verified reentrancy via manual trace"
  }'

ML Training

Generate training data

source venv/bin/activate

# Generate EVM bytecode training batches
python scripts/generate_evm_training_data.py
python scripts/generate_evm_batch2.py
python scripts/generate_evm_batch3.py

# Inject into Supabase
python scripts/inject_evm_training.py

# Verify injection
python scripts/verify_evm_training.py

Train models

# Train baseline models (RF + Isolation Forest + LSTM)
python scripts/train_models.py

# Run quality gate (validates model performance thresholds)
python scripts/quality_gate.py

Trained models are saved to ./models/:

  • rf_detector.pkl — Random Forest (labelled data)
  • iso_forest.pkl — Isolation Forest (anomaly detection)
  • lstm_autoencoder.pt — LSTM autoencoder (sequence anomalies)
  • lstm_config.json — LSTM hyperparameters

Sync models to/from Supabase Storage

python -c "
from app.models.sync import upload_models, download_models
upload_models('./models')    # push to Supabase Storage
download_models('./models')  # pull from Supabase Storage
"

AMD ROCm / MI100X Setup

The engine auto-detects the GPU backend at startup via app/ml/rocm_inference.py.

Verify ROCm installation

rocminfo | grep -E "gfx|Name"
python -c "import torch; print(torch.version.hip, torch.cuda.is_available())"

Expected startup log output

[rocm] AMD Instinct MI100X | backend=ROCM dtype=bfloat16 flash_attn=False vram=32GB hbm=900GB/s

BF16 inference

On ROCm 6.x+ with MI-series GPUs, the engine automatically uses BF16 autocast:

  • Halves memory bandwidth vs FP32
  • Maintains FP32 exponent range (better than FP16 for training stability)
  • Controlled by DeviceConfig.dtype_str in app/ml/rocm_inference.py

Environment variables for ROCm

ROCR_VISIBLE_DEVICES=0         # expose GPU 0 to ROCm
HIP_VISIBLE_DEVICES=0          # same, via HIP
HSA_OVERRIDE_GFX_VERSION=10.3.0  # needed for some MI100 variants

Flash Attention

Flash Attention 3.0 is probed at startup. On MI100X, it falls back to PyTorch SDPA via Triton if flash-attn-rocm is not installed — the engine continues to function normally.


Testing

source venv/bin/activate

# Run full test suite
pytest tests/ -q

# Run with coverage report
pytest tests/ --cov=app --cov-report=term-missing -q

# Run a single module
pytest tests/test_hybrid_detector.py -v

# Run only fast tests (skip slow integration tests)
pytest tests/ -q -m "not slow"

Current coverage: 67% overall (517 tests passing)

Key coverage highlights:

Module Coverage
app/utils/address.py 100%
app/ml/rocm_inference.py 100%
app/ml/tool_to_tensor.py 100%
app/models/schemas.py 100%
app/ml/hybrid_detector.py 91%
app/db/repository.py 96%
app/models/sync.py 85%

Database Migrations

Run against your Supabase project using the SQL editor or psql:

# Core schema (vulnerabilities, alerts, feedback)
psql $DATABASE_URL < supabase_migration.sql

# Human feedback loop tables
psql $DATABASE_URL < supabase_feedback_migration.sql

Tables created:

  • vulnerabilities — confirmed findings per contract
  • alerts — live exploit detections from chain scanner
  • feedback — analyst labels for RLHF training loop
  • audit_contests — tracked Code4rena / Sherlock contest metadata

Deployment (Railway)

The repo includes a railway.json for one-click Railway deployment:

railway up

Set all required env vars in the Railway dashboard under Variables.

About

Backend for AproxResearch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors