A modular ML framework for detecting smart contract vulnerabilities and real-time on-chain exploits. Built for the AMD Instinct MI100X (ROCm) compute cluster.
- Overview
- Architecture
- Requirements
- Installation
- Configuration
- Running the Service
- Running 24/7 (systemd)
- API Reference
- ML Training
- AMD ROCm / MI100X Setup
- Testing
- Database Migrations
AproxResearch is a FastAPI backend that orchestrates multiple security analysis tools and machine-learning models to audit Solidity smart contracts and detect live on-chain exploit patterns.
Core capabilities:
- Multi-phase SAST → symbolic execution → fuzzing → ML scoring pipeline
- Hybrid ML detector (Random Forest + Isolation Forest + LSTM autoencoder)
- Tool-to-tensor feature fusion across Slither, Aderyn, Semgrep, Mythril, Echidna, and 4naly3er
- Real-time blockchain scanning via Alchemy WebSocket
- Circuit-breaker alerts via Telegram and generic webhooks
- Supabase (PostgREST + Storage) as the persistence layer
POST /orchestrate/audit
│
├── Phase 1 — Rapid SAST (parallel)
│ ├── Slither + Aderyn → CFG, known patterns
│ └── Semgrep → deprecated patterns, logic flaws
│
├── Phase 2 — Deep Logic & Invariants (parallel)
│ ├── Mythril / Halmos → symbolic path exploration
│ └── Echidna / Medusa → property-based fuzzing
│
├── Phase 3 — Complexity Analysis
│ └── 4naly3er → cyclomatic complexity, gas, NC issues
│
└── ML Brain
├── Tool-to-Tensor → 69-dim feature vector
├── Hybrid Detector → RF + Isolation Forest + LSTM fusion
└── Confidence score → confirmed findings with root-cause
| Component | Version |
|---|---|
| Python | 3.12+ |
| PyTorch | ROCm 6.x build (torch==2.5.0+rocm6.1) |
| ROCm | 6.1+ |
| Uvicorn | 0.31.0 |
| FastAPI | 0.115.0 |
Optional (significantly improves analysis):
slither-analyzer— pip install slither-analyzersemgrep— pip install semgrep (bundled in venv)mythril— pip install mythrilechidna— install via GitHub releases4naly3er— install via npm:npm i -g @4naly3er/4naly3er
# Clone the repo
git clone <repo-url>
cd Backend---AproxResearch
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Install PyTorch for ROCm 6.1 (AMD MI100X)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
# Install training extras (needed for ML training scripts)
pip install -r requirements-training.txtCopy the example file and fill in your credentials:
cp .env.example .envMinimum required variables:
# Security
API_KEY=<random-hex-string> # authenticate all API calls via X-API-Key header
# Blockchain scanning
ALCHEMY_API_KEY=<your-alchemy-key>
ETHERSCAN_API_KEY=<your-etherscan-key>
# Database
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_SERVICE_ROLE_KEY=<service-role-key>Optional but recommended:
# Telegram exploit alerts
TELEGRAM_BOT_TOKEN=<bot-token>
TELEGRAM_CHAT_ID=<chat-id>
# Weights & Biases training monitoring
WANDB_API_KEY=<wandb-key>
# Alert webhook (Slack, Discord, or custom)
WEBHOOK_URL=<webhook-url>
WEBHOOK_SECRET=<webhook-secret>
# Chain scanners for BSC / Polygon
BSCSCAN_API_KEY=<bscscan-key>
POLYGONSCAN_API_KEY=<polygonscan-key>
# Auto-train baseline models if none found on startup
RETRAIN_ON_STARTUP=falsesource venv/bin/activate
DEBUG=true uvicorn main:app --host 0.0.0.0 --port 8000 --reloadsource venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1Use
--workers 1— the LSTM autoencoder and chain scanner are stateful singletons. For horizontal scaling, run multiple instances behind a load balancer.
curl http://localhost:8000/healthsudo tee /etc/systemd/system/aproxresearch.service > /dev/null <<'EOF'
[Unit]
Description=AproxResearch Exploit ML Engine
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=root
WorkingDirectory=/root/Backend---AproxResearch
EnvironmentFile=/root/Backend---AproxResearch/.env
# ROCm / HIP device visibility — expose all GPU devices
Environment=ROCR_VISIBLE_DEVICES=0
Environment=HIP_VISIBLE_DEVICES=0
Environment=HSA_OVERRIDE_GFX_VERSION=10.3.0
ExecStart=/root/Backend---AproxResearch/venv/bin/uvicorn main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 1 \
--log-level info
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=aproxresearch
# Allow more open file descriptors for WebSocket connections
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOFsudo systemctl daemon-reload
sudo systemctl enable aproxresearch
sudo systemctl start aproxresearchsudo systemctl status aproxresearch
sudo journalctl -u aproxresearch -f # live logs
sudo journalctl -u aproxresearch --since todaysudo systemctl stop aproxresearch
sudo systemctl restart aproxresearchAll endpoints (except /health) require the X-API-Key header.
export API_KEY="your-api-key-here"
export BASE="http://localhost:8001" # systemd service port (Docker uses 8000)# Basic liveness (no API key required)
curl http://localhost:8001/health
# ML model status
curl -H "X-API-Key: $API_KEY" $BASE/health/models
# ML backtest status
curl -H "X-API-Key: $API_KEY" $BASE/health/backtestcurl -X POST $BASE/analyze \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"bytecode": "0x6080604052...",
"contract_address": "0xabc...def",
"chain": "ethereum"
}'# Audit from source code
curl -X POST $BASE/orchestrate/audit \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_code": "pragma solidity ^0.8.0;\ncontract Foo {...}",
"contract_name": "Foo",
"chain": "ethereum",
"phases": ["phase1", "phase2", "phase3"]
}'
# Audit from on-chain address (fetches source from Etherscan)
curl -X POST $BASE/orchestrate/audit \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contract_address": "0xabc...def",
"chain": "ethereum"
}'Response fields:
| Field | Description |
|---|---|
audit_id |
Unique audit identifier (audit-<uuid>) |
confirmed_findings |
List of confirmed vulnerabilities with severity, root cause, confidence |
phase_results |
Timing and raw output per phase |
dominant_root_cause |
Most common vulnerability class detected |
tool_consensus_score |
Agreement level across all tools (0–1) |
tool_feature_norm |
L2 norm of the 69-dim tool feature vector |
total_duration_ms |
End-to-end latency in milliseconds |
# List tracked addresses
curl -H "X-API-Key: $API_KEY" $BASE/monitor/addresses
# Add an address to scan
curl -X POST $BASE/monitor/watch \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"address": "0xabc...def", "chain": "ethereum"}'
# List recent alerts
curl -H "X-API-Key: $API_KEY" $BASE/monitor/alerts# Submit analyst feedback on a finding
curl -X POST $BASE/feedback \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"audit_id": "audit-abc123",
"finding_id": "finding-xyz",
"label": "confirmed",
"analyst_notes": "Verified reentrancy via manual trace"
}'source venv/bin/activate
# Generate EVM bytecode training batches
python scripts/generate_evm_training_data.py
python scripts/generate_evm_batch2.py
python scripts/generate_evm_batch3.py
# Inject into Supabase
python scripts/inject_evm_training.py
# Verify injection
python scripts/verify_evm_training.py# Train baseline models (RF + Isolation Forest + LSTM)
python scripts/train_models.py
# Run quality gate (validates model performance thresholds)
python scripts/quality_gate.pyTrained models are saved to ./models/:
rf_detector.pkl— Random Forest (labelled data)iso_forest.pkl— Isolation Forest (anomaly detection)lstm_autoencoder.pt— LSTM autoencoder (sequence anomalies)lstm_config.json— LSTM hyperparameters
python -c "
from app.models.sync import upload_models, download_models
upload_models('./models') # push to Supabase Storage
download_models('./models') # pull from Supabase Storage
"The engine auto-detects the GPU backend at startup via app/ml/rocm_inference.py.
rocminfo | grep -E "gfx|Name"
python -c "import torch; print(torch.version.hip, torch.cuda.is_available())"[rocm] AMD Instinct MI100X | backend=ROCM dtype=bfloat16 flash_attn=False vram=32GB hbm=900GB/s
On ROCm 6.x+ with MI-series GPUs, the engine automatically uses BF16 autocast:
- Halves memory bandwidth vs FP32
- Maintains FP32 exponent range (better than FP16 for training stability)
- Controlled by
DeviceConfig.dtype_strinapp/ml/rocm_inference.py
ROCR_VISIBLE_DEVICES=0 # expose GPU 0 to ROCm
HIP_VISIBLE_DEVICES=0 # same, via HIP
HSA_OVERRIDE_GFX_VERSION=10.3.0 # needed for some MI100 variantsFlash Attention 3.0 is probed at startup. On MI100X, it falls back to PyTorch SDPA via Triton if flash-attn-rocm is not installed — the engine continues to function normally.
source venv/bin/activate
# Run full test suite
pytest tests/ -q
# Run with coverage report
pytest tests/ --cov=app --cov-report=term-missing -q
# Run a single module
pytest tests/test_hybrid_detector.py -v
# Run only fast tests (skip slow integration tests)
pytest tests/ -q -m "not slow"Current coverage: 67% overall (517 tests passing)
Key coverage highlights:
| Module | Coverage |
|---|---|
app/utils/address.py |
100% |
app/ml/rocm_inference.py |
100% |
app/ml/tool_to_tensor.py |
100% |
app/models/schemas.py |
100% |
app/ml/hybrid_detector.py |
91% |
app/db/repository.py |
96% |
app/models/sync.py |
85% |
Run against your Supabase project using the SQL editor or psql:
# Core schema (vulnerabilities, alerts, feedback)
psql $DATABASE_URL < supabase_migration.sql
# Human feedback loop tables
psql $DATABASE_URL < supabase_feedback_migration.sqlTables created:
vulnerabilities— confirmed findings per contractalerts— live exploit detections from chain scannerfeedback— analyst labels for RLHF training loopaudit_contests— tracked Code4rena / Sherlock contest metadata
The repo includes a railway.json for one-click Railway deployment:
railway upSet all required env vars in the Railway dashboard under Variables.