AproxResearch — Exploit ML Engine

A modular ML framework for detecting smart contract vulnerabilities and real-time on-chain exploits. Built for the AMD Instinct MI100X (ROCm) compute cluster.

Overview

AproxResearch is a FastAPI backend that orchestrates multiple security analysis tools and machine-learning models to audit Solidity smart contracts and detect live on-chain exploit patterns.

Core capabilities:

Multi-phase SAST → symbolic execution → fuzzing → ML scoring pipeline
Hybrid ML detector (Random Forest + Isolation Forest + LSTM autoencoder)
Tool-to-tensor feature fusion across Slither, Aderyn, Semgrep, Mythril, Echidna, and 4naly3er
Real-time blockchain scanning via Alchemy WebSocket
Circuit-breaker alerts via Telegram and generic webhooks
Supabase (PostgREST + Storage) as the persistence layer

Architecture

POST /orchestrate/audit
  │
  ├── Phase 1 — Rapid SAST (parallel)
  │     ├── Slither + Aderyn   → CFG, known patterns
  │     └── Semgrep            → deprecated patterns, logic flaws
  │
  ├── Phase 2 — Deep Logic & Invariants (parallel)
  │     ├── Mythril / Halmos   → symbolic path exploration
  │     └── Echidna / Medusa   → property-based fuzzing
  │
  ├── Phase 3 — Complexity Analysis
  │     └── 4naly3er           → cyclomatic complexity, gas, NC issues
  │
  └── ML Brain
        ├── Tool-to-Tensor     → 69-dim feature vector
        ├── Hybrid Detector    → RF + Isolation Forest + LSTM fusion
        └── Confidence score   → confirmed findings with root-cause

Requirements

Component	Version
Python	3.12+
PyTorch	ROCm 6.x build (`torch==2.5.0+rocm6.1`)
ROCm	6.1+
Uvicorn	0.31.0
FastAPI	0.115.0

Optional (significantly improves analysis):

slither-analyzer — pip install slither-analyzer
semgrep — pip install semgrep (bundled in venv)
mythril — pip install mythril
echidna — install via GitHub releases
4naly3er — install via npm: npm i -g @4naly3er/4naly3er

Installation

# Clone the repo
git clone <repo-url>
cd Backend---AproxResearch

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

# Install PyTorch for ROCm 6.1 (AMD MI100X)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1

# Install training extras (needed for ML training scripts)
pip install -r requirements-training.txt

Configuration

Copy the example file and fill in your credentials:

cp .env.example .env

Minimum required variables:

# Security
API_KEY=<random-hex-string>          # authenticate all API calls via X-API-Key header

# Blockchain scanning
ALCHEMY_API_KEY=<your-alchemy-key>
ETHERSCAN_API_KEY=<your-etherscan-key>

# Database
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_SERVICE_ROLE_KEY=<service-role-key>

Optional but recommended:

# Telegram exploit alerts
TELEGRAM_BOT_TOKEN=<bot-token>
TELEGRAM_CHAT_ID=<chat-id>

# Weights & Biases training monitoring
WANDB_API_KEY=<wandb-key>

# Alert webhook (Slack, Discord, or custom)
WEBHOOK_URL=<webhook-url>
WEBHOOK_SECRET=<webhook-secret>

# Chain scanners for BSC / Polygon
BSCSCAN_API_KEY=<bscscan-key>
POLYGONSCAN_API_KEY=<polygonscan-key>

# Auto-train baseline models if none found on startup
RETRAIN_ON_STARTUP=false

Running the Service

Development (with hot reload)

source venv/bin/activate
DEBUG=true uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Production (single process)

source venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1

Use --workers 1 — the LSTM autoencoder and chain scanner are stateful singletons. For horizontal scaling, run multiple instances behind a load balancer.

Health check

curl http://localhost:8000/health

Running 24/7 (systemd)

1. Create the service file

sudo tee /etc/systemd/system/aproxresearch.service > /dev/null <<'EOF'
[Unit]
Description=AproxResearch Exploit ML Engine
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=root
WorkingDirectory=/root/Backend---AproxResearch
EnvironmentFile=/root/Backend---AproxResearch/.env

# ROCm / HIP device visibility — expose all GPU devices
Environment=ROCR_VISIBLE_DEVICES=0
Environment=HIP_VISIBLE_DEVICES=0
Environment=HSA_OVERRIDE_GFX_VERSION=10.3.0

ExecStart=/root/Backend---AproxResearch/venv/bin/uvicorn main:app \
    --host 0.0.0.0 \
    --port 8000 \
    --workers 1 \
    --log-level info

Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=aproxresearch

# Allow more open file descriptors for WebSocket connections
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

2. Enable and start

sudo systemctl daemon-reload
sudo systemctl enable aproxresearch
sudo systemctl start aproxresearch

3. Check status

sudo systemctl status aproxresearch
sudo journalctl -u aproxresearch -f          # live logs
sudo journalctl -u aproxresearch --since today

4. Stop / restart

sudo systemctl stop aproxresearch
sudo systemctl restart aproxresearch

API Reference

All endpoints (except /health) require the X-API-Key header.

export API_KEY="your-api-key-here"
export BASE="http://localhost:8001"   # systemd service port (Docker uses 8000)

Health

# Basic liveness (no API key required)
curl http://localhost:8001/health

# ML model status
curl -H "X-API-Key: $API_KEY" $BASE/health/models

# ML backtest status
curl -H "X-API-Key: $API_KEY" $BASE/health/backtest

Analyze a contract

curl -X POST $BASE/analyze \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bytecode": "0x6080604052...",
    "contract_address": "0xabc...def",
    "chain": "ethereum"
  }'

Full security audit (orchestrated)

# Audit from source code
curl -X POST $BASE/orchestrate/audit \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_code": "pragma solidity ^0.8.0;\ncontract Foo {...}",
    "contract_name": "Foo",
    "chain": "ethereum",
    "phases": ["phase1", "phase2", "phase3"]
  }'

# Audit from on-chain address (fetches source from Etherscan)
curl -X POST $BASE/orchestrate/audit \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contract_address": "0xabc...def",
    "chain": "ethereum"
  }'

Response fields:

Field	Description
`audit_id`	Unique audit identifier (`audit-<uuid>`)
`confirmed_findings`	List of confirmed vulnerabilities with severity, root cause, confidence
`phase_results`	Timing and raw output per phase
`dominant_root_cause`	Most common vulnerability class detected
`tool_consensus_score`	Agreement level across all tools (0–1)
`tool_feature_norm`	L2 norm of the 69-dim tool feature vector
`total_duration_ms`	End-to-end latency in milliseconds

Monitor (live blockchain scanning)

# List tracked addresses
curl -H "X-API-Key: $API_KEY" $BASE/monitor/addresses

# Add an address to scan
curl -X POST $BASE/monitor/watch \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"address": "0xabc...def", "chain": "ethereum"}'

# List recent alerts
curl -H "X-API-Key: $API_KEY" $BASE/monitor/alerts

Human Feedback Loop

# Submit analyst feedback on a finding
curl -X POST $BASE/feedback \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audit_id": "audit-abc123",
    "finding_id": "finding-xyz",
    "label": "confirmed",
    "analyst_notes": "Verified reentrancy via manual trace"
  }'

ML Training

Generate training data

source venv/bin/activate

# Generate EVM bytecode training batches
python scripts/generate_evm_training_data.py
python scripts/generate_evm_batch2.py
python scripts/generate_evm_batch3.py

# Inject into Supabase
python scripts/inject_evm_training.py

# Verify injection
python scripts/verify_evm_training.py

Train models

# Train baseline models (RF + Isolation Forest + LSTM)
python scripts/train_models.py

# Run quality gate (validates model performance thresholds)
python scripts/quality_gate.py

Trained models are saved to ./models/:

rf_detector.pkl — Random Forest (labelled data)
iso_forest.pkl — Isolation Forest (anomaly detection)
lstm_autoencoder.pt — LSTM autoencoder (sequence anomalies)
lstm_config.json — LSTM hyperparameters

Sync models to/from Supabase Storage

python -c "
from app.models.sync import upload_models, download_models
upload_models('./models')    # push to Supabase Storage
download_models('./models')  # pull from Supabase Storage
"

AMD ROCm / MI100X Setup

The engine auto-detects the GPU backend at startup via app/ml/rocm_inference.py.

Verify ROCm installation

rocminfo | grep -E "gfx|Name"
python -c "import torch; print(torch.version.hip, torch.cuda.is_available())"

Expected startup log output

[rocm] AMD Instinct MI100X | backend=ROCM dtype=bfloat16 flash_attn=False vram=32GB hbm=900GB/s

BF16 inference

On ROCm 6.x+ with MI-series GPUs, the engine automatically uses BF16 autocast:

Halves memory bandwidth vs FP32
Maintains FP32 exponent range (better than FP16 for training stability)
Controlled by DeviceConfig.dtype_str in app/ml/rocm_inference.py

Environment variables for ROCm

ROCR_VISIBLE_DEVICES=0         # expose GPU 0 to ROCm
HIP_VISIBLE_DEVICES=0          # same, via HIP
HSA_OVERRIDE_GFX_VERSION=10.3.0  # needed for some MI100 variants

Flash Attention

Flash Attention 3.0 is probed at startup. On MI100X, it falls back to PyTorch SDPA via Triton if flash-attn-rocm is not installed — the engine continues to function normally.

Testing

source venv/bin/activate

# Run full test suite
pytest tests/ -q

# Run with coverage report
pytest tests/ --cov=app --cov-report=term-missing -q

# Run a single module
pytest tests/test_hybrid_detector.py -v

# Run only fast tests (skip slow integration tests)
pytest tests/ -q -m "not slow"

Current coverage: 67% overall (517 tests passing)

Key coverage highlights:

Module	Coverage
`app/utils/address.py`	100%
`app/ml/rocm_inference.py`	100%
`app/ml/tool_to_tensor.py`	100%
`app/models/schemas.py`	100%
`app/ml/hybrid_detector.py`	91%
`app/db/repository.py`	96%
`app/models/sync.py`	85%

Database Migrations

Run against your Supabase project using the SQL editor or psql:

# Core schema (vulnerabilities, alerts, feedback)
psql $DATABASE_URL < supabase_migration.sql

# Human feedback loop tables
psql $DATABASE_URL < supabase_feedback_migration.sql

Tables created:

vulnerabilities — confirmed findings per contract
alerts — live exploit detections from chain scanner
feedback — analyst labels for RLHF training loop
audit_contests — tracked Code4rena / Sherlock contest metadata

Deployment (Railway)

The repo includes a railway.json for one-click Railway deployment:

railway up

Set all required env vars in the Railway dashboard under Variables.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
__pycache__		__pycache__
app		app
assets		assets
data		data
models		models
scripts		scripts
static_analyzer		static_analyzer
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
aproxresearch.service		aproxresearch.service
main.py		main.py
pytest.ini		pytest.ini
railway.json		railway.json
requirements-test.txt		requirements-test.txt
requirements-torch.txt		requirements-torch.txt
requirements-training.runtime.txt		requirements-training.runtime.txt
requirements-training.txt		requirements-training.txt
requirements.txt		requirements.txt
supabase_feedback_migration.sql		supabase_feedback_migration.sql
supabase_migration.sql		supabase_migration.sql

Folders and files

Latest commit

History

Repository files navigation

AproxResearch — Exploit ML Engine

Table of Contents

Overview

Architecture

Requirements

Installation

Configuration

Running the Service

Development (with hot reload)

Production (single process)

Health check

Running 24/7 (systemd)

1. Create the service file

2. Enable and start

3. Check status

4. Stop / restart

API Reference

Health

Analyze a contract

Full security audit (orchestrated)

Monitor (live blockchain scanning)

Human Feedback Loop

ML Training

Generate training data

Train models

Sync models to/from Supabase Storage

AMD ROCm / MI100X Setup

Verify ROCm installation

Expected startup log output

BF16 inference

Environment variables for ROCm

Flash Attention

Testing

Database Migrations

Deployment (Railway)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages