You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One command to detect your hardware, recommend the optimal setup, and auto-tune any ONNX model -- on any GPU from any vendor.
ISAT is a production-grade CLI toolkit for ONNX inference optimization. It auto-detects your hardware (AMD, NVIDIA, Intel, Apple, Qualcomm), classifies it (iGPU/dGPU/APU/SoC), and generates copy-paste-ready inference configurations. Then it jointly searches across memory strategy, kernel backend, precision, graph transforms, batch size, and thread tuning -- benchmarking each combination with thermal-aware cooldowns, statistical rigor, and Bayesian optimization.
pip install isat-tuner
# Detect your hardware and get instant recommendations:
isat tune
# Detect + recommend + auto-tune a specific model:
isat tune model.onnx
# Full production tuning with cloud profile:
isat tune model.onnx --profile cloud
Install note: On modern Linux (Ubuntu 23.04+, Debian 12+), bare pip install is blocked by
PEP 668. Use pipx install isat-tuner instead --
it creates an isolated environment and puts isat on your PATH automatically.
If you don't have pipx: sudo apt install pipx && pipx ensurepath.
Why ISAT?
Deploying an ONNX model today means manually tweaking dozens of settings:
Setting
Choices
Impact
HSA_XNACK
0 or 1
Up to 30% on APUs
MIGRAPHX_DISABLE_MLIR
0 or 1
5-15% GEMM performance
MIGRAPHX_SET_GEMM_PROVIDER
default, rocblas, hipblaslt
10-25% on GEMM-heavy models
Precision
FP32, FP16, INT8
2-4x throughput
Batch size
1 to 256
Linear throughput scaling
Graph optimization level
0-99
5-20% latency reduction
Inter/intra op threads
1 to N
CPU-side parallelism
A single wrong choice can leave 40%+ performance on the table. With 6 dimensions and 4+ choices each, there are thousands of combinations. Nobody has time to test them all manually.
ISAT does it automatically.
All 55 Commands
Auto-Tuning & Search
Command
What it does
isat tune
Auto-detect hardware + recommend + tune (works with or without a model)
isat profiles
List available tuning profiles (edge, cloud, latency, etc.)
isat init
Generate a default isat.yaml config file
isat batch
Find optimal batch size (latency vs throughput tradeoff)
isat shapes
Benchmark model across dynamic input shapes
Model Analysis & Inspection
Command
What it does
isat inspect
Deep fingerprint a model without benchmarking
isat diff
Structural diff between two ONNX models
isat fusion
Analyze operator fusion (fused vs unfused ops)
isat attention
Profile attention heads in transformer models
isat weight-sharing
Detect shared/tied weights across layers
isat visualize
Visualize ONNX graph (DOT, ASCII, histogram)
isat scan
Security and compliance scan of ONNX model
isat compat-matrix
Operator compatibility across providers
Benchmarking & Profiling
Command
What it does
isat profile
Decompose latency into load/compile/inference phases
isat llm-bench
LLM token throughput (TPS, TTFT, ITL with P95)
isat compiler-compare
Benchmark same model across ALL execution providers
isat stress
Sustained/burst/ramp stress testing
isat leak-check
Detect memory leaks during inference
isat power
Profile power efficiency (perf/watt, energy/inference)
isat thermal
Thermal throttle detection during inference
isat gpu-frag
GPU memory fragmentation analysis
isat warmup
Analyze warmup behavior, find optimal iterations
Model Optimization
Command
What it does
isat optimize
Optimize ONNX model (simplify, quantize, export)
isat prune
Prune model weights (magnitude/percentage/global)
isat surgery
ONNX graph surgery (remove/rename/extract nodes)
isat quant-sensitivity
Per-layer quantization sensitivity analysis
isat distill
Knowledge distillation planning for teacher models
Monitor output quality and detect confidence drift
isat regression
Performance regression detection across versions
isat replay
Record or replay inference requests
Planning & Cost
Command
What it does
isat cost
Estimate cloud inference cost
isat sla
Validate inference against SLA requirements
isat recommend
Hardware recommendation for a model
isat migrate
Generate migration plan between providers
isat memory
Estimate memory usage and predict OOM risk
Infrastructure & Utilities
Command
What it does
isat hwinfo
Print hardware fingerprint
isat doctor
Pre-flight system health and compatibility check
isat history
Show past tuning results from database
isat export
Re-generate reports from database
isat compare
Compare two configs with significance testing
isat abtest
A/B test two models with statistical rigor
isat snapshot
Capture environment state for reproducibility
isat cache
Manage compilation cache (MIGraphX/ORT)
isat zoo
List pre-tuned model configurations
isat download
Download ONNX model by name or URL
isat registry
Model version registry (register, promote, diff)
isat pipeline
Profile multi-model inference pipeline
Installation
# From PyPI
pip install isat-tuner
# From GitHub (latest)
pip install git+https://github.com/SID-Devu/isat-tuner.git
# With all optional features
pip install "isat-tuner[all]"# Platform-specific
pip install "isat-tuner[rocm]"# ROCm GPU support
pip install "isat-tuner[cuda]"# NVIDIA CUDA support
pip install "isat-tuner[server]"# REST API server
pip install "isat-tuner[bayesian]"# Bayesian optimization (scipy)# Development
git clone https://github.com/SID-Devu/isat-tuner.git
cd isat && pip install -e ".[dev,all]"
Quick Start
# One-command auto-tune
isat tune model.onnx --warmup 3 --runs 5 --cooldown 60
# Use a deployment profile
isat tune model.onnx --profile edge
isat tune model.onnx --profile cloud
# Bayesian optimization (smarter than grid search)
isat tune model.onnx --bayesian --max-configs 20
# Inspect model
isat inspect model.onnx
# Check your hardware
isat hwinfo
# System health check
isat doctor
# LLM token benchmarking
isat llm-bench model.onnx --seq-lengths 32,64,128,256
# Compare across all available providers
isat compiler-compare model.onnx
# Prune a model
isat prune model.onnx --strategy magnitude --sparsity 0.5
# Analyze operator fusion
isat fusion model.onnx
# Generate C++ inference code
isat codegen model.onnx --output-dir cpp_build/
# Canary deployment (safe model rollout)
isat canary baseline.onnx candidate.onnx
# Monitor output drift
isat drift model.onnx
# Graph surgery (remove Identity/Dropout nodes)
isat surgery model.onnx --remove-op Identity --remove-op Dropout
# Launch REST API
isat serve --port 8000
Search Dimensions
1. Memory Strategy
Config
Environment
When to use
xnack0_default
HSA_XNACK=0
Discrete GPUs, no demand paging
xnack1_default
HSA_XNACK=1
APUs, unified memory
xnack1_coarse_grain
XNACK=1 + coarse-grain
Large models on APU
xnack1_oversubscribe
XNACK=1 + queue limit
Models exceeding VRAM
2. Kernel Backend
Config
Environment
When to use
mlir_default
(default)
General-purpose, fused kernels
rocblas_explicit
MIGRAPHX_DISABLE_MLIR=1
GEMM-heavy models
hipblaslt_explicit
MIGRAPHX_SET_GEMM_PROVIDER=hipblaslt
Latest GEMM tuning
3. Precision
Config
Method
Typical speedup
fp32_native
Original
Baseline
fp16_migraphx
MIGraphX built-in
1.5-2x
int8_qdq
ORT static quantization
2-4x
4. Graph Transforms
Config
Transform
Effect
raw_opt99
None + full ORT opt
Default
sim_opt99
onnxsim + full ORT opt
Remove dead ops
pinned_opt99
Freeze dynamic dims
Better kernel selection
5. Batch Size
Auto-explores powers of 2 up to GPU memory limit.
6. Thread Tuning
Explores inter/intra thread counts and sequential vs parallel execution modes.
7. Execution Provider (Multi-Platform)
Auto-detects available providers: MIGraphX, CUDA, TensorRT, OpenVINO, ROCm, DirectML, CPU.
Deployment Profiles
Profile
Warmup
Runs
Cooldown
Priority
Use case
edge
3
10
30s
Latency
IoT, mobile, embedded
cloud
5
20
120s
Throughput
Serving, batch processing
latency
5
30
60s
P99
Real-time inference
throughput
3
15
120s
FPS
Max batch throughput
power
3
10
60s
Perf/watt
Battery, thermal-constrained
quick
1
3
15s
Latency
Fast exploration
exhaustive
5
50
180s
Latency
Leave no stone unturned
apu
3
10
60s
Latency
APU-specific optimization
Output & Reports
File
Description
isat_report.html
Interactive HTML dashboard
isat_report.json
Machine-readable results for automation
best_config.sh
Shell script -- source it to apply best env vars
isat_results.db
SQLite database of all historical results
config.pbtxt
Triton Inference Server config
isat.prom
Prometheus metrics
traces_*.json
OpenTelemetry-compatible trace export
isat_inference.cpp
Generated C++ inference code
REST API Server
isat serve --port 8000
Endpoint
Method
Description
/api/v1/tune
POST
Submit a tuning job
/api/v1/jobs
GET
List all jobs
/api/v1/jobs/{id}
GET
Get job status + results
/api/v1/jobs/{id}/report
GET
Get JSON report
/api/v1/jobs/{id}/report/html
GET
Get HTML dashboard
/api/v1/inspect
POST
Fingerprint a model
/api/v1/hardware
GET
Get hardware fingerprint
/api/v1/history
GET
Query historical results
/health
GET
Health check
Docker
docker-compose up -d
# Or standalone
docker build -t isat .
docker run --device /dev/kfd --device /dev/dri --group-add video \
-v ./models:/models isat tune /models/model.onnx
Inference Stack Auto-Tuner — one command to find the fastest ONNX inference config for any model on any GPU. Bayesian optimization, Pareto analysis, thermal-aware benchmarking, multi-provider (ROCm/CUDA/TensorRT/OpenVINO).