Skip to content

Quick Start

Gaurav14cs17 edited this page Jun 21, 2026 · 1 revision

Quick Start

Get started with FlashOptim in minutes.

1. Quantize a Model (Fastest Optimization)

from flashoptim import FlashOptim, PTQuantizer

# Load your model
model = FlashOptim("pretrained/model.pth")

# Apply INT8 quantization
quantizer = PTQuantizer(dtype="int8", calibration_samples=500)
quantized = quantizer.quantize(model, calibration_data="data/calibration/")

# Export
quantized.export("optimized/model_int8.onnx")
print(f"Size reduction: {quantized.compression_ratio}x")

2. Prune a Model

from flashoptim import FlashOptim, UnstructuredPruner

model = FlashOptim("pretrained/model.pth")

pruner = UnstructuredPruner(sparsity=0.5, method="magnitude")
pruned = pruner.prune(model)

pruned.export("optimized/model_pruned.onnx")

3. CLI Workflow

# Quantize
flashoptim quantize --config configs/flashoptim_quantize_int8.yaml

# Prune
flashoptim prune --config configs/flashoptim_prune_unstructured.yaml

# Benchmark the result
flashoptim benchmark --model optimized/model_int8.onnx --device cpu

4. Auto-Optimize (Recommended for Beginners)

from flashoptim import AutoOptimizer

optimizer = AutoOptimizer(
    model_path="pretrained/model.pth",
    target="edge",
    constraints={"latency_ms": 10}
)
result = optimizer.run()
result.export("optimized/model_auto.onnx")

Next Steps

Clone this wiki locally