A simple tool to quantize ONNX models to various quantization precisions using ONNX Runtime.
- Single model quantization (
convert.py) - Batch conversion of all models in a directory (
batch_convert.py) - Support for multiple quantization types: QInt8, QUInt8, QFLOAT8E4M3FN, QInt16, QUInt16, QInt4, QUInt4
- Dynamic quantization for weights only
- Python 3.10+ is recommended
- ONNX Runtime
- tqdm
Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`Install dependencies:
pip install -r requirements.txtTo quantize a single ONNX model:
python convert.py input_model.onnx output_model.onnx --quant-type QInt8You can specify different quantization types:
python convert.py input_model.onnx output_model.onnx --quant-type QUInt8To convert all .onnx files in the models/ directory:
python batch_convert.py --quant-type QInt8The quantized models will be saved in the output/ directory with the quantization type as suffix (e.g., _qint8.onnx).
.
├── convert.py # Single model quantization script
├── batch_convert.py # Batch conversion script
├── requirements.txt # Python dependencies
├── models/ # Input ONNX models directory
│ ├── model1.onnx
│ ├── model2.onnx
│ └── ...
└── output/ # Output quantized models directory
├── model1_qint8.onnx
├── model2_qint8.onnx
└── ...