Skip to content

KidiXDev/onnx-quantizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ONNX Quantizer

A simple tool to quantize ONNX models to various quantization precisions using ONNX Runtime.

Features

  • Single model quantization (convert.py)
  • Batch conversion of all models in a directory (batch_convert.py)
  • Support for multiple quantization types: QInt8, QUInt8, QFLOAT8E4M3FN, QInt16, QUInt16, QInt4, QUInt4
  • Dynamic quantization for weights only

Requirements

  • Python 3.10+ is recommended
  • ONNX Runtime
  • tqdm

Installation

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:

pip install -r requirements.txt

Usage

Single Model Conversion

To quantize a single ONNX model:

python convert.py input_model.onnx output_model.onnx --quant-type QInt8

You can specify different quantization types:

python convert.py input_model.onnx output_model.onnx --quant-type QUInt8

Batch Conversion

To convert all .onnx files in the models/ directory:

python batch_convert.py --quant-type QInt8

The quantized models will be saved in the output/ directory with the quantization type as suffix (e.g., _qint8.onnx).

Directory Structure

.
├── convert.py              # Single model quantization script
├── batch_convert.py        # Batch conversion script
├── requirements.txt        # Python dependencies
├── models/                 # Input ONNX models directory
│   ├── model1.onnx
│   ├── model2.onnx
│   └── ...
└── output/                 # Output quantized models directory
    ├── model1_qint8.onnx
    ├── model2_qint8.onnx
    └── ...

About

A simple tool to quantize ONNX models to various quantization precisions using ONNX Runtime

Resources

License

Stars

Watchers

Forks

Languages