# qunat?

In [19]:
from onnx import numpy_helper

onnx_model = onnx.load(onnx_model_path)
graph = onnx_model.graph

print("Model Initializers (Weights) Info:")
total_params = 0
for initializer in graph.initializer:
    arr = numpy_helper.to_array(initializer)
    print(f"  Name: {initializer.name}")
    print(f"    ONNX data_type: {initializer.data_type} (see onnx.TensorProto for meaning)")
    print(f"    Shape: {arr.shape}")
    print(f"    Num parameters: {arr.size}")
    print(f"    Dtype in numpy: {arr.dtype}\n")
    total_params += arr.size

print(f"Total number of parameters in the model: {total_params}\n")



Model Initializers (Weights) Info:
  Name: backbone.stem.0.weight
    ONNX data_type: 1 (see onnx.TensorProto for meaning)
    Shape: (10, 1, 3, 3)
    Num parameters: 90
    Dtype in numpy: float32

  Name: backbone.stem.0.bias
    ONNX data_type: 1 (see onnx.TensorProto for meaning)
    Shape: (10,)
    Num parameters: 10
    Dtype in numpy: float32

  Name: backbone.stage0.2.weight
    ONNX data_type: 1 (see onnx.TensorProto for meaning)
    Shape: (10, 10, 1, 1)
    Num parameters: 100
    Dtype in numpy: float32

  Name: backbone.stage0.2.bias
    ONNX data_type: 1 (see onnx.TensorProto for meaning)
    Shape: (10,)
    Num parameters: 10
    Dtype in numpy: float32

  Name: backbone.stage0.3.conv_block.conv1.weight
    ONNX data_type: 1 (see onnx.TensorProto for meaning)
    Shape: (10, 1, 3, 3)
    Num parameters: 90
    Dtype in numpy: float32

  Name: backbone.stage0.3.conv_block.conv1pw.weight
    ONNX data_type: 1 (see onnx.TensorProto for meaning)
    Shape: (10, 10, 1, 1)


In [20]:
# Check for quantization ops in the graph
# A very rough indicator of whether the model is using ONNX-quantized operators is if it has
# QuantizeLinear or DequantizeLinear ops in the graph. (There are other ways to check for
# custom quantization, but this is a good start.)
quant_ops = {"QuantizeLinear", "DequantizeLinear"}
found_quant_nodes = any(node.op_type in quant_ops for node in graph.node)

if found_quant_nodes:
    print("This model has quantization nodes (QuantizeLinear/DequantizeLinear).")
else:
    print("No quantization nodes detected. It may still be floating-point or use another quant scheme.")


No quantization nodes detected. It may still be floating-point or use another quant scheme.


quantization approaches in ONNX Runtime:

    Dynamic Quantization (quantize_dynamic)
        Quickest and easiest.
        Quantizes only the weights by default. Activations are still in float32.
        Can help reduce model size and possibly speed up inference on some targets.

    Static Post-Training Quantization (quantize_static)
        More advanced.
        Requires calibration data (representative input examples).
        Quantizes weights and activations to int8.
        Tends to yield better speedups or smaller models for hardware that supports int8.

    Quantization-Aware Training (quantize_qat)
        Performed during training.
        You must have trained the model in a QAT-friendly framework.
        Produces the best accuracy for int8.

## Dynamic Quantization Example

In [23]:
from onnxruntime.quantization import QuantType, quantize_dynamic

input_model_path = "ReDimNet_no_mel.onnx"
output_model_path = "ReDimNet_no_mel_int8.onnx"

# QuantType.QInt8 -> quantize weights to int8
quantize_dynamic(
    model_input=input_model_path,
    model_output=output_model_path,
    weight_type=QuantType.QInt8,  # or QuantType.QUInt8
)



In [25]:
!ls -lha ReDimNet_no_mel_int8.onnx

-rw-rw-r-- 1 vlad vlad 1.8M Mar  8 11:30 ReDimNet_no_mel_int8.onnx


In [27]:
onnx_model = onnx.load(output_model_path)
graph = onnx_model.graph

print("After quantization:")
for initializer in graph.initializer:
    print(f"  {initializer.name} -> data_type: {initializer.data_type}")

After quantization:
  backbone.stem.0.bias -> data_type: 1
  backbone.stage0.2.bias -> data_type: 1
  backbone.stage0.3.conv_block.conv1pw.bias -> data_type: 1
  backbone.stage0.3.conv_block.bn1.weight -> data_type: 1
  backbone.stage0.3.conv_block.bn1.bias -> data_type: 1
  backbone.stage0.3.conv_block.bn1.running_mean -> data_type: 1
  backbone.stage0.3.conv_block.bn1.running_var -> data_type: 1
  backbone.stage0.4.conv_block.conv1pw.bias -> data_type: 1
  backbone.stage0.4.conv_block.bn1.weight -> data_type: 1
  backbone.stage0.4.conv_block.bn1.bias -> data_type: 1
  backbone.stage0.4.conv_block.bn1.running_mean -> data_type: 1
  backbone.stage0.4.conv_block.bn1.running_var -> data_type: 1
  backbone.stage0.6.red_dim_conv.0.bias -> data_type: 1
  backbone.stage0.6.tcm.0.dwconvs.0.bias -> data_type: 1
  backbone.stage0.6.tcm.0.norm.weight -> data_type: 1
  backbone.stage0.6.tcm.0.norm.bias -> data_type: 1
  backbone.stage0.6.tcm.0.norm.running_mean -> data_type: 1
  backbone.stage0.6

data_type = 1 in the ONNX initializer means float32.

data_type: 3 (INT8) or data_type: 2 (UINT8)