Problem
The Relax TFLite frontend currently has two related blockers for quantized TFLite import.
First, quantized tensors are blocked early in get_tensors() by the tensor quantization metadata guard. After preserving tensor-level quantization metadata (scale, zero_point, and axis) and allowing the frontend to proceed further, the next blocker appears at the operator conversion stage:
NameError: name '_qnn' is not defined
This happens because the frontend contains quantized operator conversion paths that reference non-existent _qnn.op.* APIs.
At the same time, Relax already provides quantize / dequantize operators with C++ registration, Python APIs, legalization to TE, and tests. This suggests that quantized TFLite operators may initially be imported using QDQ decomposition around existing Relax ops, rather than requiring a new set of fused QNN operators as the first step.
This issue tracks the work needed to support quantized TFLite operator import in the Relax frontend.
Affected _qnn.op.* calls
The TFLite frontend (python/tvm/relax/frontend/tflite/tflite_frontend.py) references 7 non-existent QNN ops across 18 call sites:
| Op |
Call sites |
Typical context |
quantize |
1 |
float → int8 in convert_quantize() |
dequantize |
4 |
int8 → float in convert_dequantize() and convert_detection_postprocess() |
requantize |
9 |
post-conv/dense/relu/reshape/reduce scale adjustment |
conv2d |
1 |
quantized 2D convolution in convert_conv() |
dense |
1 |
quantized fully connected in convert_fully_connected() |
concat |
1 |
quantized concatenation in convert_concatenation() |
conv2d_transpose |
1 |
quantized transposed convolution in convert_transpose_conv() |
Existing Relax quantization infrastructure
Relax already has two QDQ operators with C++ registration, Python APIs, legalization, and tests:
relax.op.quantize(data, scale, zero_point, axis, out_dtype) — clip(round(input / scale) + zp, min, max)
relax.op.dequantize(data, scale, zero_point, axis, out_dtype) — scale * (input - zp)
These are defined in:
- C++:
src/relax/op/tensor/qdq.cc
- Python API:
python/tvm/relax/op/qdq.py
- Legalization:
python/tvm/relax/transform/legalize_ops/qdq.py
- Tests:
tests/python/relax/test_op_qdq.py, tests/python/relax/test_transform_legalize_ops_qdq.py
Both support per-tensor and per-axis (channel-wise) quantization via the axis parameter.
Possible implementation directions
There are at least two possible paths:
- Add explicit fused Relax QNN operators, such as
qnn.conv2d, qnn.dense, and qnn.requantize.
- Reuse existing Relax QDQ operators and import quantized TFLite operators as QDQ patterns around existing Relax compute ops.
I propose starting with the second path. The QDQ-based approach has a smaller API surface and can reuse existing Relax quantize/dequantize infrastructure. Explicit fused QNN operators may still be useful later for optimized int8 execution or backend-specific pattern matching, and can be discussed as a follow-up if needed.
Task list
Out of scope
- ONNX
QLinearConv / QLinearMatMul — may benefit from similar infrastructure but tracked separately
- End-to-end int8 kernel optimization — may require explicit fused QNN ops or backend-specific QDQ pattern matching, and is not the first milestone
- Per-channel axis remap for arbitrary ops — only addressed for conv2d and dense where weight layout transpose occurs
References
cc @leandron @tlopex
Problem
The Relax TFLite frontend currently has two related blockers for quantized TFLite import.
First, quantized tensors are blocked early in
get_tensors()by the tensor quantization metadata guard. After preserving tensor-level quantization metadata (scale,zero_point, andaxis) and allowing the frontend to proceed further, the next blocker appears at the operator conversion stage:This happens because the frontend contains quantized operator conversion paths that reference non-existent
_qnn.op.*APIs.At the same time, Relax already provides
quantize/dequantizeoperators with C++ registration, Python APIs, legalization to TE, and tests. This suggests that quantized TFLite operators may initially be imported using QDQ decomposition around existing Relax ops, rather than requiring a new set of fused QNN operators as the first step.This issue tracks the work needed to support quantized TFLite operator import in the Relax frontend.
Affected
_qnn.op.*callsThe TFLite frontend (
python/tvm/relax/frontend/tflite/tflite_frontend.py) references 7 non-existent QNN ops across 18 call sites:quantizeconvert_quantize()dequantizeconvert_dequantize()andconvert_detection_postprocess()requantizeconv2dconvert_conv()denseconvert_fully_connected()concatconvert_concatenation()conv2d_transposeconvert_transpose_conv()Existing Relax quantization infrastructure
Relax already has two QDQ operators with C++ registration, Python APIs, legalization, and tests:
relax.op.quantize(data, scale, zero_point, axis, out_dtype)—clip(round(input / scale) + zp, min, max)relax.op.dequantize(data, scale, zero_point, axis, out_dtype)—scale * (input - zp)These are defined in:
src/relax/op/tensor/qdq.ccpython/tvm/relax/op/qdq.pypython/tvm/relax/transform/legalize_ops/qdq.pytests/python/relax/test_op_qdq.py,tests/python/relax/test_transform_legalize_ops_qdq.pyBoth support per-tensor and per-axis (channel-wise) quantization via the
axisparameter.Possible implementation directions
There are at least two possible paths:
qnn.conv2d,qnn.dense, andqnn.requantize.I propose starting with the second path. The QDQ-based approach has a smaller API surface and can reuse existing Relax quantize/dequantize infrastructure. Explicit fused QNN operators may still be useful later for optimized int8 execution or backend-specific pattern matching, and can be discussed as a follow-up if needed.
Task list
get_tensors()(scale,zero_point, andaxis) and remove the global quantization guardconcat,conv2d_transpose,requantizepaths)Out of scope
QLinearConv/QLinearMatMul— may benefit from similar infrastructure but tracked separatelyReferences
python/tvm/relax/frontend/tflite/tflite_frontend.pycc @leandron @tlopex