Skip to content

[Tracking Issue][TFLite] Support quantized operator import in Relax frontend #19534

@Aharrypotter

Description

@Aharrypotter

Problem

The Relax TFLite frontend currently has two related blockers for quantized TFLite import.

First, quantized tensors are blocked early in get_tensors() by the tensor quantization metadata guard. After preserving tensor-level quantization metadata (scale, zero_point, and axis) and allowing the frontend to proceed further, the next blocker appears at the operator conversion stage:

NameError: name '_qnn' is not defined

This happens because the frontend contains quantized operator conversion paths that reference non-existent _qnn.op.* APIs.

At the same time, Relax already provides quantize / dequantize operators with C++ registration, Python APIs, legalization to TE, and tests. This suggests that quantized TFLite operators may initially be imported using QDQ decomposition around existing Relax ops, rather than requiring a new set of fused QNN operators as the first step.

This issue tracks the work needed to support quantized TFLite operator import in the Relax frontend.

Affected _qnn.op.* calls

The TFLite frontend (python/tvm/relax/frontend/tflite/tflite_frontend.py) references 7 non-existent QNN ops across 18 call sites:

Op Call sites Typical context
quantize 1 float → int8 in convert_quantize()
dequantize 4 int8 → float in convert_dequantize() and convert_detection_postprocess()
requantize 9 post-conv/dense/relu/reshape/reduce scale adjustment
conv2d 1 quantized 2D convolution in convert_conv()
dense 1 quantized fully connected in convert_fully_connected()
concat 1 quantized concatenation in convert_concatenation()
conv2d_transpose 1 quantized transposed convolution in convert_transpose_conv()

Existing Relax quantization infrastructure

Relax already has two QDQ operators with C++ registration, Python APIs, legalization, and tests:

  • relax.op.quantize(data, scale, zero_point, axis, out_dtype)clip(round(input / scale) + zp, min, max)
  • relax.op.dequantize(data, scale, zero_point, axis, out_dtype)scale * (input - zp)

These are defined in:

  • C++: src/relax/op/tensor/qdq.cc
  • Python API: python/tvm/relax/op/qdq.py
  • Legalization: python/tvm/relax/transform/legalize_ops/qdq.py
  • Tests: tests/python/relax/test_op_qdq.py, tests/python/relax/test_transform_legalize_ops_qdq.py

Both support per-tensor and per-axis (channel-wise) quantization via the axis parameter.

Possible implementation directions

There are at least two possible paths:

  1. Add explicit fused Relax QNN operators, such as qnn.conv2d, qnn.dense, and qnn.requantize.
  2. Reuse existing Relax QDQ operators and import quantized TFLite operators as QDQ patterns around existing Relax compute ops.

I propose starting with the second path. The QDQ-based approach has a smaller API surface and can reuse existing Relax quantize/dequantize infrastructure. Explicit fused QNN operators may still be useful later for optimized int8 execution or backend-specific pattern matching, and can be discussed as a follow-up if needed.

Task list

  • Preserve tensor quantization metadata in get_tensors() (scale, zero_point, and axis) and remove the global quantization guard
  • Replace quantize/dequantize helpers with Relax QDQ ops
  • Support quantized Conv2D via QDQ decomposition
  • Add per-channel Conv2D weight support
  • Support quantized FullyConnected / Dense via QDQ
  • Support remaining quantized ops (concat, conv2d_transpose, requantize paths)

Out of scope

  • ONNX QLinearConv / QLinearMatMul — may benefit from similar infrastructure but tracked separately
  • End-to-end int8 kernel optimization — may require explicit fused QNN ops or backend-specific QDQ pattern matching, and is not the first milestone
  • Per-channel axis remap for arbitrary ops — only addressed for conv2d and dense where weight layout transpose occurs

References

cc @leandron @tlopex

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions