[Tracking Issue][TFLite] Support quantized operator import in Relax frontend

## Problem

The Relax TFLite frontend currently has two related blockers for quantized TFLite import.

First, quantized tensors are blocked early in `get_tensors()` by the tensor quantization metadata guard. After preserving tensor-level quantization metadata (`scale`, `zero_point`, and `axis`) and allowing the frontend to proceed further, the next blocker appears at the operator conversion stage:

```text
NameError: name '_qnn' is not defined
```

This happens because the frontend contains quantized operator conversion paths that reference non-existent `_qnn.op.*` APIs.

At the same time, Relax already provides `quantize` / `dequantize` operators with C++ registration, Python APIs, legalization to TE, and tests. This suggests that quantized TFLite operators may initially be imported using QDQ decomposition around existing Relax ops, rather than requiring a new set of fused QNN operators as the first step.

This issue tracks the work needed to support quantized TFLite operator import in the Relax frontend.

## Affected `_qnn.op.*` calls

The TFLite frontend (`python/tvm/relax/frontend/tflite/tflite_frontend.py`) references 7 non-existent QNN ops across 18 call sites:

| Op | Call sites | Typical context |
|----|-----------|-----------------|
| `quantize` | 1 | float → int8 in `convert_quantize()` |
| `dequantize` | 4 | int8 → float in `convert_dequantize()` and `convert_detection_postprocess()` |
| `requantize` | 9 | post-conv/dense/relu/reshape/reduce scale adjustment |
| `conv2d` | 1 | quantized 2D convolution in `convert_conv()` |
| `dense` | 1 | quantized fully connected in `convert_fully_connected()` |
| `concat` | 1 | quantized concatenation in `convert_concatenation()` |
| `conv2d_transpose` | 1 | quantized transposed convolution in `convert_transpose_conv()` |

## Existing Relax quantization infrastructure

Relax already has two QDQ operators with C++ registration, Python APIs, legalization, and tests:

- `relax.op.quantize(data, scale, zero_point, axis, out_dtype)` — `clip(round(input / scale) + zp, min, max)`
- `relax.op.dequantize(data, scale, zero_point, axis, out_dtype)` — `scale * (input - zp)`

These are defined in:
- C++: `src/relax/op/tensor/qdq.cc`
- Python API: `python/tvm/relax/op/qdq.py`
- Legalization: `python/tvm/relax/transform/legalize_ops/qdq.py`
- Tests: `tests/python/relax/test_op_qdq.py`, `tests/python/relax/test_transform_legalize_ops_qdq.py`

Both support per-tensor and per-axis (channel-wise) quantization via the `axis` parameter.

## Possible implementation directions

There are at least two possible paths:

1. Add explicit fused Relax QNN operators, such as `qnn.conv2d`, `qnn.dense`, and `qnn.requantize`.
2. Reuse existing Relax QDQ operators and import quantized TFLite operators as QDQ patterns around existing Relax compute ops.

I propose starting with the second path. The QDQ-based approach has a smaller API surface and can reuse existing Relax quantize/dequantize infrastructure. Explicit fused QNN operators may still be useful later for optimized int8 execution or backend-specific pattern matching, and can be discussed as a follow-up if needed.

## Task list

- [ ] Preserve tensor quantization metadata in `get_tensors()` (`scale`, `zero_point`, and `axis`) and remove the global quantization guard
- [ ] Replace quantize/dequantize helpers with Relax QDQ ops
- [ ] Support quantized Conv2D via QDQ decomposition
- [ ] Add per-channel Conv2D weight support
- [ ] Support quantized FullyConnected / Dense via QDQ
- [ ] Support remaining quantized ops (`concat`, `conv2d_transpose`, `requantize` paths)

## Out of scope

- ONNX `QLinearConv` / `QLinearMatMul` — may benefit from similar infrastructure but tracked separately
- End-to-end int8 kernel optimization — may require explicit fused QNN ops or backend-specific QDQ pattern matching, and is not the first milestone
- Per-channel axis remap for arbitrary ops — only addressed for conv2d and dense where weight layout transpose occurs

## References

- TFLite frontend: `python/tvm/relax/frontend/tflite/tflite_frontend.py`
- TFLite quantization spec: https://www.tensorflow.org/lite/performance/quantization_spec
- Existing TFLite tracking issues: #19412, #19519
- Related: tensor quantization metadata parsing

cc @leandron @tlopex

Op	Call sites	Typical context
`quantize`	1	float → int8 in `convert_quantize()`
`dequantize`	4	int8 → float in `convert_dequantize()` and `convert_detection_postprocess()`
`requantize`	9	post-conv/dense/relu/reshape/reduce scale adjustment
`conv2d`	1	quantized 2D convolution in `convert_conv()`
`dense`	1	quantized fully connected in `convert_fully_connected()`
`concat`	1	quantized concatenation in `convert_concatenation()`
`conv2d_transpose`	1	quantized transposed convolution in `convert_transpose_conv()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking Issue][TFLite] Support quantized operator import in Relax frontend #19534

Problem

Affected `_qnn.op.*` calls

Existing Relax quantization infrastructure

Possible implementation directions

Task list

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Tracking Issue][TFLite] Support quantized operator import in Relax frontend #19534

Description

Problem

Affected _qnn.op.* calls

Existing Relax quantization infrastructure

Possible implementation directions

Task list

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Affected `_qnn.op.*` calls