Add a quantize operator that converts bfloat16 tensors to int8 with per-group symmetric scaling. Basically counterpart to #95
Together with INT8 GEMM (#93), this completes the W8A8 quantized inference pipeline on the NPU:
bf16 activations → quantize (bf16→i8) → INT8 GEMM (i8×i8→i32) → dequant (i32→bf16) → bf16
Proposed behavior:
- Input: bfloat16 tensor + group_size parameter
- Output: int8 tensor + bfloat16 scale factors (one per group)
- Scaling: symmetric, per-group, scale = max(abs(group)) / 127, out = clamp(round(in / scale), -128, 127)
- Follows the dequant_i32 operator pattern (custom MLIROperator with mixed input/output dtypes)
Related:
- #93 : INT8 GEMM support
- #95 : Dequant i32→bf16 operator
- Existing iron/operators/dequant/ (int4→bf16) and iron/operators/dequant_i32/ as implementation references
Description: