Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel #24959

hariharans29 · 2025-06-05T01:23:50Z

Description

MLAS doesn't have optimized kernel implementations for 8-bit weights to support all user requested configurations (as allowed per the MatMulNBits op kernel spec) For example, only accuracy_level of 4 is supported when weights are 8 bits. In this case, provide a "fallback" execution path like the one available for 4 bit weights.

This change:

Adds an execution path in the "unpacked compute" mode for 8-bit weights in MatmulNBits kernel
as MLASDequantize already supports 8 bit weights
1.1 Hitting the non-MLAS de-quantization path which doesn't support 8 bits should be very rare and an enforce is added to guard against hitting that
Added logging to alert users that the non-optimal "unpacked compute" mode is being used in the MatmulNBits kernel when MLAS doesn't have an optimized kernel for that configuration

Motivation and Context

Prevent users from hitting errors if MLAS doesn't host optimized kernels for 8 bit weights

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…hari/matmulnbits_8_bit_fallback

…microsoft/onnxruntime into hari/matmulnbits_8_bit_fallback

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

PR feedback Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

hariharans29 added 2 commits June 4, 2025 18:12

Support 8 bit weights unpacked mode compute in MatmulNBits kernel

9fbda7f

Modify error message

e9b8e7f

hariharans29 changed the title ~~Support 8 bit weights unpacked mode compute in MatmulNBits kernel~~ Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel Jun 5, 2025

hariharans29 requested review from jywu-msft and edgchen1 June 5, 2025 01:24

github-actions bot reviewed Jun 5, 2025

View reviewed changes

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Outdated Show resolved Hide resolved

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Outdated Show resolved Hide resolved

hariharans29 and others added 4 commits June 4, 2025 18:32

Update onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

e3bf04f

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

39bbbbd

…hari/matmulnbits_8_bit_fallback

Fix lint issues

8b2c682

Merge branch 'hari/matmulnbits_8_bit_fallback' of https://github.com/…

64f7ac5

…microsoft/onnxruntime into hari/matmulnbits_8_bit_fallback

edgchen1 reviewed Jun 6, 2025

View reviewed changes

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Show resolved Hide resolved

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Outdated Show resolved Hide resolved

hariharans29 added 2 commits June 10, 2025 13:26

PR feedback

954069e

Nit

0723529

edgchen1 reviewed Jun 11, 2025

View reviewed changes

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Outdated Show resolved Hide resolved

edgchen1 previously approved these changes Jun 11, 2025

View reviewed changes

Update onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

6966ff2

PR feedback Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

hariharans29 dismissed edgchen1’s stale review via 6966ff2 June 11, 2025 03:44

edgchen1 approved these changes Jun 12, 2025

View reviewed changes

hariharans29 merged commit 3b855e1 into main Jun 12, 2025
89 checks passed

hariharans29 deleted the hari/matmulnbits_8_bit_fallback branch June 12, 2025 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel #24959

Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel #24959

Uh oh!

hariharans29 commented Jun 5, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel #24959

Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel #24959

Uh oh!

Conversation

hariharans29 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hariharans29 commented Jun 5, 2025 •

edited

Loading