Skip to content

Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel #24959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 12, 2025

Conversation

hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Jun 5, 2025

Description

MLAS doesn't have optimized kernel implementations for 8-bit weights to support all user requested configurations (as allowed per the MatMulNBits op kernel spec) For example, only accuracy_level of 4 is supported when weights are 8 bits. In this case, provide a "fallback" execution path like the one available for 4 bit weights.

This change:

  1. Adds an execution path in the "unpacked compute" mode for 8-bit weights in MatmulNBits kernel
    as MLASDequantize already supports 8 bit weights
    1.1 Hitting the non-MLAS de-quantization path which doesn't support 8 bits should be very rare and an enforce is added to guard against hitting that

  2. Added logging to alert users that the non-optimal "unpacked compute" mode is being used in the MatmulNBits kernel when MLAS doesn't have an optimized kernel for that configuration

Motivation and Context

Prevent users from hitting errors if MLAS doesn't host optimized kernels for 8 bit weights

@hariharans29 hariharans29 changed the title Support 8 bit weights unpacked mode compute in MatmulNBits kernel Support 8 bit weights "unpacked" compute mode in MatmulNBits kernel Jun 5, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

hariharans29 and others added 4 commits June 4, 2025 18:32
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
edgchen1
edgchen1 previously approved these changes Jun 11, 2025
PR feedback

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
@hariharans29 hariharans29 merged commit 3b855e1 into main Jun 12, 2025
89 checks passed
@hariharans29 hariharans29 deleted the hari/matmulnbits_8_bit_fallback branch June 12, 2025 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants