Skip to content

Commit 9e54a65

Browse files
JartXcharlifu
authored andcommitted
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) (vllm-project#25268)
Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: charlifu <charlifu@amd.com>
1 parent c0d7622 commit 9e54a65

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

vllm/model_executor/models/qwen3_next.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -148,9 +148,11 @@ def __init__(
148148

149149
def _maybe_ignore_quant_config(self, quant_config: QuantizationConfig):
150150
# GPTQ configs do not have a list of ignored modules, however AutoGPTQ
151-
# seems to avoid gate quantization.
152-
# See: https://huggingface.co/Qwen/Qwen3-30B-A3B-GPTQ-Int4
153-
if isinstance(quant_config, (GPTQConfig, GPTQMarlinConfig)):
151+
# seems to avoid gate quantization while AutoRound does.
152+
if isinstance(
153+
quant_config,
154+
(GPTQConfig,
155+
GPTQMarlinConfig)) and not quant_config.autoround_version:
154156
return None
155157
return quant_config
156158

0 commit comments

Comments
 (0)