-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Open
Labels
feature requestNew feature or requestNew feature or requestrocmRelated to AMD ROCmRelated to AMD ROCm
Description
I'm testing RedHatAI/DeepSeek-R1-0528-quantized.w4a16
on 4xMI300X with this command:
vllm serve RedHatAI/DeepSeek-R1-0528-quantized.w4a16 --host 0.0.0.0 --port 3000 --max-model-len 8192 --max-seq-len-to-capture 8192 --enable-chunked-prefill --enable-prefix-caching --trust-remote-code --disable-log-requests --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --served-model-name deepseek-chat
And I get:
'_OpNamespace' '_C' object has no attribute 'gptq_marlin_repack'
I've tried VLLM_USE_TRITON_AWQ=1
(seems like it's activated automatically for rocm devices anyway), but it looks like there is no gptq_marlin_repack
in awq_triton.py
so that didn't help: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/awq_triton.py
Related:
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requestrocmRelated to AMD ROCmRelated to AMD ROCm