Skip to content

[Feature]: AWQ DeepSeek support on MI300X #19727

@josephrocca

Description

@josephrocca

I'm testing RedHatAI/DeepSeek-R1-0528-quantized.w4a16 on 4xMI300X with this command:

vllm serve RedHatAI/DeepSeek-R1-0528-quantized.w4a16 --host 0.0.0.0 --port 3000 --max-model-len 8192 --max-seq-len-to-capture 8192 --enable-chunked-prefill --enable-prefix-caching --trust-remote-code --disable-log-requests --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --served-model-name deepseek-chat

And I get:

'_OpNamespace' '_C' object has no attribute 'gptq_marlin_repack'

I've tried VLLM_USE_TRITON_AWQ=1 (seems like it's activated automatically for rocm devices anyway), but it looks like there is no gptq_marlin_repack in awq_triton.py so that didn't help: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/awq_triton.py

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or requestrocmRelated to AMD ROCm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions