-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
[ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374
#18990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374
#18990
Conversation
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
cc: @842974287 |
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@houseroad |
…6026703716a81f05374` (vllm-project#18990) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…6026703716a81f05374` (vllm-project#18990) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: minpeter <kali2005611@gmail.com>
This PR is a bugfix after PR (#18596). ONLY Merge this after PR (#18596).
This PR also include upgrading the AITER commit of the Dockerfile.
PR (#18596) introduced the use of AITER MHA which depends on a new AITER commit
648764942e552a8bb5fe16026703716a81f05374
.AITER commit: ROCm/aiter@a02a93d has introduced a new enum value in a breaking changes manner.
lm_eval after fix
Qwen/Qwen3-235B-A22B-FP8
mistralai/Mixtral-8x7B-Instruct-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1 dynamic fp8 quantization
deepseek-ai/DeepSeek-V3
Note:
Even if the AITER commit:
648764942e552a8bb5fe16026703716a81f05374
introduced a new input argumentmin_seqlen_q
toflash_attn_varlen_func
. It seems that the default value is set to0
which retains compatibility with how the MHA is used in the ROCm MLA v1 class. Refer to https://github.com/ROCm/aiter/blob/2c1a21adad9c5b5e02619c7dd05d63f9afda3642/aiter/ops/mha.py#L1369