-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
[Bugfix] Enable PP with AITER+V1 #19822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Warning Gemini encountered an error creating the summary. You can try again by commenting |
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
/gemini summary |
Summary of ChangesThis pull request primarily focuses on enabling Pipeline Parallelism (PP) with AITemplate (AITER) and V1 by addressing two key areas: fixing an AITER Multi-Layer Attention (MLA) setting error and properly integrating AITER RMSNorm operations. The changes ensure that AITER-based normalization functions are correctly registered and dispatched as custom PyTorch operations, and that the MLA decode logic behaves as expected for V1. Highlights
Changelog
Activity
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to enable Pipeline Parallelism with AITER for V1. The key changes include:
- Fixing an AITER MLA setting error by ensuring
max_seqlen_qo
is set to 1 in the decode forward pass forrocm_aiter_mla.py
. - Enabling AITER RMSNorm for V1 by refactoring
layernorm.py
to usedirect_register_custom_op
forrocm_aiter_rms_norm
androcm_aiter_fused_add_rms_norm
. This involves renaming implementation functions and adding fake implementations for compatibility.
The changes in layernorm.py
appear to be a clean refactor for custom op registration. The fix in rocm_aiter_mla.py
directly addresses the described MLA setting error. Minor formatting changes are also included in other model files.
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com> Signed-off-by: juncheoll <th6re8e@naver.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com> Signed-off-by: minpeter <kali2005611@gmail.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com> Signed-off-by: fhl <2410591650@qq.com>
Purpose
Enable Pipeline Parallelism with AITER + V1.
Problem resolved
this command:
VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_RMSNORM=0 VLLM_USE_V1=1 vllm serve /models/DeepSeek-R1/ -pp 8 -tp 1 --block-size 1 --max-model-len 32768 --disable-log-requests --distributed-executor-backend mp
will fail due to bad max_seqlen_qo setting. This PR is to fix this problem.