-
Notifications
You must be signed in to change notification settings - Fork 3.1k
【Inference】Migrate MoE Kernel from Paddle Inner #10063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
b8e7243 to
48b603b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #10063 +/- ##
===========================================
- Coverage 49.97% 49.93% -0.04%
===========================================
Files 757 757
Lines 122498 122586 +88
===========================================
+ Hits 61217 61218 +1
- Misses 61281 61368 +87 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
195d226 to
a5c67af
Compare
yuanlehome
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Before submitting
testsfolder. If there are codecov issues, please add tests cases first.PR types
PR changes
Description
Paddle移除moe PR#71610
将框架中的moe算子迁移为paddle_nlp自定义算子,优化部分代码进行精减;有利于框架编译速度提升;便于应用cutlass3.x;便于在此基础支持其他精度moe kernel;便于精度profile等等。
部分头文件引用paddle内部头文件
单测精度可对齐;
模型deepseek-v2-lite wint4输出正常
Qwen/Qwen1.5-MoE-A2.7B模型,单卡H卡/128并发/1000条数据/wint8/input-1152-output-201/block-bs 28
内部moe算子:
迁移后算子: