【Inference】Migrate MoE Kernel from Paddle Inner #10063

gzy19990617 · 2025-03-10T10:42:16Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

PR changes

Description

Paddle移除moe PR#71610
将框架中的moe算子迁移为paddle_nlp自定义算子，优化部分代码进行精减；有利于框架编译速度提升；便于应用cutlass3.x；便于在此基础支持其他精度moe kernel；便于精度profile等等。
部分头文件引用paddle内部头文件
单测精度可对齐；
模型deepseek-v2-lite wint4输出正常

修改tune后逻辑在长输入下性能约提升20-30%

paddle内部实现与paddle_nlp自定义算子moe静态图服务化吞吐持平，可以证明迁移出去，不会影响服务静态图吞吐性能。

Qwen/Qwen1.5-MoE-A2.7B模型，单卡H卡/128并发/1000条数据/wint8/input-1152-output-201/block-bs 28
内部moe算子：

迁移后算子：

paddle-bot · 2025-03-10T10:42:24Z

Thanks for your contribution!

codecov · 2025-03-10T11:19:21Z

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 49.93%. Comparing base (79dbdbb) to head (45b5e0e).
Report is 265 commits behind head on develop.

Files with missing lines	Patch %	Lines
...erimental/transformers/fused_transformer_layers.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #10063      +/-   ##
===========================================
- Coverage    49.97%   49.93%   -0.04%     
===========================================
  Files          757      757              
  Lines       122498   122586      +88     
===========================================
+ Hits         61217    61218       +1     
- Misses       61281    61368      +87

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

CLAassistant · 2025-03-11T09:01:45Z

All committers have signed the CLA.

csrc/gpu/moe/fused_moe/fused_moe.cu

csrc/setup_cuda.py

yuanlehome

LGTM

save

54e725a

backup

48b603b

gzy19990617 force-pushed the migration_moe_ffn branch from b8e7243 to 48b603b Compare March 10, 2025 10:44

add moe_dispatch&moe_reduce

c6a1c69

gzy19990617 added 3 commits March 11, 2025 17:45

add dongzhuanjing

7b5add0

fix dispatch precis

a1945e7

rm cutlass_extention

a5c67af

gzy19990617 force-pushed the migration_moe_ffn branch from 195d226 to a5c67af Compare March 11, 2025 09:47

gzy19990617 added 11 commits March 11, 2025 21:19

add zkk inner commit

db88a49

fix

286dede

add fused_moe

01fe7b0

add glog third party

a981c76

add fused_moe

76ddc48

formar code style

6bb0f95

fix

d5f798e

rm gemm config

79bb998

merge develop

6d020b6

add test

85bedea

fix

31cd2f9

gzy19990617 changed the title ~~Migration moe ffn~~ 【Inference】Migration moe ffn Mar 12, 2025

gzy19990617 changed the title ~~【Inference】Migration moe ffn~~ 【Inference】Migration MoE Kernel from Paddle Inner Mar 12, 2025

gzy19990617 changed the title ~~【Inference】Migration MoE Kernel from Paddle Inner~~ 【Inference】Migrate MoE Kernel from Paddle Inner Mar 12, 2025

gzy19990617 added 5 commits March 13, 2025 10:54

fix

d700808

fix glog install

484fabe

add yazhu fix

522dfec

fix

1ec4f5b

fix

b7f0c0c

fix

5ba4347

gzy19990617 mentioned this pull request Mar 14, 2025

【Inference】remove moe code PaddlePaddle/Paddle#71610

Merged

yuanlehome reviewed Mar 14, 2025

View reviewed changes

csrc/gpu/moe/fused_moe/fused_moe.cu Outdated Show resolved Hide resolved

csrc/setup_cuda.py Outdated Show resolved Hide resolved

gzy19990617 added 7 commits March 17, 2025 14:26

rm glog

ddc43ae

fix

b95d1ad

fix

f4ee417

fix

a580ffd

fix glog

ee509d8

fix

0f77c73

fix ci

45b5e0e

yuanlehome approved these changes Mar 18, 2025

View reviewed changes

ZHUI merged commit 63c4ea4 into PaddlePaddle:develop Mar 18, 2025
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Inference】Migrate MoE Kernel from Paddle Inner #10063

【Inference】Migrate MoE Kernel from Paddle Inner #10063

Uh oh!

gzy19990617 commented Mar 10, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 10, 2025

Uh oh!

codecov bot commented Mar 10, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Mar 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

yuanlehome left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

【Inference】Migrate MoE Kernel from Paddle Inner #10063

【Inference】Migrate MoE Kernel from Paddle Inner #10063

Uh oh!

Conversation

gzy19990617 commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Mar 10, 2025

Uh oh!

codecov bot commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

CLAassistant commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuanlehome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gzy19990617 commented Mar 10, 2025 •

edited

Loading

codecov bot commented Mar 10, 2025 •

edited

Loading

CLAassistant commented Mar 11, 2025 •

edited

Loading