[Feature] Support MOE Cutlass backend for latent MOE#7428
[Feature] Support MOE Cutlass backend for latent MOE#7428chang-wenbin merged 5 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7428 +/- ##
==========================================
Coverage ? 73.95%
==========================================
Files ? 398
Lines ? 54947
Branches ? 8609
==========================================
Hits ? 40634
Misses ? 11594
Partials ? 2719
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-16 20:26 CST
📋 Review 摘要
PR 概述:为 latent MOE 模型添加 Cutlass backend 支持,在 MoE 计算前后应用 fc1/fc2 投影层
变更范围:model_executor/layers/moe/、model_executor/layers/quantization/
影响面 Tag:OP
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fused_moe_backend_base.py:244 |
基类 apply() 以位置参数传递 fc1/fc2_latent_proj 给 apply_tp(),与 BlackwellGemmFusedMoeMethod 和 ModelOptNvFp4FusedMoE 签名不兼容 |
| 🟡 建议 | fused_moe_deepgemm_backend.py:741 |
DeepGemm backend 接受但静默忽略 latent proj 参数 |
总体评价
Cutlass backend 的 latent projection 实现逻辑清晰,测试覆盖了核心路径。但基类 apply() 中以位置参数调用 apply_tp() 与多个子类签名不兼容,存在运行时崩溃和参数错位的风险,需要修复后再合入。
| ) | ||
| else: | ||
| return self.apply_tp(layer, x, gate, topk_ids_hookfunc) | ||
| return self.apply_tp(layer, x, gate, topk_ids_hookfunc, fc1_latent_proj, fc2_latent_proj) |
There was a problem hiding this comment.
🔴 Bug 基类 apply() 将 fc1_latent_proj 和 fc2_latent_proj 以位置参数传入 apply_tp(),但部分子类的 apply_tp 签名不兼容,会导致运行时错误:
-
BlackwellGemmFusedMoeMethod.apply_tp()(fused_moe_blackwell_backend.py:854)签名中没有fc1_latent_proj/fc2_latent_proj参数,当传入非None值时将抛出TypeError: apply_tp() takes 5 positional arguments but 7 were given。 -
ModelOptNvFp4FusedMoE.apply_tp()(nvfp4.py:860)签名中shared_experts在第 5 个位置参数,而基类传入的第 5 个位置参数是fc1_latent_proj。这会导致fc1_latent_proj被错误地赋值给shared_experts,fc2_latent_proj被赋值给fc1_latent_proj,而实际的fc2_latent_proj永远为None。
建议修复方式(二选一):
方案 A:将基类调用改为关键字参数,并确保所有子类签名统一:
return self.apply_tp(
layer, x, gate, topk_ids_hookfunc,
fc1_latent_proj=fc1_latent_proj,
fc2_latent_proj=fc2_latent_proj,
)同时在 BlackwellGemmFusedMoeMethod.apply_tp() 签名中补充 fc1_latent_proj=None, fc2_latent_proj=None。
方案 B:统一所有子类 apply_tp 的参数顺序,确保与基类抽象方法签名一致(推荐)。
| topk_ids_hookfunc: Callable = None, | ||
| fc1_latent_proj: nn.Layer = None, | ||
| fc2_latent_proj: nn.Layer = None, | ||
| ) -> paddle.Tensor: |
There was a problem hiding this comment.
🟡 建议 DeepGemmFusedMoeMethod.apply_tp() 接受了 fc1_latent_proj 和 fc2_latent_proj 参数但完全未使用。PR 描述中注明"未实现功能",但如果调用方传入了非 None 的 latent projection,数据将被静默忽略,不会产生任何报错,容易导致难以排查的正确性问题。
建议添加显式的警告或异常:
if fc1_latent_proj is not None or fc2_latent_proj is not None:
raise NotImplementedError(
"DeepGemm backend does not yet support latent projection. "
"Please use the Cutlass backend instead."
)
Motivation
为 latent MOE 模型添加 Cutlass backend 支持,允许在 MoE 计算前后应用 fc1/fc2 投影层。
Modifications
MoEMethodBase.apply_tp()和apply()中添加fc1_latent_proj和fc2_latent_proj可选参数CutlassMoEMethod.apply_tp()中实现 latent projection 逻辑DeepGemmFusedMoeMethod.apply_tp()中更新方法签名(但未实现功能)Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.