[Feature] Support MOE Cutlass backend for latent MOE by chang-wenbin · Pull Request #7428 · PaddlePaddle/FastDeploy

chang-wenbin · 2026-04-16T04:02:56Z

Motivation

为 latent MOE 模型添加 Cutlass backend 支持，允许在 MoE 计算前后应用 fc1/fc2 投影层。

Modifications

在 MoEMethodBase.apply_tp() 和 apply() 中添加 fc1_latent_proj 和 fc2_latent_proj 可选参数
在 CutlassMoEMethod.apply_tp() 中实现 latent projection 逻辑
在 DeepGemmFusedMoeMethod.apply_tp() 中更新方法签名（但未实现功能）

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-16T04:03:03Z

Thanks for your contribution!

codecov-commenter · 2026-04-16T05:47:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@a498720). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7428   +/-   ##
==========================================
  Coverage           ?   73.95%           
==========================================
  Files              ?      398           
  Lines              ?    54947           
  Branches           ?     8609           
==========================================
  Hits               ?    40634           
  Misses             ?    11594           
  Partials           ?     2719

Flag	Coverage Δ
GPU	`73.95% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-16 20:26 CST

📋 Review 摘要

PR 概述：为 latent MOE 模型添加 Cutlass backend 支持，在 MoE 计算前后应用 fc1/fc2 投影层
变更范围：model_executor/layers/moe/、model_executor/layers/quantization/
影响面 Tag：OP

问题

级别	文件	概述
🔴 Bug	`fused_moe_backend_base.py:244`	基类 `apply()` 以位置参数传递 fc1/fc2_latent_proj 给 `apply_tp()`，与 BlackwellGemmFusedMoeMethod 和 ModelOptNvFp4FusedMoE 签名不兼容
🟡 建议	`fused_moe_deepgemm_backend.py:741`	DeepGemm backend 接受但静默忽略 latent proj 参数

总体评价

Cutlass backend 的 latent projection 实现逻辑清晰，测试覆盖了核心路径。但基类 apply() 中以位置参数调用 apply_tp() 与多个子类签名不兼容，存在运行时崩溃和参数错位的风险，需要修复后再合入。

PaddlePaddle-bot · 2026-04-16T12:26:52Z

                )
        else:
-            return self.apply_tp(layer, x, gate, topk_ids_hookfunc)
+            return self.apply_tp(layer, x, gate, topk_ids_hookfunc, fc1_latent_proj, fc2_latent_proj)


🔴 Bug 基类 apply() 将 fc1_latent_proj 和 fc2_latent_proj 以位置参数传入 apply_tp()，但部分子类的 apply_tp 签名不兼容，会导致运行时错误：

BlackwellGemmFusedMoeMethod.apply_tp()（fused_moe_blackwell_backend.py:854）签名中没有 fc1_latent_proj/fc2_latent_proj 参数，当传入非 None 值时将抛出 TypeError: apply_tp() takes 5 positional arguments but 7 were given。

ModelOptNvFp4FusedMoE.apply_tp()（nvfp4.py:860）签名中 shared_experts 在第 5 个位置参数，而基类传入的第 5 个位置参数是 fc1_latent_proj。这会导致 fc1_latent_proj 被错误地赋值给 shared_experts，fc2_latent_proj 被赋值给 fc1_latent_proj，而实际的 fc2_latent_proj 永远为 None。

建议修复方式（二选一）：

方案 A：将基类调用改为关键字参数，并确保所有子类签名统一：

return self.apply_tp( layer, x, gate, topk_ids_hookfunc, fc1_latent_proj=fc1_latent_proj, fc2_latent_proj=fc2_latent_proj, )

同时在 BlackwellGemmFusedMoeMethod.apply_tp() 签名中补充 fc1_latent_proj=None, fc2_latent_proj=None。

方案 B：统一所有子类 apply_tp 的参数顺序，确保与基类抽象方法签名一致（推荐）。

PaddlePaddle-bot · 2026-04-16T12:26:52Z

        topk_ids_hookfunc: Callable = None,
+        fc1_latent_proj: nn.Layer = None,
+        fc2_latent_proj: nn.Layer = None,
    ) -> paddle.Tensor:


🟡 建议 DeepGemmFusedMoeMethod.apply_tp() 接受了 fc1_latent_proj 和 fc2_latent_proj 参数但完全未使用。PR 描述中注明"未实现功能"，但如果调用方传入了非 None 的 latent projection，数据将被静默忽略，不会产生任何报错，容易导致难以排查的正确性问题。

建议添加显式的警告或异常：

if fc1_latent_proj is not None or fc2_latent_proj is not None: raise NotImplementedError( "DeepGemm backend does not yet support latent projection. " "Please use the Cutlass backend instead." )

K11OntheBoat

LGTM

support moe cutlass backend latent moe

42c0f63

chang-wenbin had a problem deploying to Metax_ci April 16, 2026 04:03 — with GitHub Actions Error

chang-wenbin requested a review from zhoutianzi666 April 16, 2026 04:19

This comment was marked as outdated.

Sign in to view

support moe cutlass backend latent moe

41e1836

chang-wenbin had a problem deploying to Metax_ci April 16, 2026 04:24 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

chang-wenbin changed the title ~~Support MOE Cutlass backend for latent MOE~~ [Feature] Support MOE Cutlass backend for latent MOE Apr 16, 2026

support moe cutlass backend latent moe

d56d61a

chang-wenbin had a problem deploying to Metax_ci April 16, 2026 07:09 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

zhoutianzi666 previously approved these changes Apr 16, 2026

View reviewed changes

support moe cutlass backend latent moe

d322a11

chang-wenbin dismissed zhoutianzi666’s stale review via d322a11 April 16, 2026 09:00

chang-wenbin had a problem deploying to Metax_ci April 16, 2026 09:00 — with GitHub Actions Failure

chang-wenbin requested a review from zhoutianzi666 April 16, 2026 11:56

update tests

8a3f63d

chang-wenbin had a problem deploying to Metax_ci April 16, 2026 12:13 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes Apr 16, 2026

View reviewed changes

K11OntheBoat approved these changes Apr 16, 2026

View reviewed changes

chang-wenbin merged commit 6ce4854 into PaddlePaddle:develop Apr 16, 2026
34 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support MOE Cutlass backend for latent MOE#7428

[Feature] Support MOE Cutlass backend for latent MOE#7428
chang-wenbin merged 5 commits intoPaddlePaddle:developfrom
chang-wenbin:mla_latent_moe

chang-wenbin commented Apr 16, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 16, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 16, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Uh oh!

K11OntheBoat left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

chang-wenbin commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 16, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

K11OntheBoat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chang-wenbin commented Apr 16, 2026 •

edited

Loading

codecov-commenter commented Apr 16, 2026 •

edited

Loading