Skip to content

[Feature] Support MOE Cutlass backend for latent MOE#7428

Merged
chang-wenbin merged 5 commits intoPaddlePaddle:developfrom
chang-wenbin:mla_latent_moe
Apr 16, 2026
Merged

[Feature] Support MOE Cutlass backend for latent MOE#7428
chang-wenbin merged 5 commits intoPaddlePaddle:developfrom
chang-wenbin:mla_latent_moe

Conversation

@chang-wenbin
Copy link
Copy Markdown
Collaborator

@chang-wenbin chang-wenbin commented Apr 16, 2026

Motivation

为 latent MOE 模型添加 Cutlass backend 支持,允许在 MoE 计算前后应用 fc1/fc2 投影层。

Modifications

  • MoEMethodBase.apply_tp()apply() 中添加 fc1_latent_projfc2_latent_proj 可选参数
  • CutlassMoEMethod.apply_tp() 中实现 latent projection 逻辑
  • DeepGemmFusedMoeMethod.apply_tp() 中更新方法签名(但未实现功能)

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 16, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@chang-wenbin chang-wenbin changed the title Support MOE Cutlass backend for latent MOE [Feature] Support MOE Cutlass backend for latent MOE Apr 16, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@a498720). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7428   +/-   ##
==========================================
  Coverage           ?   73.95%           
==========================================
  Files              ?      398           
  Lines              ?    54947           
  Branches           ?     8609           
==========================================
  Hits               ?    40634           
  Misses             ?    11594           
  Partials           ?     2719           
Flag Coverage Δ
GPU 73.95% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

zhoutianzi666
zhoutianzi666 previously approved these changes Apr 16, 2026
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-16 20:26 CST

📋 Review 摘要

PR 概述:为 latent MOE 模型添加 Cutlass backend 支持,在 MoE 计算前后应用 fc1/fc2 投影层
变更范围model_executor/layers/moe/model_executor/layers/quantization/
影响面 TagOP

问题

级别 文件 概述
🔴 Bug fused_moe_backend_base.py:244 基类 apply() 以位置参数传递 fc1/fc2_latent_proj 给 apply_tp(),与 BlackwellGemmFusedMoeMethod 和 ModelOptNvFp4FusedMoE 签名不兼容
🟡 建议 fused_moe_deepgemm_backend.py:741 DeepGemm backend 接受但静默忽略 latent proj 参数

总体评价

Cutlass backend 的 latent projection 实现逻辑清晰,测试覆盖了核心路径。但基类 apply() 中以位置参数调用 apply_tp() 与多个子类签名不兼容,存在运行时崩溃和参数错位的风险,需要修复后再合入。

)
else:
return self.apply_tp(layer, x, gate, topk_ids_hookfunc)
return self.apply_tp(layer, x, gate, topk_ids_hookfunc, fc1_latent_proj, fc2_latent_proj)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 基类 apply()fc1_latent_projfc2_latent_proj位置参数传入 apply_tp(),但部分子类的 apply_tp 签名不兼容,会导致运行时错误:

  1. BlackwellGemmFusedMoeMethod.apply_tp()fused_moe_blackwell_backend.py:854)签名中没有 fc1_latent_proj/fc2_latent_proj 参数,当传入非 None 值时将抛出 TypeError: apply_tp() takes 5 positional arguments but 7 were given

  2. ModelOptNvFp4FusedMoE.apply_tp()nvfp4.py:860)签名中 shared_experts 在第 5 个位置参数,而基类传入的第 5 个位置参数是 fc1_latent_proj。这会导致 fc1_latent_proj 被错误地赋值给 shared_expertsfc2_latent_proj 被赋值给 fc1_latent_proj,而实际的 fc2_latent_proj 永远为 None

建议修复方式(二选一):

方案 A:将基类调用改为关键字参数,并确保所有子类签名统一:

return self.apply_tp(
    layer, x, gate, topk_ids_hookfunc,
    fc1_latent_proj=fc1_latent_proj,
    fc2_latent_proj=fc2_latent_proj,
)

同时在 BlackwellGemmFusedMoeMethod.apply_tp() 签名中补充 fc1_latent_proj=None, fc2_latent_proj=None

方案 B:统一所有子类 apply_tp 的参数顺序,确保与基类抽象方法签名一致(推荐)。

topk_ids_hookfunc: Callable = None,
fc1_latent_proj: nn.Layer = None,
fc2_latent_proj: nn.Layer = None,
) -> paddle.Tensor:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 DeepGemmFusedMoeMethod.apply_tp() 接受了 fc1_latent_projfc2_latent_proj 参数但完全未使用。PR 描述中注明"未实现功能",但如果调用方传入了非 None 的 latent projection,数据将被静默忽略,不会产生任何报错,容易导致难以排查的正确性问题。

建议添加显式的警告或异常:

if fc1_latent_proj is not None or fc2_latent_proj is not None:
    raise NotImplementedError(
        "DeepGemm backend does not yet support latent projection. "
        "Please use the Cutlass backend instead."
    )

Copy link
Copy Markdown
Collaborator

@K11OntheBoat K11OntheBoat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chang-wenbin chang-wenbin merged commit 6ce4854 into PaddlePaddle:develop Apr 16, 2026
34 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants