Group_topk: moe_fused_gate support num expert is not power of 2 by junhaha666 · Pull Request #2604 · ROCm/aiter

junhaha666 · 2026-04-03T05:45:51Z

test on MI308：

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-04-03T05:46:20Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2604 --add-label <label>

Copilot

Pull request overview

This PR aims to extend the MoE fused gating/top-k path to handle non-power-of-two expert counts by relaxing prior constraints and adding specialized dispatch configurations, and updates the Python wrapper to rely less on the previous “power-of-2 experts only” limitation.

Changes:

Relaxed moe_fused_gate’s host-side “num_experts must be power-of-2” constraint and added tuned specializations for num_experts = 192 and 96.
Adjusted moe_fused_gate_impl loading logic to use scalar loads for dynamic params and vector loads for static params.
Updated biased_grouped_topk routing logic to no longer force a fallback for non-power-of-two expert counts.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
`csrc/kernels/moe_fused_gate.cu`	Adds new expert-count specializations, changes input/bias loading strategy, and introduces a HIP device guard; removes (comments out) the power-of-2 check.
`aiter/ops/topk.py`	Updates wrapper routing logic to use `moe_fused_gate` without checking `num_experts` power-of-2; updates copyright year.

Comments suppressed due to low confidence (1)

csrc/kernels/moe_fused_gate.cu:581

The comment above the divisibility check is incorrect: num_experts % num_expert_group == 0 does not imply num_expert_group is a power of 2. Also, multithread_reduce(..., thread_num) in hip_reduce.h only has explicit implementations for thread_num in {1,2,4,8,16,32}, so non-power-of-two num_expert_group will fall through and produce incorrect results. Please add a TORCH_CHECK that num_expert_group is a supported power-of-two (and ideally <= 32), and update/remove the misleading comment.

    // Check 2: Ensure that num_experts is divisible by num_expert_group. (this also means
    // num_expert_group is power of 2)
    TORCH_CHECK(num_experts % num_expert_group == 0,
                "num_experts must be divisible by num_expert_group, but got ",
                num_experts,
                " / ",
                num_expert_group);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…96) (#2604)

Copilot AI review requested due to automatic review settings April 3, 2026 05:45

Copilot started reviewing on behalf of junhaha666 April 3, 2026 05:47 View session

Group_topk: moe_fused_gate support num expert is not power of 2 (192/96)

3b15019

junhaha666 force-pushed the jun/group_topk branch from ebf320d to 3b15019 Compare April 3, 2026 05:49

junhaha666 requested a review from a team April 3, 2026 05:49

Copilot AI reviewed Apr 3, 2026

View reviewed changes

valarLip approved these changes Apr 3, 2026

View reviewed changes

valarLip merged commit a381890 into main Apr 3, 2026
37 of 39 checks passed

valarLip deleted the jun/group_topk branch April 3, 2026 10:35

sunway513 mentioned this pull request Apr 5, 2026

Add FlyDSL fused RoPE + KV Cache backend #2617

Closed

3 tasks

yzhou103 pushed a commit that referenced this pull request Apr 8, 2026

Group_topk: moe_fused_gate support num expert is not power of 2 (192/…

227581f

…96) (#2604)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group_topk: moe_fused_gate support num expert is not power of 2#2604

Group_topk: moe_fused_gate support num expert is not power of 2#2604
valarLip merged 1 commit intomainfrom
jun/group_topk

junhaha666 commented Apr 3, 2026

Uh oh!

github-actions bot commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

junhaha666 commented Apr 3, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot commented Apr 3, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants