Update some HIP kernels support different warp_size(topksoftmax/grouptopk, cache, sample) by junhaha666 · Pull Request #2599 · ROCm/aiter

junhaha666 · 2026-04-02T15:05:40Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-04-02T15:06:36Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2599 --add-label <label>

Copilot

Pull request overview

This PR updates several HIP kernels and helpers to better handle GPUs with different warp (wavefront) sizes by replacing hard-coded 64 assumptions with WARP_SIZE / runtime queries, and tightening architecture gating for certain asm paths.

Changes:

Make topk-softmax (including grouped topk) kernels compute warp-related launch/reduction behavior using WARP_SIZE / get_warp_size_func() instead of assuming wave64.
Update cache reshape+quant kernels and sampling kernels to remove hard-coded warp-size template parameters and use shared warp-size helpers.
Gate MoE asm topk-softmax usage/tests to specific gfx targets (gfx942, gfx950).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
op_tests/test_moeTopkSoftmax.py	Skips asm path except on `gfx942/gfx950` to match supported targets.
csrc/kernels/topk_softmax_kernels.cu	Makes launch bounds and load vectorization more warp-size aware; adjusts BYTES_PER_LDG selection.
csrc/kernels/topk_softmax_kernels_group.cu	Reworks reductions to use shared `wave_reduce` and makes defaults depend on `WARP_SIZE`.
csrc/kernels/sample_kernels.cu	Removes fixed warp-size template parameters to rely on shared warp utilities.
csrc/kernels/moe_fused_gate.cu	Computes rows-per-warp/CTA from `WARP_SIZE` inside the kernel for portability.
csrc/kernels/cache_kernels.cu	Uses `WARP_SIZE` in per-token-quant reshape kernel indexing and adjusts launch sizing.
csrc/include/hip_reduce.h	Changes default warp-size template parameters to use `WARP_SIZE`.
csrc/include/aiter_hip_common.h	Adds warning about using `WARP_SIZE` as a host-side constexpr.
aiter/fused_moe.py	Restricts asm fused_topk usage to `gfx942/gfx950`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…topk, cache, sample) (#2599) * update topk_softmax * update hip group topk * rm warpsize in sample_kernels.cu * update cache.cu * update * update2

junhaha666 added 5 commits April 2, 2026 15:03

update topk_softmax

1343f20

update hip group topk

2c640e8

rm warpsize in sample_kernels.cu

ad0b71e

update cache.cu

1b45395

update

d8b530e

junhaha666 requested review from a team and Copilot April 2, 2026 15:05

Copilot started reviewing on behalf of junhaha666 April 2, 2026 15:07 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Comment thread csrc/kernels/topk_softmax_kernels_group.cu

Comment thread csrc/kernels/topk_softmax_kernels_group.cu

Comment thread csrc/kernels/topk_softmax_kernels.cu

Comment thread csrc/kernels/topk_softmax_kernels.cu

Comment thread csrc/kernels/topk_softmax_kernels.cu

update2

93bd3c6

junhaha666 added the ci:atom label Apr 3, 2026

valarLip approved these changes Apr 4, 2026

View reviewed changes

valarLip merged commit 1e9d2cc into main Apr 4, 2026
44 of 47 checks passed

valarLip deleted the jun/wip_045_new branch April 4, 2026 08:02

sunway513 mentioned this pull request Apr 5, 2026

Add FlyDSL fused RoPE + KV Cache backend #2617

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update some HIP kernels support different warp_size(topksoftmax/grouptopk, cache, sample)#2599

Update some HIP kernels support different warp_size(topksoftmax/grouptopk, cache, sample)#2599
valarLip merged 6 commits intomainfrom
jun/wip_045_new

junhaha666 commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

junhaha666 commented Apr 2, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot commented Apr 2, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants