[TRITON] Sagev2 patch by juuso-oskari · Pull Request #2240 · ROCm/aiter

juuso-oskari · 2026-03-10T09:08:25Z

This PR concerns the sage attention kernels (the vanilla and the mxfp4):

Use int64 offsets to prevent B x S x H x D from overflowing int32 values
Move the sage attention quantization kernels to their own files.
Revert to un-fused way of quantization for perf boost for mxfp4 sage attention.

…for v if there is possibility of confusion

github-actions · 2026-03-10T09:09:03Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:multi-gpu`	Multi-GPU op tests (8 GPU)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2240 --add-label <label>

Copilot

Pull request overview

This PR updates SageAttention (vanilla + MXFP4) Triton paths to avoid int32 overflow in pointer arithmetic by promoting offsets to int64, and refactors MXFP4 quantization to use an unfused quantization path for better performance.

Changes:

Cast key program ids / offsets to tl.int64 in Sage attention kernels to prevent overflow in pointer calculations.
Move Sage quantization logic into dedicated quant wrapper / kernel modules and switch MXFP4 wrapper to the unfused quantization path.
Update MXFP4 benchmark to use CUDA-graph benchmarking and add a -test flag to optionally run accuracy checks.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
op_tests/op_benchmarks/triton/bench_fav3_sage_mxfp4.py	Switches benchmark timing method and gates correctness tests behind a new CLI flag.
aiter/ops/triton/quant/sage_attention_quant_wrappers.py	Introduces Python-level quantization wrappers (rotation/smoothing/downcast) used by Sage attention.
aiter/ops/triton/attention/fav3_sage_attention_mxfp4_wrapper.py	Rewires MXFP4 forward wrapper to use the new unfused quantization wrapper.
aiter/ops/triton/attention/fav3_sage.py	Redirects Sage quant import to the new quant wrapper module.
aiter/ops/triton/_triton_kernels/quant/sage_attention_quant.py	Adds Triton kernels for Sage quantization (including int64 pid handling).
aiter/ops/triton/_triton_kernels/attention/fav3_sage_attention_mxfp4.py	Promotes program ids to int64 and removes in-file quantization helpers.
aiter/ops/triton/_triton_kernels/attention/fav3_sage_attention.py	Promotes program ids to int64 and removes in-file quantization helpers.
3rdparty/composable_kernel	Updates the CK submodule revision.

Comments suppressed due to low confidence (1)

aiter/ops/triton/attention/fav3_sage_attention_mxfp4_wrapper.py:1

The wrapper’s public flags (hadamard_rotation, q_smooth, R, BLOCK_R) are no longer passed into quantization, so toggling these options will not affect behavior as the API suggests. Consider either (a) plumbing these args through to a quant path that honors them (e.g., call fused_sage_quant_mxfp4 when requested, or extend the unfused path to accept/implement them), or (b) explicitly disallow/raise when these flags are enabled to avoid silent misconfiguration.

# SPDX-License-Identifier: MIT

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aiter/ops/triton/quant/sage_attention_quant_wrappers.py

op_tests/op_benchmarks/triton/bench_fav3_sage_mxfp4.py

aiter/ops/triton/_triton_kernels/attention/fav3_sage_attention_mxfp4.py

op_tests/op_benchmarks/triton/bench_fav3_sage_mxfp4.py

aiter/ops/triton/quant/sage_attention_quant_wrappers.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

azaidy

LGTM!

* revert to unfused quant kernels for perf * int64 offsets to avoid bhsd overflow of int32

juuso-oskari added 6 commits March 9, 2026 07:48

revert to unfused quant kernels for perf

ba59a03

both fused and not fused as quantization options. Also use v strides …

78ccc25

…for v if there is possibility of confusion

int32 ptr overflow fix. Still need checking for perf effect

6c0ad57

add fix to fused aswell

5f5c62d

move quantization kernels and wrappers to their own files

d1220f8

turn to int32 for block masking computations

39f1fe8

juuso-oskari requested review from a team and Copilot March 10, 2026 09:08

revert ck change

cb8ba5f

Copilot AI reviewed Mar 10, 2026

View reviewed changes

juuso-oskari and others added 6 commits March 10, 2026 09:16

blacks

df5ab45

Apply suggestions from code review

6a4bae1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'main' into sagev2_patch

745e709

Apply suggestions from code review

ec06b2b

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

blacks

eb364e0

Update aiter/ops/triton/_triton_kernels/quant/sage_attention_quant.py

e5dbb91

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

juuso-oskari requested a review from Copilot March 10, 2026 09:31

remove unused imports

757f315

Copilot started reviewing on behalf of juuso-oskari March 10, 2026 09:33 View session

remove unused imports

4f47998

This comment was marked as outdated.

Sign in to view

Chi-Chu319 previously approved these changes Mar 10, 2026

View reviewed changes

Merge branch 'main' into sagev2_patch

674e444

jcaraban previously approved these changes Mar 10, 2026

View reviewed changes

juuso-oskari added 5 commits March 10, 2026 14:22

tidying up

e5c24c6

fix fused quant

b766b0f

Merge branch 'main' into sagev2_patch

1435da8

fix qsmooth head q head k error

ed67c44

merge

dd86dd2

blacks

f7e02b1

juuso-oskari dismissed stale reviews from Chi-Chu319 and jcaraban via f7e02b1 March 10, 2026 17:09

juuso-oskari and others added 15 commits March 11, 2026 09:31

Merge branch 'main' into sagev2_patch

0e00b5a

Apply suggestions from code review

86898dd

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Apply suggestions from code review

e3999c2

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

revert ck

3ee994d

blacks

13afb7b

ruff

8371759

Update op_tests/op_benchmarks/triton/bench_fav3_sage_mxfp4.py

00cf998

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

black

543d888

Merge branch 'main' into sagev2_patch

e817fe1

default to bf16 in hadamard create and pass q.dtype from outside

c87a5dd

change default of hadamard to bfloat16

565ca91

blacks

7f7d557

bias masking

a64f028

black

9b4c24a

revert csrc

2649402

juuso-oskari requested a review from azaidy March 11, 2026 13:09

juuso-oskari self-assigned this Mar 11, 2026

azaidy approved these changes Mar 11, 2026

View reviewed changes

juuso-oskari merged commit 3903934 into main Mar 11, 2026
35 checks passed

juuso-oskari deleted the sagev2_patch branch March 11, 2026 14:57

juuso-oskari restored the sagev2_patch branch March 11, 2026 14:57

valarLip pushed a commit that referenced this pull request Mar 18, 2026

[TRITON] Sagev2 patch (#2240)

56f4693

* revert to unfused quant kernels for perf * int64 offsets to avoid bhsd overflow of int32

AMD-yanfeiwang pushed a commit to AMD-yanfeiwang/aiter that referenced this pull request Mar 18, 2026

[TRITON] Sagev2 patch (ROCm#2240)

31e01b2

* revert to unfused quant kernels for perf * int64 offsets to avoid bhsd overflow of int32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRITON] Sagev2 patch#2240

[TRITON] Sagev2 patch#2240
juuso-oskari merged 37 commits intomainfrom
sagev2_patch

juuso-oskari commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

azaidy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

juuso-oskari commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

azaidy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

juuso-oskari commented Mar 10, 2026 •

edited

Loading