[Codegen][GPU] Lower gpu.subgroup_reduce to DPP intrinsics on AMD GPUs #20468

Muzammiluddin-Syed-ECE · 2025-04-03T20:17:32Z

When performing cross-lane reductions using subgroup_reduce ops across contiguous lanes on AMD GPUs, lower to Data Parallel Primitives (DPP) ops when possible. This reduces latency on applicable devices.
See related #20007

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp

krzysz00

Code-wise, this looks fine to me

However

Can we get a test that just checks that DPP shows in a case when we expect it to?
Can we get perf numbers? I figure sdxl unet with/without this patch might be enlightening

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp

Muzammiluddin-Syed-ECE · 2025-05-21T03:51:12Z

Deactivated this change on the SPIRV pipeline because of this issue: #20872

Muzammiluddin-Syed-ECE · 2025-05-29T07:55:45Z

A commit has been merged upstream to fix CI failures from this PR: llvm/llvm-project@893ef7f
Upon cherry-picking the commit, failures were successfully resolved locally: See Gist

After next integrate this will be mergeable.

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

…blems Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

…onToGPU to allow pass to make decisions based on backend target Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

kuhar

Just some nits

compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_vector_distribute_gfx950.mlir

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp

compiler/src/iree/compiler/Codegen/LLVMGPU/Passes.cpp

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp

compiler/src/iree/compiler/Codegen/LLVMGPU/ROCDLLowerExecutableTarget.cpp

compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_vector_distribute_gfx1100.mlir

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp

krzysz00

Looks good to me

kuhar · 2025-06-04T14:37:51Z

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp

+    // SPIRV doesn't support clustered reduction, so if possible, avoid adding
+    // problematic attribute until it is supported.


I can see cluster sizes in the spec, e.g.: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformIAdd

What is missing?

There is a lowering missing in GPUToSPIRV for subgroup reduce ops when a cluster size attribute is specified.

https://github.com/llvm/llvm-project/blob/0a25b5022831c7465790cf99655afdcd0f91e34d/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp#L592-L594

I'd update the comment to say that the lowering is missing, not that SPIR-V doesn't support it.

To provide more context, there is varying support for subgroup reduce lowering in the three gpu related paths we support:

- Path A) ROCDL - Path B) NVVM - Path C) SPIRV

A) subgroup_reduce is fully supported.

B) & C) NVVM & SPIRV can't lower clustered subgroup_reduce ops but can support full-warp reductions

So, the issue is that the other backends have various levels of support for subgroup_reduce but VectorReductionToGPUPass is a pass that touches all three. So, some murky decisions were made to account for this:

We do not preserve subgroup reductions that would produce clustering because of SPIRV's lack of support. An example in this pass is in the warp reduction fn which first reduces within warps then across warps. We choose to lower the reduction across warps to gpu.shuffles because reductions across warps require clustered reductions at the moment.

The fix would be to add a proper lowering for subgroup reduce in the clustered case.

We introduced a flag forROCDL to gpu passes that ideally should not need to treat NVVM and ROCDL differrently. But because of the lack of clustered subgroup reduce lowering support in NVVM it was necessary.

This lack of support is easy to fix, we just need to create a non gpu.shuffle lowering in ExpandGPUOps for NVVM like we did for AMDGPU and then add support for lowering subgroup reduce in the clustered case.

Ideally at some point we could do this clean up in a follow up PR and undo these two decisions.

edit: I guess i should create an issue for this: #21006

I'd update the comment to say that the lowering is missing, not that SPIR-V doesn't support it.

done

Ok turns out I did create an issue for SPIRV, #20872, and it already has a PR open llvm/llvm-project#141402

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

… AMD GPUs (iree-org#20468)" This reverts commit 0c342e0.

Muzammiluddin-Syed-ECE requested review from MaheshRavishankar, qedawkins, kuhar, Groverkss and antiagainst as code owners April 3, 2025 20:17

Muzammiluddin-Syed-ECE marked this pull request as draft April 3, 2025 20:18

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch from 0154cf1 to 6d4c462 Compare April 3, 2025 20:25

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch 2 times, most recently from 3be73fd to 0588b97 Compare April 22, 2025 07:39

krzysz00 reviewed Apr 22, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp Show resolved Hide resolved

Muzammiluddin-Syed-ECE mentioned this pull request Apr 23, 2025

[mlir][AMDGPU] Improve DPP implementation of subgroup reduction Muzammiluddin-Syed-ECE/iree#1

Open

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch 2 times, most recently from d7689df to 6e33136 Compare April 28, 2025 05:13

Muzammiluddin-Syed-ECE commented Apr 28, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp Outdated Show resolved Hide resolved

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch 3 times, most recently from e69e013 to 1969b6b Compare May 1, 2025 19:30

krzysz00 reviewed May 1, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp Outdated Show resolved Hide resolved

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch 2 times, most recently from 1f63b8e to e553482 Compare May 5, 2025 22:30

krzysz00 reviewed May 6, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp Outdated Show resolved Hide resolved

Muzammiluddin-Syed-ECE marked this pull request as ready for review May 6, 2025 20:17

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch 3 times, most recently from 8acf341 to 60dc379 Compare May 15, 2025 01:12

Muzammiluddin-Syed-ECE mentioned this pull request May 21, 2025

[CodeGen][SPIRV] Lowering for clustered reduce not implemented #20872

Open

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch 2 times, most recently from 6234f1b to 6184e2f Compare May 21, 2025 03:42

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch from a463098 to c820e8f Compare May 21, 2025 23:12

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch from c820e8f to e765dca Compare May 22, 2025 02:09

Muzammiluddin-Syed-ECE added 11 commits June 2, 2025 05:53

Adding GPUToAMDGPU Pass to LLVMGPU pipeline

45980b8

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Moving pattern to ExpandGPUOpsPass

3bc0053

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Adding option to specify rocdl target

61a76a9

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Defer expansion of subgroup reduce in VectorDistribute pipeline.

6883068

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Fixing implementation issue

0c81d8e

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Replacing option with function to check backend target

1309108

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

PR review round 1

1fb6fad

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Adding tests to check production of subgroup reduce

4e9c97e

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Making better use of the expandSubgroupReduce flag to avoid SPIRV pro…

d9bd5cc

…blems Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Test to see if numerics issue originating from cross warp reduction

c68efbe

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Adding hacky option to LLVMGPULowerExecutableTarget and VectorReducti…

907b55f

…onToGPU to allow pass to make decisions based on backend target Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch from e765dca to 907b55f Compare June 2, 2025 05:58

kuhar reviewed Jun 2, 2025

View reviewed changes

kuhar reviewed Jun 3, 2025

View reviewed changes

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch from 48faabb to f611789 Compare June 3, 2025 14:45

Muzammiluddin-Syed-ECE requested a review from krzysz00 June 3, 2025 14:46

krzysz00 approved these changes Jun 4, 2025

View reviewed changes

kuhar reviewed Jun 4, 2025

View reviewed changes

PR Review round 2

b7a554a

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub branch from f611789 to b7a554a Compare June 4, 2025 16:09

Muzammiluddin-Syed-ECE changed the title ~~[AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs~~ [Codegen][GPU] Lower gpu.subgroup_reduce to DPP intrinsics on AMD GPUs Jun 4, 2025

Muzammiluddin-Syed-ECE mentioned this pull request Jun 4, 2025

[Codegen][NVVM] Add lowering in NVVM for clustered subgroup reduction and do associated clean up #21006

Open

Muzammiluddin-Syed-ECE enabled auto-merge (squash) June 4, 2025 16:19

kuhar approved these changes Jun 4, 2025

View reviewed changes

Muzammiluddin-Syed-ECE merged commit 0c342e0 into iree-org:main Jun 4, 2025
43 checks passed

Muzammiluddin-Syed-ECE added a commit to Muzammiluddin-Syed-ECE/iree that referenced this pull request Jun 10, 2025

Revert "[Codegen][GPU] Lower gpu.subgroup_reduce to DPP intrinsics on…

c91e02e

… AMD GPUs (iree-org#20468)" This reverts commit 0c342e0.

		// SPIRV doesn't support clustered reduction, so if possible, avoid adding
		// problematic attribute until it is supported.

[Codegen][GPU] Lower gpu.subgroup_reduce to DPP intrinsics on AMD GPUs #20468

[Codegen][GPU] Lower gpu.subgroup_reduce to DPP intrinsics on AMD GPUs #20468

Uh oh!

Conversation

Muzammiluddin-Syed-ECE commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Muzammiluddin-Syed-ECE commented May 21, 2025

Uh oh!

Muzammiluddin-Syed-ECE commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

kuhar Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Muzammiluddin-Syed-ECE Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

kuhar Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Muzammiluddin-Syed-ECE Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Muzammiluddin-Syed-ECE Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Muzammiluddin-Syed-ECE Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Muzammiluddin-Syed-ECE commented Apr 3, 2025 •

edited

Loading

Muzammiluddin-Syed-ECE commented May 29, 2025 •

edited

Loading

Muzammiluddin-Syed-ECE Jun 4, 2025 •

edited

Loading