[ROCm] Limit number of values per thread for reductions on three dimensions #2460

jerrymannil · 2025-08-05T17:16:14Z

In the current implementation of reductions in three dimensions for AMD GPUs the number of values per thread is unbounded and can end up being in the hundreds of thousands for certain tensors. This of course is bad for performance. This patch fixes this issue by increasing the parallelism and thus lowering the number of value per thread to reasonable limits i.e. less than 2048 values per thread. The performance gains can be between 10x-17x for certain examples where the number of values per thread was originally very high.

cherry-pick of pytorch#159652

Cherry-picked to release/2.8 branch via #2469

…nsions

rocm-repo-management-api · 2025-08-05T17:47:15Z

Jenkins build for 53fbf7866d00ae012510c041cb5c3e13d3a1c214 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

jerrymannil · 2025-08-05T17:56:54Z

reproducer details at pytorch#159652 (comment)

pruthvistony

Please check for any regressing models, may need to rebase the numbers.

jerrymannil · 2025-08-06T20:45:36Z

! cherry-pick --onto release/2.8

…nsions (#2460) In the current implementation of reductions in three dimensions for AMD GPUs the number of values per thread is unbounded and can end up being in the hundreds of thousands for certain tensors. This of course is bad for performance. This patch fixes this issue by increasing the parallelism and thus lowering the number of value per thread to reasonable limits i.e. less than 2048 values per thread. The performance gains can be between 10x-17x for certain examples where the number of values per thread was originally very high. cherry-pick of pytorch#159652

okakarpa · 2025-08-06T20:51:47Z

Created branch autogenerated/release/2.8_cherry-pick_pr-2460 and #2469

…d for reductions on three dimensions (#2469) Cherry-pick of #2460 Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>

[ROCm] Limit number of values per thread for reductions on three dime…

53fbf78

…nsions

jerrymannil requested a review from pruthvistony August 5, 2025 17:16

jerrymannil self-assigned this Aug 5, 2025

pruthvistony approved these changes Aug 6, 2025

View reviewed changes

jerrymannil merged commit 5cd45f9 into release/2.7 Aug 6, 2025
2 of 6 checks passed

jerrymannil deleted the 2.7_reduce_sum_fix branch August 6, 2025 18:13

okakarpa mentioned this pull request Aug 6, 2025

[AUTOGENERATED] [release/2.8] [ROCm] Limit number of values per thread for reductions on three dimensions #2469

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Limit number of values per thread for reductions on three dimensions #2460

[ROCm] Limit number of values per thread for reductions on three dimensions #2460

Uh oh!

jerrymannil commented Aug 5, 2025 •

edited by okakarpa

Loading

Uh oh!

rocm-repo-management-api bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

jerrymannil commented Aug 5, 2025

Uh oh!

pruthvistony left a comment

Uh oh!

Uh oh!

jerrymannil commented Aug 6, 2025

Uh oh!

okakarpa commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ROCm] Limit number of values per thread for reductions on three dimensions #2460

[ROCm] Limit number of values per thread for reductions on three dimensions #2460

Uh oh!

Conversation

jerrymannil commented Aug 5, 2025 • edited by okakarpa Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrymannil commented Aug 5, 2025

Uh oh!

pruthvistony left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerrymannil commented Aug 6, 2025

Uh oh!

okakarpa commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jerrymannil commented Aug 5, 2025 •

edited by okakarpa

Loading

rocm-repo-management-api bot commented Aug 5, 2025 •

edited

Loading