Skip to content

Conversation

@okakarpa
Copy link
Collaborator

@okakarpa okakarpa commented Aug 6, 2025

Cherry-pick of #2460

…nsions (#2460)

In the current implementation of reductions in three dimensions for AMD
GPUs the number of values per thread is unbounded and can end up being
in the hundreds of thousands for certain tensors. This of course is bad
for performance. This patch fixes this issue by increasing the
parallelism and thus lowering the number of value per thread to
reasonable limits i.e. less than 2048 values per thread. The performance
gains can be between 10x-17x for certain examples where the number of
values per thread was originally very high.

cherry-pick of pytorch#159652
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 6, 2025

Jenkins build for 33deb83483e7827f75996182d80117661610a417 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jerrymannil jerrymannil marked this pull request as ready for review August 6, 2025 21:07
@jerrymannil jerrymannil merged commit ade02ad into release/2.8 Aug 6, 2025
0 of 2 checks passed
@jerrymannil jerrymannil deleted the autogenerated/release/2.8_cherry-pick_pr-2460 branch August 6, 2025 21:07
tvukovic-amd pushed a commit that referenced this pull request Aug 20, 2025
…d for reductions on three dimensions (#2469)

Cherry-pick of #2460

Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants