Skip to content

Conversation

@dhonnappa-amd
Copy link

Cherry-pick of #2584

This change removes need for fences in global_reduce by converting the
stores to reduce_buffer[] into atomics+return. This is crucial for perf
in architectures with split caches (e.g. MI300), where fences are
inherently costly.

cherry-pick of pytorch#161180
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 27, 2025

Jenkins build for f26d454ef821afe238e7745df648afe65f8b61e5 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jerrymannil jerrymannil marked this pull request as ready for review August 27, 2025 16:37
@jerrymannil jerrymannil merged commit 28f820a into rocm7.1_internal_testing Aug 27, 2025
1 of 3 checks passed
@jerrymannil jerrymannil deleted the autogenerated/rocm7.1_internal_testing_cherry-pick_pr-2584 branch August 27, 2025 16:37
jerrymannil added a commit that referenced this pull request Sep 5, 2025
…uce (#2586)

Cherry-pick of #2584

Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants