Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace activemask with ballot function #838

Merged
merged 1 commit into from
Oct 10, 2022

Conversation

DieGoldeneEnte
Copy link
Collaborator

This is done to prevent retrieving the wrong thread mask. This is also implemented in hip, but most likely not necessary.

I tested this on an AMD GPU, but didn't do any performance measurements.

fixes #823

This is done to prevent retrieving the wrong thread mask. This is also
implemented in hip, but most likely not necessary.
@DieGoldeneEnte
Copy link
Collaborator Author

@hipSYCL-ci-bot test please :)

@fodinabor
Copy link
Collaborator

fodinabor commented Sep 30, 2022

Tested on a mixed GPU setup, seems good.
Both HIP and CUDA group_reduce and group_inclusive_scan seem to be in the same performance ballpark as before (according to your sycl-bench fork :))

Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@illuhad illuhad merged commit eef3132 into AdaptiveCpp:develop Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect use of __activemask
4 participants