-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove volatile from atomics #1672
Conversation
@MrBurmark, would you be willing to look at the changes to hip atomics? The previous implementation of atomicCAS relied on volatile, so I changed the implementation to use an atomic load. To avoid a circular dependency, I've changed in the implementation of atomicLoad to use the intrinsic if available, otherwise I fall back to atomicOr(address, 0). Does that make sense, or would it be better to fall back to atomicCAS(address, 0, 0) or something else? I think atomicOr will be better than atomicAdd, but I'm not sure if atomicCAS can avoid the write in some cases. I also modified atomicExchange to use reinterpret casting instead of atomicCAS in a loop, then made atomicStore use atomicExchange if the intrinsic is not available. If these changes make sense, then I will do basically the same thing in CUDA and then we can fully get rid of volatile. |
Its best to avoid using |
I'll be out most of this afternoon and all day tomorrow, so if the tests come back passing, feel free to merge when you think it is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adayton1
This is looking pretty good. You double checked which types were available in which hardware for cuda and hip? It looks like the cuda and hip backend are more similar now but they are still a bit different on which types are supported for which hardware. |
Yeah, I double checked the types. The main differences are that Hip doesn't provide an atomicInc or atomicDec, and CUDA supports an additional type for atomicMin and atomicMax. |
I keep hitting unrelated errors in the CI: [info: cloning spack develop branch from github] |
Summary
__CUDA_ARCH__
is less than 350 since RAJA requires that as the minimum supported architecture anyway