Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove volatile from atomics #1672

Merged
merged 43 commits into from
Jun 20, 2024
Merged

Conversation

adayton1
Copy link
Member

@adayton1 adayton1 commented Jun 14, 2024

Summary

  • This PR is a refactoring
  • It does the following:
    • Modifies hip and cuda generic atomic compare and swap algorithms to use atomic loads instead of relying on volatile
    • Re-implements atomic loads in terms of builtin atomics for cuda and hip (so that the generic compare and swap functions can use it)
    • Removes volatile qualifier in atomic function signatures
    • Uses cuda::atomic_ref in newer versions of CUDA to back atomicLoad/Store
    • Uses atomicAdd as a fallback for atomicSub in CUDA
    • Refactors CUDA atomics to reduce duplicate code (now more closely matches the Hip atomic implementations)
    • Removes checks where __CUDA_ARCH__ is less than 350 since RAJA requires that as the minimum supported architecture anyway

@adayton1
Copy link
Member Author

adayton1 commented Jun 15, 2024

@MrBurmark, would you be willing to look at the changes to hip atomics? The previous implementation of atomicCAS relied on volatile, so I changed the implementation to use an atomic load. To avoid a circular dependency, I've changed in the implementation of atomicLoad to use the intrinsic if available, otherwise I fall back to atomicOr(address, 0). Does that make sense, or would it be better to fall back to atomicCAS(address, 0, 0) or something else? I think atomicOr will be better than atomicAdd, but I'm not sure if atomicCAS can avoid the write in some cases. I also modified atomicExchange to use reinterpret casting instead of atomicCAS in a loop, then made atomicStore use atomicExchange if the intrinsic is not available.

If these changes make sense, then I will do basically the same thing in CUDA and then we can fully get rid of volatile.

@MrBurmark
Copy link
Member

Its best to avoid using atomicCAS(0,0) where possible as its slower than doing something like atomicOr(0) or atomicAdd(0). Between atomicOr(0) or atomicAdd(0) I don't know which is faster, maybe #1624 could provide some insight?

@adayton1 adayton1 requested a review from MrBurmark June 18, 2024 20:18
@adayton1
Copy link
Member Author

I'll be out most of this afternoon and all day tomorrow, so if the tests come back passing, feel free to merge when you think it is ready.

@adayton1 adayton1 requested a review from rhornung67 June 18, 2024 20:40
Copy link
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adayton1

@MrBurmark
Copy link
Member

This is looking pretty good. You double checked which types were available in which hardware for cuda and hip? It looks like the cuda and hip backend are more similar now but they are still a bit different on which types are supported for which hardware.

@adayton1
Copy link
Member Author

This is looking pretty good. You double checked which types were available in which hardware for cuda and hip? It looks like the cuda and hip backend are more similar now but they are still a bit different on which types are supported for which hardware.

Yeah, I double checked the types. The main differences are that Hip doesn't provide an atomicInc or atomicDec, and CUDA supports an additional type for atomicMin and atomicMax.

@adayton1 adayton1 requested a review from MrBurmark June 20, 2024 15:44
@adayton1
Copy link
Member Author

I keep hitting unrelated errors in the CI:

[info: cloning spack develop branch from github]
[exe: git clone --single-branch --depth=1 -b develop-2024-05-26 https://github.com/spack/spack.git spack]
Cloning into 'spack'...
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 3939 bytes of body are still expected
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
[spack python: /bin/sh: /dev/shm/lassen53-1939417/spack/bin/spack: No such file or directory]
[Checking for concretizer options...]
[disabling config scope (except defaults) in: /dev/shm/lassen53-1939417/spack/lib/spack/spack/config.py]
Traceback (most recent call last):
File "./scripts/uberenv/uberenv.py", line 1389, in
sys.exit(main())
File "./scripts/uberenv/uberenv.py", line 1335, in main
env.patch()
File "./scripts/uberenv/uberenv.py", line 892, in patch
self.disable_spack_config_scopes()
File "./scripts/uberenv/uberenv.py", line 862, in disable_spack_config_scopes
cfg_script = open(spack_lib_config).read()
FileNotFoundError: [Errno 2] No such file or directory: '/dev/shm/lassen53-1939417/spack/lib/spack/spack/config.py'

@adayton1 adayton1 merged commit 70e57de into develop Jun 20, 2024
24 checks passed
@adayton1 adayton1 deleted the feature/dayton8/no_volatile_atomics branch June 20, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants