-
Notifications
You must be signed in to change notification settings - Fork 37
Status of atomicAddNoRet #19
Comments
@al42and there are restrictions on when global_atomic_add_f32 can be used, so the compiler can't generate it by default. You can either get it, as you indicate, via a call to atomicAddNoRet, or by adding -munsafe-fp-atomics to the compiler options. |
Thanks for the reply, @b-sumner! So, |
@al42and correct, it is not going away in the near future. |
@b-sumner, Thank you for confirming! One more question, if allowed by NDA: shall |
@al42and I'd suggest using atomicAddNoRet() only on gfx908. On gfx90a, only, unsafeAtomicAdd() can be used instead (and supports a return value). There is a double overload in addition to the float overload. But I still encourage -munsafe-fp-atomics so you can use the standard atomicAdd(). |
The problem with this solution for me is that I'm working on a pretty large codebase. Currently, this option can be enabled just fine because the return value from But introducing an option that alters a major way (different return value) the behavior of a common function globally but only on certain hardware (MI100) is very dangerous long-term. We might introduce a new kernel or add a library that relies on a standard-compliant behavior of |
Note that If the return value is used, then the MI-100 no-return atomic add instruction won't be generated with -munsafe-fp-atomics. |
Oh, that's great news! My questions are answered, but I think it might be good if the topic of FP atomic support was more elaborated in the docs. I saw some scattered mentions that they are not supported on AMD hardware, but any deeper info (e.g., that the "noret" version exists and that |
Thanks, I'll pass this along. |
@b-sumner can you please help with clarifying things a bit further, the atomics support and intrinsics are unfortunately frustratingly undocumented by AMD. In addition, you suggest that |
@pszi1ard, the documentation issue is known and steps are being taken to improve it. Regarding IEEE 754 compilance, note that C++20 states that he floating-point environment for atomic arithmetic operations on But the main issue here is that for the devices that support them, non shared memory atomic floating point add is implemented in the device L2 cache and if the pointed-to memory is not cacheable, the add may have no effect. The compiler has no control over where the pointer is pointing, so it is up to the developer to assert that they accept this behavior either by using atomicAddNoRet (gfx908) or unsafeAtomicAdd (gfx90a) or use -munsafe_fp_atomics. |
Hello!
I would like to inquire about the state of the
atomicAddNoRet
function. It gives our code (GROMACS) a 2x speed-up in one of the kernels when running on MI100 (gfx908), compared to a plainatomicAdd
(which gets compiled into a CAS-loop). So, I would really like to keep using the noret version, since the return value is anyway ignored.However,
atomicAddNoRet
is marked as deprecated, and a plainatomicAdd
is suggested instead (with no indications of possible performance degradation, by the way!). Could you please advise on what function should be used? I also considered using the__ockl_atomic_add_noret_f32
intrinsic directly, but it's also not documented.We are using with ROCm 4.5.2 and hipSYCL for our code. However, the problem is easily demonstrated with the plain HIP (ROCm 4.5.2 and 5.0.0 tested):
Examining the
test-hip-amdgcn-amd-amdhsa-gfx908.s
file, we see that_Z15atomicAddKernelPf
contains a loop ofglobal_atomic_cmpswap
, while_Z20atomicAddNoRetKernelPf
only has one nice littleglobal_atomic_add_f32
call.The text was updated successfully, but these errors were encountered: