-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use non locking atomic intrinsics #97527
Conversation
Makes PAL use intrinsics that are guaranteed to not lock with Clang and with other compilers (probably only GCC) checks the C11 `lock free atomics` defines as there seems to be no better way to verify safety there. Fixes dotnet#97452.
src/coreclr/pal/inc/pal.h
Outdated
@@ -3639,24 +3639,42 @@ Define_InterlockMethod( | |||
CHAR, | |||
InterlockedExchange8(IN OUT CHAR volatile *Target, CHAR Value), | |||
InterlockedExchange8(Target, Value), | |||
#ifdef __clang__ | |||
return __sync_swap(pDst, iValue); | |||
#elif ATOMIC_CHAR_LOCK_FREE == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot depend on defines like this in CoreCLR PAL headers. CoreCLR PAL headers cannot depend on system headers where they are defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would moving the checks for those to the FCall definitions make them okay? Or would they be not accessible there either and need to be somehow wrapped with CMake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Nothing in CoreCLR outside the PAL implementation can depend on system headers.
One way to fix this is to use __sync_swap
clang-specific intrinsic for clang, and implement it using __sync_val_compare_and_swap
for non-clang. Like:
#ifdef __clang__
Define_InterlockMethod(
LONGLONG,
InterlockedExchange64(IN OUT LONGLONG volatile *Target, IN LONGLONG Value),
InterlockedExchange64(Target, Value),
__sync_swap(Target, Value)
)
#else
inline LONGLONG InterlockedExchange64(LONGLONG volatile * Target, LONGLONG Value)
{
LONGLONG Old;
do {
Old = *Target;
} while (__sync_val_compare_and_swap(Target, Old, Value) != Old);
PAL_InterlockedOperationBarrier();
return Old;
}
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__sync_val_compare_and_swap
isn't guaranteed to be lock free in GCC like we've discussed earlier (GCC docs say it's implemented with __atomic
in fact) so that wouldn't help here.
Maybe the define test could just be done in the CMake files completely instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that it is not guaranteed, but it is reasonable to assume that the implementation is lock-free on platforms that we care about.
How would you do that in CMake files?
clang is fine. I would like to avoid some complicated solution that is just for gcc. As I have said earlier, we do not have any testing for gcc-based builds, so there is no way to validate that it actually works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you do that in CMake files?
Pushed a commit, a part of it is commented out cause I want to verify on the CI.
As I have said earlier, we do not have any testing for gcc-based builds, so there is no way to validate that it actually works.
I've seen that some Linux distros ship GCC built dotnet in their repositories so I'd prefer to guarantee its working for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if __has_builtin(__c11_atomic_exchange)
...
#elif __has_builtin(__sync_swap)
...
#else
...
#endif cc @janvorli |
The presence of |
|
It will however return false on platforms without supported lockfree atomics which will make the code use the locking ones which is explicitly what we don't want here. |
Not seeing how? Suggestion was to use |
The issue is that we don't want to fallback to |
It will not fallback to |
We want to fail building on platforms where it's not available though as |
Also, I'm not sure if |
@jkotas I've just discovered that Clang will use the locking implementation even for |
I am not sure if I completely follow all the reasoning here. I have also considered atomic without lock to be possible for sizes smaller or equal to the 32 / 64 bits depending on the architectures - that means 32 bit CPUs that we support would be able to do 32 bit atomic ops and 64 bit architectures 64 bit atomic operations at most. Runtime should never try to use atomic operations for anything larger than the architecure size, so I am not sure what is the real issue that's being solved here. |
We have managed APIs that require atomic 64 bit ops even on 32 bit CPUs. All current 32 bit CPUs support that, except RISCV 32 (without extensions). We are unlikely to ever support RISCV 32. |
So the code that is there today is about as unportable as alternatives, and there is nothing to do? |
I'd guess the only solution would be to hardcode the ASM GCC generates which is lock free for ARM32 and just check the defines for other platforms? |
What are the compilers, intrinsics and platforms that have actual problem? If the problem is limited to gcc InterlockedExchange64 on arm32, the fix can be to enable runtime/src/coreclr/pal/inc/pal.h Line 3671 in 492ed0e
|
Looking on godbolt, calls seem to be emitted for:
|
Do these helpers actually use locks, or is it just a bogus warning? |
As far as I can see, |
The managed |
Apparently setting |
Looking at our current runtime armv7 compiled code, I don't see any calls to the sync helpers, but rather inline ldrex / strex for 32 bit and ldrexd/strexd for 64 bit operations. Expanded e.g. from the InterlockedCompareExchange64/InterlockedCompareExchange. So, I am not sure why the linked output from the compiler explorer shows calls to the helpers. I've compiled runtime with clang 10, but the compiler explorer shows the helpers for that clang too. I was thinking that maybe the linker replaces those calls, but even the object files contain the inlined instructions. |
We set march to armv7: runtime/eng/native/configurecompiler.cmake Line 747 in 9a3cacd
|
Does that maybe not get piped through to all the places? The CMake check I've added failed on Arm32 which seemed to suggest it wasn't compiling with that. |
Is this PR ready to review/merge? |
This PR is a cleanup to make the code more portable. It is not clear whether it is an improvement from the discussion so far. As far as we know, there is no real problem fixed by this change on the currently supported platforms. |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Makes PAL use intrinsics that are guaranteed to not lock with Clang and with other compilers (probably only GCC) checks the C11
lock free atomics
defines as there seems to be no better way to verify safety there.Fixes #97452.
cc @jkotas