Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use non locking atomic intrinsics #97527

Closed
wants to merge 6 commits into from

Conversation

MichalPetryka
Copy link
Contributor

Makes PAL use intrinsics that are guaranteed to not lock with Clang and with other compilers (probably only GCC) checks the C11 lock free atomics defines as there seems to be no better way to verify safety there.

Fixes #97452.
cc @jkotas

Makes PAL use intrinsics that are guaranteed to not lock with Clang and with other compilers (probably only GCC) checks the C11 `lock free atomics` defines as there seems to be no better way to verify safety there.

Fixes dotnet#97452.
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 25, 2024
@@ -3639,24 +3639,42 @@ Define_InterlockMethod(
CHAR,
InterlockedExchange8(IN OUT CHAR volatile *Target, CHAR Value),
InterlockedExchange8(Target, Value),
#ifdef __clang__
return __sync_swap(pDst, iValue);
#elif ATOMIC_CHAR_LOCK_FREE == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot depend on defines like this in CoreCLR PAL headers. CoreCLR PAL headers cannot depend on system headers where they are defined.

Copy link
Contributor Author

@MichalPetryka MichalPetryka Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would moving the checks for those to the FCall definitions make them okay? Or would they be not accessible there either and need to be somehow wrapped with CMake?

Copy link
Member

@jkotas jkotas Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Nothing in CoreCLR outside the PAL implementation can depend on system headers.

One way to fix this is to use __sync_swap clang-specific intrinsic for clang, and implement it using __sync_val_compare_and_swap for non-clang. Like:

#ifdef __clang__

Define_InterlockMethod(
    LONGLONG,
    InterlockedExchange64(IN OUT LONGLONG volatile *Target, IN LONGLONG Value),
    InterlockedExchange64(Target, Value),
    __sync_swap(Target, Value)
)

#else

inline LONGLONG InterlockedExchange64(LONGLONG volatile * Target, LONGLONG Value)
{
    LONGLONG Old;
    do {
        Old = *Target;
    } while (__sync_val_compare_and_swap(Target, Old, Value) != Old);

    PAL_InterlockedOperationBarrier();

    return Old;
}

#endif

Copy link
Contributor Author

@MichalPetryka MichalPetryka Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__sync_val_compare_and_swap isn't guaranteed to be lock free in GCC like we've discussed earlier (GCC docs say it's implemented with __atomic in fact) so that wouldn't help here.
Maybe the define test could just be done in the CMake files completely instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that it is not guaranteed, but it is reasonable to assume that the implementation is lock-free on platforms that we care about.

How would you do that in CMake files?

clang is fine. I would like to avoid some complicated solution that is just for gcc. As I have said earlier, we do not have any testing for gcc-based builds, so there is no way to validate that it actually works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you do that in CMake files?

Pushed a commit, a part of it is commented out cause I want to verify on the CI.

As I have said earlier, we do not have any testing for gcc-based builds, so there is no way to validate that it actually works.

I've seen that some Linux distros ship GCC built dotnet in their repositories so I'd prefer to guarantee its working for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Seems working I think.

@am11
Copy link
Member

am11 commented Jan 26, 2024

#if __llvm / __clang type compiler-specific conditions were previously replaced by __has_builtin checks, please prefer the better structure for such changes:

#if __has_builtin(__c11_atomic_exchange)
...
#elif __has_builtin(__sync_swap)
...
#else
...
#endif

cc @janvorli

@MichalPetryka
Copy link
Contributor Author

#if __llvm / __clang type compiler-specific conditions were previously replaced by __has_builtin checks, please prefer the better structure for such changes:

#if __has_builtin(__c11_atomic_exchange)
...
#elif __has_builtin(__sync_swap)
...
#else
...
#endif

cc @janvorli

The presence of __has_builtin(__sync_swap) and others doesn't guarantee them to be lock free, it's a Clang specific guarantee here.

@am11
Copy link
Member

am11 commented Jan 26, 2024

The presence of __has_builtin(__sync_swap) and others doesn't guarantee them to be lock free, it's a Clang specific guarantee here.

__sync_swap is clang-specific, so __has_builtin(__sync_swap) will be resolved only when compiling with clang, without the additional cmake checks.

@MichalPetryka
Copy link
Contributor Author

The presence of __has_builtin(__sync_swap) and others doesn't guarantee them to be lock free, it's a Clang specific guarantee here.

__sync_swap is clang-specific, so __has_builtin(__sync_swap) will be resolved only when compiling with clang, without the additional cmake checks.

It will however return false on platforms without supported lockfree atomics which will make the code use the locking ones which is explicitly what we don't want here.

@am11
Copy link
Member

am11 commented Jan 26, 2024

It will however return false on platforms without supported lockfree atomics which will make the code use the locking ones which is explicitly what we don't want here.

Not seeing how? Suggestion was to use __c11_atomic_exchange if available, then __sync_swap and finally fallback to the existing __atomic_exchange_n.

@MichalPetryka
Copy link
Contributor Author

It will however return false on platforms without supported lockfree atomics which will make the code use the locking ones which is explicitly what we don't want here.

Not seeing how? Suggestion was to use __c11_atomic_exchange if available, then __sync_swap and finally fallback to the existing __atomic_exchange_n.

The issue is that we don't want to fallback to __atomic_exchange_n on Clang as it can be implemented with locking on some platforms.

@am11
Copy link
Member

am11 commented Jan 26, 2024

The issue is that we don't want to fallback to __atomic_exchange_n on Clang as it can be implemented with locking on some platforms.

It will not fallback to __atomic_exchange_n on clang because #if __has_builtin(__sync_swap) will not return false if it is available. Official builds are using clang 16, and even with clang 9 (the version we were having problems with and decided to upgrade toolchain on build machines) returns true: https://godbolt.org/z/PG5hbn3PP.

@MichalPetryka
Copy link
Contributor Author

The issue is that we don't want to fallback to __atomic_exchange_n on Clang as it can be implemented with locking on some platforms.

It will not fallback to __atomic_exchange_n on clang because #if __has_builtin(__sync_swap) will not return false if it is available. Official builds are using clang 16, and even with clang 9 (the version we were having problems with and decided to upgrade toolchain on build machines) returns true: https://godbolt.org/z/PG5hbn3PP.

We want to fail building on platforms where it's not available though as __atomic_exchange_n will be unsafe there.

@MichalPetryka
Copy link
Contributor Author

Also, I'm not sure if __has_builtin(__sync_swap) is true when all sizes are supported or only when any size is.

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Jan 26, 2024

@jkotas I've just discovered that Clang will use the locking implementation even for __sync_swap and __sync_val_compare_and_swap on some platforms, which means the guarantee can't be relied upon.

@janvorli
Copy link
Member

I am not sure if I completely follow all the reasoning here. I have also considered atomic without lock to be possible for sizes smaller or equal to the 32 / 64 bits depending on the architectures - that means 32 bit CPUs that we support would be able to do 32 bit atomic ops and 64 bit architectures 64 bit atomic operations at most. Runtime should never try to use atomic operations for anything larger than the architecure size, so I am not sure what is the real issue that's being solved here.

@jkotas
Copy link
Member

jkotas commented Jan 26, 2024

means 32 bit CPUs that we support would be able to do 32 bit atomic ops

We have managed APIs that require atomic 64 bit ops even on 32 bit CPUs. All current 32 bit CPUs support that, except RISCV 32 (without extensions). We are unlikely to ever support RISCV 32.

@jkotas
Copy link
Member

jkotas commented Jan 26, 2024

I've just discovered that Clang will use the locking implementation even for __sync_swap and __sync_val_compare_and_swap on some platforms, which means the guarantee can't be relied upon.

So the code that is there today is about as unportable as alternatives, and there is nothing to do?

@MichalPetryka
Copy link
Contributor Author

I've just discovered that Clang will use the locking implementation even for __sync_swap and __sync_val_compare_and_swap on some platforms, which means the guarantee can't be relied upon.

So the code that is there today is about as unportable as alternatives, and there is nothing to do?

I'd guess the only solution would be to hardcode the ASM GCC generates which is lock free for ARM32 and just check the defines for other platforms?

@jkotas
Copy link
Member

jkotas commented Jan 26, 2024

What are the compilers, intrinsics and platforms that have actual problem?

If the problem is limited to gcc InterlockedExchange64 on arm32, the fix can be to enable

#if defined(HOST_X86)
for gcc on arm32.

@MichalPetryka
Copy link
Contributor Author

What are the compilers, intrinsics and platforms that have actual problem?

Looking on godbolt, calls seem to be emitted for:

  1. Clang arm32 all types - uses helpers and warns about locks
  2. Clang arm64 all types - uses helpers but I think those are fine
  3. GCC arm64 all types - uses helpers but __sync_val_compare_and_swap seems to use a different ordering __aarch64_cas8_sync than clang with __aarch64_cas8_acq_rel
  4. RISC-V 32bit all compilers, 64bit int - uses locking helpers

@jkotas
Copy link
Member

jkotas commented Jan 26, 2024

Clang arm32 all types - uses helpers and warns about locks

Do these helpers actually use locks, or is it just a bogus warning?

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Jan 26, 2024

Clang arm32 all types - uses helpers and warns about locks

Do these helpers actually use locks, or is it just a bogus warning?

As far as I can see, atomic_is_lock_free works for runtime querying whether the helpers are locking based and gets folded to true when intrinsics are supported, would adding a check for this and switching to preemptive on false work for the FCalls? For GCC there's a -fno-sync-libcalls setting which seems to guarantee intrinsics too but it's not supported by Clang.

@jkotas
Copy link
Member

jkotas commented Jan 26, 2024

switching to preemptive on false work for the FCalls?

The managed Interlocked.* APIs require the operations to be atomic. Lock-based implementation of these APIs is not an option for conformant implementation.

@MichalPetryka
Copy link
Contributor Author

Apparently setting -march=armv6 makes Clang not use helper calls on arm32 and makes it say atomics are always lock free and since AFAIR CoreCLR requires that already, I guess it should be enough to enable it and just check the defines for safety?

@janvorli
Copy link
Member

Looking at our current runtime armv7 compiled code, I don't see any calls to the sync helpers, but rather inline ldrex / strex for 32 bit and ldrexd/strexd for 64 bit operations. Expanded e.g. from the InterlockedCompareExchange64/InterlockedCompareExchange. So, I am not sure why the linked output from the compiler explorer shows calls to the helpers. I've compiled runtime with clang 10, but the compiler explorer shows the helpers for that clang too. I was thinking that maybe the linker replaces those calls, but even the object files contain the inlined instructions.

@janvorli
Copy link
Member

Apparently setting -march=armv6 makes Clang not use helper calls on arm32 and makes it say atomics are always lock free and since AFAIR CoreCLR requires that already

We set march to armv7:

add_compile_options(-march=armv7-a)

@MichalPetryka
Copy link
Contributor Author

Apparently setting -march=armv6 makes Clang not use helper calls on arm32 and makes it say atomics are always lock free and since AFAIR CoreCLR requires that already

We set march to armv7:

add_compile_options(-march=armv7-a)

Does that maybe not get piped through to all the places? The CMake check I've added failed on Arm32 which seemed to suggest it wasn't compiling with that.

@mangod9
Copy link
Member

mangod9 commented Mar 18, 2024

Is this PR ready to review/merge?

@jkotas
Copy link
Member

jkotas commented Mar 18, 2024

Is this PR ready to review/merge?

This PR is a cleanup to make the code more portable. It is not clear whether it is an improvement from the discussion so far.

As far as we know, there is no real problem fixed by this change on the currently supported platforms.

@jkotas jkotas marked this pull request as draft March 18, 2024 21:20
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators May 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-PAL-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARM32 (and possibly some others) might take locks in Interlocked FCalls
5 participants