Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM32 (and possibly some others) might take locks in Interlocked FCalls #97452

Open
MichalPetryka opened this issue Jan 24, 2024 · 6 comments
Open
Labels
arch-arm32 area-PAL-coreclr in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@MichalPetryka
Copy link
Contributor

While working on #92974, I've realised that on some platforms without builtin atomics for all types, compilers might use a locking based implementation for them.
Checking the C11 defines, it seems like at least ARM32 is affected with Clang, but other platforms might be affected with GCC too.

Looking at the exact codegen, ARM32 and ARM 8.0 without atomics seem to use helper calls for Interlocked, only arm32 gives warnings about locking though.

The code should be fixed to avoid potential issues with this, I see 4 possible solutions:

  1. Make the helper calls QCalls - expensive
  2. Switch to preemptive in the FCall on affected platforms - also expensive
  3. Implement Interlocked with always expand intrinsics on affected platforms - requires JIT work (should already be handled on ARM 8.0 in .NET 9)
  4. Rewrite the unmanaged helpers in raw assembly in a way that doesn't use locks - requires VM work

Relying on the compiler to be lock free on any platform that uses helper calls for atomics seems unreliable as AFAIR those are free to implement them using locks.
cc @jkotas

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 24, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 24, 2024
@jkotas
Copy link
Member

jkotas commented Jan 24, 2024

Do __sync... intrinsics have this problem? I think the simplest solution is to avoid __atomic_exchange_n and switch to __sync... intrinsics: #92974 (comment) .

@MichalPetryka
Copy link
Contributor Author

Do __sync... intrinsics have this problem? I think the simplest solution is to avoid __atomic_exchange_n and switch to __sync... intrinsics: #92974 (comment) .

They also emit helper calls, they don't issue a warning though. I wouldn't rely on it being fine however as their docs say:

Not all operations are supported by all target processors. If a particular operation cannot be implemented on the target processor, a warning will be generated and a call an external function will be generated. The external function will carry the same name as the builtin, with an additional suffix `_n' where n is the size of the data type.

which makes it seem like there should be a warning in such case.

@jkotas
Copy link
Member

jkotas commented Jan 24, 2024

It is reasonable to assume that the helper calls that __sync intrinsics compile into do not use locks. The helper calls can be used e.g. to select the most efficient implementation for given hardware.

Per LLVM documentation, __sync_... intrinsics are required to not use locks. From https://github.com/llvm/llvm-project/blob/main/llvm/docs/Atomics.rst#libcalls-__sync_ : "the Target in LLVM can claim support for atomics of an appropriate size, and then implement some subset of the operations via libcalls to a _sync* function. Such functions must not use locks in their implementation"

@MichalPetryka
Copy link
Contributor Author

This doesn't seem to be guaranteed in GCC:

These functions are implemented in terms of the ‘__atomic’ builtins (see https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html).
They should not be used for new code which should use the ‘__atomic’ builtins instead.

Not all operations are supported by all target processors.
If a particular operation cannot be implemented on the target processor, a warning is generated and a call to an external function is generated.
The external function carries the same name as the built-in version, with an additional suffix ‘_n’ where n is the size of the data type.

@jkotas
Copy link
Member

jkotas commented Jan 24, 2024

It may not be guaranteed, but it is reasonable to assume that it is the case.

Also, our officially supported build is clang/llvm. There is very little testing of gcc based builds. It is likely that gcc based builds have number of issues, so one more potential gcc-specific issue is not really a problem.

@MichalPetryka
Copy link
Contributor Author

It seems however that with GCC all atomics are lock free with the exception of 64bit integers on RISC-V 32bit.
It seems to not use helpers on arm32.

@filipnavara filipnavara added arch-arm32 area-PAL-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 25, 2024
MichalPetryka added a commit to MichalPetryka/runtime that referenced this issue Jan 25, 2024
Makes PAL use intrinsics that are guaranteed to not lock with Clang and with other compilers (probably only GCC) checks the C11 `lock free atomics` defines as there seems to be no better way to verify safety there.

Fixes dotnet#97452.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 25, 2024
@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2024
@mangod9 mangod9 added this to the Future milestone Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm32 area-PAL-coreclr in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants