No LDREX/STREX-based implementations of __cxa_guard_acquire/release/abort in ARM code? #1393

amosnier · 2022-01-10T21:39:15Z

From C++ ABI for the ARM architecture, ARM IHI 0041D:

3.2.3.1 Guard variables
To support the potential use of initialization guard variables as semaphores that are the target of ARM SWP and
LDREX/STREX synchronizing instructions we define a static initialization guard variable to be a 4-byte aligned, 4-
byte word with the following inline access protocol.

#define INITIALIZED 1

// inline guard test...
if ((obj_guard & INITIALIZED)!= INITIALIZED) {
    // TST obj_guard, #1; BNE already_initialized
    if (__cxa_guard_acquire(&obj_guard)) {
    ...
}

Usually, a guard variable should be allocated in the same data section as the object whose construction it guards.
3.2.3.2 One-time construction API

extern "C" int __cxa_guard_acquire(int *guard_object);

If the guarded object has not yet been initialized, this function returns 1. Otherwise it returns 0.
If it returns 1, a semaphore might have been claimed and associated with guard_object, and either
__cxa_guard_release or __cxa_guard_abort must be called with the same argument to release the semaphore.

extern "C" void __cxa_guard_release(int *guard_object);

This function is called on completing the initialization of the guarded object. It sets the least significant bit of
guard_object (allowing subsequent inline checks to succeed) and releases any semaphore associated with it.

extern "C" void __cxa_guard_abort(int *guard_object);

This function is called if any part of the initialization of the guarded object terminates by throwing an exception. It
releases any semaphore associated with guard_object.

Is my interpretation correct that only one bit of the obj_guard variable is accessed at all by the code that provides it (the code that invokes the __cxa_guard_xxx functions), and that because the rest is unused, the obj_guard variable itself could be used for the semaphore implementation?

If that is the case, and I certainly hope so, since an alternative implementation would have to "manually" allocate semaphore memory on the side of every static variable, which would be quite cumbersome (or would have to use some kind of recursive mutex that would handle the case when the OS is not started, but not everyone has that kind of luxury), how come the following search on the CMSIS_5 repository gives no result in source code?

$ git grep __cxa_guard_acquire

I mean, if there is a possibly trivial implementation based on the ABI documentation (of the three functions), why would ARM themselves not provide it? Or is it provided in some other repository? I made such an implementation myself, but seeing naive non-thread-safe implementations all over the Internet (which systematically break C++ static object creation semantics!) really makes me wonder whether we are not missing a great opportunity of improving many embedded C++ applications with a small effort.

What am I missing?

The text was updated successfully, but these errors were encountered:

amosnier · 2022-01-10T21:50:51Z

I might reopen this, but I will think a little more about it first.

amosnier · 2022-01-10T22:05:18Z

Reopening, after a little more thinking.
One broken example, just for reference: https://github.com/BrewPi/firmware/blob/master/platform/spark/firmware/hal/src/stm32/newlib.cpp.
And one example, which is much more complex, seems to at least try to do the right thing for multiples configurations, and does seem to confirm that the guard object itself is supposed to be used for the semaphore implementation: https://github.com/llvm-mirror/libcxxabi/blob/master/src/cxa_guard.cpp#L188 and corresponding implementations.

So my questions above stand.

JonatanAntoni · 2022-01-13T17:12:49Z

Hi @amosnier,

I guess the most important aspect is that CMSIS is still pure C code and does not contain any C++ related content so far.

Now, we need to set the scene to sort your request. Am I right that you are looking for using CMSIS-RTOS2 in an C++ environment? E.g., by using a concrete implementation of this RTOS-API like the reference implementation RTX5?

Cheers,
Jonatan

amosnier · 2022-01-13T20:49:51Z

Hi @JonatanAntoni,

Thanks for taking the time to answer.

In my use case, an RTOS is available, although it does not implement the CMSIS-RTOS2 interface.

Assuming that an RTOS is used, if static object creation is only performed once the RTOS has started, a __cxa_guard_acquire implementation can be based on a single recursive mutex instance for a whole application, if such a facility is available in the RTOS. This is the case for me, but recursive mutexes are currently not in use in the concerned source code base, so they would at least have to be thoroughly tested first.

Additionally, in a typical C++ embedded application using an RTOS, in the general case, it is likely that most objects, but not necessarily all of them, will be constructed before the RTOS is started, at a time when OS-mutexes are not available for locking.
It is possible to wrap the OS-mutex implementation in an API that checks whether or not the RTOS is started, to make __cxa_guard_acquire reliably work both before and after the RTOS has been started. In summary, assuming that the RTOS provides recursive mutexes, __cxa_guard_acquire can be implemented.

But that is obviously not what the ARM C++ ABI refers to, when it says:

To support the potential use of initialization guard variables as semaphores that are the target of ARM SWP and
LDREX/STREX synchronizing instructions we define a static initialization guard variable to be a 4-byte aligned, 4-
byte word [...]

and

extern "C" void __cxa_guard_release(int *guard_object);

This function is called on completing the initialization of the guarded object. It sets the least significant bit of
guard_object (allowing subsequent inline checks to succeed) and releases any semaphore associated with it.

If I understand this correctly, and LLVM's general implementation of __cxa_guard_acquire seems to confirm that, guard_object itself would be used to store a semaphore that would be used to implement the locking, which gives us one semaphore per static object, instead of a single recursive OS-mutex for all the objects.

It seems to me that we have two quite different solutions, one which relies on the RTOS and one which is completely independent of any RTOS, and which, according to ARM themselves, could be based on ARM SWP and LDREX/STREX, instructions which I guess are available on all ARM cores.

Both have benefits and drawbacks:

In the RTOS-based solution, the locking call does not busy-wait.
In the ARM SWP and LDREX/STREX solution, static objects can be created independently of each other, but the resolution of conflict in case of multiple simultaneous constructions of a single static object probably requires some kind of busy-waiting, although in practice it should be an exceptional case.

To link back to my original questions, it seems to me that a common implementation based on ARM SWP and LDREX/STREX could be provided by ARM.

For the sake of completion, I guess I should add that because of the Static Initialization Order Fiasco, and because declaring static objects in functions does not really solve the whole issue, static object creation of non-POD types should be frowned upon, so in an ideal application, one could choose to generate a compilation error if __cxa_guard_acquire is ever invoked, but in my experience, such a policy is unfortunately not very realistic.

Best regards,

Alain Mosnier

JonatanAntoni · 2022-01-17T13:52:21Z

Hi Alain,

To be honest, I don't see how your request could be addressed in terms of CMSIS.

Using RTOS-based solution requires at least some knowledge about the RTOS API.

I cannot imagine how a solution that is totally independent from the used RTOS could work. If I got your requirement correctly, it is about thread-safeness when function-local static objects are initialized. Calling a function with a function-local static object before the RTOS scheduler has been launched should always be safe. But calling it from different thread contexts concurrently could cause issues.

Lets say thread A calls such a function first and the static object initialization is started. Now, the initialization is interrupted and the scheduler switches to thread B. Thread B calls the same function. Now, the static object cannot be used as it is not fully initialized, yet. But thread B must not start to initialize the object itself (again). Instead thread B would need to wait for thread A to complete object initialization before executing the function.

The only way to achive this is by using some RTOS synchronization mechanism (semaphore, mutex, event). I don't understand how busy waiting would work in this case. Hence, I don't see how one might use LDREX/STREX or SWP instructions in this scenario.

Cheers,
Jonatan

JonatanAntoni · 2022-01-17T16:54:35Z

Hi @amosnier,

A default implementation of these guards is part of LLVM, you can find the implementation here: https://github.com/llvm/llvm-project/blob/main/libcxxabi/src/cxa_guard.cpp

Having this discussed with some Compiler experts I still don't see how an RTOS-independent solution could work.

The LLVM implementation contains a variant using a global mutex plus global conditional variable. I think a similar implementation could be done for a specific RTOS used.

Cheers,
Jonatan

amosnier · 2022-01-18T18:42:31Z

Hi @JonatanAntoni,

You wrote:

The only way to achive this is by using some RTOS synchronization mechanism (semaphore, mutex, event). I don't understand how busy waiting would work in this case. Hence, I don't see how one might use LDREX/STREX or SWP instructions in this scenario.

But ARM also specifies (C++ ABI for the ARM architecture):

3.2.3 Guard variables and the one-time construction API
3.2.3.1 Guard variables
To support the potential use of initialization guard variables as semaphores that are the target of ARM SWP and
LDREX/STREX synchronizing instructions we define a static initialization guard variable to be a 4-byte aligned, 4-
byte word with the following inline access protocol.

and:

3.1 Summary of differences from and additions to the generic C++ ABI
[...]
GC++ABI §2.8 Initialization guard variables
Static initialization guard variables are 4 bytes long not 8, and there is a different protocol for using them which
allows a guard variable to implement a semaphore when used as the target of ARM SWP or LDREX and STREX
instructions. See §3.2.3.

So my question is: in what scenarios does ARM mean that static object initialization guard variables should be "the target of ARM SWP or LDREX and STREX instructions", and why would CMSIS not include a reference implementation of those principles?

As a side note, the ARM® Synchronization Primitives article includes several OS-independent thread synchronization primitives, a mutex among others, based on LDREX/STREX. I do not think any of them should be used directly to guard static object initialization, and they are probably sub-optimal for general purposes if an OS is available, but the article still shows that OS-independent thread synchronization based on LDREX/STREX is achievable.

Best regards,

Alain Mosnier

JonatanAntoni · 2022-01-19T08:47:25Z

Hi @amosnier,

The C++ ABI might be missleading, here. But I am not an expert in this area and as I already stated CMSIS is C software. This is why CMSIS does not contain C++ implementations, yet.

RTX5 for instance makes use of LDREX/STREX instructions to achieve atomicity of its synchronization primitives. But these implementations are clearly RTX5 specific and not OS-independent.

Sorry, if this is not the answer you were looking for.

Cheers,
Jonatan

amosnier · 2022-01-19T18:15:03Z

Hi @JonatanAntoni,

You wrote

The C++ ABI might be missleading, here.

Since we are talking about a major interface specification, we would have to report such an issue to the group who maintains it, right? What is the procedure for that? For the record, I in fact do not think that the ABI documentation is misleading. My feeling is that the group who wrote it knows exactly what they are talking about, and even have a solution in mind. Maybe they could just tell us?

But I am not an expert in this area and as I already stated CMSIS is C software. This is why CMSIS does not contain C++ implementations, yet.

The functions are declared extern "C" (see earlier posts, or the C++ ABI for the ARM architecture). I.e. the API we are talking about is very carefully designed to be possible to implement in C.

Best regards,

Alain Mosnier

JonatanAntoni · 2022-01-20T09:17:05Z

Hi @amosnier,

you could try to address this in the Arm Community.

Cheers,
Jonatan

amosnier · 2022-01-20T18:40:24Z

Hi @JonatanAntoni,

You wrote

The C++ ABI might be missleading, here.

Since you think you have found some misleading guidance in a major ARM specification, I trust you will report that as an issue with the group responsible for maintaining it. Since I expect that they in fact have a precise idea of how an implementation of the __cxa_guard API in C should look like, I would be thankful if you could report any progress here.

You could try to address this in the Arm Community.

Sure I can submit this question to them too.

Best regards,

Alain Mosnier

amosnier · 2022-01-25T07:42:05Z

Hi @JonatanAntoni,
I can now see that you were right about the OS-dependency. Thank you for your patience. I will now close this issue.
Best regards,
Alain Mosnier

amosnier · 2022-01-25T17:54:06Z

If anyone reads this and is interested, they might also be interested by a related ARM community discussion.

amosnier closed this as completed Jan 10, 2022

amosnier reopened this Jan 10, 2022

amosnier closed this as completed Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No LDREX/STREX-based implementations of __cxa_guard_acquire/release/abort in ARM code? #1393

No LDREX/STREX-based implementations of __cxa_guard_acquire/release/abort in ARM code? #1393

amosnier commented Jan 10, 2022 •

edited

Loading

amosnier commented Jan 10, 2022

amosnier commented Jan 10, 2022

JonatanAntoni commented Jan 13, 2022

amosnier commented Jan 13, 2022 •

edited

Loading

JonatanAntoni commented Jan 17, 2022

JonatanAntoni commented Jan 17, 2022

amosnier commented Jan 18, 2022 •

edited

Loading

JonatanAntoni commented Jan 19, 2022

amosnier commented Jan 19, 2022

JonatanAntoni commented Jan 20, 2022

amosnier commented Jan 20, 2022

amosnier commented Jan 25, 2022

amosnier commented Jan 25, 2022

No LDREX/STREX-based implementations of __cxa_guard_acquire/release/abort in ARM code? #1393

No LDREX/STREX-based implementations of __cxa_guard_acquire/release/abort in ARM code? #1393

Comments

amosnier commented Jan 10, 2022 • edited Loading

amosnier commented Jan 10, 2022

amosnier commented Jan 10, 2022

JonatanAntoni commented Jan 13, 2022

amosnier commented Jan 13, 2022 • edited Loading

JonatanAntoni commented Jan 17, 2022

JonatanAntoni commented Jan 17, 2022

amosnier commented Jan 18, 2022 • edited Loading

JonatanAntoni commented Jan 19, 2022

amosnier commented Jan 19, 2022

JonatanAntoni commented Jan 20, 2022

amosnier commented Jan 20, 2022

amosnier commented Jan 25, 2022

amosnier commented Jan 25, 2022

amosnier commented Jan 10, 2022 •

edited

Loading

amosnier commented Jan 13, 2022 •

edited

Loading

amosnier commented Jan 18, 2022 •

edited

Loading