Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add atomic loads and stores and barriers #9247

Merged
merged 1 commit into from Jan 21, 2019

Conversation

Projects
None yet
7 participants
@kjbracey-arm
Copy link
Contributor

commented Jan 3, 2019

Description

Add atomic load and store functions, and add barriers to the existing atomic functions.

File currently has no explicit barriers - we don't support SMP, so don't need CPU barriers.

But we do need to worry about compiler barriers - particularly if link time optimisation is activated so that the compiler can see inside these functions. The assembler or intrinsics that access PRIMASK for
enter/exit critical act as barriers, but LDREX, STREX and simple volatile pointer loads and stores do not.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[X] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@c1728p9, @pan-

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

I'm in two minds about this one - if people are directly accessing atomics with raw load or store, they're probably doing something weird and should be using something higher-level, but maybe there's no good alternative. This is being added to cover the _pending flag in InternetSocket - see #9248.

@pan-
Copy link
Member

left a comment

I'm puzzled with the atomic load/store implementation proposed - I must be missing something 😕 .
From what I understand MBED_BARRIER is a compiler fence; nothing more. I believe we should use __LDREX? and __STREX? on M3 and M4 architecture and we can use a critical section on M0 architecture.

platform/mbed_toolchain.h Outdated
* Stop the compiler moving memory accesses.
*/
#ifdef __CC_ARM
#define MBED_BARRIER() __memory_changed()

This comment has been minimized.

Copy link
@pan-

pan- Jan 3, 2019

Member

It would be nice to have a detailed description - and maybe more appropriate naming - of what MBED_BARRIER is; where it should be used, what it does and what it doesn't do.

From my understanding asm volatile("" : : : "memory") forces the compiler to not reorder the code it produces however there is no guarantee that the code won't be reordered by the CPU or that memory writes have completed. It doesn't replaces CPU barrier instructions.

MBED_FORCEINLINE void core_util_atomic_flag_clear(volatile core_util_atomic_flag *flagPtr)
{
MBED_BARRIER();
flagPtr->_flag = false;

This comment has been minimized.

Copy link
@c1728p9

c1728p9 Jan 3, 2019

Contributor

Why isn't there a memory barrier after _flag is set to false?

This comment has been minimized.

Copy link
@kjbracey-arm

kjbracey-arm Jan 4, 2019

Author Contributor

You're right, one is needed there. I was thinking it wasn't needed for single-CPU, but sequential consistency requires it for the same reason you would need two DMBs (https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html). Assuming we want sequential consistency (the C++11 default), then we need to guarantee that

atomic_store(x); // release
atomic_load(y); // acquire

can't be reordered. So that has to end up as:

BARRIER // for release semantics - keep preceding above
x = foo;
BARRIER // stop store+load being reordered
bar = y;
BARRIER // acquire semantics - keep following below

And it's store that gets the extra sequential-consistency barrier (like the extra DMB), because stores are probably less common.

*/
MBED_FORCEINLINE uint8_t core_util_atomic_load_u8(const volatile uint8_t *valuePtr)
{
uint8_t value = *valuePtr;

This comment has been minimized.

Copy link
@c1728p9

c1728p9 Jan 3, 2019

Contributor

Why isn't there a memory barrier before *valuePtr is read?

This comment has been minimized.

Copy link
@kjbracey-arm

kjbracey-arm Jan 4, 2019

Author Contributor

Because load has "acquire" semantics, as per C++11/C11. It only needs to stop stuff from afterwards moving before it.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

I'm puzzled with the atomic load/store implementation proposed - I must be missing something 😕 .

mbed_critical currently has no SMP support, and has no CPU barriers at all. These new functions are consistent with the rest of the file - compiler barriers only, assuming 1 CPU, and protect against other threads or interrupts on that single CPU.

I believe we should use __LDREX? and __STREX? on M3 and M4 architecture

Those don't have any barrier/ordering functionality in the CPU. The LDREX and STREX macros/intrinsics only provide compiler barriers - if you were doing SMP you would either need to add DMB instructions for CPU barriers (ARMv7) or use LDAEX and STLEX to get CPU acquire/release semantics (ARMv8).

So these implementations are to add the necessary equivalent compiler barrier on a plain load or store, that we get automatically whenever we happen to use LDREX/STREX or touch PRIMASK.

(Edit, oops, no, LDREX and STREX are not actually compiler barriers, necessarily. I'm going to need to add some.)

Without the barrier, this can go wrong:

if (!atomic_flag_test_and_set(&flag)) {
     do some stuff_to_state_protected_by_flag();
     atomic_flag_clear(&flag);
}

The "do some stuff" could be reordered after the flag clear by the compiler, so that we are modifying the state with the flag clear. Existing attempts to do the equivalent with a "volatile bool" suffer the same problem - they should change to use atomic_flag to get the barrier that's being added here.

With LTO off, this was avoided because the "call outside this C file" to atomic_flag_clear is an effective compiler barrier in itself. But the moment LTO is on, and it can see inside the definition (or once it's made inline, like it is in this patch), that reordering will happen.

An alternative would be to implement the atomic loads and stores like asm volatile("STR %0,[%1]" : : : "memory"), so that they are barriers like the STREX or PRIMASK, but that would significantly impede code generation.

@kjbracey-arm kjbracey-arm force-pushed the kjbracey-arm:atomic_load_store branch Jan 4, 2019

@kjbracey-arm kjbracey-arm changed the title Add atomic loads and stores with barriers Add atomic loads and stores and barriers Jan 4, 2019

@pan-

pan- approved these changes Jan 4, 2019

Copy link
Member

left a comment

Thanks for taking the time to answer my concerns @kjbracey-arm ; it all makes sense now.

@kjbracey-arm kjbracey-arm force-pushed the kjbracey-arm:atomic_load_store branch Jan 4, 2019

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Reworked the comments more - provide both MBED_COMPILER_BARRIER and MBED_BARRIER (which is currently MBED_COMPILER_BARRIER because non-SMP). I can see that ARMv8 atomics will want "compiler barrier" specifically, while ARMv7 atomics want "DMB or compiler barrier", so provide both.

@kjbracey-arm kjbracey-arm force-pushed the kjbracey-arm:atomic_load_store branch 2 times, most recently Jan 9, 2019

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor Author

commented Jan 14, 2019

Don't think this needs any more work - maybe more review.

@0xc0170 0xc0170 requested a review from ARMmbed/mbed-os-core Jan 14, 2019

@cmonr cmonr added needs: review and removed needs: work labels Jan 15, 2019

@0xc0170

This comment has been minimized.

Copy link
Member

commented Jan 16, 2019

@SenRamakri @deepikabhavnani If you could review this one, it's nearly ready for CI

@deepikabhavnani
Copy link
Contributor

left a comment

Looks good to me 👍

@cmonr cmonr added needs: CI and removed needs: review labels Jan 17, 2019

@cmonr

This comment has been minimized.

Copy link
Contributor

commented Jan 17, 2019

CI started

@mbed-ci

This comment has been minimized.

Copy link

commented Jan 17, 2019

Test run: FAILED

Summary: 1 of 1 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_unittests

@cmonr cmonr added needs: work and removed needs: CI labels Jan 17, 2019

Add atomic loads and stores and barriers
Add atomic load and store functions, and add barriers to the existing atomic
functions.

File currently has no explicit barriers - we don't support SMP, so don't
need CPU barriers.

But we do need to worry about compiler barriers - particularly if link time
optimisation is activated so that the compiler can see inside these
functions. The assembler or intrinsics that access PRIMASK for
enter/exit critical act as barriers, but LDREX, STREX and simple
volatile pointer loads and stores do not.

@kjbracey-arm kjbracey-arm force-pushed the kjbracey-arm:atomic_load_store branch to 703e440 Jan 18, 2019

@0xc0170

This comment has been minimized.

Copy link
Member

commented Jan 18, 2019

CI restarted

@cmonr cmonr removed the needs: work label Jan 18, 2019

@mbed-ci

This comment has been minimized.

Copy link

commented Jan 18, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 2
Build artifacts

@cmonr cmonr added ready for merge and removed needs: CI labels Jan 19, 2019

@0xc0170 0xc0170 merged commit a23a850 into ARMmbed:master Jan 21, 2019

22 checks passed

continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci/build-ARM Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage RTOS ROM(+0 bytes) RAM(+0 bytes)
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARM Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Local astyle testing has passed
Details
travis-ci/docs Local docs testing has passed
Details
travis-ci/doxy-spellcheck Local doxy-spellcheck testing has passed
Details
travis-ci/events Passed, runtime is 9056 cycles (-1435 cycles)
Details
travis-ci/gitattributestest Local gitattributestest testing has passed
Details
travis-ci/licence_check Local licence_check testing has passed
Details
travis-ci/littlefs Passed, code size is 8372B (+0.00%)
Details
travis-ci/psa-autogen Local psa-autogen testing has passed
Details
travis-ci/tools-py2.7 Local tools-py2.7 testing has passed
Details

@kjbracey-arm kjbracey-arm deleted the kjbracey-arm:atomic_load_store branch Jan 24, 2019

kjbracey-arm added a commit to kjbracey-arm/mbed-os that referenced this pull request Jan 29, 2019

Second barrier for core_util_atomic_flag_clear
As barriers were added in ARMmbed#9247, review comments pointed out that atomic
stores needed a second barrier, and they got one. But atomic_flag_clear
was missed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.