-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider not using _mm_mfence even when it is available #36
Comments
If my reading of the C++ standard ([intro.races]/9) is correct, a release operation on an atomic should make all prior memory accesses visible to other threads after they execute an acquire operation on the atomic. The standard doesn't discriminate between regular or non-temporal stores, but it can be argued that it doesn't have to because both are "evaluations", in the standard's language. Put simply, non-temporal or not, the stores before the release fence should probably be flushed. I know Boost.Atomic is issuing As for |
OTOH, all compilers seem to generate |
The particular choice of an interlocked operation is a matter of micro-optimization, like trying to use as fewer registers as possible (including result register), as fewer flags as possible, avoid stack variable, etc. Specifically this one avoids stack variable. Can be approximately simulated by So it is implementable only on x86 (32 and 64 bit), and actually exists as intrinsic only on 64-bit x86.
No, I don't want that. The standard does not define non-temporal stores, and they are always intrinsics, so I don't think these should be covered. |
https://stackoverflow.com/a/61382843/2945027 relevant SO thread |
I wouldn't mind if you prefer keeping |
Given that other implementations don't issue |
The most weird about emitting void inc_seq_cst(int* v)
{
__atomic_fetch_add(v, 1, __ATOMIC_SEQ_CST);
} is no different from others. Similarly, Maybe compiler developers hope that |
Thanks. Especially for the link against Then probably the best fence one can do is |
Thanks, |
Consider either making
BOOST_ATOMIC_NO_MFENCE
overriding detection of SSE2 or x64, or even always defaulting to not usingmfence
.The reason is that
mfence
is slower than a dummy interlocked operation.On x64 there's even an intrinsic for that - see __faststorefence
There might be a reason to use precisely
mfence
. An interlocked operation does not provide fences to non-temporal stores. But this is out of scope for implementing C++ memory model on x86 in the usual way.The text was updated successfully, but these errors were encountered: