-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Theoretical data race in non-raw mode, due to incorrect atomic orderings #14
Comments
Ah, I see, the theoretical data race that I overlooked is that at the time of a buffer swap, hardware can do this:
I cannot think of a practical hardware implementation in which all this can happen in the short time span between the moment where the reader's back-buffer index write has been committed and the moment where the reader's old back-buffer read is performed. Which is most likely why you did not observe it on real hardware. But that behavior is indeed allowed by the memory model, and so a Sufficiently Evil Optimizer (which is what Loom tries to model) would be within its right to generate code that makes this data race much more likely to happen. I guess I need to give up on the Acquire/Release optimization and use AcqRel in all operating modes, then... Since this was the main motivation for putting the "raw" interface behind a feature flag, I could use that bugfix as an opportunity to question whether I should keep the feature flag or not. Getting rid of it would simplify the code, but would arguably be a semver-breaking change, although I can make a semver-compatible version of it by keeping a "raw" feature flag that does nothing for a while, and not renaming methods right away. |
There is now a fix for this on |
I'm not very familiar with how atomics map onto generated code (only really spent time studying high-level memory models, not CPU coherence and barriers), but do you think memory fences would be faster or slower than atomic swaps? (I don't know fences well, but Mintomic (an older atomic library) only offers fences, not read/write orderings.) |
The swap must be atomic in any case, because otherwise you would have this race condition:
However, as an atomic swap, it could be a Relaxed atomic operation, surrounded by fences (you need to put a fence before a write for Release ordering and a fence after a read for Acquire ordering). This should be slower than an AcqRel swap on any hardware architecture where Acquire and Release require actual synchronization (i.e. basically any modern architecture other than x86), especially if the CPU architecture has load-acquire and store-release instructions (see e.g. ARMv8), because...
|
By the way, I just tagged the bugfix, so this bug is now officially resolved ;) |
While writing my own version of this crate (I independently came across the same concept of a triple buffer plus an atomic bitflag), I tested my code with Loom to check for atomic ordering bugs. (Unfortunately, Loom's version of
UnsafeCell
removes the APIs which return&
or&mut T
, and only supplies APIs which accept callbacks, so I had to cripple my API on Loom-enabled builds, and use the crippled API for Loom testing).If I either wrote
Input::publish()
to useRelease
ordering, orOutput::update()
to useAcquire
ordering, then Loom would detect a data race when accessing theUnsafeCell
s. After turning on Loom debugging and adding debug prints, I traced how the data race occurred, and translated my types/variables/methods to the triple-buffer equivalents.Trace
Output thread runs:
Output::output_buffer()
returns&mut
intoSharedState::buffers[2]
,Output::read()
casts it to&
, and it's subsequently read (a)&mut
is UB if you're not usingAcqRel
on both the reader and writer... But this is moot, because you can get a data race even ifOutput
only reads.Output::update()
swaps atomicSharedState::back_info
: store 2 withRelaxed
ordering, load 1 withAcquire
ordering (b)Input thread runs:
Input::publish()
swaps atomicSharedState::back_info
: store 0 withRelease
ordering, load 2 withRelaxed
ordering (c)Input::input_buffer()
returns&mut
intobuffers[2]
, andInput::write()
writes into it (d)In Rust, for each atomic memory location, all threads see all reads/writes in the same order (because C++ and Rust only exposes LLVM's Monotonic and stronger levels). But the order may not be consistent between different memory locations. As I understand it, thread 1 reads memory location A before writing to B, and thread 2 reads the new value of B before writing to A, but thread 2's write to A could happen before thread 1's read of A (due to hardware reordering or caching or something).
Explained differently, (a) and (d) form a data race (in spite of (b) and (c)), because the the output's Relaxed store (b) does not synchronize-with the input's Relaxed load (c), so it does not force (a) to happen-before (d) (terminology from Preshing on Programming).
Does it happen in practice? (no)
Looking at Wikipedia's article on memory ordering, on ARM devices, (a) can be reordered with (b), and also (c) reordered with (d). I ran triple-buffer's unit tests on an ARM32 and ARM64 Android device (using Termux's
rust
package) but they didn't detect any reorderings in practice. Thecontended_concurrent_read_write
test (append--exact tests::contended_concurrent_read_write
to the command line) operates on aRaceCell
, which points to both a stack and heap element. However, it never found any inconsistencies.(a) and (b) happen in different function calls, and so do (c) and (d), which probably limits the ability for compilers to reorder them. I'm not sure if hardware can reorder them, since I haven't learned about atomic operations at the assembly/hardware/cache level.
Solution
I switched to
Ordering::AcqRel
for both atomic accesses, to silence Loom (which also fixed the underlying bug). I don't know how much performance you lose by doing that regardless of "raw" mode, though.The text was updated successfully, but these errors were encountered: