Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upArcCell not lock-free #160
Comments
This comment has been minimized.
This comment has been minimized.
|
You're right. We should probably implement Do you perhaps have an example of how you want to use |
This comment has been minimized.
This comment has been minimized.
jethrogb
commented
Dec 11, 2017
•
|
I've needed something like this over and over again to store program-global configuration. Alternatively, I think you can easily implement this using 128-bit atomics on platforms that support them. But I don't think those are available on stable Rust. |
This comment has been minimized.
This comment has been minimized.
|
I've attempted a partially wait-free version of That said, crossbeam is currently ongoing a major reorganisation and new types with similar functionality might (and will if people need them) appear in the future. |
This comment has been minimized.
This comment has been minimized.
|
@Vtec234 I'm slightly worried that wait-free @jethrogb Actually, I believe a primitive tentatively called |
This comment has been minimized.
This comment has been minimized.
|
@stjepang That should work, yes. My implementation is essentialy a very simple rw-lock and I'm sure much better implementations are possible (maybe based on HP, like you said). |
This comment has been minimized.
This comment has been minimized.
vorner
commented
Apr 1, 2018
|
Hello I played with this a bit and I have an implementation that is lock-free (wait-free in the case of the readers) and it shouldn't be possible to starve the writers. It also works with the standard So, if you'd like to accept it, I can create a pull request and place it somewhere (and rename it to ArcCell, to keep the consistency with the current implementation). What do you think about it? |
This comment has been minimized.
This comment has been minimized.
|
@vorner Hey, that looks neat! Designing an atomic shared pointer (like I think the biggest drawback of Consider the fact that loading from The solution is to not update any shared counters when loading from |
This comment has been minimized.
This comment has been minimized.
vorner
commented
Apr 7, 2018
|
Reading the hazard-pointer approach, it does look interesting. But I'm a bit nervous about two things ‒ the fact that an update may force a reader to loop (therefore the slow-path operation hurting the hot-path one, possibly making it stuck for a long time/forever with a steady inflow of changes), and that the chains of registries and thread entries can grow to long ones and I don't think they ever shrink back in the code. Anyway it is well possible this will be faster than my approach. Is there any estimate on when your approach could be ready? I ask because I want to know if it makes sense for me to clean it up, add all the license files, and publish it, or (in case hazard-cell will be ready soon in crossbeam) if it's not worth it. |
This comment has been minimized.
This comment has been minimized.
If updates are so frequent that they manage to force reads to spin in a loop, then updates kind of are the hot-path. :) But I see what you're saying, it's a fair concern that readers aren't wait-free.
Yes, the current prototype isn't tackling this problem. It's in the plan to shrink back the linked lists.
There are other things on my schedule at the moment, but I'd say in a few weeks.
If you have a use case for |
This comment has been minimized.
This comment has been minimized.
acsearle
commented
Sep 12, 2018
|
I'll throw my hat in the ring too. I have a reasonably complete AtomicArc now using differential reference counting, inspired by Folly::AtomicSharedPtr. I suspect it is close to what @stjepang was envisioning in the second comment. It has the same problem of reads degrading under contention, has to use its own reimplementation of Arc, and will only work on x86_64 and (hopefully) AArch64 using their spare pointer bits. On the brighter side, it also provides AtomicWeak, has no thread_local or global structures, and no requirements for the user to call anything outside the std::sync::atomic interface. https://github.com/acsearle/barc Might be an interesting comparison point if nothing else. |
This comment has been minimized.
This comment has been minimized.
|
@acsearle Thanks for sharing -- that looks really good, I'm impressed! So I've just whipped up a simple set of benchmarks to compare the performance of load operation with
I wonder what you think about using spare pointer bits in light of near-future migrations to 56-bit address spaces on x86-64 (and maybe even 64-bit). That seems like an unfortunate portability nightmare. :( There was an interesting discussion about this topic in the LuaJIT repo. |
This comment has been minimized.
This comment has been minimized.
jethrogb
commented
Sep 13, 2018
|
@acsearle have you considered an implementation using double-word atomics? You should be able to test with 64-bit atomics on 32-bit while we wait for rust-lang/rust#53514 to land |
This comment has been minimized.
This comment has been minimized.
acsearle
commented
Sep 13, 2018
|
Wow, the performance of parking_lot Mutex is phenomenal! Is it enough to just .lock() the mutex, or would the better comparison be with .lock().clone() so we get and destroy a usable object per loop, as returned by AtomicWeightedArc::load() and (I think) hazard's .get()? The disheartening result on contested load is consistent what I saw in my own jury-rigged benchmarks: inherently slow since each load has to win a compare_exchange. I wasn't aware that losing those bits was imminent (there's a relatively recent talk by Chandler Carruth somewhere where he talks about how LLVM uses them). That's unfortunate. Re other architectures, I have thought about it a little. For both of these questions, it should be fairly easy to experiment since it requires only a platform-specific reimplementation of CountedPtr and CountedNonNull. Another option is using alignment bits, perhaps over-aligning the ArcInner allocation to get more of them (if the Rust allocator supports this yet). But yes, as we lose bits, we have to replenish more frequently and the number of threads that are lock-free degrades until we become a spinlock. Anyway, my concern (having learned a lot from implementing it) is that this style of implementation, independent of whether the count bits are in the pointer or a second word, can never be competitive on contested loads with implementations that manage to use an atomic read for that purpose, and so is a bit of a dead end :( |
jethrogb commentedDec 11, 2017
The current implementation of
ArcCellis not lock-free. It uses a spin lock. I very much want to use this functionality, but I don't think this implementation should be included incrossbeamright now (and frankly, the fact that it's included right now is quite misleading).