Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upFaster mutex implementation (optionally reentrant) #8295
Conversation
This comment has been minimized.
This comment has been minimized.
Weird, we usually want the exact opposite to avoid contention —waking all resources will lead them to fight again to get the lock... when n - 1 of them will return back to sleep. |
This comment has been minimized.
This comment has been minimized.
|
@ysbaddaden this is not waking all resources. It just wake one of the fibers and it will have to compete with those still not sleeping, if they exist. If they don't it will acquire the lock right away. Otherwise, if it looses, it will spin a little bit one more time, and hopefully get the lock before going back to sleep. It might sound weird for that fiber, but the global throughput is improved because it reduces the time the mutex is not owned by any non-sleeping fiber. |
This comment has been minimized.
This comment has been minimized.
|
Ah, thanks for the clarification. It makes sense now. Yes, spinning is always a good idea :) |
048444e
into
crystal-lang:master
This comment has been minimized.
This comment has been minimized.
GavinRay97
commented
Oct 10, 2019
|
Microseconds to nanoseconds?! |
This comment has been minimized.
This comment has been minimized.
|
I really don't think having non-reentrant (recursive) be the default is a good idea. What happens if you accidentally recurse or unlock a unlocked lock? A deadlock? Corruption? The only acceptable state is an exception, or having a recursive lock. There may be an unsafe option to make the lock much faster for the cost of having to be extra careful with your critical sections, but safety must be the default, not speed. The stdlib is nowhere near the only consumer of |
This comment has been minimized.
This comment has been minimized.
|
I agree. Crystal should always put safety before performance. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
If you are using mutexes is because you're dealing with "unsafe" code. I mean, there is some shared state that must be protected with a mutex in order to deal with potential race conditions. In most cases a non-reentrant mutex is all you need. A non-reentrant mutex is not owned by any fiber and that means it can be unlocked by a different fiber. That is actually a feature of the mutex. You can lock, pass a message to another fiber and assume the ownership there and unlock later. If you accidentally try to recursively lock a mutex it will deadlock and the program will fail, way much faster and deterministic than failures that can occur for not locking at all. In other words, we're not adding safety just making the mutex reentrant, non-reentrant mutex is way much faster and all you need in most cases. That was the rationale behind the decision of making non-reentrant option the default. |
This comment has been minimized.
This comment has been minimized.
Not "unsafe" in terms of "can generate segfaults". Mutexes will be commonly used in shards code. Misuse of a mutex in a shard should be debuggable. The minimal requirement for that is an exception instead of a deadlock. I'm fine with them not being reentrant, but they must fail gracefully. |
This comment has been minimized.
This comment has been minimized.
|
The story of having better diagnostics/debugging tools is independent of the API mutex should have. Using a non-reentrant mutex as a reentrant mutex will block and not segfault, so the behavior is in the safe/sound side. The performance difference and the more common requirement for non-reentrant seems enough to stay with the optionally reentrant API. |
This comment has been minimized.
This comment has been minimized.
|
I think this will mostly hit users in a silent way when doing stuff like: # called from multiple functions
def helper_function
@mutex.synchronize do
# some data
end
end
def main_function
@mutex.synchronize do
do_something
helper_function # Oops, deadlock, not easy to debug
end
end |
This comment has been minimized.
This comment has been minimized.
|
Would it cost too much performance to require an explicit lock owner akin to Java's def lock(owner)
raise "Already locked" if @current_owner = owner.object_id
real_lock
@current_owner = owner.object_id
endOr actually a CAS operation for that. Then Ary's example, using |
This comment has been minimized.
This comment has been minimized.
|
But if you are using a lock that is hold you want to wait for it to be free, not fail immediately. Maybe a reasonable timeout on debug mode can help, but I am not sure I even like that alternative. Or you mean that only for non-reentrant mutex to signal a hey: you might need a reentrant mutex? |
This comment has been minimized.
This comment has been minimized.
|
Yes, the reentrant one wouldn't raise in that case, or even do anything with the owner I guess (other than keeping it for debug purposes maybe). |
This comment has been minimized.
This comment has been minimized.
|
It's not about debugging tools, it's about being able to debug in the first place. Users will not appreciate having to dig out gdb on a production server to debug their deadlocks. This is especially annoying because hung process detection and restarting is far less common and tricky to set up than auto-restart on crash. They'd much prefer if the process just crashed and have them a backtrace to investigate. If you can have a lock owner, you can have the default lock owner be the current fiber object - which already provides the desired semantics. Is a benchmark of the non-reentrant vs reentrant available? |
This comment has been minimized.
This comment has been minimized.
|
I still don't understand why we have a deadlocking but less performance behavior as a default one, instead of a safer but slower one. One can always optimize a bottleneck once it's found. Having to deal with deadlocks or crashes upfront is not fun. But we'll see with feedback. |
This comment has been minimized.
This comment has been minimized.
|
Still:
|
This comment has been minimized.
This comment has been minimized.
|
With reentrant you must acces Fiber.current, thus access a thread local, then make more writes to memory when locking and unlocking. For the tight usages we make of Mutex, its necessarily faster to not have all these writes. I had 2 mutexes in my MT experiment, one private (no validation) and 1 public (deadlock detection, optionally reentrant) wrapping the private one. No need to say the private one was much more performant (for tight usages, such as in Channel, ...) I wish there was a third option, on by default: detect and raise on deadlock/wrong fiber unlocking the mutex. Then, you know there is an issue and have a backtrace to understand it, or use a reentrant mutex if it's acceptable (thought, they're a smell for me, they bypass a bug, and can still lead to deadlock when using lock/unlock) instead of a hanging process. I agree with Ary: validation shouldn't be opt-in. You can disable all checks if there is a performance issue (or you're a boss), thought unless in tight benchmarks/loops it may not be that significant (except for MT synchronization primitives). |
This comment has been minimized.
This comment has been minimized.
This is what I have been asking for all along... Have I been unclear? The users of mutexes could opt out of this checking ( |
This comment has been minimized.
This comment has been minimized.
lribeiro
commented
Nov 23, 2019
•
|
That would be my preference as well, safe by default, fast by need. |
This comment has been minimized.
This comment has been minimized.
|
Well, there's non-reentrancy, and skipping deadlock checks in unsafe code for speed. They're different things to me (feature vs safety). |
This comment has been minimized.
This comment has been minimized.
|
Yes, and that's why I insisted above on having all 3 states:
I.e. similar to pthreads options but checked by default. It prevents deadlocks by default and reports where a deadlock/invalid unlock occurred —without having to reproduce in GDB. It keeps some flexibility with reentrant, and it won't impact performance when checks are disabled (we already check for reentrant). |
This comment has been minimized.
This comment has been minimized.
|
Somebody wants to open an issue about this so we don't forget it? |
This comment has been minimized.
This comment has been minimized.
|
@jhass this is blocking the release so it certainly won't be forgotten |
This comment has been minimized.
This comment has been minimized.
|
Is it blocking the release? I think we can release it like that for 0.32.0 and adjust in 0.32.1 or 0.33.0 |
This comment has been minimized.
This comment has been minimized.
|
It's too easy to forget about a regression if it makes it into a release. |
waj commentedOct 8, 2019
I made some improvements in the
Muteximplementation to optimize the time it takes to lock/unlock and handoff the ownership to waiting fibers.Current implementation is quite slow (specially in MT mode) because it always tries to be extremely fair by passing the ownership to the first fiber in the waiting queue. But that fiber might be running in a thread that is already sleeping, and waking up threads is costly and slow. So now the fiber is still awaken but it doesn't automatically have the ownership of the lock. Instead it will compete again with other fibers that might not be sleeping yet. Also, the lock will spin some time before sleeping allowing small critical sections to be acquired without passing through the waiting queue at all.
So, the changes included in this PR are:
lockwill spin some time before sleepingMutexis not reentrant by default. Reentrancy requires keeping track of the locking mutex. Non-reentrant mutex just stores an atomic bit and it's much faster to "swap" the content than doing an atomic CAS operation.unlockfaster if no fibers are waiting.I was running a benchmark https://gist.github.com/waj/0c4b5835af088e8921fee4cc2c6006ed#file-bm_lock-cr (provided by @carlhoerberg, thanks Carl!) and these are the results:
Before changes:
After changes:
I was also running another test (https://gist.github.com/waj/0c4b5835af088e8921fee4cc2c6006ed#file-test-mutex-cr) to evaluate the correctness of the mutex:
Before changes:
After changes: