-
-
Notifications
You must be signed in to change notification settings - Fork 416
core.thread: Implement critical regions. #204
Conversation
* until it is outside a critical region. | ||
* | ||
* This function is, in particular, meant to help maintain garbage collector | ||
* invariants when a lock is not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really should put a very strong warning in here, like the one you put in your commit message:
WARNING
Using critical regions is extremely error-prone. For instance,
using a lock inside a critical region will most likely result in
an application deadlocking because the stop-the-world routine will
attempt to suspend and resume the thread forever, to no avail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Did a lot of indentation change? There are a ton of diffs in this request. |
I moved |
Anything else that needs fixing? I'd like to manage to get this and my cooperative suspension patch into the next release. |
Folks, this has been sitting here for 20 days with no word on what's going to happen to it (and it has passed the auto tester since day one), and I have two other patches depending on it, and an ambitious goal of getting those in before 2.060 is released. Can we please try to speed this up? Reviews seem to happen reasonably quickly, but someone seriously needs to hit the merge button or give a reason for why this shouldn't be pulled in, instead of leaving it here to rot. I completely understand that people may not have the time to review and merge/reject all pull requests quickly, but I think that's a clear indicator that we don't have enough people with the rights to press that merge button to begin with. Delegation is key to the success of open source projects. If you look at other open source projects, changes are generally reviewed within one day and accepted (with any needed adjustments) within 2-3 days, while in the D community, the process seems to take eons simply because of the "final verdict" bottleneck. (Mind you, I'm not trying to say I want aforementioned rights - I really don't care (in fact, I fully believe we shouldn't accept code from anyone without having at least one extra set of eyes look at it). I just wish there were more people to actually review and merge pull requests, since it is clear we don't have enough right now.) I hope I didn't come across as demanding or whatever. I'm just trying to make the point that our process is way too slow. Consider that I've been in the D community for many months now and I'm still annoyed by the slow pull request process at this point. Now consider how a newcomer feels if their pull requests take just as long to be reviewed/merged/rejected. They won't be particularly encouraged to continue contributing. So, we need a long-term, proper solution to this. This isn't just me screaming "merge this one pull request already dammit". ...that turned out much more TL;DR than I intended it to be... |
@alexrp: I can understand your frustration, as I have experienced similar situations, e.g. right now with dlang/phobos#555. In this specific case, the added feature is not exactly controversial, but helpful (also, I'd really like to make use of it in an upcoming article), and several people on IRC and GitHub already provided feedback (although I seem to have messed up the commit comments with rebasing). However, after a month it is still sitting there, without any comment on what is actually holding up the merge. Please note that not receiving any feedback is probably the single most discouraging thing for most open source contributors, much more so than e.g. overly picky style guides, formal requirements, etc. Thus, I think it is really important that we figure out a way to efficiently handle pull requests, since the problem will (hopefully) just get worse over time. And please also don't forget that there are people like Alex and me who'd actually love to help finding a way to alleviate the situation. |
It's a big change that I haven't had the time to consider carefully. If it helps, I tend to be very conservative about changes to core.thread. It's an easy module to break in subtle ways. More generally, I think it makes sense for runtime development to move a bit slowly since literally every D program relies on it. |
Sean, I completely understand your hesitation to merge core.thread changes too quickly. Whenever I edit that file, I think I spend more time thinking about various ways I could end up breaking the threading infrastructure with my changes than I do on actually writing code... as they say. threading is something to be debugged with a brain, not a debugger. But what I have implemented here is not exactly new, innovative, or dangerous: It's what almost every virtual machine (including Mono, where I originally found the concept) uses to make lock-free GC allocations possible. Regarding slow development, I'm inclined to disagree. It is completely true that every D program depends on druntime, and any breakage (and especially in core.thread) would be disastrous for a release. But at the same time, I don't think the chances of breakage are that high. A little comparison:
So, 247 druntime commits in the last 6 months, and 464 commits to Mono's virtual machine. Now consider that Mono's VM consists of a JIT compiler, IR optimization passes, profiler, interpreter, metadata (CIL) manipulation layer, garbage collection, threading infrastructure, ... The potential for breakage in Mono's runtime is insanely high compared to druntime because it's written in C of all languages and their VM has even more responsibilities than our runtime. And yet, they have more commits than us and have way more applications depending on their VM. Not to mention, the commercial products MonoTouch and Mono for Android completely depend on Mono. Also consider that they have to support way more architectures and operating systems than we do. I regularly work with a Mono built straight from Git master and have only once in the last 6 months run into an actual VM problem. It was fixed almost instantaneously and never made it into a release. I think we can do just as good as they can. I think our release cycle is in part to blame for the mentality I see here. If a release is made, and a critical bug is found halfway into the release cycle, no hot fix release is made. Instead, people have to wait for the fix in the next release which might be more than a month off. This model doesn't work. Release early, release often. Not to mention, the slow development, regardless of our release cycle, does not scale. It will fall on its knees (if it hasn't already) as more people come to the D community, and they are going to be just as frustrated as I am. Walter has just proposed another DMD beta, which most likely means these changes won't make it in for 2.060 as I had hoped. If this is the case, then I will have no choice but to fork druntime for my virtual machine work. I can't wait another entire release cycle. This is a sad state of affairs, but I have code that needs to be written, and I can't wait for this slow bureaucracy any longer. (The above paragraph is not to say that I won't send my patches upstream. I will. But I just don't have the time to wait this long for patches to be reviewed/merged.) |
(Not to be confused with critical sections.) Critical regions are areas of code where a thread absolutely must not be suspended. Entering and exiting a critical region is very similar to acquiring and releasing a mutex, but critical regions do not block. They simply ensure that when the world is stopped, a thread will not be suspended inside critical code; i.e. it will be continually suspended and resumed until it is no longer in a critical region. This feature is useful to maintain invariants in garbage collectors where you would normally use a lock. Empirically, it only happens very rarely that a thread is suspended inside a critical region, so the suspend/resume loop is a cheaper approach than always locking, since locking results in high contention. The way this commit implements critical regions, they are useful beyond garbage collection, however; they are not specifically coupled to any GC code. Using critical regions is extremely error-prone. For instance, using a lock inside a critical region will most likely result in an application deadlocking because the stop-the-world routine will attempt to suspend and resume the thread forever, to no avail. This means that you really shouldn't be messing with this feature unless you *really* do know what you're doing.
Rebased the pull request on |
Looks green again. The only failure is due to that odd |
I find it hard to get behind this. Programming with "all interrupts disabled" is a technique that seems to have died somewhere in the 1980s. I can't find on the net a generally-agreed upon notion of "critical region" that is in accordance with this one, and the only one I found is in .NET and has a different, unrelated meaning. http://stackoverflow.com/questions/747551/difference-between-critical-section-critical-region-and-constrained-execut Are you sure we really need this? |
100% sure. As pointed out in the comments in the code, this is absolutely not, ever something you should be using in normal code. If someone would use this as a synchronization mechanism, they'd be doing it completely wrong (and the comments do point this out). Critical regions are a low-level mechanism tightly coupled with the garbage collector's stop-the-world machinery. Implementing lock-free allocation in a garbage-collected environment is close to impossible (if not impossible; I don't know of any other way) without this mechanism. You need a way to tell the GC "okay, I'm going to be doing something here that would normally require that I acquire the GC lock, but I need a tiny invariant to always hold, so don't interrupt me for this tiny interval". This tiny interval is usually ridiculously short (think a single store instruction in machine code). Also, note that critical regions are implemented as nothing more than a simple atomic Boolean condition. In short, you can think of critical regions as a GC-aware/supported lock-free programming mechanism, much like CAS operations in hardware. That's all they're made for and all they're supposed to be used for. And your average programmer is never going to need it. |
For reference, the term "critical region" comes from Mono: https://github.com/mono/mono/blob/master/mono/metadata/sgen-gc.h#L925 I will add references to the docs. |
After a discussion with @andralex on IRC, we'll go with leaving the critical region functions undocumented (i.e. they don't show up in Ddoc) until we want to commit to the interface. I think this is reasonable enough, since we can give it a release or two to mature. |
core.thread: Implement critical regions.
Could you please make up a convincing example where this is needed. |
Consider this made up code:
(I know the code doesn't look much like what we have in druntime, Explanation: This is an object allocation function intended for OK, so in the first block of code, we have the critical region Now, you might be wondering how an inconsistent view of an object In the code below the critical region, we have the normal locked You asked why we can't just use What I have described above is all very specific to D. If someone This is in general why I have been sending patches that expose |
A precise collector for D will have to cope with uninitialized memory and unions.
This is super expensive. Suspending/Resuming and waiting for a thread in a critical region requires 4 context switches.
Yeah, that's the risk with enable/disable but serializing threads doesn't scale.
The GC wasn't designed to serve anyone but the language/compiler. Therefor it's not a perfect fit for your application and I'm not sure to which extend we need to solve this within druntime. It would be interesting though to see your application (open source? link?). |
Fair. Still, to implement something like a cardtable efficiently, you're going to need a critical region for when you mark the memory region of a bit-blitted value type. There are probably many other uses I haven't thought of yet. Keep in mind that critical regions are basically the inverse of GC safe points.
It's the cheapest approach that doesn't potentially let the process die from an OOME. What you have here is at the most basic level a choice between crashing or continuing (at a relatively small price in execution speed).
It isn't entirely correct to call it serial; the suspending and resuming is serial indeed (like the stop-the-world code always was), but threads that are in critical regions get to finish their work in parallel once resumed. It does mean a few extra loops on the rare occasion that a thread is suspended inside a critical region, but I would be very surprised if this showed up in profiling results at all. Besides, stop-the-world isn't scalable at all in the first place. I understand that the suspend/resume loop for critical regions seems intimidating and evil, but as I explained, it will actually happen very rarely that a thread is suspended inside a critical region (and thus the suspend/resume loop will do no significant work). The only way it could happen so much that it would affect performance severely would be if someone abused the critical region API as a general synchronization mechanism (or maybe in laboratory cases that intentionally try to bring the suspend machinery to its knees, but I don't think this is very interesting to look at; a GC can be brought to its knees in many ways already, after all). Consider: Mono has a total of 6 critical regions in the entire virtual machine, 4 of these being in lock-free allocation routines, 2 others being in the cardtable implementation. That's all. Out of all the code that the VM will execute, it's extremely unlikely for a thread to be suspended inside one of these. Granted, D is not virtualized, but the situation isn't really different for us.
I certainly wouldn't expect the GC to cater to every odd need out there. The reason I submitted this to druntime is that my virtual machine basically relies entirely on druntime's threading infrastructure. I could write my own, of course, but then I would have to fight druntime and its GC when it comes to the stop-the-world routine, thread state management, and so on. It wouldn't be pretty. Plus, it would be incredibly wasteful when druntime already has all of this infrastructure and abstraction in place. One of the goals of the virtual machine is that it should be trivially embeddable in any D application. That means that it must be able to cooperate with druntime's GC. The project is here: https://github.com/lycus/mci Note that there's no code written yet that uses critical regions. Since we're targeting a large amount of systems and architectures, we have to stick to released versions of druntime, DMD, etc. Using unreleased or patched versions would introduce too much friction. To see some actual code that uses critical regions, see:
(Search for ENTER_CRITIAL_REGION.) |
(Not to be confused with critical sections.)
Critical regions are areas of code where a thread absolutely must
not be suspended. Entering and exiting a critical region is very
similar to acquiring and releasing a mutex, but critical regions
do not block. They simply ensure that when the world is stopped,
a thread will not be suspended inside critical code; i.e. it will
be continually suspended and resumed until it is no longer in a
critical region.
This feature is useful to maintain invariants in garbage collectors
where you would normally use a lock. Empirically, it only happens
very rarely that a thread is suspended inside a critical region, so
the suspend/resume loop is a cheaper approach than always locking,
since locking results in high contention.
The way this commit implements critical regions, they are useful
beyond garbage collection, however; they are not specifically
coupled to any GC code.
Using critical regions is extremely error-prone. For instance,
using a lock inside a critical region will most likely result in
an application deadlocking because the stop-the-world routine will
attempt to suspend and resume the thread forever, to no avail.
As part of this pull request, I factored suspend() and resume() out of thread_suspendAll() and thread_resumeAll() to make it easier to get an overview of the code.