Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

core.thread: Implement critical regions. #204

Merged
merged 5 commits into from
Jul 9, 2012
Merged

core.thread: Implement critical regions. #204

merged 5 commits into from
Jul 9, 2012

Conversation

alexrp
Copy link
Contributor

@alexrp alexrp commented May 4, 2012

(Not to be confused with critical sections.)

Critical regions are areas of code where a thread absolutely must
not be suspended. Entering and exiting a critical region is very
similar to acquiring and releasing a mutex, but critical regions
do not block. They simply ensure that when the world is stopped,
a thread will not be suspended inside critical code; i.e. it will
be continually suspended and resumed until it is no longer in a
critical region.

This feature is useful to maintain invariants in garbage collectors
where you would normally use a lock. Empirically, it only happens
very rarely that a thread is suspended inside a critical region, so
the suspend/resume loop is a cheaper approach than always locking,
since locking results in high contention.

The way this commit implements critical regions, they are useful
beyond garbage collection, however; they are not specifically
coupled to any GC code.

Using critical regions is extremely error-prone. For instance,
using a lock inside a critical region will most likely result in
an application deadlocking because the stop-the-world routine will
attempt to suspend and resume the thread forever, to no avail.

As part of this pull request, I factored suspend() and resume() out of thread_suspendAll() and thread_resumeAll() to make it easier to get an overview of the code.

* until it is outside a critical region.
*
* This function is, in particular, meant to help maintain garbage collector
* invariants when a lock is not used.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really should put a very strong warning in here, like the one you put in your commit message:

WARNING
Using critical regions is extremely error-prone. For instance,
using a lock inside a critical region will most likely result in
an application deadlocking because the stop-the-world routine will
attempt to suspend and resume the thread forever, to no avail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@complexmath
Copy link
Contributor

Did a lot of indentation change? There are a ton of diffs in this request.

@alexrp
Copy link
Contributor Author

alexrp commented May 5, 2012

I moved suspend() and resume() out of thread_suspendAll() and thread_resumeAll(), so their entire indentation changed. I didn't actually change their code.

@alexrp
Copy link
Contributor Author

alexrp commented May 11, 2012

Anything else that needs fixing? I'd like to manage to get this and my cooperative suspension patch into the next release.

@alexrp
Copy link
Contributor Author

alexrp commented May 25, 2012

Folks, this has been sitting here for 20 days with no word on what's going to happen to it (and it has passed the auto tester since day one), and I have two other patches depending on it, and an ambitious goal of getting those in before 2.060 is released. Can we please try to speed this up? Reviews seem to happen reasonably quickly, but someone seriously needs to hit the merge button or give a reason for why this shouldn't be pulled in, instead of leaving it here to rot.

I completely understand that people may not have the time to review and merge/reject all pull requests quickly, but I think that's a clear indicator that we don't have enough people with the rights to press that merge button to begin with. Delegation is key to the success of open source projects. If you look at other open source projects, changes are generally reviewed within one day and accepted (with any needed adjustments) within 2-3 days, while in the D community, the process seems to take eons simply because of the "final verdict" bottleneck.

(Mind you, I'm not trying to say I want aforementioned rights - I really don't care (in fact, I fully believe we shouldn't accept code from anyone without having at least one extra set of eyes look at it). I just wish there were more people to actually review and merge pull requests, since it is clear we don't have enough right now.)

I hope I didn't come across as demanding or whatever. I'm just trying to make the point that our process is way too slow. Consider that I've been in the D community for many months now and I'm still annoyed by the slow pull request process at this point. Now consider how a newcomer feels if their pull requests take just as long to be reviewed/merged/rejected. They won't be particularly encouraged to continue contributing.

So, we need a long-term, proper solution to this. This isn't just me screaming "merge this one pull request already dammit".

...that turned out much more TL;DR than I intended it to be...

@dnadlinger
Copy link
Contributor

@alexrp: I can understand your frustration, as I have experienced similar situations, e.g. right now with dlang/phobos#555. In this specific case, the added feature is not exactly controversial, but helpful (also, I'd really like to make use of it in an upcoming article), and several people on IRC and GitHub already provided feedback (although I seem to have messed up the commit comments with rebasing). However, after a month it is still sitting there, without any comment on what is actually holding up the merge.

Please note that not receiving any feedback is probably the single most discouraging thing for most open source contributors, much more so than e.g. overly picky style guides, formal requirements, etc. Thus, I think it is really important that we figure out a way to efficiently handle pull requests, since the problem will (hopefully) just get worse over time. And please also don't forget that there are people like Alex and me who'd actually love to help finding a way to alleviate the situation.

@complexmath
Copy link
Contributor

It's a big change that I haven't had the time to consider carefully. If it helps, I tend to be very conservative about changes to core.thread. It's an easy module to break in subtle ways. More generally, I think it makes sense for runtime development to move a bit slowly since literally every D program relies on it.

@alexrp
Copy link
Contributor Author

alexrp commented May 28, 2012

Sean, I completely understand your hesitation to merge core.thread changes too quickly. Whenever I edit that file, I think I spend more time thinking about various ways I could end up breaking the threading infrastructure with my changes than I do on actually writing code... as they say. threading is something to be debugged with a brain, not a debugger.

But what I have implemented here is not exactly new, innovative, or dangerous: It's what almost every virtual machine (including Mono, where I originally found the concept) uses to make lock-free GC allocations possible.

Regarding slow development, I'm inclined to disagree. It is completely true that every D program depends on druntime, and any breakage (and especially in core.thread) would be disastrous for a release. But at the same time, I don't think the chances of breakage are that high.

A little comparison:

alexrp@alexrp ~/Projects/druntime $ git log --oneline --since=6.month . | wc -l
247
alexrp@alexrp ~/Projects/mono/mono $ git log --oneline --since=6.month . | wc -l
464

So, 247 druntime commits in the last 6 months, and 464 commits to Mono's virtual machine. Now consider that Mono's VM consists of a JIT compiler, IR optimization passes, profiler, interpreter, metadata (CIL) manipulation layer, garbage collection, threading infrastructure, ... The potential for breakage in Mono's runtime is insanely high compared to druntime because it's written in C of all languages and their VM has even more responsibilities than our runtime. And yet, they have more commits than us and have way more applications depending on their VM. Not to mention, the commercial products MonoTouch and Mono for Android completely depend on Mono. Also consider that they have to support way more architectures and operating systems than we do.

I regularly work with a Mono built straight from Git master and have only once in the last 6 months run into an actual VM problem. It was fixed almost instantaneously and never made it into a release.

I think we can do just as good as they can. I think our release cycle is in part to blame for the mentality I see here. If a release is made, and a critical bug is found halfway into the release cycle, no hot fix release is made. Instead, people have to wait for the fix in the next release which might be more than a month off. This model doesn't work. Release early, release often.

Not to mention, the slow development, regardless of our release cycle, does not scale. It will fall on its knees (if it hasn't already) as more people come to the D community, and they are going to be just as frustrated as I am.

Walter has just proposed another DMD beta, which most likely means these changes won't make it in for 2.060 as I had hoped. If this is the case, then I will have no choice but to fork druntime for my virtual machine work. I can't wait another entire release cycle. This is a sad state of affairs, but I have code that needs to be written, and I can't wait for this slow bureaucracy any longer.

(The above paragraph is not to say that I won't send my patches upstream. I will. But I just don't have the time to wait this long for patches to be reviewed/merged.)

alexrp added 2 commits July 7, 2012 00:28
(Not to be confused with critical sections.)

Critical regions are areas of code where a thread absolutely must
not be suspended. Entering and exiting a critical region is very
similar to acquiring and releasing a mutex, but critical regions
do not block. They simply ensure that when the world is stopped,
a thread will not be suspended inside critical code; i.e. it will
be continually suspended and resumed until it is no longer in a
critical region.

This feature is useful to maintain invariants in garbage collectors
where you would normally use a lock. Empirically, it only happens
very rarely that a thread is suspended inside a critical region, so
the suspend/resume loop is a cheaper approach than always locking,
since locking results in high contention.

The way this commit implements critical regions, they are useful
beyond garbage collection, however; they are not specifically
coupled to any GC code.

Using critical regions is extremely error-prone. For instance,
using a lock inside a critical region will most likely result in
an application deadlocking because the stop-the-world routine will
attempt to suspend and resume the thread forever, to no avail. This
means that you really shouldn't be messing with this feature unless
you *really* do know what you're doing.
@alexrp
Copy link
Contributor Author

alexrp commented Jul 6, 2012

Rebased the pull request on HEAD. Better see what the auto tester says.

@alexrp
Copy link
Contributor Author

alexrp commented Jul 7, 2012

Looks green again. The only failure is due to that odd core.sync.condition bug.

@andralex
Copy link
Member

andralex commented Jul 8, 2012

I find it hard to get behind this. Programming with "all interrupts disabled" is a technique that seems to have died somewhere in the 1980s. I can't find on the net a generally-agreed upon notion of "critical region" that is in accordance with this one, and the only one I found is in .NET and has a different, unrelated meaning. http://stackoverflow.com/questions/747551/difference-between-critical-section-critical-region-and-constrained-execut

Are you sure we really need this?

@alexrp
Copy link
Contributor Author

alexrp commented Jul 8, 2012

100% sure.

As pointed out in the comments in the code, this is absolutely not, ever something you should be using in normal code. If someone would use this as a synchronization mechanism, they'd be doing it completely wrong (and the comments do point this out).

Critical regions are a low-level mechanism tightly coupled with the garbage collector's stop-the-world machinery. Implementing lock-free allocation in a garbage-collected environment is close to impossible (if not impossible; I don't know of any other way) without this mechanism. You need a way to tell the GC "okay, I'm going to be doing something here that would normally require that I acquire the GC lock, but I need a tiny invariant to always hold, so don't interrupt me for this tiny interval". This tiny interval is usually ridiculously short (think a single store instruction in machine code). Also, note that critical regions are implemented as nothing more than a simple atomic Boolean condition.

In short, you can think of critical regions as a GC-aware/supported lock-free programming mechanism, much like CAS operations in hardware. That's all they're made for and all they're supposed to be used for. And your average programmer is never going to need it.

@alexrp
Copy link
Contributor Author

alexrp commented Jul 8, 2012

For reference, the term "critical region" comes from Mono: https://github.com/mono/mono/blob/master/mono/metadata/sgen-gc.h#L925

I will add references to the docs.

@alexrp
Copy link
Contributor Author

alexrp commented Jul 9, 2012

After a discussion with @andralex on IRC, we'll go with leaving the critical region functions undocumented (i.e. they don't show up in Ddoc) until we want to commit to the interface. I think this is reasonable enough, since we can give it a release or two to mature.

andralex added a commit that referenced this pull request Jul 9, 2012
core.thread: Implement critical regions.
@andralex andralex merged commit 1da26de into dlang:master Jul 9, 2012
@MartinNowak
Copy link
Member

Could you please make up a convincing example where this is needed.
We already have GC.enable/disable which does a very similar thing but doesn't lock up other threads.

@alexrp
Copy link
Contributor Author

alexrp commented Jul 10, 2012

Consider this made up code:

Object allocateObject(TypeInfo ti)
{
    {
        thread_enterCriticalRegion();

        scope (exit)
            thread_exitCriticalRegion();

        if (void* mem = gc_tryAllocNoLock(ti))
        {
            (cast(ubyte*)mem)[0 .. ti.init.length] = ti.init[];
            return cast(Object)mem;
        }
    }

    gc_acquireAllocLock();

    scope (exit)
        gc_releaseAllocLock();

    if (void* mem = gc_allocLocked(ti))
    {
        (cast(ubyte*)mem)[0 .. ti.init.length] = ti.init[];
        return cast(Object)mem;
    }
    else
        outOfMemoryError();

    assert(false);
}

(I know the code doesn't look much like what we have in druntime,
but I wrote it the way I did to make it easier to follow.)

Explanation: This is an object allocation function intended for
a precise garbage collector (which is what we're moving towards
in D). Being precise, it relies on sane information to be present
in GC-tracked fields of an object at all times. The only sensible
values in such fields are valid pointers to other GC-tracked
objects and the null value. Now, in addition to being precise, it
also supports lock-free allocation. There are many ways to do
lock-free allocation in GCs, and I won't get into the gritty
details here. Just know that gc_tryAllocNoLock() tries to do
a lock-free allocation and returns null if the operation could
not be done without a lock.

OK, so in the first block of code, we have the critical region
case. This code tries to allocate memory for the object without
acquiring a global allocation lock. Once memory is allocated, the
code initializes it with the object's vtable and the various
initial values of its fields. Now, this is where critical regions
are important. In the edge case where the world is suspended just
after the allocation and just before the initialization of the
object (or during it), the GC would get an inconsistent view of
the object's state. Now, it's fairly clear that this would only
happen rarely out of all the world suspensions in a normal
program. It is, nonetheless, a possibility. By wrapping this
operation in a critical region, we merely tell the STW routine
to not suspend the thread until we have finished initializing the
object, thus ensuring a consistent heap view for the GC.

Now, you might be wondering how an inconsistent view of an object
during GC matters at all. The thing to notice is that we are
initializing the object byte-by-byte. This means that a field the
size of a word which holds a GC pointer could have only half of
its bits set when the world stops, thus resulting in an invalid
GC pointer causing Bad Things (TM) to occur. The obvious fix is
to initialize word-by-word, but we can't do this since that would
require all objects to have a word-aligned size, which is too
unreasonable for a systems language like D.

In the code below the critical region, we have the normal locked
allocation. There probably isn't anything worth noting here. It's
just the fallback for the case where a lock-free allocation could
not be done.

You asked why we can't just use GC.enable()/GC.disable() for
the same thing critical regions are used for. It is true that the
same effect could be achieved by using these. However, this
approach has a fundamental problem, which is exactly what you
said: Other threads still get a chance to run while the GC is
in a disabled state. This is actually bad. It means that while
a thread is doing some work inside a GC-disabled region, other
threads get a chance to also reach such a region. If threads
repeatedly keep reaching these regions, a GC can (in edge cases),
be completely prevented. This is a real problem in programs with
lots of allocation. Critical regions ensure that all threads are
suspended incrementally, not giving any thread a chance to help
another thread stay active, therefore ensuring that a program
won't get an OOME due to contention in these edge cases.

What I have described above is all very specific to D. If someone
is using druntime's GC as a library (or just using its threading
infrastructure and attempting to be cooperative with the druntime
GC) in a virtual machine (hint hint, I am), then the ability to
use critical regions outside of just druntime becomes absolutely
essential. For instance, in my virtual machine, arrays know their
length (i.e. they have a field which holds it). This field must
be written to inside a critical region so the GC (be it the
druntime GC, libgc, or whatever) sees it. If this is not done,
the GC could miss references stored inside the array.

This is in general why I have been sending patches that expose
some low-level aspects of druntime. Cooperating with druntime in
things like virtual machines was previously too hard and in some
cases (such as this one) impossible. I believe that druntime,
being the runtime of a systems language, has to expose some of
its low-level functionality for people to be able to write such
software as virtual machines in D rather than C.

@MartinNowak
Copy link
Member

Being precise, it relies on sane information to be present in GC-tracked fields of an object at all times. The only sensible values in such fields are valid pointers to other GC-tracked objects and the null value.

A precise collector for D will have to cope with uninitialized memory and unions.

By wrapping this operation in a critical region, we merely tell the STW routine to not suspend the thread until we have finished initializing the object, thus ensuring a consistent heap view for the GC.

This is super expensive. Suspending/Resuming and waiting for a thread in a critical region requires 4 context switches.

won't get an OOME due to contention in these edge cases

Yeah, that's the risk with enable/disable but serializing threads doesn't scale.

If someone is using druntime's GC as a library in a virtual machine

The GC wasn't designed to serve anyone but the language/compiler. Therefor it's not a perfect fit for your application and I'm not sure to which extend we need to solve this within druntime. It would be interesting though to see your application (open source? link?).

@alexrp
Copy link
Contributor Author

alexrp commented Jul 13, 2012

A precise collector for D will have to cope with uninitialized memory and unions.

Fair. Still, to implement something like a cardtable efficiently, you're going to need a critical region for when you mark the memory region of a bit-blitted value type. There are probably many other uses I haven't thought of yet. Keep in mind that critical regions are basically the inverse of GC safe points.

This is super expensive. Suspending/Resuming and waiting for a thread in a critical region requires 4 context switches.

It's the cheapest approach that doesn't potentially let the process die from an OOME. What you have here is at the most basic level a choice between crashing or continuing (at a relatively small price in execution speed).

Yeah, that's the risk with enable/disable but serializing threads doesn't scale.

It isn't entirely correct to call it serial; the suspending and resuming is serial indeed (like the stop-the-world code always was), but threads that are in critical regions get to finish their work in parallel once resumed. It does mean a few extra loops on the rare occasion that a thread is suspended inside a critical region, but I would be very surprised if this showed up in profiling results at all. Besides, stop-the-world isn't scalable at all in the first place.

I understand that the suspend/resume loop for critical regions seems intimidating and evil, but as I explained, it will actually happen very rarely that a thread is suspended inside a critical region (and thus the suspend/resume loop will do no significant work). The only way it could happen so much that it would affect performance severely would be if someone abused the critical region API as a general synchronization mechanism (or maybe in laboratory cases that intentionally try to bring the suspend machinery to its knees, but I don't think this is very interesting to look at; a GC can be brought to its knees in many ways already, after all).

Consider: Mono has a total of 6 critical regions in the entire virtual machine, 4 of these being in lock-free allocation routines, 2 others being in the cardtable implementation. That's all. Out of all the code that the VM will execute, it's extremely unlikely for a thread to be suspended inside one of these. Granted, D is not virtualized, but the situation isn't really different for us.

The GC wasn't designed to serve anyone but the language/compiler. Therefor it's not a perfect fit for your application and I'm not sure to which extend we need to solve this within druntime. It would be interesting though to see your application (open source? link?).

I certainly wouldn't expect the GC to cater to every odd need out there. The reason I submitted this to druntime is that my virtual machine basically relies entirely on druntime's threading infrastructure. I could write my own, of course, but then I would have to fight druntime and its GC when it comes to the stop-the-world routine, thread state management, and so on. It wouldn't be pretty. Plus, it would be incredibly wasteful when druntime already has all of this infrastructure and abstraction in place. One of the goals of the virtual machine is that it should be trivially embeddable in any D application. That means that it must be able to cooperate with druntime's GC.

The project is here: https://github.com/lycus/mci

Note that there's no code written yet that uses critical regions. Since we're targeting a large amount of systems and architectures, we have to stick to released versions of druntime, DMD, etc. Using unreleased or patched versions would introduce too much friction. To see some actual code that uses critical regions, see:

(Search for ENTER_CRITIAL_REGION.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants