Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

GC Initialized on First Alloc #2057

Merged
merged 5 commits into from Feb 14, 2018
Merged

GC Initialized on First Alloc #2057

merged 5 commits into from Feb 14, 2018

Conversation

somzzz
Copy link
Contributor

@somzzz somzzz commented Jan 24, 2018

This PR removes the requirement of having the GC initialized at startup. Instead, gc_init is called when the GC is first invoked.

@dlang-bot
Copy link
Contributor

dlang-bot commented Jan 24, 2018

Thanks for your pull request and interest in making D better, @somzzz! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@somzzz
Copy link
Contributor Author

somzzz commented Jan 24, 2018

Fail seems unrelated #2048
@andralex

src/gc/proxy.d Outdated
}

alias GCInitNoThrow = void function() nothrow @nogc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be unused

src/gc/proxy.d Outdated
{
instanceLock.lock();
scope(exit) instanceLock.unlock();
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary extra scope?

src/gc/proxy.d Outdated

fprintf(stderr, "No GC was initialized, please recheck the name of the selected GC ('%.*s').\n", cast(int)config.gc.length, config.gc.ptr);
exit(1);
if (atomicLoad(*pinstance) is null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever I do double-checked locking, I like to put a comment :)
// using double-checked locking

src/gc/proxy.d Outdated
exit(1);
}

atomicStore(*pinstance, cast(shared GC)auxInstance);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition? setting instance to "not null" should be the last thing done correct? Otherwise another thread could come in, see that instance is not null and then continue before thread_init has finished. However, my comment is assuming that thread_init should be finished before another thread can start using the GC, is this assumption correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the startup, there could be multiple external "C" threads calling thread_attachThis concurrently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinNowak so what's the right course of action here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutual dependency between gc_init and thread_init isn't properly resolved.

if (atomicLoad(*pinstance) is null))
{
    instanceLock.lock();
    scope (exit) instanceLock.unlock();
    if (atomicLoad(*pinstance) is null))
    {
        // init
        // ...
        atomicStore(*pinstance, cast(shared GC)auxInstance);

        // thread_init <- should be moved before storing pInstance or some other thread could start to use the GC before thread_init finishes
    }
}

The problem is that thread_init depends on the GC, but any GC usage depends on thread_init (for suspending).
Let's figure out whether we can avoid the GC for Thread.getThis, if not then pass auxInstance directly to thread_init.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And BTW, use weaker (and much cheaper) orderings.

if (atomicLoad!(MemoryOrder.acq)(*pInstance) is null)
{
    // lock...
    if (atomicLoad!(MemoryOrder.acq)(*pInstance) is null)
    {
        // init...
        atomicStore!(MemoryOrder.rel)(*pInstance, auxInstance);
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinNowak let's make it most conservative and optimize in a subsequent step thx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nastier than it looks.
thread_init calls thread_attachThis which calls GC.disable and GC.enable.
https://github.com/dlang/druntime/blob/master/src/core/thread.d#L2138

While gc_init has not yet finished, the spinlock is still locked. Calling disable will redirect to gc_init (with the instance still null because store hasn't been done yet) -> deadlock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just directly pass the newly created GC instance to thread_init, so that thread_attachThis can directly access the GC interface instead of calling the global gc_disable/gc_enable functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for gc_disable/gc_enable.
However, the code then creates an object new Thread which will become a call to _d_newclass in lifetime.d.
https://github.com/dlang/druntime/blob/master/src/rt/lifetime.d#L93

This function calls GC.malloc. I can't pass the instance there and the code hangs on the spinlock (as GC.malloc tries to call gc_init).

A slightly ugly workaround is to create a static tlsInstance for the thread doing the initialization and use this one, if available, in gc_enable, gc_disable and gc_malloc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_d_newclass should forward to a function that accepts a gc instance, and then instead of calling new Thread, call the function directly (with the auxInstance of course).

src/gc/proxy.d Outdated
import core.stdc.stdlib : exit;

fprintf(stderr, "No GC was initialized, please recheck the name of the selected GC ('%.*s').\n", cast(int)config.gc.length, config.gc.ptr);
instanceLock.unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've unlocked twice! (scope(exit) instanceLock.unlock(); above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, if unlock is not called before exit(1), this test hangs: druntime/test/exceptions/src/unknown_gc.d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh oh, that may be a bug with D's scope(exit) implementation (anyone correct me if I'm wrong). Given this, I would suggest not using scope(exit) here and instead insert instancelock.unlock() at every "exit point". There should just be 2, just before calling exit and at the end of the block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we have a nice language feature for exactly this:

synchronized (instanceLock)
{
   // code here
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If scope(exit) doesn't work with the exit function, my guess is that this might suffer from the same bug. @somzzz can you try this out and confirm?

Copy link
Member

@schveiguy schveiguy Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, exit() doesn't return, so it can't exit the scope. This seems like the code is correct to me...

scope(exit) only works if the scope is exited, via stack unwinding or normal execution. Calling exit function is neither.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep scope(exit) and also unlock explicitly before calling exit thx

src/gc/proxy.d Outdated
instanceLock.lock();
scope(exit) instanceLock.unlock();
{
if (atomicLoad(*pinstance) is null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think atomicLoad is necessary once inside the lock, since the value is guaranteed not to be modified while a thread is inside the lock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether you have tried the following trick of using a variable to cache and thus providing as fast, hot-path for the default case:

__gshared gcHasBeenInitialized;
if (!gcHasBeenInitialized)
{
 // do your normal locking here
 gcHasBeenInitialized = true;
}

TLS variables could be faster, but you could benchmark this for dynamic modules too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what that is, I believe, only fixed to be correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A TLS cache can be made thread-safe without atomics, of course.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to see a performance comparison on TLS vs atomic double-checked locking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that to be hard to do as a microbenchmark on x86 in a sensible fashion, since it depends a lot on non-local considerations. If there is no contention on the cache line, I'd expect double checked locking to be at least as fast (acq loads are just simple movs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checked locking should be faster, TLS reads go through an extra indirection and can be very slow if TLS is emulated. Just reading a shared variable by multiple threads doesn't cost much (except maybe for the non-inlined atomicRead!acq overhead with dmd).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@somzzz let's conservatively leave atomicLoad here and review it in a future pass thx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, precisely. On x86_64 TLS can pretty much only be faster if the indirection is cheap (i.e. static TLS model) and the other variant is burdened down by concurrent writes to other data on the same cache line (hence my above comment). I wouldn't worry about it.

src/gc/proxy.d Outdated
thread_init();
void gc_init_nothrow() nothrow
{
try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gc_init_throw is going to be called often. Rather than incurring the overhead of a try/catch for every gc_init_nothrow, this should probably go inside the double checked lock. That way you only have to pay the price of a try/catch once, when you initialize the GC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, use sth. similar to initOnce.
Instead of a separate flag variable, you could directly load/store the shared instance variable. Also seems like the spinLock would be fine to be used as mutex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose here was just to trick nothrow.
A short idiom for that is the following.

void foo() nothrow
{
    scope (failure) assert(0, "unexpected exception");
    stmt; // not yet nothrow annotated
}

Calling abort is cleaner as it prints a better error message, so you might just want to stick with the current code. In that case please add a comment that gc_init should be annotated nothrow instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, try/catch and scope(failure) incur some amount of overhead. By moving this inside the lock we only have to incur it once. I still recommend doing so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @MartinNowak's suggestions, initOnce looks like it could work here so long as you use a mutex and wrap the init argument in a tryCatch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initOnce is in Phobos, which is the reason I didn't suggest it earlier. We cut put something like it in druntime, of course, for internal use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dmd is fairly bad at optimizing initOnce, so doing it manually seems fine here.

src/gc/proxy.d Outdated
@@ -68,132 +103,159 @@ extern (C)
// NOTE: Due to popular demand, this has been re-enabled. It still has
// the problems mentioned above though, so I guess we'll see.

instance.collectNoStack(); // not really a 'collect all' -- still scans
// static data area, roots, and ranges.
if (instance !is null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with who calls gc_term. Would it be better to make this synchronized like you have done in gc_init? Or is this only going to be called by a single thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gc_term is called in dmain2.d->rt_term after thread_joinAll.
https://github.com/dlang/druntime/blob/master/src/rt/dmain2.d#L223

@marler8997
Copy link
Contributor

Thanks for making this change. I like the idea.

src/gc/proxy.d Outdated
instance.enable();
}

void gc_disable()
{
assert(instance !is null);
instance.disable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break code that calls GC.disable at the beginning of the program? Sometimes it's useful to do this and schedule your own calls to GC.collect; I've done this before to optimize GC performance.

Would it be better to also initialize the GC in gc_enable / gc_disable, in addition to doing it upon the first memory allocation? IOW, initialize the GC the first time it's used in some way, whether it's a memory allocation or enable/disable, etc..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a use case for it, then yes - it should be initialized.
Which of these methods do you think should initialize the GC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely GC.disable as it's fairly common. Don't think anyone has explicitly called GC.enable yet. Even better if we could memorize GC.disable and lazily initialize and disable the GC if it's used later on. It would be weird to initialize a GC just to disable it.
Don't make things too complicated though, when in doubt just initialize the GC on any access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... just had a weird idea.

What if the instance is initialized with a GC that when first used, figures out what GC to allocate, sets the 'instance' to the new one, and calls the appropriate function on it. Then we can handle things like a bool to track the disabling without having to initialize until someone tries to call malloc, etc.

Then once the real instance is in place, there is no longer any checking for initialization.

You'd still need to atomically load the instance every time you use it, unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the methods requiring that behavior disable, addRoot, removeRoot, rootIter, addRange, removeRange, rangeIter? Sounds like a reasonable approach for lazy initialization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, that just requires an atomicLoad!(MemoryOrder.acq), and that's free on strongly-ordered CPUs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would initialize a full GC on anything that requires storing an array of things to do, because that involves allocation. Just having a global bool is OK to cache until you need the whole GC.

that's free on strongly-ordered CPUs

Right, x86 doesn't have issues with loads of a word, so those will be fine. It's the other architectures I don't know enough about how the performance might be. I would leave that decision/discussion to those who know more :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would initialize a full GC on anything that requires storing an array of things to do

Hm... I guess if you need to allocate using the C heap you can, as long as the behavior matches what the GC would actually do. You might even pass those into the GC initializer, and it would just take over ownership of the arrays.

Copy link
Member

@andralex andralex Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, that just requires an atomicLoad!(MemoryOrder.acq), and that's free on strongly-ordered CPUs.

Please KISS (Keep It Simple and conServative) for now. Then we'll take a separate PR with loads and stores tightened. If we ever have trouble with those we'll be able to track that easily. Thanks!

src/gc/proxy.d Outdated
import core.stdc.stdlib : exit;
import core.atomic : atomicLoad, atomicStore;

auto pinstance = cast(shared(GC)*) &instance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you remove the second atomicLoad (see comment below). I believe you could just do
if (atomicLoad(instance))
and you don't need the pinstance variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marler8997 I think that won't work because atomicLoad expects a ref to a shared object

Copy link
Member

@MartinNowak MartinNowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, the ugly startup order has long been lingering around. I guess we want to add a test to druntime/test/ that checks that a @nogc main can be run without initializing the GC.
There is already unknown_gc which should now require a GC allocation to pass, but you can adopt it to check that no GC is ever initialized.

src/gc/proxy.d Outdated

// NOTE: The GC must initialize the thread library
// before its first collection.
thread_init();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you indent to do about Thread.getThis? It is currently a GC allocated instance that is eagerly created in thread_init.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be why *pinstance had to be set just above! Seems fragile. Perhaps we can not use the GC for getThis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can currently retain a Thread object longer than the threads lives, e.g. there is even a Thread.join methods, so there isn't a clear owner of that instance. Not too easy to avoid the GC here.

src/gc/proxy.d Outdated
thread_init();
void gc_init_nothrow() nothrow
{
try
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, use sth. similar to initOnce.
Instead of a separate flag variable, you could directly load/store the shared instance variable. Also seems like the spinLock would be fine to be used as mutex.

src/gc/proxy.d Outdated
exit(1);
}

atomicStore(*pinstance, cast(shared GC)auxInstance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the startup, there could be multiple external "C" threads calling thread_attachThis concurrently.

@rainers
Copy link
Member

rainers commented Jan 24, 2018

  • This is pretty performance sensitive code, please benchmark the GC suite with benchmark/runbench

  • We have removed null-checks for the instance not too long ago (though they have been used for a different purpose). Adding them back with atomics is probably not so great. An alternative would be to implement the GCInterface with an object that switches the instance global to the standard GC if necessary.

  • all the assert(instance !is null) can probably do/return something sensible knowing that no GC memory has been allocated.

  • I also suspect that this change will probably not have an immediate effect because other init functions are likely to access the GC, e.g. initStaticDataGC calls GC.addRange.

src/gc/proxy.d Outdated
}

alias GCInitNoThrow = void function() nothrow @nogc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it private

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if used)

src/gc/proxy.d Outdated
import core.stdc.stdlib : exit;
import core.atomic : atomicLoad, atomicStore;

auto pinstance = cast(shared(GC)*) &instance;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marler8997 I think that won't work because atomicLoad expects a ref to a shared object

src/gc/proxy.d Outdated
import core.stdc.stdlib : exit;

fprintf(stderr, "No GC was initialized, please recheck the name of the selected GC ('%.*s').\n", cast(int)config.gc.length, config.gc.ptr);
instanceLock.unlock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please delete

src/gc/proxy.d Outdated
exit(1);
}

atomicStore(*pinstance, cast(shared GC)auxInstance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinNowak so what's the right course of action here?

src/gc/proxy.d Outdated

// NOTE: The GC must initialize the thread library
// before its first collection.
thread_init();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be why *pinstance had to be set just above! Seems fragile. Perhaps we can not use the GC for getThis?

@MartinNowak
Copy link
Member

That's the main blocker to make this work https://github.com/dlang/druntime/pull/2057/files#r163907886.

@schveiguy
Copy link
Member

This may have been lost in the noise, so I'll link to it here:

#2057 (comment)

Might be a viable option that eliminates some of the possible performance drawbacks.

@marler8997
Copy link
Contributor

@schveiguy I'm actually working an an alternative that explores your idea. Will make a PR soon if it turns out well.

src/gc/proxy.d Outdated

fprintf(stderr, "No GC was initialized, please recheck the name of the selected GC ('%.*s').\n", cast(int)config.gc.length, config.gc.ptr);
instanceLock.unlock();
exit(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add an assert(0); here to note that exit(1) shouldn't return.

src/gc/proxy.d Outdated
instanceLock.lock();
scope(exit) instanceLock.unlock();
{
if (atomicLoad(*pinstance) is null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@somzzz let's conservatively leave atomicLoad here and review it in a future pass thx

src/gc/proxy.d Outdated
import core.stdc.stdlib : exit;

fprintf(stderr, "No GC was initialized, please recheck the name of the selected GC ('%.*s').\n", cast(int)config.gc.length, config.gc.ptr);
instanceLock.unlock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep scope(exit) and also unlock explicitly before calling exit thx

src/gc/proxy.d Outdated
exit(1);
}

atomicStore(*pinstance, cast(shared GC)auxInstance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinNowak let's make it most conservative and optimize in a subsequent step thx

src/gc/proxy.d Outdated
instance.enable();
}

void gc_disable()
{
assert(instance !is null);
instance.disable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't make things too complicated though, when in doubt just initialize the GC on any access.

let's do that now, refine later

@rainers
Copy link
Member

rainers commented Jan 26, 2018

Possible @rainers had the same thought, I just didn't see it

Thanks for noticing ;-) I think with the switching proxy object no atomics are needed to access the instance pointer in the general case if pointer writes cannot appear as partial operations in other threads (which should cover ost architectures). The synchronization is only needed in the switching proxy, i.e. it is a one time operation and doesn't need to be very fast.

The disable state of the GC can be saved to gc.config.disable before atually initializing the GC.

BTW: how about not allocating any pool during GC initialization. That would have almost the same effect as not initializing it, but a few tiny allocations.

@andralex
Copy link
Member

There are 2 simple acceptance criteria, the overhead of lazy GC must be negligible, and we must not initialize and allocate OS resources when the GC is not used (lazy initialization). The latter one still isn't met, but very simple to achieve.

I realized I omitted a discussion I've had with @somzzz. Next on her list to move the code that adds roots in _Dmain from src/rt/sections_elf_shared.d. The roots will be added lazily during the initialization. That eliminates the internal use of addRoot/addRange. I do agree that if one of the first things _Dmain does is to call addRoot/addRange, then this PR doesn't help a lot.

So please let's get this in without more features. Accumulating addRoot/addRange calls in an array is not trivial - where do you allocate the array? There's no GC. So then we need to use malloc/free, an additional complication. And once we remove addRoot/addRange from _Dmain, the clientele of that complication will be virtually nobody.

class ProtoGC : GC
{
__gshared Array!Root roots;
__gshared Array!Range ranges;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do those really need to be __gshared?

@MartinNowak
Copy link
Member

So please let's get this in without more features. Accumulating addRoot/addRange calls in an array is not trivial - where do you allocate the array? There's no GC.

As said earlier, we have @nogc containers in druntime for those purposes.
In fact they are already added (https://github.com/dlang/druntime/pull/2057/files#diff-254a9fe518c0b5c278aeec1838d7adf8R34), but not used.

@andralex
Copy link
Member

@MartinNowak in wake of the upcoming removal of addRange from _Dmain, what would be the usefulness of your proposed added feature?

@MartinNowak
Copy link
Member

I realized I omitted a discussion I've had with @somzzz. Next on her list to move the code that adds roots in _Dmain from src/rt/sections_elf_shared.d. The roots will be added lazily during the initialization. That eliminates the internal use of addRoot/addRange. I do agree that if one of the first things _Dmain does is to call addRoot/addRange, then this PR doesn't help a lot.

That sounds nice in theory, but you're likely underestimating the complexity in that area, because GC roots are dynamically added/removed when loading/unloading shared libraries. So instead of conflating a complex with another complex, let's just go for the straightforward solution and cache roots/ranges until lazily initializing the GC.
In fact we've prolly spent 2x more time discussing this, than would be necessary to do the change.

@andralex
Copy link
Member

@MartinNowak meet on slack?

@andralex
Copy link
Member

After discussing, the shared lib example is quite compelling. @somzzz will implement the changes with this PR.

Copy link
Member

@MartinNowak MartinNowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great.

@MartinNowak
Copy link
Member

Another nice thing that comes with this PR, once #1924 is implemented, applications wouldn't even need to link against any GC.

@dlang-bot dlang-bot merged commit fb17fe9 into dlang:master Feb 14, 2018
@andralex
Copy link
Member

Congrats @somzzz and many thanks to all reviewers!

@MartinNowak
Copy link
Member

Congrats @somzzz and many thanks to all reviewers!

Yes, thanks for this important step towards a no-GC language, and thanks @schveiguy for the nice ProtoGC idea.
The amount of feedback was a bit overwhelming, but this is certainly due to the interest in the topic.

@schveiguy
Copy link
Member

thanks @schveiguy for the nice ProtoGC idea

Thanks for the credit, I also have to mention that @rainers had the same idea (before I did), but just expressed it in a different way here: #2057 (comment).

size_t reserve(size_t size) nothrow
{
gc_init_nothrow();
return reserve(size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! It should have been gc_reserve


void addRange(void* p, size_t sz, const TypeInfo ti = null) nothrow @nogc
{
ranges.insertBack(Range(p, p + sz, cast() ti));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem thread-safe – are we sure there can never be a situation where ProtoGC sticks around long enough to be racy (e.g. after improving druntime thread startup to be nogc and using statically allocated Thread instances, or when attaching to multiple C threads)?

Has this been thoroughly thought through for race conditions in general, especially considering architectures where loading the global instance doesn't automatically have acquire semantics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. Likely you need to take the lock. I'm wondering if adding/removing ranges or roots should simply initialize the GC at that point.

I thought the instance should be atomically loaded, see Martin's suggestion here. But that doesn't help if you need atomic array appending.

@dnadlinger
Copy link
Member

Ping @somzzz, @andralex – I don't have the bandwidth to look into this myself right now, but someone should (or make sure it isn't actually an issue).

@dnadlinger
Copy link
Member

Ping @somzzz, @andralex – still waiting for a response.

@andralex
Copy link
Member

andralex commented Mar 1, 2018

@klickverbot thanks for the reminder. @somzzz is off to a scholarship in Singapore so she's off the project. The concern about addRange seems legit. Can you please create a bugzilla issue? Thanks!

@dnadlinger
Copy link
Member

I've created an "umbrella" issue – no time to delve into the code right now to figure out what exactly is broken right now.

@ghost
Copy link

ghost commented Jun 18, 2018

This change causes a regression in phobos Array container. gc_init should be made public now. see bugzilla

@schveiguy
Copy link
Member

gc_init should be made public now.

No, the point of ProtoGC is to initialize properly when needed, we should not need to make gc_init public (nor should we require calling it).

The problem causing the issue is that removeRoot and removeRange do not behave correctly. I'll file a PR to fix this.

@schveiguy
Copy link
Member

Fixing PR: #2220

JohanEngelen added a commit to weka/ldc that referenced this pull request Jan 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
10 participants