add initOnce #2664

MartinNowak · 2014-11-07T23:54:49Z

uses double-checked locking (the correct one)
all instances shared the same global mutex
minimal overhead and simple API

MartinNowak · 2014-11-08T00:08:14Z

Herb Sutter recently mentioned that call_once is implemented using something else than DCL.
I couldn't find it anywhere though.

MartinNowak · 2014-11-08T01:17:04Z

all instances shared the same global mutex

There is a potential for dead-locking here when one thread waits in callOnce for the result of another thread that also invokes callOnce. It's probably not a practical issue and I'd rather save all the mutex allocations.

ntrel · 2014-11-10T14:09:48Z

std/concurrency.d

+ * Executes the given $(D_PARAM callable) exactly once, even when invoked by
+ * multiple threads simultaneously.  The implementation guarantees that all
+ * calling threads block until $(D_PARAM callable) returns and that any
+ * side-effects of $(D_PARAM callable) is globally visible afterwards.


nitpick: is -> are

dnadlinger · 2014-11-10T15:55:40Z

Did you measure whether the variant using atomics or the TLS one are actually better for typical applications (typical of course being hard to define)? I'd actually expect the latter to be faster in cases where there are a lot of concurrent accesses.

Furthermore, I don't think the dead-lock issue should be brushed aside so lightly. Concurrency primitives that come with hidden traps are rarely a good thing.

MartinNowak · 2014-11-10T17:44:24Z

Did you measure whether the variant using atomics or the TLS one are actually better for typical applications (typical of course being hard to define)?
I'd actually expect the latter to be faster in cases where there are a lot of concurrent accesses.

No, there is going to be only a single write (no contention), acquire/release is good enough (and cheap or even free) and TLS access is slower (emulated TLS).
For dmd there is a little extra overhead, because core.atomic doesn't use intrinsics.

MartinNowak · 2014-11-10T17:46:00Z

Furthermore, I don't think the dead-lock issue should be brushed aside so lightly. Concurrency primitives that come with hidden traps are rarely a good thing.

I added a note in the docs. But I could still only come up with very contrived examples where this would be a problem.

mihails-strasuns · 2014-11-11T16:52:34Z

Usual ping to @complexmath

dnadlinger · 2014-11-11T16:59:54Z

Depending on the platform/TLS model TLS accesses might be nearly free too, but I guess you are right – in general, they aren't, and at least on x86 the read should be free (unless you get extremely unlucky with what else is on your cache line, of course, but that's beside the point here).

MartinNowak · 2014-11-14T00:15:14Z

Are we still waiting for something?

dnadlinger · 2014-11-14T00:44:58Z

I'd like one or two more opinions before adding a new API artifact, just as a matter of principle.

Kozzi11 · 2014-11-14T09:59:07Z

std/concurrency.d

+{
+    import core.atomic;
+
+    static shared bool flag;


Maybe silly question, but should not be used __gshared here instead of shared?

It's used in atomic operations, therefor requires shared. And shared static variables are global, not thread-local, just like __gshared.

@MartinNowak thanks

Also I think general rule of a thumb is to always try going for shared instead of __gshared unless there are some transitivity issues that prevent it (like with Mutex)

quickfur · 2014-11-14T15:50:40Z

I'd like to help review this, but my experience with concurrent programming is rather limited, so I don't know if I can give a meaningful review. But at least on the surface, the code looks OK to me.

dnadlinger · 2014-11-14T20:57:46Z

Hm, one additional, non-obvious pitfall is that multiple callOnce invocations with, say, the same function declaration as a parameter instead of a lambda will also only invoke the target once. I'm starting to think that we should simply use something like std::once_flag in C++.

This way, the deadlock issue would by default drastically be reduced in scope to areas that are already coupled in source code, as we'd just put the Mutex in that object. If the few extra bytes would be critical for a given application, the user could always make the decision to re-use it on their own.

You could also make specifying the once_flag analog optional, defaulting to a separate one per invocation.

DmitryOlshansky · 2014-11-14T21:12:41Z

I'm starting to think that we should simply use something like std::once_flag in C++.

Agreed.

jacob-carlborg · 2014-11-15T14:53:09Z

std/concurrency.d

+ * $(D_PARAM callable) should not wait for another thread that might
+ * also invoke callOnce or a dead-lock can happen.
+ */
+void callOnce(alias callable)()


Do we want to have a template constraint making sure that callable is callable?

IgorStepanov · 2014-11-15T17:11:49Z

@MartinNowak this implenentation looks correct, but I think that global mutex with static this (which may be instantiated more then one time) is stodgy. Why don't you use bool flag instead of the second mutex?

void callOnce(alias callable)()
{
    import core.atomic;

    static shared bool flag;
    static shared bool flag2;
    if (!atomicLoad!(MemoryOrder.acq)(flag))
    {
        if (cas(&flag2, false, true))
        {
            if (!atomicLoad!(MemoryOrder.acq)(flag))
            {
                callable();
                atomicStore!(MemoryOrder.rel)(flag, true);
                atomicStore!(MemoryOrder.rel)(flag2, false);
            }
            else
            {
                 while(atomicLoad!(MemoryOrder.acq)(flag2))
                     Thread.yeild();
            } 
        }
    }
}

MartinNowak · 2014-11-16T23:51:40Z

Hm, one additional, non-obvious pitfall is that multiple callOnce invocations with, say, the same function declaration as a parameter instead of a lambda will also only.

Yes, and that's perfectly fine. Because it means that multiple invocation to say a library initialization will only happens once, it's a sane default. You can easily workaround it by wrapping your function call in a lambda.

I'm starting to think that we should simply use something like std::once_flag in C++.

No, I thought about this quite a while and I don't think it's worth the trouble.
Not only is the API much more complex (and requires way more documentation) the dead-lock problem is even bigger with a separate once_flag. It's a single flag that you can associate with different functions which allows for hard to debug lock ordering problems.
My conclusion was to go with a simple API, and add things like separate flags when there is an actual need.

This way, the deadlock issue would by default drastically be reduced in scope to areas that are already coupled in source code, as we'd just put the Mutex in that object. If the few extra bytes would be critical for a given application, the user could always make the decision to re-use it on their own.

A once_flag can only be used once, hence the name.

dnadlinger · 2014-11-17T00:30:10Z

Hm, one additional, non-obvious pitfall is that multiple callOnce invocations with, say, the same function declaration as a parameter instead of a lambda will also only.

Yes, and that's perfectly fine. Because it means that multiple invocation to say a library initialization will only happens once, it's a sane default. You can easily workaround it by wrapping you function call in a lambda.

It's also perfectly unexpected if you think of it as "do this function call once". Your "easy" workaround will appear completely mysterious to many users, because it relies on the fact that two semantically identical lambdas are still distinct symbols – something that, if I remember correctly, by the way isn't even documented right now.

Note that you can write your library initialization very naturally in the other model too, and this time without relying on non-obvious details:

void apiClient1() { ensureInitialized(); … } 
void apiClient2() { ensureInitialized(); … }
void ensureInitialized() { callOnce({ auto h = dlopen(…); … }); }

I'm starting to think that we should simply use something like std::once_flag in C++.

No, I thought about this quite a while and I don't think it's worth the trouble.
Not only is the API much more complex (and requires way more documentation)

"much more complex"? It adds a single type. I don't think it would necessarily entail more documentation (and certainly not "way"), if you, from a user's point of view, properly documented the details of your proposal.

the dead-lock problem is even bigger with a separate once_flag.

The problem that exists in your implementation certainly isn't; it is at least as big as with a separate once_flag, and in my opinion much worse (as you can't even see the sharing in the code). There might be a different, but let's keep the discussion to the point.

It's a single flag that you can associate with different functions which allows for hard to debug lock ordering problems.

That is indeed an issue to consider.

My conclusion was to go with a simple API, and add things like separate flags if there is an actual need.

I don't think your API is conceptually more simple than having a separate flag. The latter can easily be explained as the thread-global equivalent of static bool flag; if (!flag) { flag = true; … }.

This way, the deadlock issue would by default drastically be reduced in scope to areas that are already coupled in source code, as we'd just put the Mutex in that object. If the few extra bytes would be critical for a given application, the user could always make the decision to re-use it on their own.

A once_flag can only be used once, hence the name.

Err, yes, of course.

MartinNowak · 2014-11-17T01:03:01Z

To cite http://en.cppreference.com/w/cpp/thread/call_once

Exactly one execution of exactly one of the functions (passed as f to the invocations in the group) is performed. It is undefined which function will be selected for execution. The selected function runs in the same thread as the call_once invocation it was passed to.

The problem that exists in your implementation certainly isn't; it is at least as big as with a separate once_flag, and in my opinion much worse (as you can't even see the sharing in the code).

It's mentioned in the documentation, and because it's a recursive mutex you'd need to interact with another thread that already invokes callOnce. I don't agree that this is a very practical problem, but we could add an overload to callOnce that takes a specific mutex argument to resolve it.

Note that you can write your library initialization very naturally in the other model too, and this time without relying on non-obvious details:

Sure it's cleaner if the library provided a thread-safe initialization function, but if it doesn't you're stuck. And if you have multiple libraries in your app that use the same library you'll end up initializing it twice.

I don't think your API is conceptually more simple than having a separate flag.

It saves the not so trivial boilerplate of declaring and initializing a separate flag.

class Foo
{
    private static shared OnceFlag fooFlag;
    shared static this()
    {
        fooFlag = OnceFlag();
    }

    Foo instance()
    {
        callOnce!({ _instance = new Foo(); })(fooFlag);
        return _instance;
    }
}

It's also perfectly unexpected if you think of it as "do this function call once".

I added an example for this, still missing a valid use-case where you actually want this behavior.

MartinNowak · 2014-11-17T01:12:08Z

The problem that exists in your implementation certainly isn't; it is at least as big as with a separate once_flag

It's simpler because there is no lock order issue. The following code will always dead-lock, but again I can only come up with very contrived examples.

callOnce!({
    auto tid = spawn({
        callOnce!({writeln("foobar");})();
        ownerTid.send(true);
    });
    receiveOnly!bool();
});

IgorStepanov · 2014-11-17T01:22:28Z

The following code will always dead-lock...

Does it happens because you use one global mutex for all callOnce instance? Why don't use separate mutexes/flags for each callOnce instance?

MartinNowak · 2014-11-17T02:31:38Z

Does it happens because you use one global mutex for all callOnce instance?

It would be much harder to dead-lock with a mutex per instance.

Why don't use separate mutexes/flags for each callOnce instance?

Because it's a waste of resources, the mutex would at maximum be used once.
And I still don't think that it's a practical issue, but I'll add an overload that takes a mutex to resolve the discussion.

JakobOvrum · 2014-12-12T15:04:25Z

Will do it just fine in thread-local case.

Whether to use lazy or eager initialization (and whether to use compile-time or runtime initialization in the eager case) is orthogonal from whether it's TLS or global. TLS objects benefit from lazy initialization in exactly the same ways:

The object may not always be used, so the cost of initialization is only paid when necessary, or start-up time is reduced by delaying initialization to time-of-use.
Initialization may depend on runtime arguments only available at the place of use.
Runtime initialization is desired but a static constructor is out of the question, such as in circular dependency scenarios.

Sure, but it's also easy.

It's cluttered imperative boilerplate:

static bool isInitialized = false;
static T theSingleton;
if(!isInitialized)
{
    theSingleton = ...;
    isInitialized = true;
}
// use `theSingleton`

vs declarative:

static T theSingleton;
initOnce!theSingleton(...);
// use `theSingleton`

Of course the global singleton is much more involved and error-prone than the TLS one, but initOnce helps in the TLS case too.

edit:

To be clear, I'm not against putting it in std.concurrency, I think that's perfectly reasonable.

DmitryOlshansky · 2014-12-12T15:09:12Z

@JakobOvrum The point is that

static T theSingleton;
initOnce!theSingleton(...);
// use `theSingleton`

Most likely has to be:

@property auto theSingleton(){
static Singleton s;
initOnce!(s)(...);
return s;
}

So for singletons that's the same accessor boilerplate anyway and trading one for another really is not a deal breaker.

DmitryOlshansky · 2014-12-12T15:10:31Z

Only in C++, D doesn't have static variable initializers, they have to be computable at compile time.

Right sorry for confusion.

JakobOvrum · 2014-12-12T15:11:37Z

So for singletons that's the same accessor boilerplate anyway and trading one for another really is not a deal breaker.

Firstly, it's clearly not the same amount of boilerplate. But more importantly, lazy initialization is not just for the singleton pattern.

MartinNowak · 2014-12-12T15:50:09Z

Firstly, it's clearly not the same amount of boilerplate. But more importantly, lazy initialization is not just for the singleton pattern.

You correctly identified the usability problem. Now it happens quite often to me that I accidentally declare a thread-local variable instead of a global one. That would result in quite some debugging and confusion if initOnce could be used with TLS variables, e.g. to stick with my use case, calling the thread-unsafe curl_global_init multiple times possibly in multiple threads.
So the usability problem isn't really resolved, therefor we should disallow TLS variables altogether.

MartinNowak · 2014-12-12T15:53:47Z

Firstly, it's clearly not the same amount of boilerplate.

But almost and we have the policy to not add trivial stuff to phobos.

static Instance var;
if (!var) var = new Instance();

JakobOvrum · 2014-12-12T16:02:49Z

You correctly identified the usability problem. Now it happens quite often to me that I accidentally declare a thread-local variable instead of a global one. That would result in quite some debugging and confusion if initOnce could be used with TLS variables, e.g. to stick with my use case, calling the thread-unsafe curl_global_init multiple times possibly in multiple threads.
So the usability problem isn't really resolved, therefor we should disallow TLS variables altogether.

I think this is a good point for callOnce, but I'm not so sure about initOnce. The D type system prevents you from initializing a non-shared TLS variable with a shared object, so this will already fail to compile.

But almost and we have the policy to not add trivial stuff to phobos.

Most lazily initialized variables will be of reference type, so I guess that's a good point.

MartinNowak · 2014-12-13T17:42:44Z

Most lazily initialized variables will be of reference type, so I guess that's a good point.

Updated and all things addressed, at least I hope so ;).

MartinNowak · 2014-12-13T17:48:10Z

Thanks @JakobOvrum, this really came out much better than the initial callOnce.
The only I don't like that much, the documentation for the mutex overload is equally prominent.

MartinNowak · 2014-12-13T18:45:24Z

The only I don't like that much, the documentation for the mutex overload is equally prominent.

Partly fixed now, I attached each unittest example to it's own overload, so the one taking a mutex is separated a little better.
https://dlang.dawg.eu/phobos-prerelease/std_concurrency.html#.initOnce
Ready to merge.

JakobOvrum · 2014-12-14T06:59:24Z

std/concurrency.d

+auto ref initOnce(alias var)(lazy typeof(var) init, Mutex mutex)
+{
+    // check that var is global, can't take address of a TLS variable
+    static assert(is(typeof({__gshared p = &var;})), "var must be 'static shared' or '__gshared'.");


Why are TLS variables disabled when the type system prevents you from doing it wrong anyway?

MartinNowak · 2014-12-15T19:34:57Z

I think this is a good point for callOnce, but I'm not so sure about initOnce.

Why are you starting that discussion again? This is a primitive for race-free lazy initialization. There are no races when assigning TLS variables.

The D type system prevents you from initializing a non-shared TLS variable with a shared object, so this will already fail to compile.

Most of the time it's not even possible to use shared, prominent example being Mutex.
Not sure why you're trying so hard to squeeze lazy TLS initialization into a concurrency primitive.
But this example here is a real usability problem.

static Mutext mutex; // should be __gshared
void foo()
{
    synchronized (initOnce!mutex(new Mutex)) {
        // Oh, not mutual exclusive at all. Who's going to debug this?
    }
}

MartinNowak · 2014-12-20T15:39:58Z

Anything left to do?

MartinNowak · 2014-12-27T03:00:43Z

ping

WalterBright · 2014-12-31T07:05:34Z

std/concurrency.d

+ * simultaneously calling initOnce with the same $(D_PARAM var) argument block
+ * until $(D_PARAM var) is fully initialized. All side-effects of $(D_PARAM init)
+ * are globally visible afterwards.
+ *


Missing Params: block

Should I seriously write var - the variable to initialize and init - the initializer even though the first sentence perfectly describes what this function does? This feels a lot like auto sum = a + b; // add a and b comments.

Yes. The main description can be more of an overview, and be more of an explanation of what the point of this function is. For example:

http://www.cplusplus.com/reference/cmath/sin/

That's normal and expected.

Done, but I still disagree.
We don't improve our documentation by adding noise.
What's that supposed to tell me?
http://www.cplusplus.com/reference/cmath/pow/#parameters

jacob-carlborg · 2015-01-04T09:40:49Z

std/concurrency.d

+
+/**
+ * Initializes $(D_PARAM var) with the lazy $(D_PARAM init) value in a
+ * thread-safe manner.


Could you please add a blank line to make the first part a summary.

- uses double-checked locking (the correct one) - minimal overhead and simple API - default mutex is shared among all instances

MartinNowak · 2015-01-09T19:45:53Z

Anything left to do? This has been on hold for yet another week and almost 2 month in total.

MartinNowak · 2015-01-14T10:03:00Z

Ping again

DmitryOlshansky · 2015-01-14T10:07:29Z

LGTM

MartinNowak · 2015-01-15T07:21:47Z

Someone merge this please?

MartinNowak · 2015-01-20T08:44:34Z

Auto-merge toggled on

add initOnce

ntrel reviewed Nov 10, 2014
View reviewed changes

MartinNowak force-pushed the master branch 2 times, most recently from 61d7958 to 53412fc Compare November 10, 2014 15:26

MartinNowak force-pushed the master branch from 53412fc to 2e80999 Compare November 10, 2014 16:44

MartinNowak force-pushed the master branch 2 times, most recently from 996d8bc to 2e80999 Compare November 12, 2014 00:05

MartinNowak force-pushed the master branch 2 times, most recently from 07c94d0 to 2e80999 Compare November 14, 2014 00:28

Kozzi11 reviewed Nov 14, 2014
View reviewed changes

jacob-carlborg reviewed Nov 15, 2014
View reviewed changes

MartinNowak force-pushed the master branch from 2e80999 to cbe2e89 Compare November 17, 2014 00:11

MartinNowak force-pushed the master branch from cbe2e89 to 2d015b2 Compare December 13, 2014 17:41

JakobOvrum reviewed Dec 14, 2014
View reviewed changes

WalterBright reviewed Dec 31, 2014
View reviewed changes

MartinNowak force-pushed the master branch from 2d015b2 to fe4b15b Compare January 4, 2015 01:41

jacob-carlborg reviewed Jan 4, 2015
View reviewed changes

add initOnce

ac84cbe

- uses double-checked locking (the correct one) - minimal overhead and simple API - default mutex is shared among all instances

MartinNowak force-pushed the master branch from fe4b15b to ac84cbe Compare January 5, 2015 01:49

MartinNowak mentioned this pull request Jan 15, 2015

[trivial but see description][DDoc] Throws: for std.net.curl.get() #2871

Merged

MartinNowak added a commit that referenced this pull request Jan 20, 2015

Merge pull request #2664 from MartinNowak/master

dcee2cd

add initOnce

MartinNowak merged commit dcee2cd into dlang:master Jan 20, 2015

Uh oh!

add initOnce #2664

add initOnce #2664

Uh oh!

Conversation

MartinNowak commented Nov 7, 2014

Uh oh!

MartinNowak commented Nov 8, 2014

Uh oh!

MartinNowak commented Nov 8, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnadlinger commented Nov 10, 2014

Uh oh!

MartinNowak commented Nov 10, 2014

Uh oh!

MartinNowak commented Nov 10, 2014

Uh oh!

mihails-strasuns commented Nov 11, 2014

Uh oh!

dnadlinger commented Nov 11, 2014

Uh oh!

MartinNowak commented Nov 14, 2014

Uh oh!

dnadlinger commented Nov 14, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quickfur commented Nov 14, 2014

Uh oh!

dnadlinger commented Nov 14, 2014

Uh oh!

DmitryOlshansky commented Nov 14, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IgorStepanov commented Nov 15, 2014

Uh oh!

MartinNowak commented Nov 16, 2014

Uh oh!

dnadlinger commented Nov 17, 2014

Uh oh!

MartinNowak commented Nov 17, 2014

Uh oh!

MartinNowak commented Nov 17, 2014

Uh oh!

IgorStepanov commented Nov 17, 2014

Uh oh!

MartinNowak commented Nov 17, 2014

Uh oh!

JakobOvrum commented Dec 12, 2014

Uh oh!

DmitryOlshansky commented Dec 12, 2014

Uh oh!

DmitryOlshansky commented Dec 12, 2014

Uh oh!

JakobOvrum commented Dec 12, 2014

Uh oh!

MartinNowak commented Dec 12, 2014

Uh oh!

MartinNowak commented Dec 12, 2014

Uh oh!

JakobOvrum commented Dec 12, 2014

Uh oh!

MartinNowak commented Dec 13, 2014

Uh oh!

MartinNowak commented Dec 13, 2014

Uh oh!

MartinNowak commented Dec 13, 2014

Uh oh!

Choose a reason for hiding this comment