Skip to content

Conversation

monarchdodra
Copy link
Collaborator

Because it's been broken for months, with no improvements in sight. Having a correct emplace is (IMO) critical. I'm opening this again (3rd time)

I wrote this fix, which is

  • small and relatively simple (as simple as emplace gets)
  • stand alone (no dependency on other pulls)
  • efficient
  • complete (AFAIK)

Fixes:

  • correct emplace(void) implementation for static arrays.
  • unittest emplace(args) for static arrays.
  • emplace never calls opAssign.
  • emplace choses postblit over constructor (when both are possible).
  • Correctly postblits and/or constructs nested members...
  • ...but refuses to call disabled postblits.
  • Correctly handles emplacement from alias this.
  • Correctly refuses to modify const data
  • Supports bug 8847.
  • Correctly refuses to posblit from an immutable, when implicit cast is not possible.

...phew ! And unittests for all of that, of course. Did I miss anything?

Also, can work in safe and nothrow code! And sometimes CTFE to boot (yay).

Has a deprecated branch for calling opCall: arguably, opCall is not construction, so it should not be supported. However, the old implementation allowed opCall, so I added a deprecated branch.

Finally: ALWAYS diagnoses illegal args at top level with a verbose explanation for special cases (EG, no internal errors).

In regards to context pointer, it is copied when available. Otherwise, emplace will initialize it to null. This is kind of ugly, but it consistent behavior with dealing with voldemort types (such as when they are constructed inside any template).


PLEASE PLEASE PLEASE take the time to review this.

This isn't some small bug fix, or some petty performance improvement. This needs to make it into phobos.

There may be ways to optimize certain branches (thinking static array default emplacement), but correctness trumps efficiency at this point...

If you have any doubts, please voice them, as much as you can. I can justify the behavior (or lack thereof) of everything in this pull.

@alexrp
Copy link
Contributor

alexrp commented Jan 21, 2013

Waiting for the auto tester, then I'll review.

else static if (is(typeof(T(args))))
{
// Struct without constructor that has one matching field for
// each argument
*chunk = T(args);
// each argument. Individually emplace each attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: field

@quickfur
Copy link
Member

Did dmd git HEAD break again? Why is the autotester failing in dmd test cases when this is only a Phobos pull request?

@jmdavis
Copy link
Member

jmdavis commented Jan 25, 2013

Did dmd git HEAD break again? Why is the autotester failing in dmd test cases when this is only a Phobos pull request?

Because the dmd test suite uses Phobos.

@MartinNowak
Copy link
Member

The reason I opened my pull request was that I use std.conv.emplace as reference implementation for initializing raw memory. Finding obvious bugs in this function is bad, so I totally agree with you that this is important.
It bothers me that the implementation of such a fundamental idiom has become so complex.
Maybe we should make this a druntime function object.initialize as complement to destroy?
We should get @andralex and @WalterBright on board for correct semantics.

memcpy(chunk, p, T.sizeof);
else
memset(chunk, 0, T.sizeof);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the purpose of bitblit+optional postblit is performance. Here you prefer the compiler generated assign-bitblit over memcpy.
Can the compiler actually do better than calling memcpy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the difference is that memcpy is a 100% run-time call: memcpy does not know the size of the thing to be copied, nor the source/destination addresses. This makes it inherently slower. It has to do a lot of run-time checking, and must implement a copy loop.

By comparison a (non-elaborate) an opAssign is a straight up assembly memcopy: the compiler knows at all (most) the parameters at compile time: the source, and the size. This means the call is just replaced by an assembly copy.

Also: memcpy is neither safe nor CTFE-able.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I had the same intuition.
Even though you have separate functions for the different semantics it wasn't immediately obvious to me that this two static if branches are supposed to do the same. Maybe a comment or not separating the functions would be more expressive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy is handled as an intrinsic in many C++ compiler so it does exploit size and alignment when known statically. Not sure whether dmd does anything in particular with it. cc @WalterBright

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, regardless of memcpy's performance, it is neither safe nor CTFE. It may also have to pay for a pre-typeid run-time call (That's run-time... right?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currentlymemcpy is not handled as intrinsic by dmd. Also it does not seem to take any advantage for D array copies with know size.

@monarchdodra
Copy link
Collaborator Author

It bothers me that the implementation of such a fundamental idiom has become so complex.

One of the reasons it is so complex, is that emplace (for structs) merges both the notions of emplace from another instance (postblit) and actual construction from args. This means it has to try to Analise which one you are trying to call, which can be difficult when you take into account that D does have constructors that take the same type:

struct S
{
    this(this){writeln("postblit");}
    this(S)   {writeln("CC");}
    this(int) {writeln("this(int)");}
}

void main()
{
    S a;        //Nothing
    S b = a;    //postblit
    S c = S(a); //CC
    S d = S(5); //Constructor
}

In particular, there are some strange corner cases, where if the target type and destination types match, but postblit is disabled, then emplace has to try and guess if maybe you want to try to fall back to CC'ing. It makes a mess of things.

Things would have been much simpler if we had an extra explicit functions: First would be the normal emplace (basically emplace with 1 arg, which matching types) that basically just postblits. Then, we'd have one that calls actual constructor (eg emplace from arguments).

The caller should always know which of the two he wants anyways. It would have made things simpler and safer. EG:

S a = void;
S b;
emplace(&a, b); //Explicit request for postblit initialization
emplaceArgs(&a, 1); //Explicit request to *construct* a from the *arguments* "1".

Oh well, that's how it is now... Still, I think it is worth thinking about such a change.

@andralex
Copy link
Member

I'm not particularly worried, though indeed simpler would be nicer. This is highly generic and highly leveraged code, the kind is usually inside the compiler. We can hoist it into the language proper because of introspection, and from that perspective it looks as expected.

@MartinNowak
Copy link
Member

emplace (for structs) merges both the notions of emplace from another instance (postblit) and actual construction from args

Thanks for the detailed explanation.
So basically emplace supports all initialization schemes for structs and should have exactly the same semantics (nice test case). Therefor it also supported static opCall.
This really makes sense and is actually very simple.

@monarchdodra
Copy link
Collaborator Author

So basically emplace supports all initialization schemes for structs and should have exactly the same semantics.

That's the goal yes. The idea is:

  • Do the same as S a = "args";
  • IF that doesn't work, do the same as S a = S(args);

Therefor it also supported static opCall.

Well... technically, static opCall is not a construction scheme, so emplace is not supposed to call it. The reason for this is that opCall is just a function like any other that returns a value, and that value can then be assigned/postBlittted onto your current instance.

If you want to emplace from an opCall, then you should just call emplace(&a, S(args));

That's my stance anyways. It's up to debate.

@DmitryOlshansky
Copy link
Member

Given preparations for beta, it would great if we could squeeze this into the next release. Thoughts?

@monarchdodra
Copy link
Collaborator Author

Rebased.

Added unittest for, new but already fixed by this, bug 9559:

http://d.puremagic.com/issues/show_bug.cgi?id=9559

This was referenced Mar 19, 2013
unittest
{
////Works, but breaks in "-w -O" because of @@@9332@@@.
////Uncomment test when 9332 is fixed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the case now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TY. Uncommenting.

@monarchdodra
Copy link
Collaborator Author

Fixes:

Issue 9824 - Emplace is broken

http://d.puremagic.com/issues/show_bug.cgi?id=9824

@IgorStepanov
Copy link
Contributor

What the status of this pull?
Did it allows to emplace variable at compile time?

    static struct Json {
       int a;
        void opAssign(Json) {}
        size_t length() const { return aa.length; }
    }
    struct Pair(A,B)
    {
       A first;
       B second;
       this(A f, B s)
       {
          emplace(&first, f);
          emplace(&second, s);
       }
    }
    static p = Pair!(int, const(Json))(4, Json(99)); //CTFE; simple assignment does not allowed, because Json.opAssign is not const.

@monarchdodra
Copy link
Collaborator Author

@IgorStepanov

Thank you for your code participation. Allow me to answer:

What the status of this pull?

Waiting on a good Samaritan to review it. I am also crawling the boards looking for usecases for emplace to test it.

Did it allows to emplace variable at compile time?

Very partially, and experimentally. Partially, because it requires the types to have a non-elaborate assign. Experimentally, because it has a tendency to break the compiler (pointer stuff).

static p = Pair!(int, const(Json))(4, Json(99)); //CTFE; simple assignment does not allowed, because Json.opAssign is not const.

There is a dual problem in that code.

The first problem is the const if the input type T (in this case, a Json), then emplace simply can't do the emplace. I suggest changing the code to:

    struct Pair(A,B)
    {
       A first;
       B second;
       this(A f, B s)
       {
          emplace(cast(Unqual!A*)&first, f);
          emplace(cast(Unqual!B*)&second, s);
       }
    }

This works, although there may be some "unexpected consequences" to the cast I have not thought of? AFAIK, it should be safe.

The second problem is that this can't be done at compile time, because Json has an elaborate opAssign, ergo Pair has an elaborate opAssign. If you comment it out, it should work, but you'll actually just get an internal compiler error. As I said: Experimental.

@monarchdodra
Copy link
Collaborator Author

I rebased, did some more tweaking/simplifying/documenting. Added some more early diagnosticating for qualified objects.

I also hit a compiler bug with CTFE:
http://d.puremagic.com/issues/show_bug.cgi?id=9982

@IgorStepanov
Copy link
Contributor

@donc please see this bug: http://d.puremagic.com/issues/show_bug.cgi?id=9982
Error, when you geting address of struct member and dereference it. Is it hard to fix it?

@monarchdodra
Copy link
Collaborator Author

Apart from the compiler bug (which is not blocking, and only sometimes triggers with CTFE), AFAIK, everything works.

Could I get a second review on this? I reworked the code a little, and now the "flow" inside emplace is (IMO) simple, clear and straight forward. There are (I'd say) enough unittests to validate correct behavior.

Could we try to get this through? It's important.

@IgorStepanov
Copy link
Contributor

Do emplace strongly depends on phobos? Maybe this function correctly place to the druntime?
Will it require a lot of effort?

@monarchdodra
Copy link
Collaborator Author

Do emplace strongly depends on phobos? Maybe this function correctly place to the druntime?

Not much, it only uses a few minor traits: hasElaborateAssign, isAssignable, and isStaticArray. The only problem is that it does need access to the passed parameter types, and, AFAIK, there are no templates inside druntime. Furthermore, emplace currently uses things like typeid.postblit: This makes a runtime call that is actually not necessary with correct compile time introspection.

I think it would be better to leave it in phobos for now.

If you ask me though, the best place to put such a functionality though would be straight into the compiler, via placement new. Compiler knows best; emplace merely reverse engineers what the compiler does for its construction sequence...

@MartinNowak
Copy link
Member

I too think it belongs into druntime because it's kind of the complement to destroy and manual memory management is an intrinsic language property.

Not much, it only uses a few minor traits: hasElaborateAssign, isAssignable, and isStaticArray. The only problem is that it does need access to the passed parameter types, and, AFAIK, there are no templates inside druntime.

The template is not the problem, not having access to std.traits is.

@monarchdodra
Copy link
Collaborator Author

New reg:

http://d.puremagic.com/issues/show_bug.cgi?id=10690

I think we should try to review this. Even if the plan is to move it to druntime, or change it to typeid(construct), there is currently a lot of code that relies on emplace, and it should be fixed.

@pinver
Copy link

pinver commented Aug 25, 2013

I would like to push for a fix of emplace: it's like disseminating mines in the phobos fields, and it's a pain for learners like me to hit some of them (see http://forum.dlang.org/thread/nxbdgtdlmwscocbiypjs@forum.dlang.org)

@monarchdodra
Copy link
Collaborator Author

I would like to push for a fix of emplace: it's like disseminating mines in the phobos fields, and it's a pain for learners like me to hit some of them (see http://forum.dlang.org/thread/nxbdgtdlmwscocbiypjs@forum.dlang.org)

Thank you for your comment. I just checked, and your code does work perfectly fine with this fix. I added your code to the test suite.

I think I've had just about enough of this broken emplace. Bugs are reported for it all the time, but it doesn't get fixed.

Returns: A pointer to the newly constructed object (which is the same
as $(D chunk)).
*/
T* emplace(T)(T* chunk)
if (!is(T == class))
T* emplace(T)(T* chunk) @safe nothrow pure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function that dereferences a pointer isn't @safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is safe as long as it dereferences it at offset 0. It's the arithmetic that is unsafe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure anyways in these cases, as emplacing over something already constructed can bypass the destructor, leading to a state that may corrupt memory integrity, and/or leak.

That said, extracting a pointer is usually an unsafe operation to begin with...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @klickverbot it's pointer arithmetics and reinterpreting memory which are unsafe.
Using a reference would be cleaner solution though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, while emplace might be "memory safe", it is still a very dangerous to use function. By making emplace take a pointer as an argument, it means emplace cannot be used in a pure @safe scope, unless some @trusted function first provided the pointer. I think this is a good thing.

@MartinNowak
Copy link
Member

I will review it a second time when I find some time.
Meanwhile I wonder whether the ()@trusted{}() trick made it into the official idioms list.

dnadlinger added a commit that referenced this pull request Aug 28, 2013
@dnadlinger dnadlinger merged commit 40c6760 into dlang:master Aug 28, 2013
@dnadlinger
Copy link
Contributor

Okay, I went out on a limb and merged this.

While it is potentially a high-impact change (since emplace is so widely used), I just reviewed it for a second (third? forth?) time and couldn't find any serious issues.

There is some cleanup/further fixes left to do (see e.g. the comments – @monarchdodra, are there issues for those?), but this fixes a host of difficult to track down bugs, and we absolutely need to ship a fix for those soon.

@monarchdodra
Copy link
Collaborator Author

Thankyou @klickverbot. A bold move, but I think it was the right move. As you saw, this fixes a couple of bugs. I'm now doing some follow up, and fixing things that depended on emplace being correct.

I just opened 2 new pulls:

  1. Fixup unittest following emplace fix #1528
  2. Fix appender form elaborate assign types #1529
  3. Is mostly trivial, and merrely writes unittests.
  4. Is more complicated, but it fixes the remaining issues in the bug tracker that weren't immediately fixed.

else
{
static immutable T i;
()@trusted{memcpy(chunk, &i, T.sizeof);}();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed compiler optimization, so how about (cast(ubyte*)chunk)[0..T.sizeof] = (cast(ubyte*)&i)[0..T.sizeof]? It will be rewritten as memcpy but the compiler might directly copy small arrays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better yet ?

enum N = T.sizeof;
(*cast(ubyte[N]*)chunk) = (*cast(ubyte[N]*)&i);

This statically calls static array copy. I'm no assembler expert, but I'd be curious to compare the generated asm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also thought of this after posting. It looks slightly better (and saves a few keystrokes to type) but in both cases the compiler has the same knowledge, it's copying array with constant boundaries.
Also note that dmd doesn't use this, it will simply call memcpy.
It should be fairly simple to add an optimization because the backend already has an IR elem OPmemcpy and also directly uses rep movsq for struct blitting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is your recommendation? Leave it as memcpy, or, cast to ubyte[N]?

If you think we should be casting, then it might be worth writing a helper void binaryBlit(T)(T* chunk, ref S s) function. It might be worth writing it either way, as the emplace implementations use this a lot, it would consolidate it to whatever we choose to do.

@denis-sh
Copy link
Contributor

Sorry, but I have to write it again. It is obvious for me for ages that emplace is broken by design. And I can't even imagine any arguments against my opinion. If someone still didn't think about it, just try to formulate what exactly does emplace do? This is magic stuff just like destroy.

@denis-sh
Copy link
Contributor

Also I'm sure this opinion is already proven by (tons of?) fundamental error in e.g. Phobos ranges algorithms because of emplace design. So imagine what I feel when I show a someone broken-design-function, potential errors with it, (lots of?) real error he did because of this and get as a response: "dude, you are incorrect. I'm correct because I'm correct".

So I sincerely ask to think about it those who care about D.

@monarchdodra
Copy link
Collaborator Author

Could you elaborate how it is broken "by design" ? There is, to my knowlege, no more broken cases with emplace (bar using static arrays, which I am currently fixing).

If you do now of a broken use case, please share it.

@denis-sh
Copy link
Contributor

Could you elaborate how it is broken "by design" ?

as I wrote:

just try to formulate what exactly does emplace do?

@monarchdodra
Copy link
Collaborator Author

just try to formulate what exactly does emplace do?

Builds a T at memory address chunk from the arguments arg?

@denis-sh
Copy link
Contributor

How do you define "builds"?
I'm asking it as when I was fixing emplace this was the question I was trying to answer for hours (no jokes), reading examples and source code. IMO, this is also the same problem as with action definition for destroy.

@JakobOvrum
Copy link
Contributor

emplace should construct a T at the given address. Surely that's a sufficient definition? I think "construct" can be defined trivially and uncontroversially for most types. One exception I can think of are associative array types, those might have some room for interpretation.

As for destroy, I think the documentation is plentiful. It's not being very specific because it doesn't need to; the point is that trying to reuse the destroyed value is a logic error. With the abstract definition, its implementation is open to change because it's an error if user code relied on implementation details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.