Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Fix move #923

Closed
wants to merge 4 commits into from
Closed

[RFC] Fix move #923

wants to merge 4 commits into from

Conversation

denis-sh
Copy link
Contributor

@denis-sh denis-sh commented Nov 3, 2012

It's better to analyze by commit and see commits comments for description.

And yes, I also dislike moving const/immutabe objects. But have no idea how to avoid it and not make move unusable.

[EDITED] 3 times
My move:

void move(T)(ref T source, ref T target)
in { assert(&source == &target || !pointsTo(source, source)); }
body
{
    // Performance optimization:
    // Do not compare addresses if we don't have side effects,
    // will not need addresses in `memcpy`, and T fits in register.
    static if (hasElaborateCopyConstructor!T || hasElaborateAssign!T ||
               hasElaborateDestructor!T || !isAssignable!T ||
               T.sizeof > size_t.sizeof)
        if (&source == &target) return;

    static if (hasElaborateDestructor!T)
        destruct(target, false);

    // If there is no elaborate assign and no elaborate copy constructor,
    // `target = source` has no side effects.
    static if (hasElaborateAssign!T || hasElaborateCopyConstructor!T || !isAssignable!T)
        memcpy(cast(void*) &target, cast(const void*) &source, T.sizeof);
    else
        target = source;

    static if (hasElaborateCopyConstructor!T || hasElaborateDestructor!T)
        setInitialState(source);
}

T move(T)(ref T source)
{
    // Can avoid to check aliasing here.

    static if (hasElaborateCopyConstructor!T || hasElaborateDestructor!T)
    {
        T result = void;
        memcpy(cast(void*) &result, cast(const void*) &source, T.sizeof);
        setInitialState(source); // from pull 928
        return result;
    }
    else
    {
        return source;
    }
}

I also dislike that move doesn't always set source to it's init. As it is documented now, it's not such a big problem as it was but it still looks inconsistent.

So can I just tell move to write init to source obligatory?

And don't ask me to fill all fixed move bug in bugzilla as it will take a full day at least and I will probably still miss something.

Requires setInitialState from pull #928 and destruct from #929.

* also improve precondition skipping comment
@denis-sh denis-sh mentioned this pull request Nov 3, 2012
@DmitryOlshansky
Copy link
Member

While trying to make move work under CTFE I realized move can (and in fact already does in Phobos unittest) overwrite immutable/const fileds of structs without so much as a notice. WTF.
I'm still curious how to add this bit-blaster attitude to CTFE version as it disallows reinterpret casts and memcpy...

Other then this - one more thing. Please move aliasing check &soource == &target to apply only for structs, it makes little sense to do this check for primitives. (I'd argue that some small structs could be fine wth plain assign and no aliasing check)
Judging by asm output DMD's optimizer keep this check in for e.g. integers ... just madness.Needless yo say it leads to some performance problems I've found in
#787

@monarchdodra
Copy link
Collaborator

Nitpick:

  1. I'm not entirelly sure we need such a huge example of the dangers of move. A simple warning should do, imo.
  2. avoid using pronouns in documentation : ˋs/you/the caller/ ˋ

@denis-sh
Copy link
Contributor Author

denis-sh commented Nov 3, 2012

@monarchdodra's documentation change and @blackwhale's optimization are incorporated.

@denis-sh
Copy link
Contributor Author

denis-sh commented Nov 3, 2012

to @monarchdodra:

I'm not entirelly sure we need such a huge example of the dangers of move. A simple warning should do, imo.

Maybe. Will wait others to show their opinions here.

@monarchdodra
Copy link
Collaborator

Your new implementation of ˋT move(T)(ref T source)ˋ is awesome. It puts my consolidation change to shame...

@monarchdodra
Copy link
Collaborator

Hum... your approach to the context pointer issue is indeed a 180° approach from mine. I'm not entirelly sure this is what the standard had in mind about "an init value must be destroyable".

In particular, I can't imagine any sane implementation of a destructor that starts with ˋif (this == init)ˋ

I think this is especially relevent in the sense that the context pointer's value isn't really part of the object's state.

Will wait for others to show their opinions.


I think my context pointer fix is a valid one, so I'll keep my pull open. I may cherry pick some stuff from here though... hope you don't mind.

@denis-sh
Copy link
Contributor Author

denis-sh commented Nov 4, 2012

Fixed my @blackwhale's optimization implementation.

@DmitryOlshansky
Copy link
Member

I like the way optimization looks, very simple and generic. But I'm still a bit uneasy about how compiler copies small structs (IRC it should just do a couple of moves but maybe..), does it work for odd-length structs e.g. 3-byte ones?

Of course it would be extreamly useful if it was ever specified how compiler copies structs depending on size, something like:
a) < X register widths - unrolled sequence of moves
b) > X but < Y register widths - rep movsq/movsw/movsb (copy loop)
c) everything else goes to memcpy (function call)

Looking at this kind of list I'd say that a should skip aliasing check but everything from b and c can afford it do it.
Problem is that X & Y are CPU specific most likely...

By far the most interesting case is a struct that contains an array slice (2 words) and it would be great to make it be just 2 moves. Not sure if we can count on it being legal.

Maybe we can get some help from compiler gurus..
@donc can you shed some light on this, please?

@donc
Copy link
Collaborator

donc commented Nov 5, 2012

The relevant code is cdstreq() in backend/cod2.c. ('streq' = 'struct equals' = struct assignment). It seems to basically do a rep movs as long as it is more than two registers long. I don't think it ever calls memcpy. Which makes sense since DMC's memcpy is not very fast, it just does rep movsd. If it is less than two registers I have a nasty feeling it just does a single movsd or movsw. Hope I'm wrong, that would be pretty awful.

@DmitryOlshansky
Copy link
Member

Thanks for the input. It is as I feared it's pretty much tied to backend. Peeked at LDC backend, appears to be transformed into memcpy or analoguos intrinsic.

One last thing I'd love to know is if there is any danger at all of doing memcpy(&this, &this, this.sizeof); - self copying (aside from potentialy wasting cycles) w.r.t. to potential SIMD optimizations and what not. Keeping in mind that memcpy is typically an instrinsic.

Looking at the code I see that this is a primary branch for small structs:

 if (numbytes <= REGSIZE * (6 + (REGSIZE == 4)))
  {     while (numbytes >= REGSIZE)
        {
            c3 = gen1(c3,0xA5);         /* MOVSW                        */
            code_orrex(c3, rex);
            numbytes -= REGSIZE;
        }
        while (numbytes--)
            c3 = gen1(c3,0xA4);         /* MOVSB                        */
  }

From what I understand it should do an unrolled sequence of:
x times movsw
y times movsb
Except actually it does movsd not movsw. The code appears to be quite old.

And that might be okay but... if I'm not mistaken it does:
a) setups esi/edi
b) kills out order execution by using interdependent movs(b/w/d/q)

A lot of work to transport 1-2 words.

Hope I'm wrong, that would be pretty awful.

I'm afraid it is the case :(
For this 6 byte struct it does 1 movsd + 2 movsb:

align(1)
struct C{
    align(1) short abc;
    align(1) short efg;
    align(1) short hij;
}
static assert(C.sizeof == 6);

void main(){
    C c = C(1, 2, 3);
    C x;
    x = c;
    assert(x.abc == 1);
    assert(x.efg == 2);
    assert(x.hij == 3);
}

Generated asm:

enter   10h, 0
push    ebx
push    esi
push    edi
mov     ax, 1
mov     [ebp+var_10], ax
mov     cx, 2
mov     [ebp+var_E], cx
mov     dx, 3
mov     [ebp+var_C], dx
lea     ebx, [ebp+var_8]
xor     eax, eax
mov     [ebx], eax
mov     [ebx+4], ax
lea     esi, [ebp+var_10]
lea     edi, [ebp+var_8]
movsd
movsb
movsb
... //series of cmp/call assert

@denis-sh
Copy link
Contributor Author

denis-sh commented Nov 7, 2012

To @blackwhale :

One last thing I'd love to know is if there is any danger at all of doing memcpy(&this, &this, this.sizeof); - self copying (aside from potentialy wasting cycles) w.r.t. to potential SIMD optimizations and what not. Keeping in mind that memcpy is typically an instrinsic.

Why are you taking memcpy which behavior is undefined in the case of overlapping source and destination and trying to analyze if a concrete realization will work for this exact overlapping case instead of using memmove which allow overlapping?

@DmitryOlshansky
Copy link
Member

To @denis-sh I'm not even sure now;) The main argument was being able to make it as fast as possible. I believe you see that checking aliasing for small things (which memmove most definetly does) is not good enough.

My line of reasoning was going like this:

  1. Simple built-in structs are copied bitwise on assignment, there is no mention how this magic works but obviously self assignment must work. (peeking at compilers reveals all kind of _builtin_memcpy though).
  2. Now we need to do the same to 'move' small things with elaborate destructor/postblit that we'd rather avoid.
  3. And all of a sudden we can't get this bitwise assignment magic that we want to imitate because it was never specified in detail.
  4. And looking at C's spec we can't use memcpy directly without aliasing check... even exact overlap

So for now the case of smallish PODs is optimized and it's fine. Elaborate small structs are not for the reasons above.

My lasy problem is with making move CTFE-able. With CTFE-ability requirement I don't see how to implement the bitwise move in the library. If we disallow moving structs with immutable fields it can be implemented as copy (given it's during compile-time I couldn't care less). I'd love to disallow it but we need a consensus esp as it breaks some obscure unittest in std.container.

Anyway I think I got carried away way a bit. The focus should be on the pull request itself.
And I'd say it's good to go as is.

@denis-sh
Copy link
Contributor Author

denis-sh commented Nov 8, 2012

To @blackwhale:

I believe you see that checking aliasing for small things (which memmove most definetly does) is not good enough.

I don't know much about C's memmove implementations but I don't see why it should check aliasing as it can just compare pointers and choose a direction.

IMHO, it's very bad and is a source of bugs currently and will lead to more bugs in the future that we use C library (especially buggy Digital Mars one) without any reason.

A already proposed to have our own implementation (my variant is called copyOverlapped, it is rather fast but not optimized for small structs), but, as always, nobody needed:
http://forum.dlang.org/thread/jiqkgp$29os$1@digitalmars.com

@DmitryOlshansky
Copy link
Member

I don't know much about C's memmove implementations but I don't see why it should check aliasing as it can just compare pointers and choose a direction.

That's it. Comparing pointers if we copy say 4 bytes. Yeah, call me crazy but as we see generic code ends up doing this stuff and even worse. In fact, the call itself is too much but we can only hope on inliner.

IMHO, it's very bad and is a source of bugs currently and will lead to more bugs in the future that we use C library (especially buggy Digital Mars one) without any reason.

I tried to rise the topic of over relience on C's mem* well probably folks are alright with it but not going to step in:
http://www.digitalmars.com/d/archives/digitalmars/D/rawCopy_rawTransfer_rawFind_179379.html

The signature it uses allows it to optimize. The intended use is to cover both arrays of and just single structs.
And knowing the exact type it can optimize.

Also truth be told glibc has real fast memcpy with SIMD under the hood. On windows we have no such luck...

A already proposed to have our own implementation (my variant is called copyOverlapped, it is rather fast but not optimized for small structs), but, as always, nobody needed:
http://forum.dlang.org/thread/jiqkgp$29os$1@digitalmars.com

I recall that quite some time ago.
It's fine but to be truly awesome it needs to have versions involving SIMD with at least SSE2. I think array ops in D actually started using them. Need to disassemble again or dig up druntime to check.

Another important moment about it is that not knowning the type it can't optimize for small structs. In other words it's a great as fallback function to use when all special cases are handled. In my propsal of rawTransfer if there is no better way it should forward to something like this.

@DmitryOlshansky
Copy link
Member

I think I've tackled the CTFE problem, take a look : #936
It just copies for now but there is little else I can do at CTFE.

@denis-sh denis-sh mentioned this pull request Nov 12, 2012
@DmitryOlshansky
Copy link
Member

Somehow it started failing the autotester.

@denis-sh
Copy link
Contributor Author

To @blackwhale:

Somehow it started failing the autotester.

I added dependencies on my other pulls. See Requires at the end of pull description.

@denis-sh
Copy link
Contributor Author

By the way, my move(source, target) had incorrect condition for memcpy call. Fixed and added comment.

@DmitryOlshansky
Copy link
Member

There is a way to list them with some keyword so that auto-tester pulls them too. Can't remember the exact magic though...

@ghost ghost mentioned this pull request Nov 16, 2012
@andralex
Copy link
Member

LGTM once it passes the tester

@denis-sh
Copy link
Contributor Author

It can't pass the tester as it requires functions from unmerged pulls.


Specifically:
$(UL
$(LI Do nothing if $(D &source is &target) (for the first overload only).)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

»do« -> »does«, etc. for conformity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Issues fixed:
* `move` is over-complicated
* `move` doesn't work with qualified structs
* `move` doesn't work with const/immutable non-struct types
* for a nested struct, `move` tries to call destructor on possibly invalid struct filled by `init` except with a context pointer of `target` (such mix is unexpected and invalid e.g. if default constructor is disabled)
* `move` fails do the correct mix of `init` and context pointer of `target` in previous point in some cases
* lots of other issues I'm to tired to search in list here
@DmitryOlshansky
Copy link
Member

@denis-sh
Still inclined to go for it?
How about rebasing and putting in in all dependant primitives as private helpers?

Another matter is making it CTFE-able with yours rawCopy
http://denis-sh.github.com/phobos-additions/unstd.array.html#rawCopy
but probably we'd better start with this fix alone.

@DmitryOlshansky
Copy link
Member

Failed to penetrate. More over issues mentioned in commit comments seem to be addressed in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants