New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: Invalid ordering when assigning ref-return #19243

Closed
jakobbotsch opened this Issue Aug 2, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@jakobbotsch
Collaborator

jakobbotsch commented Aug 2, 2018

The following example gives different results in debug and release:

// Debug: Outputs 8614979244451975600
// Release: Outputs 0
struct S0
{
    public long F0;
    public sbyte F4;
    public S0(long f0): this() { F0 = f0; }
}

class C0
{
    public S0 F5;
    public C0(S0 f5) { F5 = f5; }
}

public class Program
{
    static C0 s_13 = new C0(new S0(0));
    static S0 s_37;
    public static void Main()
    {
        M7() = s_13.F5;
        System.Console.WriteLine(s_37.F0);
    }

    static ref S0 M7()
    {
        s_13 = new C0(new S0(8614979244451975600L));
        return ref s_37;
    }
}

The disassembly shows that the s_13 reference is loaded before the call to M7:

       48B9C8456396FA7F0000 mov      rcx, 0x7FFA966345C8
       BA03000000           mov      edx, 3
       E854CD815F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48B878291C2943020000 mov      rax, 0x243291C2978
       488B00               mov      rax, gword ptr [rax]           ; [s_13]
       488D7008             lea      rsi, bword ptr [rax+8]
       E8DEFBFFFF           call     Program:M7():byref

As far as I can tell, everything looks fine before rationalize:

***** BB01, stmt 1
     ( 47, 43) [000015] ------------              *  STMT      void  (IL   ???...  ???)
N019 ( 29, 35) [000012] -ACXG--N----              |  /--*  IND       struct <l:$383, c:$382>
N017 (  1,  1) [000034] ------------              |  |  |  /--*  CNS_INT   long   8 field offset Fseq[F5] $181
N018 ( 26, 33) [000035] -ACXG-------              |  |  \--*  ADD       byref  <l:$343, c:$342>
N015 (  5, 12) [000004] x---G-------              |  |     |  /--*  IND       ref    <l:$2c0, c:$300>
N014 (  3, 10) [000040] ------------              |  |     |  |  \--*  CNS_INT(h) long   0x2d423b92978 static Fseq[s_13] $280
N016 ( 24, 31) [000011] -ACXG-------              |  |     \--*  COMMA     ref    <l:$202, c:$201>
N012 (  1,  1) [000050] ------------              |  |        |  /--*  LCL_VAR   long   V01 cse0          $241
N013 ( 19, 19) [000051] -ACXG-------              |  |        \--*  COMMA     long   $241
N009 ( 18, 18) [000010] H-CXG-------              |  |           |  /--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE $241
N005 (  3, 10) [000006] ------------ arg0 in rcx  |  |           |  |  +--*  CNS_INT   long   0x7ffa71e245c8 $180
N006 (  1,  1) [000007] ------------ arg1 in rdx  |  |           |  |  \--*  CNS_INT   int    3 $41
N011 ( 18, 18) [000049] -ACXG---R---              |  |           \--*  ASG       long   $VN.Void
N010 (  1,  1) [000048] D------N----              |  |              \--*  LCL_VAR   long   V01 cse0          $241
N022 ( 47, 43) [000014] -ACXG---R---              \--*  ASG       struct (copy) $VN.Void
N021 ( 17,  7) [000013] --CXG-------                 \--*  BLK(16)   struct
N020 ( 14,  5) [000001] --CXG-------                    \--*  CALL      byref  Program.M7 $400

But rationalize seems to end up with this in the wrong order:

Rewriting GT_ASG(BLK(X), Y) to STORE_BLK(X,Y):
N014 (  3, 10) [000040] ------------        t40 =    CNS_INT(h) long   0x2d423b92978 static Fseq[s_13] $280
                                                 /--*  t40    long   
N015 (  5, 12) [000004] x---G-------         t4 = *  IND       ref    <l:$2c0, c:$300>
N017 (  1,  1) [000034] ------------        t34 =    CNS_INT   long   8 field offset Fseq[F5] $181
                                                 /--*  t4     ref    
                                                 +--*  t34    long   
N018 ( 26, 33) [000035] ---XG-------        t35 = *  ADD       byref  <l:$343, c:$342>
                                                 /--*  t35    byref  
N019 ( 29, 35) [000012] ---XG--N----        t12 = *  IND       struct <l:$383, c:$382>
N020 ( 14,  5) [000001] --CXG-------         t1 =    CALL      byref  Program.M7 $400
                                                 /--*  t1     byref  
                                                 +--*  t12    struct 
N021 ( 17,  7) [000013] -ACXG-------              *  STORE_BLK(16) struct (copy)
@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Aug 2, 2018

Collaborator

I have just now noticed/learned about the GTF_REVERSE_OPS flag which is set on the GT_ASG node, so it does seem like rationalize at least works correctly. However that flag already appears during import, and also in minopts. The difference is that in minopts there is a temp for the result of the call, and the call happens in a previous statement, so the ordering is correct that way.

EDIT: That flag is set in gtBlockOpInit. I'm not sure what the appropriate fix would be -- should import always introduce a temp for this case?

Collaborator

jakobbotsch commented Aug 2, 2018

I have just now noticed/learned about the GTF_REVERSE_OPS flag which is set on the GT_ASG node, so it does seem like rationalize at least works correctly. However that flag already appears during import, and also in minopts. The difference is that in minopts there is a temp for the result of the call, and the call happens in a previous statement, so the ordering is correct that way.

EDIT: That flag is set in gtBlockOpInit. I'm not sure what the appropriate fix would be -- should import always introduce a temp for this case?

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Aug 3, 2018

Contributor

EDIT: That flag is set in gtBlockOpInit. I'm not sure what the appropriate fix would be -- should import always introduce a temp for this case?

I suppose @CarolEidt might know. Maybe "the use must occur before the def" due to overlapping concerns? Structs are unfortunately rather complicated to deal with.

Contributor

mikedn commented Aug 3, 2018

EDIT: That flag is set in gtBlockOpInit. I'm not sure what the appropriate fix would be -- should import always introduce a temp for this case?

I suppose @CarolEidt might know. Maybe "the use must occur before the def" due to overlapping concerns? Structs are unfortunately rather complicated to deal with.

@CarolEidt CarolEidt self-assigned this Aug 6, 2018

@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Aug 6, 2018

Member

I was a bit stumped by this. The block ops (cpblk, initblk, cpobj) have their operands on the evaluation stack in the same order as the stind - i.e. dest addr is pushed first, then the value/src addr. So the order of side effects must be preserved in the same way. I searched the code history to see if I could track down the origin of the "the use must occur before the def" comment. (Clearly one needs to load the source value before storing to the dest, but that's the job of the internal ordering of the codegen for the store.)

I found the change that introduced this comment (a subsequent change moved it, and the associated setting of GTF_REVERSE_OPS to where it is now). In the original change (about 5 years ago), the following comment was added to Compiler::fgSetTreeSeqHelper():

    // This is a special kind of GT_LIST that only occurs under initBlk and copyBlk.
    // It is used as a pair, where op1 is the dst and op2 is the src (value or location)
    // The use must appear before the def because of the case where a local is cpblk'ed to itself.
    // If it were otherwise, upstream stores to the local would appear to be dead.

I believe this is no longer relevant, because we no longer represent block ops this way, and although we still have this messiness to deal with in ASG nodes (which hopefully someday we can eliminate altogether, not just in the backend), that is handled without artificially changing the execution order on the assignment operands.

There's probably now other code that relies on this in some way, but I'm going to remove this setting of GTF_REVERSE_OPS, and see what else needs cleaning up.

Member

CarolEidt commented Aug 6, 2018

I was a bit stumped by this. The block ops (cpblk, initblk, cpobj) have their operands on the evaluation stack in the same order as the stind - i.e. dest addr is pushed first, then the value/src addr. So the order of side effects must be preserved in the same way. I searched the code history to see if I could track down the origin of the "the use must occur before the def" comment. (Clearly one needs to load the source value before storing to the dest, but that's the job of the internal ordering of the codegen for the store.)

I found the change that introduced this comment (a subsequent change moved it, and the associated setting of GTF_REVERSE_OPS to where it is now). In the original change (about 5 years ago), the following comment was added to Compiler::fgSetTreeSeqHelper():

    // This is a special kind of GT_LIST that only occurs under initBlk and copyBlk.
    // It is used as a pair, where op1 is the dst and op2 is the src (value or location)
    // The use must appear before the def because of the case where a local is cpblk'ed to itself.
    // If it were otherwise, upstream stores to the local would appear to be dead.

I believe this is no longer relevant, because we no longer represent block ops this way, and although we still have this messiness to deal with in ASG nodes (which hopefully someday we can eliminate altogether, not just in the backend), that is handled without artificially changing the execution order on the assignment operands.

There's probably now other code that relies on this in some way, but I'm going to remove this setting of GTF_REVERSE_OPS, and see what else needs cleaning up.

@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Aug 7, 2018

Member

The first round of changes (just respecting the correct evaluation order), results in a small number of diffs - more negative than positive, but still a relatively small number of methods. I've as yet done only diff testing on x64, but am going to pursue a second round of changes.

When I did the first set of "first class structs" changes (changing the block ops from GT_blkop(GT_LIST(dst,src),size) to GT_ASG(blkop(dst,size),src), I wanted to preserve identical behavior (zero diffs). To do this, I had to support the case where the size was evaluated either before or after the dst and src, requiring that it be special cased as if it were a child of the assignment. I'd planned to clean this up later, but never got that done.

I'm now working on eliminating the weird special handling of size, which I expect to also have a minimal impact, but in any event should have a PR up in the next day or two.

Member

CarolEidt commented Aug 7, 2018

The first round of changes (just respecting the correct evaluation order), results in a small number of diffs - more negative than positive, but still a relatively small number of methods. I've as yet done only diff testing on x64, but am going to pursue a second round of changes.

When I did the first set of "first class structs" changes (changing the block ops from GT_blkop(GT_LIST(dst,src),size) to GT_ASG(blkop(dst,size),src), I wanted to preserve identical behavior (zero diffs). To do this, I had to support the case where the size was evaluated either before or after the dst and src, requiring that it be special cased as if it were a child of the assignment. I'd planned to clean this up later, but never got that done.

I'm now working on eliminating the weird special handling of size, which I expect to also have a minimal impact, but in any event should have a PR up in the next day or two.

@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Aug 7, 2018

Member

cc @dotnet/jit-contrib - in case anyone has thoughts on this (see the comments above).

Member

CarolEidt commented Aug 7, 2018

cc @dotnet/jit-contrib - in case anyone has thoughts on this (see the comments above).

CarolEidt added a commit to CarolEidt/coreclr that referenced this issue Aug 7, 2018

Fix evaluation order for block copy
This order was changed five years ago, as a workaround for data flow analysis issues around the block copy. Now that block copies are handled as regular assignments, this is no longer needed - and is, in fact, incorrect as it causes side effects to be handled in the wrong order.

Fix #19243

CarolEidt added a commit to CarolEidt/coreclr that referenced this issue Aug 8, 2018

Fix evaluation order for block copy
This order was changed five years ago, as a workaround for data flow analysis issues around the block copy. Now that block copies are handled as regular assignments, this is no longer needed - and is, in fact, incorrect as it causes side effects to be handled in the wrong order.

In the process, also remove the special-casing for the size operand of `DYN_BLK` under `ASG`.

Fix #19243

CarolEidt added a commit to CarolEidt/coreclr that referenced this issue Aug 9, 2018

Fix evaluation order for block copy
This order was changed five years ago, as a workaround for data flow analysis issues around the block copy. Now that block copies are handled as regular assignments, this is no longer needed - and is, in fact, incorrect as it causes side effects to be handled in the wrong order.

However, for the case where the lhs is an indirection of a local address, it must be evaluated 2nd for SSA renaming to be correct.

In the process, also remove the special-casing for the size operand of `DYN_BLK` under `ASG`.

Fix #19243

CarolEidt added a commit to CarolEidt/coreclr that referenced this issue Aug 9, 2018

Fix evaluation order for block copy
This order was changed five years ago, as a workaround for data flow analysis issues around the block copy. Now that block copies are handled as regular assignments, this is no longer needed - and is, in fact, incorrect as it causes side effects to be handled in the wrong order.

However, for the case where the lhs is an indirection of a local address, it must be evaluated 2nd for SSA renaming to be correct.

In the process, also remove the special-casing for the size operand of `DYN_BLK` under `ASG`.

Fix #19243

@CarolEidt CarolEidt closed this in #19334 Aug 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment