New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: Morph forgets about side effects when optimizing casted shift #19272

Closed
jakobbotsch opened this Issue Aug 3, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@jakobbotsch
Collaborator

jakobbotsch commented Aug 3, 2018

The following example does not print anything in release:

// Debug: Prints 1 line(s)
// Release: Prints 0 line(s)
struct S0
{
    public ulong F0;
    public sbyte F6;
    public S0(sbyte f6): this() { F6 = f6; }
}

public class Program
{
    static S0 s_1 = new S0(127);
    public static void Main()
    {
        int vr0 = (int)((ulong)M1() << 33) / s_1.F6;
    }

    static byte M1()
    {
        System.Console.WriteLine(s_1.F0);
        return 0;
    }
}

It looks like morph optimizes the left-hand side of the division to 0, but gets completely rid of the call in the process:

fgMorphTree BB01, stmt 1 (before)
               [000021] ------------              /--*  NOP       void  
               [000022] --CXG-------              *  COMMA     void  
               [000019] --CXG-------              |  /--*  FIELD     byte   F6
               [000010] ------------              |  |  |     /--*  CNS_INT   long   8 Fseq[#FirstElem]
               [000011] ------------              |  |  |  /--*  ADD       byref 
               [000009] ------------              |  |  |  |  \--*  IND       ref   
               [000008] ------------              |  |  |  |     \--*  CNS_INT(h) long   0x198583d2970 static Fseq[s_1]
               [000018] --CXG-------              |  |  \--*  COMMA     byref 
               [000017] H-CXG-------              |  |     \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000013] ------------ arg0         |  |        +--*  CNS_INT   long   0x7ffa6bac45c8
               [000014] ------------ arg1         |  |        \--*  CNS_INT   int    2
               [000020] --CXG-------              \--*  DIV       int   
               [000007] --C---------                 \--*  CAST      int <- long
               [000005] ------------                    |  /--*  CNS_INT   int    33
               [000006] --C---------                    \--*  LSH       long  
               [000004] --C------U--                       \--*  CAST      long <- ulong <- uint
               [000001] --C-G-------                          \--*  CALL      int    Program.M1
Morphing args for 17.CALL:
argSlots=2, preallocatedArgCount=4, nextSlotNum=4, outgoingArgSpaceSize=32

Sorting the arguments:
Deferred argument ('rcx'):
               [000013] -----+------              *  CNS_INT   long   0x7ffa6bac45c8
Replaced with placeholder node:
               [000030] ----------L-              *  ARGPLACE  long  
Deferred argument ('rdx'):
               [000014] -----+------              *  CNS_INT   int    2
Replaced with placeholder node:
               [000032] ----------L-              *  ARGPLACE  int   

Shuffled argument table:    rcx rdx 
fgArgTabEntry[arg 0 13.CNS_INT, 1 reg: rcx, align=1, lateArgInx=0, processed]
fgArgTabEntry[arg 1 14.CNS_INT, 1 reg: rdx, align=1, lateArgInx=1, processed]

fgMorphTree BB01, stmt 1 (after)
               [000021] -----+------              /--*  NOP       void  
               [000022] --CXG+------              *  COMMA     void  
               [000019] *-CXG+------              |  /--*  IND       byte  
               [000028] -----+------              |  |  |  /--*  CNS_INT   long   8 field offset Fseq[F6]
               [000029] --CXG+------              |  |  \--*  ADD       byref 
               [000010] -----+------              |  |     |     /--*  CNS_INT   long   8 Fseq[#FirstElem]
               [000011] -----+------              |  |     |  /--*  ADD       byref 
               [000009] x----+------              |  |     |  |  \--*  IND       ref   
               [000008] -----+------              |  |     |  |     \--*  CNS_INT(h) long   0x198583d2970 static Fseq[s_1]
               [000018] --CXG+------              |  |     \--*  COMMA     byref 
               [000017] H-CXG+------              |  |        \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000013] -----+------ arg0 in rcx  |  |           +--*  CNS_INT   long   0x7ffa6bac45c8
               [000014] -----+------ arg1 in rdx  |  |           \--*  CNS_INT   int    2
               [000020] --CXG+------              \--*  DIV       int   
               [000027] -----+------                 \--*  CNS_INT   int    0
@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Aug 3, 2018

Contributor

It looks like morph optimizes the left-hand side of the division to 0, but gets completely rid of the call in the process:

Yep, it should have extracted the side effects from the LHS of the shift.

Though it seems that morphing rarely uses gtExtractSideEffList. Maybe this optimization should be done only in VN/assertion propagation. It's not like people write such shifts all day long for this to be a common scenario worth handling in morph.

Contributor

mikedn commented Aug 3, 2018

It looks like morph optimizes the left-hand side of the division to 0, but gets completely rid of the call in the process:

Yep, it should have extracted the side effects from the LHS of the shift.

Though it seems that morphing rarely uses gtExtractSideEffList. Maybe this optimization should be done only in VN/assertion propagation. It's not like people write such shifts all day long for this to be a common scenario worth handling in morph.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Aug 6, 2018

Collaborator

Yep, it should have extracted the side effects from the LHS of the shift.
Though it seems that morphing rarely uses gtExtractSideEffList. Maybe this optimization should be done only in VN/assertion propagation.

So the 'localized' fix would be to just call gtExtractSideEffList and return a new comma node here:

coreclr/src/jit/morph.cpp

Lines 462 to 468 in ac8f8e5

else if (shiftAmountValue >= 32)
{
// Result of the shift is zero.
DEBUG_DESTROY_NODE(tree);
GenTree* zero = gtNewZeroConNode(TYP_INT);
return fgMorphTree(zero);
}

If so, I can submit a PR for this. Although if it would be better to refactor it then I wouldn't know where exactly this optimization would fit in (in VN/assertion propagation) and how it would look.

Collaborator

jakobbotsch commented Aug 6, 2018

Yep, it should have extracted the side effects from the LHS of the shift.
Though it seems that morphing rarely uses gtExtractSideEffList. Maybe this optimization should be done only in VN/assertion propagation.

So the 'localized' fix would be to just call gtExtractSideEffList and return a new comma node here:

coreclr/src/jit/morph.cpp

Lines 462 to 468 in ac8f8e5

else if (shiftAmountValue >= 32)
{
// Result of the shift is zero.
DEBUG_DESTROY_NODE(tree);
GenTree* zero = gtNewZeroConNode(TYP_INT);
return fgMorphTree(zero);
}

If so, I can submit a PR for this. Although if it would be better to refactor it then I wouldn't know where exactly this optimization would fit in (in VN/assertion propagation) and how it would look.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Aug 6, 2018

Contributor

Although if it would be better to refactor it then I wouldn't know where exactly this optimization would fit in (in VN/assertion propagation) and how it would look.

I haven't checked yet but the optimization may already exists in VN. I did try to remove this case from morph and I got 0 diffs, that means either that this optimization happens somewhere else or that there are no occurrences in real world code.

Contributor

mikedn commented Aug 6, 2018

Although if it would be better to refactor it then I wouldn't know where exactly this optimization would fit in (in VN/assertion propagation) and how it would look.

I haven't checked yet but the optimization may already exists in VN. I did try to remove this case from morph and I got 0 diffs, that means either that this optimization happens somewhere else or that there are no occurrences in real world code.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Aug 6, 2018

Collaborator

I did try to remove this case from morph and I got 0 diffs

Right, it does seem like an oddly specific optimization. FYI, removing that case fixes the issue in the example posted so it doesn't seem like the optimization exists somewhere else.

Collaborator

jakobbotsch commented Aug 6, 2018

I did try to remove this case from morph and I got 0 diffs

Right, it does seem like an oddly specific optimization. FYI, removing that case fixes the issue in the example posted so it doesn't seem like the optimization exists somewhere else.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Aug 7, 2018

Contributor

FYI, removing that case fixes the issue in the example posted so it doesn't seem like the optimization exists somewhere else.

I see, it doesn't exist indeed. The generated code becomes:

       E8DEFBFFFF           call     Program:M1():ubyte
       8BC8                 mov      ecx, eax
       48C1E121             shl      rcx, 33
       8BC1                 mov      eax, ecx
       89442424             mov      dword ptr [rsp+24H], eax
       48B95845A507FB7F0000 mov      rcx, 0x7FFB07A54558
       BA02000000           mov      edx, 2
       E806D17F5F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48B87029BCD7E2010000 mov      rax, 0x1E2D7BC2970
       488B00               mov      rax, gword ptr [rax]
       480FBE4810           movsx    rcx, byte  ptr [rax+16]
       8B442424             mov      eax, dword ptr [rsp+24H]
       99                   cdq
       F7F9                 idiv     edx:eax, ecx

The call to M1 is here as it should but the shift is there too.

Still, I lean towards removing this optimization, it doesn't seem useful to have it in morph and morph is too messy as it is. VN could likely handle this if we really want.

@AndyAyersMS Opinions about removing this shift optimization?

Contributor

mikedn commented Aug 7, 2018

FYI, removing that case fixes the issue in the example posted so it doesn't seem like the optimization exists somewhere else.

I see, it doesn't exist indeed. The generated code becomes:

       E8DEFBFFFF           call     Program:M1():ubyte
       8BC8                 mov      ecx, eax
       48C1E121             shl      rcx, 33
       8BC1                 mov      eax, ecx
       89442424             mov      dword ptr [rsp+24H], eax
       48B95845A507FB7F0000 mov      rcx, 0x7FFB07A54558
       BA02000000           mov      edx, 2
       E806D17F5F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48B87029BCD7E2010000 mov      rax, 0x1E2D7BC2970
       488B00               mov      rax, gword ptr [rax]
       480FBE4810           movsx    rcx, byte  ptr [rax+16]
       8B442424             mov      eax, dword ptr [rsp+24H]
       99                   cdq
       F7F9                 idiv     edx:eax, ecx

The call to M1 is here as it should but the shift is there too.

Still, I lean towards removing this optimization, it doesn't seem useful to have it in morph and morph is too messy as it is. VN could likely handle this if we really want.

@AndyAyersMS Opinions about removing this shift optimization?

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Aug 7, 2018

Member

You mean the >= 32 case? Seems reasonable. #15077 came from some actual user code but I don't think we need to optimize such cases.

Member

AndyAyersMS commented Aug 7, 2018

You mean the >= 32 case? Seems reasonable. #15077 came from some actual user code but I don't think we need to optimize such cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment