New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT incorrectly narrows value on ARM32/x86 in release #18780

Closed
jakobbotsch opened this Issue Jul 4, 2018 · 17 comments

Comments

Projects
None yet
4 participants
@jakobbotsch
Collaborator

jakobbotsch commented Jul 4, 2018

On ARM32 and x86 with .NET core 2.1, the following program prints 4294967295 in debug, but 255 in release:

// Generated by Fuzzlyn on 2018-07-03 07:50:20
// Seed: 10009979209080502034
// Reduced from 429.6 KiB to 0.4 KiB
// Debug: Outputs 4294967295
// Release: Outputs 255
public class Program
{
    public static void Main()
    {
        M1(0);
    }

    static void M1(byte arg2)
    {
        byte vr23 = arg2++;
        uint vr13 = uint.MaxValue * arg2;
        ulong vr40 = vr13;
        System.Console.WriteLine(vr40);
    }
}

It does not repro on AMD64 (Windows or Linux). Also, changing byte vr23 = arg2++; to arg2++; fixes it.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 4, 2018

Collaborator

Possibly related (EDIT: this is #18235 and fixed in master already):

// Generated by Fuzzlyn on 2018-07-03 22:42:26
// Seed: 9909092169244373026
// Reduced from 36.6 KiB to 0.4 KiB
// Debug: Outputs 0
// Release: Outputs 256
public class Program
{
    static byte[] s_1 = new byte[]{0};
    static int[][] s_2;
    public static void Main()
    {
        M1(0);
    }

    static void M1(ushort arg0)
    {
        s_1[0] = 255;
        arg0 = (ushort)(s_1[0]++ % -1);
        bool vr2 = true;
        if (vr2)
        {
            s_2 = s_2;
        }

        System.Console.WriteLine(arg0);
    }
}

Since I'm not sure I will wait with opening an issue.

Collaborator

jakobbotsch commented Jul 4, 2018

Possibly related (EDIT: this is #18235 and fixed in master already):

// Generated by Fuzzlyn on 2018-07-03 22:42:26
// Seed: 9909092169244373026
// Reduced from 36.6 KiB to 0.4 KiB
// Debug: Outputs 0
// Release: Outputs 256
public class Program
{
    static byte[] s_1 = new byte[]{0};
    static int[][] s_2;
    public static void Main()
    {
        M1(0);
    }

    static void M1(ushort arg0)
    {
        s_1[0] = 255;
        arg0 = (ushort)(s_1[0]++ % -1);
        bool vr2 = true;
        if (vr2)
        {
            s_2 = s_2;
        }

        System.Console.WriteLine(arg0);
    }
}

Since I'm not sure I will wait with opening an issue.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 4, 2018

Contributor

It looks to me that the current x86 build is also broken as it generates:

G_M55886_IG02:
       0FB6C9       movzx    ecx, cl
       41           inc      ecx
       0FB6C9       movzx    ecx, cl
       8BC1         mov      eax, ecx
       F7D8         neg      eax
       0FB6C0       movzx    eax, al ; huh ?!!
       6A00         push     0
       50           push     eax
       FF1588411E03 call     [Program:WriteLine(long)]

And the dump shows that morph produces a byte NEG node:

fgMorphTree BB01, stmt 3 (before)
               [000017] --C-G-------              *  CALL      void   Program.Write
               [000016] ---------U-- arg0         \--*  CAST      long <- ulong <- uint
               [000014] ------------                 |  /--*  LCL_VAR   ubyte  V00 arg0         
               [000015] ------------                 \--*  MUL       int   
               [000013] ------------                    \--*  CNS_INT   int    -1
Upping fgPtrArgCntMax from 0 to 2
fgArgTabEntry[arg 0 16.CAST, numSlots=2, slotNum=0, align=1]

fgMorphTree BB01, stmt 3 (after)
               [000017] --CXG+------              *  CALL      void   Program.Write
               [000016] -----+---U-- arg0         \--*  CAST      long <- ulong <- uint
               [000024] -----+------                 \--*  NEG       ubyte 
               [000014] -----+------                    \--*  LCL_VAR   ubyte  V00 arg0         
Contributor

mikedn commented Jul 4, 2018

It looks to me that the current x86 build is also broken as it generates:

G_M55886_IG02:
       0FB6C9       movzx    ecx, cl
       41           inc      ecx
       0FB6C9       movzx    ecx, cl
       8BC1         mov      eax, ecx
       F7D8         neg      eax
       0FB6C0       movzx    eax, al ; huh ?!!
       6A00         push     0
       50           push     eax
       FF1588411E03 call     [Program:WriteLine(long)]

And the dump shows that morph produces a byte NEG node:

fgMorphTree BB01, stmt 3 (before)
               [000017] --C-G-------              *  CALL      void   Program.Write
               [000016] ---------U-- arg0         \--*  CAST      long <- ulong <- uint
               [000014] ------------                 |  /--*  LCL_VAR   ubyte  V00 arg0         
               [000015] ------------                 \--*  MUL       int   
               [000013] ------------                    \--*  CNS_INT   int    -1
Upping fgPtrArgCntMax from 0 to 2
fgArgTabEntry[arg 0 16.CAST, numSlots=2, slotNum=0, align=1]

fgMorphTree BB01, stmt 3 (after)
               [000017] --CXG+------              *  CALL      void   Program.Write
               [000016] -----+---U-- arg0         \--*  CAST      long <- ulong <- uint
               [000024] -----+------                 \--*  NEG       ubyte 
               [000014] -----+------                    \--*  LCL_VAR   ubyte  V00 arg0         
@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 4, 2018

Collaborator

That's weird, on x86 I get this:

; Assembly listing for method Program:M1(ubyte)
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  5,  5   )   ubyte  ->  rcx
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]
;* V02 tmp1         [V02    ] (  0,  0   )     int  ->  zero-ref
;
; Lcl frame size = 0

G_M23187_IG01:

G_M23187_IG02:
       0FB6C9               movzx    rcx, cl
       FFC1                 inc      ecx
       0FB6C9               movzx    rcx, cl
       F7D9                 neg      ecx
       8BC9                 mov      ecx, ecx
       48B838229D6CFD7F0000 mov      rax, 0x7FFD6C9D2238

G_M23187_IG03:
       48FFE0               rex.jmp  rax

; Total bytes of code 25, prolog size 0 for method Program:M1(ubyte)
; ============================================================
4294967295

Any clues why?

EDIT: Oh, whoops. Let me check 32-bit...
Right, I see it now. I will edit the thread and title (I meant x86-64/AMD64)

Collaborator

jakobbotsch commented Jul 4, 2018

That's weird, on x86 I get this:

; Assembly listing for method Program:M1(ubyte)
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  5,  5   )   ubyte  ->  rcx
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]
;* V02 tmp1         [V02    ] (  0,  0   )     int  ->  zero-ref
;
; Lcl frame size = 0

G_M23187_IG01:

G_M23187_IG02:
       0FB6C9               movzx    rcx, cl
       FFC1                 inc      ecx
       0FB6C9               movzx    rcx, cl
       F7D9                 neg      ecx
       8BC9                 mov      ecx, ecx
       48B838229D6CFD7F0000 mov      rax, 0x7FFD6C9D2238

G_M23187_IG03:
       48FFE0               rex.jmp  rax

; Total bytes of code 25, prolog size 0 for method Program:M1(ubyte)
; ============================================================
4294967295

Any clues why?

EDIT: Oh, whoops. Let me check 32-bit...
Right, I see it now. I will edit the thread and title (I meant x86-64/AMD64)

@jakobbotsch jakobbotsch changed the title from RyuJIT incorrectly narrows value on ARM32 in release to RyuJIT incorrectly narrows value on ARM32/x86 in release Jul 4, 2018

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 4, 2018

Contributor

That's weird, on x86 I get this:

Did you test with the current coreclr build? The release 2.1 does work correctly but the current build does not.

Contributor

mikedn commented Jul 4, 2018

That's weird, on x86 I get this:

Did you test with the current coreclr build? The release 2.1 does work correctly but the current build does not.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 4, 2018

Collaborator

Did you test with the current coreclr build? The release 2.1 does work correctly but the current build does not.

I have just now tested it with x86 .NET core 2.1.300, and it repros it too it appears.

Collaborator

jakobbotsch commented Jul 4, 2018

Did you test with the current coreclr build? The release 2.1 does work correctly but the current build does not.

I have just now tested it with x86 .NET core 2.1.300, and it repros it too it appears.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 4, 2018

Contributor

The second example is another VN/CSE issue:

     ( 23,  7)              [000047] ------------              *  STMT      void  (IL 0x01F...0x023)
N004 ( 23,  7) CSE #05 (use)[000044] ---X--------              |  /--*  CAST      int <- ushort <- int <l:$40, c:$246>
N002 (  1,  1)              [000042] ------------              |  |  |  /--*  CNS_INT   int    -1 $41
N003 ( 22,  5) CSE #05 (def)[000043] ---X--------              |  |  \--*  MOD       int    <l:$40, c:$245>
N001 (  1,  1)              [000041] ------------              |  |     \--*  LCL_VAR   int    V01 loc0         u:3 (last use) <l:$45, c:$242>
N006 ( 23,  7)              [000046] -A-X----R---              \--*  ASG       ushort <l:$40, c:$247>
N005 (  2,  2)              [000045] D------N----                 \--*  LCL_VAR   ushort V00 arg0         d:3 <l:$40, c:$247>

…

     ( 17, 10)              [000074] ------------              *  STMT      void  (IL 0x032...0x038)
N006 ( 17, 10)              [000072] --CXG-------              \--*  CALL      void   Program.WriteLine $VN.Void
N004 (  3,  4) CSE #05 (use)[000118] ------------ arg0 in ecx     \--*  CAST      int <- ushort <- int <l:$40, c:$248>
N003 (  2,  2)              [000071] ------------                    \--*  LCL_VAR   int    V00 arg0         u:3 (last use) <l:$40, c:$247>

So MOD has VN = $40, gets stored as short and the subsequent load as int still produces VN $40. Then CSE picks up and messes up the whole thing:

     ( 23,  6) [000047] ------------              *  STMT      void  (IL 0x01F...0x023)
N006 (  1,  1) [000125] ------------              |     /--*  LCL_VAR   int    V05 cse0          <l:$40, c:$245>
N007 ( 23,  6) [000126] -A-X--------              |  /--*  COMMA     int    <l:$40, c:$245>
N002 (  1,  1) [000042] ------------              |  |  |     /--*  CNS_INT   int    -1 $41
N003 ( 22,  5) [000043] ---X--------              |  |  |  /--*  MOD       int    <l:$40, c:$245>
N001 (  1,  1) [000041] ------------              |  |  |  |  \--*  LCL_VAR   int    V01 loc0         u:3 (last use) <l:$45, c:$242>
N005 ( 22,  5) [000122] -A-X----R---              |  |  \--*  ASG       int    $VN.Void
N004 (  1,  1) [000121] D------N----              |  |     \--*  LCL_VAR   int    V05 cse0          <l:$40, c:$245>
N009 ( 23,  6) [000046] -A-X----R---              \--*  ASG       ushort <l:$40, c:$247>
N008 (  2,  2) [000045] D------N----                 \--*  LCL_VAR   ushort V00 arg0         d:3 <l:$40, c:$247>

…

***** BB01, stmt 7
     ( 15,  7) [000074] ------------              *  STMT      void  (IL 0x032...0x038)
N005 ( 15,  7) [000072] --CXG-------              \--*  CALL      void   Program.WriteLine $VN.Void
N003 (  1,  1) [000127] ------------ arg0 in ecx     \--*  LCL_VAR   int    V05 cse0          <l:$40, c:$245>

Anyway, different issue (or perhaps existing one, I'm starting to lose track of all these issues)

Contributor

mikedn commented Jul 4, 2018

The second example is another VN/CSE issue:

     ( 23,  7)              [000047] ------------              *  STMT      void  (IL 0x01F...0x023)
N004 ( 23,  7) CSE #05 (use)[000044] ---X--------              |  /--*  CAST      int <- ushort <- int <l:$40, c:$246>
N002 (  1,  1)              [000042] ------------              |  |  |  /--*  CNS_INT   int    -1 $41
N003 ( 22,  5) CSE #05 (def)[000043] ---X--------              |  |  \--*  MOD       int    <l:$40, c:$245>
N001 (  1,  1)              [000041] ------------              |  |     \--*  LCL_VAR   int    V01 loc0         u:3 (last use) <l:$45, c:$242>
N006 ( 23,  7)              [000046] -A-X----R---              \--*  ASG       ushort <l:$40, c:$247>
N005 (  2,  2)              [000045] D------N----                 \--*  LCL_VAR   ushort V00 arg0         d:3 <l:$40, c:$247>

…

     ( 17, 10)              [000074] ------------              *  STMT      void  (IL 0x032...0x038)
N006 ( 17, 10)              [000072] --CXG-------              \--*  CALL      void   Program.WriteLine $VN.Void
N004 (  3,  4) CSE #05 (use)[000118] ------------ arg0 in ecx     \--*  CAST      int <- ushort <- int <l:$40, c:$248>
N003 (  2,  2)              [000071] ------------                    \--*  LCL_VAR   int    V00 arg0         u:3 (last use) <l:$40, c:$247>

So MOD has VN = $40, gets stored as short and the subsequent load as int still produces VN $40. Then CSE picks up and messes up the whole thing:

     ( 23,  6) [000047] ------------              *  STMT      void  (IL 0x01F...0x023)
N006 (  1,  1) [000125] ------------              |     /--*  LCL_VAR   int    V05 cse0          <l:$40, c:$245>
N007 ( 23,  6) [000126] -A-X--------              |  /--*  COMMA     int    <l:$40, c:$245>
N002 (  1,  1) [000042] ------------              |  |  |     /--*  CNS_INT   int    -1 $41
N003 ( 22,  5) [000043] ---X--------              |  |  |  /--*  MOD       int    <l:$40, c:$245>
N001 (  1,  1) [000041] ------------              |  |  |  |  \--*  LCL_VAR   int    V01 loc0         u:3 (last use) <l:$45, c:$242>
N005 ( 22,  5) [000122] -A-X----R---              |  |  \--*  ASG       int    $VN.Void
N004 (  1,  1) [000121] D------N----              |  |     \--*  LCL_VAR   int    V05 cse0          <l:$40, c:$245>
N009 ( 23,  6) [000046] -A-X----R---              \--*  ASG       ushort <l:$40, c:$247>
N008 (  2,  2) [000045] D------N----                 \--*  LCL_VAR   ushort V00 arg0         d:3 <l:$40, c:$247>

…

***** BB01, stmt 7
     ( 15,  7) [000074] ------------              *  STMT      void  (IL 0x032...0x038)
N005 ( 15,  7) [000072] --CXG-------              \--*  CALL      void   Program.WriteLine $VN.Void
N003 (  1,  1) [000127] ------------ arg0 in ecx     \--*  LCL_VAR   int    V05 cse0          <l:$40, c:$245>

Anyway, different issue (or perhaps existing one, I'm starting to lose track of all these issues)

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 4, 2018

Collaborator

Hmm, I don't see where that is wrong. As long as V01 is properly narrowed, isn't that correct? That is, V01 should be the temp for s_1[0]++. Note that the cast in this case doesn't actually discard anything.

Collaborator

jakobbotsch commented Jul 4, 2018

Hmm, I don't see where that is wrong. As long as V01 is properly narrowed, isn't that correct? That is, V01 should be the temp for s_1[0]++. Note that the cast in this case doesn't actually discard anything.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 4, 2018

Collaborator

Number two repros on .NET core 2.1, but not on current head, as far as I can tell. So it would seem it has already been fixed. Perhaps it is a variant of #18235.
As a side note I really need to figure out how to build coreclr on ARM32...

Collaborator

jakobbotsch commented Jul 4, 2018

Number two repros on .NET core 2.1, but not on current head, as far as I can tell. So it would seem it has already been fixed. Perhaps it is a variant of #18235.
As a side note I really need to figure out how to build coreclr on ARM32...

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 4, 2018

Contributor

Note that the cast in this case doesn't actually discard anything.

You mean that it doesn't discard anything because the result of MOD is 0. Yes, but the code still seems fishy. Let me check a bit more.

Contributor

mikedn commented Jul 4, 2018

Note that the cast in this case doesn't actually discard anything.

You mean that it doesn't discard anything because the result of MOD is 0. Yes, but the code still seems fishy. Let me check a bit more.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 4, 2018

Collaborator

I just rechecked and the second one is definitely #18235. It works after #18627. Interesting because it doesn't really look anything like the other variants...

Collaborator

jakobbotsch commented Jul 4, 2018

I just rechecked and the second one is definitely #18235. It works after #18627. Interesting because it doesn't really look anything like the other variants...

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 5, 2018

Contributor

Hmm, I don't see where that is wrong. As long as V01 is properly narrowed, isn't that correct? That is, V01 should be the temp for s_1[0]++. Note that the cast in this case doesn't actually discard anything.

So, it's complicated. $40 is the VN of constant 0 so it's obvious that narrowing to ushort is useless in such a case. However, I still find this transform a bit fishy due to the way it works with multithreading. This is easier to observe in this slightly modified example:

    static int[] s_1 = new int[] { 0 };
    static int[][] s_2;
    public static void Main() => Test(0);
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Test(ushort arg0)
    {
        s_1[0] = 255;
        arg0 = (ushort)(s_1[0]++ % 70000);
        bool vr2 = true;
        if (vr2)
        {
            s_2 = s_2;
        }
        WriteLine(arg0 + 42);
    }

that generates

       B940407802   mov      ecx, 0x2784040
       BA01000000   mov      edx, 1
       E8231DF6FD   call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       8B15B01CBC05 mov      edx, gword ptr [classVar[0x2784144]]
       8BCA         mov      ecx, edx
       8B4104       mov      eax, dword ptr [ecx+4]
       83F800       cmp      eax, 0
       762E         jbe      SHORT G_M55886_IG04
       C74108FF000000 mov      dword ptr [ecx+8], 255
       83C208       add      edx, 8
       8B0A         mov      ecx, dword ptr [edx]
       8BC1         mov      eax, ecx
       40           inc      eax
       8902         mov      dword ptr [edx], eax
       8D15B41CBC05 lea      edx, [classVar[0x2784154]]
       A1B41CBC05   mov      eax, gword ptr [classVar[0x2784154]]
       E8C2D2E10A   call     CORINFO_HELP_CHECKED_ASSIGN_REF_EAX
       83C12A       add      ecx, 42
       FF15E0417802 call     [Program:WriteLine(int)]

There's no trace of MOD nor the cast to ushort, the generated code (without the post-increment stuff) is basically WriteLine(s_1[0] + 42). JIT's logic is "Hey, I see that you stored 255 to s_1[0] so I replace 255 % 70000 with 255. But then I'm too lazy to actually do constant propagation correctly and replace s_1[0] MOD 70000 with 255, I'll just replace it with s_1[0].".

That works fine until another thread modifies s_1[0] and puts there 1234567. In that case the result will be 1234567 + 42 and not (ushort)(1234567 % 70000) + 42 nor 255 + 42.

I don't know if this is a bug. In general race conditions attract undefined behavior thought that's usually due to memory ordering and caching issues. It's less common to find that the compiler has altered the shape of the code in such a way that you get an otherwise "impossible" result.

Contributor

mikedn commented Jul 5, 2018

Hmm, I don't see where that is wrong. As long as V01 is properly narrowed, isn't that correct? That is, V01 should be the temp for s_1[0]++. Note that the cast in this case doesn't actually discard anything.

So, it's complicated. $40 is the VN of constant 0 so it's obvious that narrowing to ushort is useless in such a case. However, I still find this transform a bit fishy due to the way it works with multithreading. This is easier to observe in this slightly modified example:

    static int[] s_1 = new int[] { 0 };
    static int[][] s_2;
    public static void Main() => Test(0);
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Test(ushort arg0)
    {
        s_1[0] = 255;
        arg0 = (ushort)(s_1[0]++ % 70000);
        bool vr2 = true;
        if (vr2)
        {
            s_2 = s_2;
        }
        WriteLine(arg0 + 42);
    }

that generates

       B940407802   mov      ecx, 0x2784040
       BA01000000   mov      edx, 1
       E8231DF6FD   call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       8B15B01CBC05 mov      edx, gword ptr [classVar[0x2784144]]
       8BCA         mov      ecx, edx
       8B4104       mov      eax, dword ptr [ecx+4]
       83F800       cmp      eax, 0
       762E         jbe      SHORT G_M55886_IG04
       C74108FF000000 mov      dword ptr [ecx+8], 255
       83C208       add      edx, 8
       8B0A         mov      ecx, dword ptr [edx]
       8BC1         mov      eax, ecx
       40           inc      eax
       8902         mov      dword ptr [edx], eax
       8D15B41CBC05 lea      edx, [classVar[0x2784154]]
       A1B41CBC05   mov      eax, gword ptr [classVar[0x2784154]]
       E8C2D2E10A   call     CORINFO_HELP_CHECKED_ASSIGN_REF_EAX
       83C12A       add      ecx, 42
       FF15E0417802 call     [Program:WriteLine(int)]

There's no trace of MOD nor the cast to ushort, the generated code (without the post-increment stuff) is basically WriteLine(s_1[0] + 42). JIT's logic is "Hey, I see that you stored 255 to s_1[0] so I replace 255 % 70000 with 255. But then I'm too lazy to actually do constant propagation correctly and replace s_1[0] MOD 70000 with 255, I'll just replace it with s_1[0].".

That works fine until another thread modifies s_1[0] and puts there 1234567. In that case the result will be 1234567 + 42 and not (ushort)(1234567 % 70000) + 42 nor 255 + 42.

I don't know if this is a bug. In general race conditions attract undefined behavior thought that's usually due to memory ordering and caching issues. It's less common to find that the compiler has altered the shape of the code in such a way that you get an otherwise "impossible" result.

@RussKeldorph RussKeldorph added this to the 3.0 milestone Jul 5, 2018

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 5, 2018

Collaborator

I don't know if this is a bug. In general race conditions attract undefined behavior thought that's usually due to memory ordering and caching issues. It's less common to find that the compiler has altered the shape of the code in such a way that you get an otherwise "impossible" result.

I think that result would be unexpected for most people, even with non-determinism due to threading. I wonder if your example could be extended to violate memory safety by dereferencing an array beyond its end?

Collaborator

jakobbotsch commented Jul 5, 2018

I don't know if this is a bug. In general race conditions attract undefined behavior thought that's usually due to memory ordering and caching issues. It's less common to find that the compiler has altered the shape of the code in such a way that you get an otherwise "impossible" result.

I think that result would be unexpected for most people, even with non-determinism due to threading. I wonder if your example could be extended to violate memory safety by dereferencing an array beyond its end?

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 6, 2018

Contributor

I wonder if your example could be extended to violate memory safety by dereferencing an array beyond its end?

Probably not, redundant array range check elimination relies on conservative value numbers that account for updates done by other threads. It also relies on the existence of relops like i < a.Length, if an optimization removes those, range check elimination will be blocked. But I wouldn't go as far to say that it's impossible 😁. And this is certainly something to keep an eye on while doing work in this area of the JIT.

Contributor

mikedn commented Jul 6, 2018

I wonder if your example could be extended to violate memory safety by dereferencing an array beyond its end?

Probably not, redundant array range check elimination relies on conservative value numbers that account for updates done by other threads. It also relies on the existence of relops like i < a.Length, if an optimization removes those, range check elimination will be blocked. But I wouldn't go as far to say that it's impossible 😁. And this is certainly something to keep an eye on while doing work in this area of the JIT.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 7, 2018

Collaborator

Much simpler repro:

// Generated by Fuzzlyn on 2018-07-05 23:09:33
// Seed: 5347005863841687628
// Reduced from 228.4 KiB to 0.2 KiB
// Debug: Outputs -1
// Release: Outputs 255
public class Program
{
    static byte s_74 = 1;
    public static void Main()
    {
        int vr24 = -1 * s_74;
        long vr29 = vr24;
        System.Console.WriteLine(vr29);
    }
}
Collaborator

jakobbotsch commented Jul 7, 2018

Much simpler repro:

// Generated by Fuzzlyn on 2018-07-05 23:09:33
// Seed: 5347005863841687628
// Reduced from 228.4 KiB to 0.2 KiB
// Debug: Outputs -1
// Release: Outputs 255
public class Program
{
    static byte s_74 = 1;
    public static void Main()
    {
        int vr24 = -1 * s_74;
        long vr29 = vr24;
        System.Console.WriteLine(vr29);
    }
}
@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 7, 2018

Collaborator

@AndyAyersMS You can assign this one to me, I can try to take a look. I assume it won't be too hard as it just looks like the optimization of the -1 multiplication ends up with the wrong node type.

Collaborator

jakobbotsch commented Jul 7, 2018

@AndyAyersMS You can assign this one to me, I can try to take a look. I assume it won't be too hard as it just looks like the optimization of the -1 multiplication ends up with the wrong node type.

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jul 7, 2018

Member

Sure, will do.

Member

AndyAyersMS commented Jul 7, 2018

Sure, will do.

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Jul 7, 2018

Collaborator

Here is a case that repros in both x86 and x64 RyuJIT:

public class Program
{
    static byte s_74 = 1;
    public static void Main()
    {
        int vr24 = -1 * s_74;
        System.Console.WriteLine((byte)vr24); // 255 in debug, -1 in release
    }
}

The temp for vr24 incorrectly gets type TYP_UBYTE, so the byte cast is just removed by fgMorphCast.
It looks simple to fix by just typing the NEG node correctly. I will submit a fix in a few days once I have some time to get diffs.

Collaborator

jakobbotsch commented Jul 7, 2018

Here is a case that repros in both x86 and x64 RyuJIT:

public class Program
{
    static byte s_74 = 1;
    public static void Main()
    {
        int vr24 = -1 * s_74;
        System.Console.WriteLine((byte)vr24); // 255 in debug, -1 in release
    }
}

The temp for vr24 incorrectly gets type TYP_UBYTE, so the byte cast is just removed by fgMorphCast.
It looks simple to fix by just typing the NEG node correctly. I will submit a fix in a few days once I have some time to get diffs.

jakobbotsch added a commit to jakobbotsch/coreclr that referenced this issue Jul 11, 2018

Properly type optimized NEG nodes
When the JIT was morphing trees like '-1 * expr', it would turn the
multiplication into a NEG node with the same type as its right operand.
This is a problem when the right operand was a small type like TYP_UBYTE
because the NEG node always produces a widened result. This could cause
problems when the negation was fed into something that depended on the
type, such as a cast to TYP_UBYTE: here the JIT would conclude that the
cast could be dropped, and end up producing a widened result.

The solution is to give the tree the same type as the result of the
original multiplication.

Also add a test for this case and for a similar case of '0 - expr',
which already had a fix.

Fix #18780

jakobbotsch added a commit to jakobbotsch/coreclr that referenced this issue Jul 11, 2018

Properly type optimized NEG nodes
When the JIT was morphing trees like '-1 * expr', it would turn the
multiplication into a NEG node with the same type as its right operand.
This is a problem when the right operand was a small type like TYP_UBYTE
because the NEG node always produces a widened result. This could cause
problems when the negation was fed into something that depended on the
type, such as a cast to TYP_UBYTE: here the JIT would conclude that the
cast could be dropped, and end up producing a widened result.

The solution is to give the tree the actual type of the NEG node.

Also add a test for this case and for a similar case of '0 - expr',
which already had a fix.

Fix #18780

@erozenfeld erozenfeld closed this in #18837 Jul 18, 2018

erozenfeld added a commit that referenced this issue Jul 18, 2018

Properly type morphed NEG nodes (#18837)
* Properly type optimized NEG nodes

When the JIT was morphing trees like '-1 * expr', it would turn the
multiplication into a NEG node with the same type as its right operand.
This is a problem when the right operand was a small type like TYP_UBYTE
because the NEG node always produces a widened result. This could cause
problems when the negation was fed into something that depended on the
type, such as a cast to TYP_UBYTE: here the JIT would conclude that the
cast could be dropped, and end up producing a widened result.

The solution is to give the tree the actual type of the NEG node.

Also add a test for this case and for a similar case of '0 - expr',
which already had a fix.

Fix #18780

* Address PR feedback

* Clarify comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment