Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: Argument written to stack too early on Linux #19256

Closed
jakobbotsch opened this Issue Aug 2, 2018 · 8 comments

Comments

Projects
None yet
4 participants
@jakobbotsch
Copy link
Collaborator

jakobbotsch commented Aug 2, 2018

On Linux x64, the following program prints 1 1234 in release, but should print 1 2.

struct S2
{
    public uint F0;
    public ulong F1, F2;
    public S2(uint f0): this() { F0 = f0; }
}

public class Program
{
    static S2 s_one = new S2(1);
    static S2 s_two = new S2(2);
    public static void Main()
    {
        M28(s_two, M28(s_one, s_one));
    }

    static ref S2 M28(S2 arg0, S2 arg1)
    {
        System.Console.WriteLine(arg0.F0);
        arg0.F0 = 1234; // this is printed in next invocation
        System.GC.KeepAlive(arg0); // ensure that assignment isn't removed
        return ref s_one;
    }
}

The disassembly shows that RyuJIT does not properly pass the first argument to the outer call; it is written to the stack before the inner call:

; Assembly listing for method Program:Main()
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 OutArgs      [V00    ] (  1,  1   )  lclBlk (48) [rsp+0x00]
;* V01 tmp1         [V01    ] (  0,  0   )  struct (24) zero-ref
;  V02 tmp2         [V02,T02] (  5,  5   )     int  ->  rdi         V01.F0(offs=0x00) P-INDEP
;  V03 tmp3         [V03,T03] (  5,  5   )    long  ->  rsi         V01.F1(offs=0x08) P-INDEP
;  V04 tmp4         [V04,T04] (  5,  5   )    long  ->  rax         V01.F2(offs=0x10) P-INDEP
;  V05 tmp5         [V05,T01] (  4,  8   )   byref  ->  rax
;  V06 cse0         [V06,T00] ( 11, 11   )   byref  ->  rax
;  V07 cse1         [V07,T05] (  3,  3   )    long  ->  [rbp-0x08]
;
; Lcl frame size = 64

G_M5092_IG01:
       55                   push     rbp
       4883EC40             sub      rsp, 64
       C5F877               vzeroupper
       488D6C2440           lea      rbp, [rsp+40H]

G_M5092_IG02:
       48BF98463F9E4C7F0000 mov      rdi, 0x7F4C9E3F4698
       BE02000000           mov      esi, 2
       E87F0F9F78           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48B8180900904C7F0000 mov      rax, 0x7F4C90000918                 
       488B00               mov      rax, gword ptr [rax]
       4883C008             add      rax, 8                              ; rax = address of s_two
       8B38                 mov      edi, dword ptr [rax]                |
       488B7008             mov      rsi, qword ptr [rax+8]              |
       488B4010             mov      rax, qword ptr [rax+16]             |
       893C24               mov      dword ptr [rsp], edi                |
       4889742408           mov      qword ptr [rsp+08H], rsi            |
       4889442410           mov      qword ptr [rsp+10H], rax            | write s_two as first arg (too early)
       48B8100900904C7F0000 mov      rax, 0x7F4C90000910
       488B00               mov      rax, gword ptr [rax]
       4883C008             add      rax, 8                              ; rax = address of s_one
       C4E17A6F00           vmovdqu  xmm0, qword ptr [rax]               |
       C4E17A7F0424         vmovdqu  qword ptr [rsp], xmm0               |
       488B7810             mov      rdi, qword ptr [rax+16]             |
       48897C2410           mov      qword ptr [rsp+10H], rdi            ; write s_one as first arg
       C4E17A6F00           vmovdqu  xmm0, qword ptr [rax]               |
       C4E17A7F442418       vmovdqu  qword ptr [rsp+18H], xmm0           |
       488B7810             mov      rdi, qword ptr [rax+16]             |
       48897C2428           mov      qword ptr [rsp+28H], rdi            ; write s_one as second arg
       E888FBFFFF           call     Program:M28(struct,struct):byref    ; inner call
       C4E17A6F00           vmovdqu  xmm0, qword ptr [rax]               |
       C4E17A7F442418       vmovdqu  qword ptr [rsp+18H], xmm0           |
       488B7810             mov      rdi, qword ptr [rax+16]             |
       48897C2428           mov      qword ptr [rsp+28H], rdi            | write second arg
       E86EFBFFFF           call     Program:M28(struct,struct):byref    ; outer call. First arg was passed too early
       90                   nop

G_M5092_IG03:
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       C3                   ret

; Total bytes of code 169, prolog size 13 for method Program:Main()

This does not repro on Windows, which is likely related to the fact that both args are passed by reference there.

@BruceForstall BruceForstall added this to the 3.0 milestone Aug 3, 2018

@BruceForstall

This comment has been minimized.

Copy link
Member

BruceForstall commented Aug 3, 2018

@jakobbotsch

This comment has been minimized.

Copy link
Collaborator Author

jakobbotsch commented Nov 10, 2018

Not surprisingly this can also cause RyuJIT to generate code that crashes the process unconditionally:

class C0
{
    public uint F0;
    public ulong F1;
    public byte F2;
}

struct S1
{
    public ushort F1;
    public int F2;
    public C0 F3;
    public uint F4;
    public S1(C0 f3): this()
    {
        F3 = f3;
    }
}

public class Program
{
    static S1 s_21 = new S1(new C0());
    static byte[][][] s_31 = new byte[][][]{new byte[][]{new byte[]{0}}};
    static S1 s_50 = new S1(new C0());
    static C0 s_74 = new C0();
    static C0 s_78 = new C0();
    public static void Main()
    {
        var vr21 = s_78.F1;
        var vr22 = s_31[0][0];
        var vr23 = new C0();
        uint vr37 = s_50.F3.F0;
        var vr25 = new C0();
        var vr26 = s_74.F2;
        M5(s_21, M62(vr21, vr22, 0, vr37, vr23, vr25, vr26));
    }

    static C0[] M5(S1 arg0, S1 arg1)
    {
        ++arg0.F3.F1;
        return new C0[]{new C0()};
    }

    static ref S1 M62(ulong arg0, byte[] arg1, sbyte arg2, uint arg3, C0 arg4, C0 arg5, short arg6)
    {
        if (18148934694853299104UL == (ushort)(arg3 ^ 0))
        {
        }

        return ref s_21;
    }
}

which crashes in M5:

Thread 1 "corerun" received signal SIGSEGV, Segmentation fault.
0x00007fff84dc1dca in ?? ()
(gdb) x/20i 0x7fff84dc1dc0
   0x7fff84dc1dc0:      push   rbx
   0x7fff84dc1dc1:      nop    DWORD PTR [rax+0x0]
   0x7fff84dc1dc5:      mov    rdi,QWORD PTR [rsp+0x10]
=> 0x7fff84dc1dca:      inc    QWORD PTR [rdi+0x8]
   0x7fff84dc1dce:      movabs rdi,0x7fff83a575e0
   0x7fff84dc1dd8:      mov    esi,0x1
   0x7fff84dc1ddd:      call   0x7ffffd8a7f20 <JIT_NewArr1OBJ_MP_FastPortable(CORINFO_CLASS_STRUCT_*, long)>
   0x7fff84dc1de2:      mov    rbx,rax
   0x7fff84dc1de5:      movabs rdi,0x7fff83a558f8
   0x7fff84dc1def:      call   0x7ffffd8a5e40 <JIT_NewS_MP_FastPortable(CORINFO_CLASS_STRUCT_*)>
   0x7fff84dc1df4:      mov    rdx,rax
   0x7fff84dc1df7:      mov    rdi,rbx
   0x7fff84dc1dfa:      xor    esi,esi
   0x7fff84dc1dfc:      call   0x7ffffd9b1ae0 <JIT_Stelem_Ref>
   0x7fff84dc1e01:      mov    rax,rbx
   0x7fff84dc1e04:      pop    rbx
   0x7fff84dc1e05:      ret
   0x7fff84dc1e06:      add    BYTE PTR [rax],al
   0x7fff84dc1e08:      sbb    DWORD PTR [rcx],eax
   0x7fff84dc1e0a:      add    DWORD PTR [rax],eax
(gdb) info registers rdi
rdi            0x7fff00000000   140733193388032

The call sequence in Main looks like

G_M5096_IG04:
       488D3C24             lea      rdi, [rsp]
       488D75C8             lea      rsi, [rbp-38H]
       488B0E               mov      rcx, gword ptr [rsi]
       48890C24             mov      gword ptr [rsp], rcx
       4883C608             add      rsi, 8
       4883C708             add      rdi, 8
       48A5                 movsq
       48A5                 movsq
       44890424             mov      dword ptr [rsp], r8d
       488BFB               mov      rdi, rbx
       498BF6               mov      rsi, r14
       418BCC               mov      ecx, r12d
       4D8BC7               mov      r8, r15
       33D2                 xor      edx, edx
       E880FAFFFF           call     Program:M62(long,ref,byte,int,ref,ref,short):byref
       488D7C2418           lea      rdi, [rsp+18H]
       488BF0               mov      rsi, rax
       488B0E               mov      rcx, gword ptr [rsi]
       48894C2418           mov      gword ptr [rsp+18H], rcx
       4883C608             add      rsi, 8
       4883C708             add      rdi, 8
       48A5                 movsq
       48A5                 movsq
       E857FAFFFF           call     Program:M5(struct,struct):ref

which appears to be the same problem as in the original post, i.e. the code tries to pass the first arg to M5 too early.

@jakobbotsch

This comment has been minimized.

Copy link
Collaborator Author

jakobbotsch commented Nov 10, 2018

The problem seems to be that morph does not call EvalArgsToTemp if the function has only stack args:

coreclr/src/jit/morph.cpp

Lines 4262 to 4279 in 54fba14

// For UNIX_AMD64, the condition without hasStackArgCopy cannot catch
// all cases of fgMakeOutgoingStructArgCopy() being called. hasStackArgCopy
// is added to make sure to call EvalArgsToTemp.
if (!reMorphing && (call->fgArgInfo->HasRegArgs()))
{
// This is the first time that we morph this call AND it has register arguments.
// Follow into the code below and do the 'defer or eval to temp' analysis.
call->fgArgInfo->SortArgs();
call->fgArgInfo->EvalArgsToTemps();
// We may have updated the arguments
if (call->gtCallArgs)
{
UpdateGT_LISTFlags(call->gtCallArgs);
}
}

note there is a comment on it which seems outdated. It looks like the struct check was removed in #18358.

If a third simple int arg is added so that there is a register argument, or if the check is changed to

if (!reMorphing && (call->fgArgInfo->HasRegArgs() || call->fgArgInfo->HasStackArgs()))

then the bug disappears. Not sure if this check is right, however. Running diffs now...

cc @CarolEidt

@jakobbotsch

This comment has been minimized.

Copy link
Collaborator Author

jakobbotsch commented Nov 12, 2018

I did try that fix, though after reading some more of the code around there it is probably not the right fix. It looks more like the problem is somewhere in fgMakeOutgoingStructArgCopy, though I haven't looked closer.

Anyway there is an instance of this in frameworks:

Crossgen Diffs for System.Private.CoreLib.dll, framework assemblies for x64 linuxnonjit.dll
Summary:
(Lower is better)
Total bytes of diff: 114 (0,00 % of base)
    diff is a regression.
Top file regressions by size (bytes):
         114 : System.ComponentModel.TypeConverter.dasm (0,02 % of base)
1 total files with size differences (0 improved, 1 regressed), 128 unchanged.
Top method regressions by size (bytes):
         114 (2,04 % of base) : System.ComponentModel.TypeConverter.dasm - ColorConverter:ConvertTo(ref,ref,ref,ref):ref:this (2 methods)
Top method regressions by size (percentage):
         114 (2,04 % of base) : System.ComponentModel.TypeConverter.dasm - ColorConverter:ConvertTo(ref,ref,ref,ref):ref:this (2 methods)
1 total methods with size differences (0 improved, 1 regressed), 122948 unchanged.
Completed analysis in 21,65s

which is in this code:
https://github.com/dotnet/corefx/blob/9ae14ba629e7547b2d27180431cfd7f5a3c14709/src/System.ComponentModel.TypeConverter/src/System/Drawing/ColorConverter.cs#L61
Currently it generates:

       lea      rdi, [rsp]
       lea      rsi, [rbp-40H]
       mov      rcx, gword ptr [rsi]
       mov      gword ptr [rsp], rcx
       add      rsi, 8
       add      rdi, 8
       movsq    
       movsq    
       mov      rax, (reloc)
       call     bword ptr [rax]CORINFO_HELP_READYTORUN_STATIC_BASE
       lea      rdi, [rsp+18H]
       mov      rsi, rax
       mov      rcx, gword ptr [rsi]
       mov      gword ptr [rsp+18H], rcx
       add      rsi, 8
       add      rdi, 8
       movsq    
       movsq    
       mov      rax, (reloc)
       call     qword ptr [rax]Color:op_Equality(struct,struct):bool

I don't think this causes problems in this instance, but with the fix from above this becomes:

       mov      rax, (reloc)
       call     bword ptr [rax]CORINFO_HELP_READYTORUN_STATIC_BASE

G_M29275_IG09:
       movdqu   xmm0, qword ptr [rax]
       movdqu   qword ptr [rbp-98H], xmm0
       mov      rdi, qword ptr [rax+16]
       mov      qword ptr [rbp-88H], rdi

G_M29275_IG10:
       lea      rdi, [rsp+18H]
       lea      rsi, [rbp-98H]
       mov      rcx, gword ptr [rsi]
       mov      gword ptr [rsp+18H], rcx
       add      rsi, 8
       add      rdi, 8
       movsq    
       movsq    
       lea      rdi, [rsp]
       lea      rsi, [rbp-B0H]
       mov      rcx, gword ptr [rsi]
       mov      gword ptr [rsp], rcx
       add      rsi, 8
       add      rdi, 8
       movsq    
       movsq    
       mov      rax, (reloc)
       call     qword ptr [rax]Color:op_Equality(struct,struct):bool
@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Feb 18, 2019

I'll investigate.

@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Feb 19, 2019

This still repros. Looks like a regression from 2.2, likely from #18358.

If we have a call site with two or more on-stack args (and no other args), and the second of those args is a call that itself has an on-stack arg, we need to make sure to evaluate the calls before writing the other args to the stack.

There used to be an extra flag hasStackArgCopy feeding into the decision in fgMorphArgs to sort args and eval to temps; it was removed in #18358 though still referred to in comments. Suspect it avoided the issue here but perhaps also introduced unnecessary copies in other cases.

@CarolEidt perhaps best if you took a look.

CarolEidt added a commit to CarolEidt/coreclr that referenced this issue Feb 22, 2019

Fix condition for calling EvalArgsToTemps
`fgArgInfo::ArgsComplete()` checks for additional conditions requiring temps that are not checked for in the body of `fgMorphArgs()`. However, if there are no register args, it won't be called.

Fix dotnet#19256

CarolEidt added a commit to CarolEidt/coreclr that referenced this issue Feb 26, 2019

Fix condition for calling EvalArgsToTemps
`fgArgInfo::ArgsComplete()` checks for additional conditions requiring temps that are not checked for in the body of `fgMorphArgs()`. However, if there are no register args, it won't be called.

Fix dotnet#19256
@jakobbotsch

This comment has been minimized.

Copy link
Collaborator Author

jakobbotsch commented Mar 4, 2019

This marks the last JIT bug found by Fuzzlyn (for now) as fixed. 🎉
Congratulations and great job to all!

@CarolEidt

This comment has been minimized.

Copy link
Member

CarolEidt commented Mar 4, 2019

And many thanks to you @jakobbotsch for finding all these!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.