New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic program outputs indeterministic results on Linux in release #18522

Closed
jakobbotsch opened this Issue Jun 17, 2018 · 11 comments

Comments

Projects
None yet
5 participants
@jakobbotsch
Collaborator

jakobbotsch commented Jun 17, 2018

On Ubuntu 14.04 using current HEAD (71f4199), the following program outputs 0 in debug but seemingly random values in release. On Windows, both debug and release output 0.

struct S0
{
    public ushort F0;
}

struct S1
{
    public S0 F0;
    public ushort F1;
}

public class Program
{
    static S1 s_36;
    public static void Main()
    {
        s_36.F0 = M113();
        System.Console.WriteLine(s_36.F1);
    }

    static S0 M113()
    {
        return new S0();
    }
}

@jkotas jkotas added this to the 3.0 milestone Jun 17, 2018

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 19, 2018

Member

I'll take an initial look.

Member

AndyAyersMS commented Jun 19, 2018

I'll take an initial look.

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 19, 2018

Member

Looks like on Linux we widen a store inappropriately and clobber an adjoining field.

;;; Main (windows)
       48B988EF4697FD7F0000 mov      rcx, 0x7FFD9746EF88
       BA03000000           mov      edx, 3
       E8583F805F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48B9D02BAB5BE8010000 mov      rcx, 0x1E85BAB2BD0
       488B09               mov      rcx, gword ptr [rcx]
       4883C108             add      rcx, 8
       66C7010000           mov      word  ptr [rcx], 0        // zero s_36.F0 (2 bytes)

;;; Main (Linux)

       48BF88EF4497FD7F0000 mov      rdi, 0x7FFD9744EF88
       BE03000000           mov      esi, 3
       E800000000           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48BFD02B93E114020000 mov      rdi, 0x214E1932BD0
       488B3F               mov      rdi, gword ptr [rdi]
       4883C708             add      rdi, 8
       66C74424100000       mov      word  ptr [rsp+10H], 0
       66C74424100000       mov      word  ptr [rsp+10H], 0
       8B442410             mov      eax, dword ptr [rsp+10H]
       8907                 mov      dword ptr [rdi], eax      // zero 4 bytes... ??

Looking at the jit dumps now....

Member

AndyAyersMS commented Jun 19, 2018

Looks like on Linux we widen a store inappropriately and clobber an adjoining field.

;;; Main (windows)
       48B988EF4697FD7F0000 mov      rcx, 0x7FFD9746EF88
       BA03000000           mov      edx, 3
       E8583F805F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48B9D02BAB5BE8010000 mov      rcx, 0x1E85BAB2BD0
       488B09               mov      rcx, gword ptr [rcx]
       4883C108             add      rcx, 8
       66C7010000           mov      word  ptr [rcx], 0        // zero s_36.F0 (2 bytes)

;;; Main (Linux)

       48BF88EF4497FD7F0000 mov      rdi, 0x7FFD9744EF88
       BE03000000           mov      esi, 3
       E800000000           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       48BFD02B93E114020000 mov      rdi, 0x214E1932BD0
       488B3F               mov      rdi, gword ptr [rdi]
       4883C708             add      rdi, 8
       66C74424100000       mov      word  ptr [rsp+10H], 0
       66C74424100000       mov      word  ptr [rsp+10H], 0
       8B442410             mov      eax, dword ptr [rsp+10H]
       8907                 mov      dword ptr [rdi], eax      // zero 4 bytes... ??

Looking at the jit dumps now....

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 19, 2018

Member

Seems like this happens in impAssignStructPtr where because the struct returning call has been retyped as int we somehow propagate that to the subsequent assign:

;;; windows
    [ 2]  10 (0x00a) stfld 04000002
.... generating stfld assign
               [000018] --C---------              *  RET_EXPR  struct(inl return from call [000012])

.... to ...
               [000019] ---XG-------              *  FIELD     struct F0
               [000017] ------------              \--*  LCL_VAR   byref  V01 tmp1         


               [000023] ------------              *  STMT      void  (IL   ???...  ???)
               [000018] --C---------              |  /--*  RET_EXPR  int   (inl return from call [000012])
               [000022] -ACXG-------              \--*  ASG       short 
               [000021] *--XG-------                 \--*  IND       short 
               [000020] ---XG-------                    \--*  ADDR      byref 
               [000019] ---XG-------                       \--*  FIELD     struct F0
               [000017] ------------                          \--*  LCL_VAR   byref  V01 tmp1 

;;; linux

    [ 2]  10 (0x00a) stfld 04000002
.... generating stfld assign
               [000018] --C---------              *  RET_EXPR  struct(inl return from call [000012])

.... to ...
               [000019] ---XG-------              *  FIELD     struct F0
               [000017] ------------              \--*  LCL_VAR   byref  V01 tmp1         


               [000023] ------------              *  STMT      void  (IL   ???...  ???)
               [000018] --C---------              |  /--*  RET_EXPR  int   (inl return from call [000012])
               [000022] -ACXG-------              \--*  ASG       int   
               [000021] *--XG-------                 \--*  IND       int   
               [000020] ---XG-------                    \--*  ADDR      byref 
               [000019] ---XG-------                       \--*  FIELD     struct F0
               [000017] ------------  

Lots of convoluted logic in this method....

Member

AndyAyersMS commented Jun 19, 2018

Seems like this happens in impAssignStructPtr where because the struct returning call has been retyped as int we somehow propagate that to the subsequent assign:

;;; windows
    [ 2]  10 (0x00a) stfld 04000002
.... generating stfld assign
               [000018] --C---------              *  RET_EXPR  struct(inl return from call [000012])

.... to ...
               [000019] ---XG-------              *  FIELD     struct F0
               [000017] ------------              \--*  LCL_VAR   byref  V01 tmp1         


               [000023] ------------              *  STMT      void  (IL   ???...  ???)
               [000018] --C---------              |  /--*  RET_EXPR  int   (inl return from call [000012])
               [000022] -ACXG-------              \--*  ASG       short 
               [000021] *--XG-------                 \--*  IND       short 
               [000020] ---XG-------                    \--*  ADDR      byref 
               [000019] ---XG-------                       \--*  FIELD     struct F0
               [000017] ------------                          \--*  LCL_VAR   byref  V01 tmp1 

;;; linux

    [ 2]  10 (0x00a) stfld 04000002
.... generating stfld assign
               [000018] --C---------              *  RET_EXPR  struct(inl return from call [000012])

.... to ...
               [000019] ---XG-------              *  FIELD     struct F0
               [000017] ------------              \--*  LCL_VAR   byref  V01 tmp1         


               [000023] ------------              *  STMT      void  (IL   ???...  ???)
               [000018] --C---------              |  /--*  RET_EXPR  int   (inl return from call [000012])
               [000022] -ACXG-------              \--*  ASG       int   
               [000021] *--XG-------                 \--*  IND       int   
               [000020] ---XG-------                    \--*  ADDR      byref 
               [000019] ---XG-------                       \--*  FIELD     struct F0
               [000017] ------------  

Lots of convoluted logic in this method....

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 19, 2018

Member

Similar issue if M113 is marked noinline -- so there are at least two clauses in impAssignStructPtr with faulty logic (for GT_CALL and GT_RET_EXPR).

Member

AndyAyersMS commented Jun 19, 2018

Similar issue if M113 is marked noinline -- so there are at least two clauses in impAssignStructPtr with faulty logic (for GT_CALL and GT_RET_EXPR).

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 19, 2018

Member

Still not sure where to actually try and fix this one.

I wonder about the asymmetry in getArgTypeForStruct and getReturnTypeForStruct: the former carefully preserves short types, the latter doesn't.... (though on windows we also retype the call to return int).

@CarolEidt any ideas?
cc @dotnet/jit-contrib

Member

AndyAyersMS commented Jun 19, 2018

Still not sure where to actually try and fix this one.

I wonder about the asymmetry in getArgTypeForStruct and getReturnTypeForStruct: the former carefully preserves short types, the latter doesn't.... (though on windows we also retype the call to return int).

@CarolEidt any ideas?
cc @dotnet/jit-contrib

@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Jun 19, 2018

Member

I don't have any real ideas. I would certainly suspect that getArgTypeForStruct and getReturnTypeForStruct should behave similarly (at least when the passing/returning rules are the same).

Member

CarolEidt commented Jun 19, 2018

I don't have any real ideas. I would certainly suspect that getArgTypeForStruct and getReturnTypeForStruct should behave similarly (at least when the passing/returning rules are the same).

@briansull

This comment has been minimized.

Show comment
Hide comment
@briansull

briansull Jun 19, 2018

Contributor

For Args we have to read them as small type and the upper bits can be garbage. (Managed code won't pass uninitialized bits in the upper bits of an argument, but C++ code can and does)
For Returns in managed code we always zero/sign extend the short type so that it returns a register sized value.

Contributor

briansull commented Jun 19, 2018

For Args we have to read them as small type and the upper bits can be garbage. (Managed code won't pass uninitialized bits in the upper bits of an argument, but C++ code can and does)
For Returns in managed code we always zero/sign extend the short type so that it returns a register sized value.

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 19, 2018

Member

Sure, but why then retype consumers of the returns...?

I can probably hack what is going on in impAssignStructPtr to recover the typing information we just threw away in impFixupCallStructReturn but that seems like just paving over one problem with another.

One of my struggles here is that I have a deep seated belief that we should not be mixing in ABI concerns this early in the phase ordering. It's hard enough just getting a consistent view of things during importation. Trying to also bridge between logical and physical notions (what is being passed/returned vs how it is being passed/returned) at the same time seems to me to just be asking for problems like this...

Member

AndyAyersMS commented Jun 19, 2018

Sure, but why then retype consumers of the returns...?

I can probably hack what is going on in impAssignStructPtr to recover the typing information we just threw away in impFixupCallStructReturn but that seems like just paving over one problem with another.

One of my struggles here is that I have a deep seated belief that we should not be mixing in ABI concerns this early in the phase ordering. It's hard enough just getting a consistent view of things during importation. Trying to also bridge between logical and physical notions (what is being passed/returned vs how it is being passed/returned) at the same time seems to me to just be asking for problems like this...

@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Jun 19, 2018

Member

I have a deep seated belief that we should not be mixing in ABI concerns this early in the phase ordering

I agree - I think we need to take ABI constraints into consideration when making struct promotion decisions, and possibly elsewhere. But retyping expressions in this way has caused numerous issues over the years. IMO the transformations done to normalize small types should be done just once (presumably in morph, not in the importer), so that the logic doesn't need to be repeatedly duplicated. To me it would make more sense for the importer to simply insert explicit casts where appropriate, and then allow morph to optimize them away where possible.

Member

CarolEidt commented Jun 19, 2018

I have a deep seated belief that we should not be mixing in ABI concerns this early in the phase ordering

I agree - I think we need to take ABI constraints into consideration when making struct promotion decisions, and possibly elsewhere. But retyping expressions in this way has caused numerous issues over the years. IMO the transformations done to normalize small types should be done just once (presumably in morph, not in the importer), so that the logic doesn't need to be repeatedly duplicated. To me it would make more sense for the importer to simply insert explicit casts where appropriate, and then allow morph to optimize them away where possible.

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 19, 2018

Member

FWIW on windows we also realize we're going to return the type in a register, but we've set gtReturnType to short.

So that suggests that the right path is perhaps to modify getReturnTypeForStruct to preserve short integer types.

Member

AndyAyersMS commented Jun 19, 2018

FWIW on windows we also realize we're going to return the type in a register, but we've set gtReturnType to short.

So that suggests that the right path is perhaps to modify getReturnTypeForStruct to preserve short integer types.

AndyAyersMS added a commit to AndyAyersMS/coreclr that referenced this issue Jun 20, 2018

JIT: fix bug returning small structs on linux x64
The jit was retyping all calls with small struct returns as 32 bit ints
when generating code for linux x64. When the results of those calls were
assigned to fields the jit would use 32 bit stores which could corrupt
neighboring fields.

The fix is to keep the small int types for 8 and 16 byte structs so that
the corresponding stores are the right size.

Fixes #18522.
@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Jun 21, 2018

Member

Think I have the return via temp part working, but looking at where things have ended up it seems like we might as well just re-fetch the struct size from the VM and check it in impFixupCallStructReturn instead of calling IsEnclosingType on the return type descriptor -- that lets us back out a bunch of the other changes including SPK_EnclosingType and leaves what should be a fairly surgical change.

So am going to work on that next.

Still haven't looked at ARM64, two reg return cases, updated tests lists, etc.

Member

AndyAyersMS commented Jun 21, 2018

Think I have the return via temp part working, but looking at where things have ended up it seems like we might as well just re-fetch the struct size from the VM and check it in impFixupCallStructReturn instead of calling IsEnclosingType on the return type descriptor -- that lets us back out a bunch of the other changes including SPK_EnclosingType and leaves what should be a fairly surgical change.

So am going to work on that next.

Still haven't looked at ARM64, two reg return cases, updated tests lists, etc.

AndyAyersMS added a commit to AndyAyersMS/coreclr that referenced this issue Jun 25, 2018

JIT: fix bug returning small structs on linux x64
The jit was retyping all calls with small struct returns as 32 bit ints
when generating code for linux x64. When the results of those calls were
assigned to fields the jit would use 32 bit stores which could corrupt
neighboring fields.

The fix is to keep the small int types for 8 and 16 byte structs so that
the corresponding stores are the right size.

Fixes #18522.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment