Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT doesn't elide some "this != null" checks when accessing fixed-size buffers in structs #49180

Closed
GrabYourPitchforks opened this issue Mar 4, 2021 · 3 comments · Fixed by #49238
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

When accessing fixed-sized buffers inside a struct's instance methods, the JIT sometimes unnecessarily includes defenses against the case where "this == nullptr". In one-off instance methods this isn't a huge problem, but when calling these instance methods within hot loops it bloats codegen and lowers throughput.

Repro code, demonstrating the problem outside of a loop:

using System;
using System.Runtime.CompilerServices;

public class C {
    private int _i;
    private MyStruct _mys;
    
    public void M(uint index) {
        var _ = _i; // force 'this' to be dereferenced, so JIT can propagate this != null
        Console.WriteLine(_mys.GetValue(index));
    }
    
    private unsafe struct MyStruct
    {
        private fixed byte Buffer[8];
        
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public byte GetValue(uint index)
        {
            if (index < 8)
            {
                return Buffer[index];
            }
            else
            {
                return 0;
            }
        }
    }
}

x64 codegen, courtesy SharpLab:

C.M(UInt32)
    L0000: mov eax, [rcx+8]  ; dereference this._i, which should allow JIT to propagate "this != nullptr" below
    L0003: add rcx, 0x10
    L0007: cmp edx, 8
    L000a: jae short L0016
    L000c: cmp [rcx], ecx  ; unnecessary check for "ref _mys.Buffer is <unaddressable>", which should be elided by L0000 above
    L000e: mov eax, edx
    L0010: movzx ecx, byte ptr [rcx+rax]
    L0014: jmp short L0018
    L0016: xor ecx, ecx
    L0018: jmp System.Console.WriteLine(Int32)

If the helper method is made an instance method of the class which has MyStruct as a field - rather than an instance method of MyStruct itself - the check is properly elided, as shown below.

using System;
using System.Runtime.CompilerServices;

public class D {
    private int _i;
    private MyStruct _mys;
    
    public void N(uint index)
    {
        var _ = _i; // force 'this' to be dereferenced, so JIT can propagate this != null
        Console.WriteLine(GetValue(index));
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private unsafe int GetValue(uint index)
    {
        if (index < 8)
        {
            return _mys.Buffer[index];
        }
        else
        {
            return 0;
        }
    }
    
    private unsafe struct MyStruct
    {
        internal fixed byte Buffer[8];
    }
}
D.N(UInt32)
    L0000: mov eax, [rcx+8]  ; deref this
    L0003: cmp edx, 8
    L0006: jae short L0011
    L0008: mov eax, edx
    L000a: movzx ecx, byte ptr [rcx+rax+0x10]  ; no additional guard needed before deref into _mys.Buffer
    L000f: jmp short L0013
    L0011: xor ecx, ecx
    L0013: jmp System.Console.WriteLine(Int32)

Moving this logic down is a viable workaround, but it harms code maintainability. It means that I can no longer keep the unsafe code wholly self-contained within the struct. It must instead be split across a few different types, and it leaks implementation details to callers who really shouldn't need to deal with unsafe concepts.

@GrabYourPitchforks GrabYourPitchforks added tenet-performance Performance related issue area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Mar 4, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 4, 2021
@AndyAyersMS
Copy link
Member

Seems like optAssertionProp_Ind could check if the address VN decomposes into an Add, not that the tree is literally an add.

Here we have and indir though tmp2, where we know:

N001 [000044]   LCL_VAR   V04 tmp2         u:1 => $141 {ADD($80, $101)}

In BB01 New Global Constant Assertion: (128, 0) ($80,$0) V00.01 != null index=#01, mask=0000000000000001

but we fail to deduce that

N002 (  2,  2) [000045] ---X---N----                    |  +--*  NULLCHECK byte   $241
N001 (  1,  1) [000044] ------------                    |  |  \--*  LCL_VAR   byref  V04 tmp2         u:1 $141

can be optimized away.

@AndyAyersMS
Copy link
Member

Seems to work out as expected... will kick off more detailed testing.

;  V01 arg1         [V01,T01] (  4,  3.50)     int  ->  rdx
 ;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
 ;  V03 tmp1         [V03,T03] (  3,  2   )   ubyte  ->  rcx         "Inline return value spill temp"
-;  V04 tmp2         [V04,T02] (  3,  4   )   byref  ->  rcx         "Inlining Arg"
+;  V04 tmp2         [V04,T02] (  2,  3   )   byref  ->  rcx         "Inlining Arg"
 ;
 ; Lcl frame size = 0

@@ -22,11 +22,10 @@ G_M44576_IG02:
        jae      SHORT G_M44576_IG04
                                                ;; bbWeight=1    PerfScore 3.50
 G_M44576_IG03:
-       cmp      dword ptr [rcx], ecx
        mov      eax, edx
        movzx    rcx, byte  ptr [rcx+rax]
        jmp      SHORT G_M44576_IG05
-                                               ;; bbWeight=0.50 PerfScore 3.12
+                                               ;; bbWeight=0.50 PerfScore 2.12
 G_M44576_IG04:
        xor      ecx, ecx
                                                ;; bbWeight=0.50 PerfScore 0.12
@@ -34,7 +33,7 @@ G_M44576_IG05:
        jmp      Console:WriteLine(int)
                                                ;; bbWeight=1    PerfScore 2.00

-; Total bytes of code 29, prolog size 0, PerfScore 11.65, instruction count 10, allocated bytes for code 29 (MethodHash=7cd251df) for method C:M(int):this
+; Total bytes of code 27, prolog size 0, PerfScore 10.45, instruction count 9, allocated bytes for code 27 (MethodHash=7cd251df) for method C:M(int):this

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Mar 5, 2021
@AndyAyersMS AndyAyersMS self-assigned this Mar 5, 2021
@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone Mar 5, 2021
@AndyAyersMS AndyAyersMS added this to Needs Triage in .NET Core CodeGen via automation Mar 5, 2021
@AndyAyersMS AndyAyersMS moved this from Needs Triage to Optimizations in .NET Core CodeGen Mar 5, 2021
@AndyAyersMS
Copy link
Member

Note there are likely more places in assertion prop where we miss things by looking at the IR not destructuring the VNs instead.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 5, 2021
In addition to checking for assertions based on the VN of an address, try and
destructure the VN to find the "base" address, and check its VNs as well.

This lets us get rid of some extra null checks, typically ones that are at
an offset from an existing non-null pointer.

Closes dotnet#49180.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 5, 2021
.NET Core CodeGen automation moved this from Optimizations to Done Mar 6, 2021
AndyAyersMS added a commit that referenced this issue Mar 6, 2021
In addition to checking for assertions based on the VN of an address, try and
destructure the VN to find the "base" address, and check its VNs as well.

This lets us get rid of some extra null checks, typically ones that are at
an offset from an existing non-null pointer.

Closes #49180.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 6, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants