Skip to content

Suboptimal codegen for UIntPtr in contrast to nuint #54463

@MichalPetryka

Description

@MichalPetryka

Description

Casting nuint to uint/ulong followed by a method call gets optimized to a tailcall on x64, but doing the same with UIntPtr isn't the case as it can be seen here:
SharpLab x64

On x86 the codegen for both is the same (x86 doesn't have tailcalls afaik):
SharpLab x86

using System.Runtime.CompilerServices;
using System;

public static unsafe class Test
{
    public static void A(nuint n)
    {
        C((ulong)n);
    }
    
    public static void B(nuint n)
    {
        if (sizeof(nuint) == sizeof(uint))
            C((uint)n);
        else
            C((ulong)n);
    }
    
    public static void X(UIntPtr n)
    {
        C((ulong)n);
    }
    
    public static void Y(UIntPtr n)
    {
        if (UIntPtr.Size == sizeof(uint))
            C((uint)n);
        else
            C((ulong)n);
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void C(uint n) {  }
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void C(ulong n) {  }
}

Configuration

Tested on SharpLab with Core CLR v5.0.621.22011 on amd64

Regression?

I haven't tested older versions, only the one on SharpLab.

Data

x64 codegen: SharpLab x64
x86 codegen: SharpLab x86

Analysis

Analysis from SingleAccretion#3645 on the dotnet discord:

So the direct reason is x64 supports implicit tailcall optimization here, while x86 does not.
Now the real question is why doesn't it tailcall both on x64?
For that the reason is there's some cruft to UIntPtr that there is not to nuint: casts from the former actually go through framework methods.
Usually that's OK because the methods are inlined and the resulting codegen is the same.
In this cases something blocked the tailcall opt, and it is not abundantly clear what.
Here's the IR in morph:

fgMorphTree BB01, STMT00003 (before)
               [000009] -A----------              *  ASG       long  
               [000008] D------N----              +--*  LCL_VAR   long   V02 tmp1         
               [000000] ------------              \--*  LCL_VAR   long   V00 arg0         
Notify VM instruction set (SSE2) must be supported.
GenTreeNode creates assertion:
               [000009] -A----------              *  ASG       long  
In BB01 New Local Copy     Assertion: V02 == V00 index=#01, mask=0000000000000001

fgMorphTree BB01, STMT00001 (before)
               [000003] --C-G-------              *  CALL      void   RyuJitReproduction.Program.C
               [000007] ------------ arg0         \--*  LCL_VAR   long   V02 tmp1         

Rejecting tail call in morph for call [000003]: Local address taken V02

Suspicion is this early tree:

LocalAddressVisitor visiting statement:
STMT00001 (IL   ???...  ???)
               [000003] --C-G-------              *  CALL      void   RyuJitReproduction.Program.C
               [000007] ----G------- arg0         \--*  FIELD     long   _value
               [000006] ------------                 \--*  ADDR      byref 
               [000005] -------N----                    \--*  LCL_VAR   long   V02 tmp1         
Replacing the field in normed struct with local var V02
LocalAddressVisitor modified statement:
STMT00001 (IL   ???...  ???)
               [000003] --C-G-------              *  CALL      void   RyuJitReproduction.Program.C
               [000007] ------------ arg0         \--*  LCL_VAR   long   V02 tmp1         

But V02 shouldn't have been marked as address-taken...

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions