Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove invalid Unsafe.As from array helpers #99778

Merged
merged 15 commits into from Mar 21, 2024
Merged

Conversation

MichalPetryka
Copy link
Contributor

This probably doesn't cause issues today but it might start to do so when we start inlining helpers.

Causes almost no diffs outside of calling a managed throw helper now.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 14, 2024
@jkotas
Copy link
Member

jkotas commented Mar 15, 2024

public static unsafe void StelemRef(Array array, nint index, object obj)
has clone of this code with the same potential issue.

@MichalPetryka
Copy link
Contributor Author

has clone of this code

What's the INPLACE_RUNTIME stuff there? Is it related to some old multimodule mode? (I'll change that file tomorrow)

@jkotas
Copy link
Member

jkotas commented Mar 15, 2024

What's the INPLACE_RUNTIME stuff there? Is it related to some old multimodule mode? (I'll change that file tomorrow)

I think we build INPLACE_RUNTIME defined everywhere currently. !INPLACE_RUNTIME was a build configuration in ProjectN where the GC and the lowest level of the runtime (including these helpers) were build as a separate library that can be used by multiple higher-level runtimes all running in the same process.

@MichalPetryka
Copy link
Contributor Author

I think we build INPLACE_RUNTIME defined everywhere currently. !INPLACE_RUNTIME was a build configuration in ProjectN where the GC and the lowest level of the runtime (including these helpers) were build as a separate library that can be used by multiple higher-level runtimes all running in the same process.

Do we want to cleanup the code then and remove those paths later (and keep it as is for now)? Or should I also change the !INPLACE_RUNTIME path?

@jkotas
Copy link
Member

jkotas commented Mar 16, 2024

Do we want to cleanup the code then and remove those paths later (and keep it as is for now)? Or should I also change the !INPLACE_RUNTIME path?

Any of these options is fine with me.

@jkotas
Copy link
Member

jkotas commented Mar 16, 2024

Is the code for helper same or better with this change?

@MichalPetryka
Copy link
Contributor Author

Is the code for helper same or better with this change?

The codegen for normal path is almost the same (the operands for cmp are swapped but that doesn't matter).
The cold throw path is a bit bigger cause it calls the managed throw helper instead of the native one:

 G_M56901_IG04:
-       call     CORINFO_HELP_RNGCHKFAIL
-						;; size=5 bbWeight=0 PerfScore 0.00
-G_M56901_IG05:
        mov      rax, 0xD1FFAB1E      ; code for System.Runtime.CompilerServices.CastHelpers:ThrowArrayMismatchException():byref
        call     [rax]System.Runtime.CompilerServices.CastHelpers:ThrowArrayMismatchException():byref
+						;; size=12 bbWeight=0 PerfScore 0.00
+G_M56901_IG05:
+       mov      rax, 0xD1FFAB1E      ; code for System.Runtime.CompilerServices.CastHelpers:ThrowIndexOutOfRangeException()
+       call     [rax]System.Runtime.CompilerServices.CastHelpers:ThrowIndexOutOfRangeException()
        int3     
 						;; size=13 bbWeight=0 PerfScore 0.00

@MichalPetryka
Copy link
Contributor Author

@MihuBot

@MichalPetryka
Copy link
Contributor Author

@jkotas Do you know what's going on with the CI refusing to run pipelines on PRs as of recently?

@jkotas
Copy link
Member

jkotas commented Mar 16, 2024

Do you know what's going on with the CI refusing to run pipelines on PRs as of recently?

Yes, it is broken. The people responsible for our eng system are not on-call during the weekend, so it may take till Monday to resolve.

It actually runs them, just the github status is not updating. For example, here is the pipeline link for this PR
https://dev.azure.com/dnceng-public/public/_build/results?buildId=605885&view=results

@MichalPetryka
Copy link
Contributor Author

@MihuBot


#if INPLACE_RUNTIME
if ((nuint)index >= (uint)array.Length)
ThrowIndexOutOfRangeException(array);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just use ThrowHelper.ThrowIndexOutOfRangeException given that is for INPLACE_RUNTIME. It is unnecessary to go through GetClasslibException indirection for INPLACE_RUNTIME. The main point of INPLACE_RUNTIME is to avoid these indirections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to remove the !INPLACE_RUNTIME code in the next PR and to keep this like earlier for now.
Also I think that ThrowHelper is not in the Test.CoreLib (not sure what that is for tbh but right now there's a build failure here due to MemoryMarshal not being there too).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test.CoreLib is very minimal implementation of CoreLib for testing, debugging and various experiments. It is fine to add minimal implementations of MemoryMarshal or ThrowHelper to it to keep it working.

@MichalPetryka
Copy link
Contributor Author

@MihuBot

@MichalPetryka
Copy link
Contributor Author

@MihuBot

src/coreclr/vm/metasig.h Outdated Show resolved Hide resolved
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@MichalPetryka
Copy link
Contributor Author

@jkotas This caused StelemRef_Helper to not be inlined into StelemRef anymore but now StelemRef_Helper_NoCacheLookup started being inlined into it, should I bring the previous behaviour back with AggressiveInlining and NoInlining?

@jkotas
Copy link
Member

jkotas commented Mar 17, 2024

Does it affect performance of any of the fast paths?

This caused StelemRef_Helper to not be inlined into StelemRef anymore

StelemRef_Helper is marked with [MethodImpl(MethodImplOptions.NoInlining)] both before and after this change. I do not see how StelemRef_Helper could have been ever inlined.

@MichalPetryka
Copy link
Contributor Author

StelemRef_Helper is marked with [MethodImpl(MethodImplOptions.NoInlining)] both before and after this change. I do not see how StelemRef_Helper could have been ever inlined.

Ah wait nevermind, I've accidentally diffed StelemRef_Helper with StelemRef.

The non cached one being inlined is the only difference now then: MihuBot/runtime-utils#332 (comment)

@huoyaoyuan
Copy link
Member

This probably doesn't cause issues today but it might start to do so when we start inlining helpers.

I wonder what the problem would be. Is there any context?

@jkotas
Copy link
Member

jkotas commented Mar 19, 2024

This probably doesn't cause issues today but it might start to do so when we start inlining helpers.

I wonder what the problem would be. Is there any context?

Unsafe.As used by the helper code today is undefined behavior. In theory, it can interact poorly with other JIT optimizations and produce bad code. See discussion in #87374 for more details.

@jkotas
Copy link
Member

jkotas commented Mar 19, 2024

Does it affect performance of any of the fast paths?

@MichalPetryka Did you have a chance to check the perf impact of these changes?

@MichalPetryka
Copy link
Contributor Author

Did you have a chance to check the perf impact of these changes?

I'll do so today.

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Mar 19, 2024


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 9 7900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-JUCZRX : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-NEFMBN : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 8.0   : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Runtime=.NET 8.0  

Method Job Toolchain s Mean Error StdDev Ratio RatioSD Code Size Allocated Alloc Ratio
WriteNull Job-JUCZRX main ? 29.33 ns 0.076 ns 0.067 ns 1.00 0.00 480 B - NA
WriteNull Job-NEFMBN pr ? 34.87 ns 0.070 ns 0.062 ns 1.19 0.00 356 B - NA
WriteNull .NET 8.0 Default ? 34.56 ns 0.135 ns 0.127 ns 1.18 0.00 53 B - NA
WriteToObject Job-JUCZRX main 63.41 ns 0.330 ns 0.292 ns 1.00 0.00 331 B - NA
WriteToObject Job-NEFMBN pr 57.73 ns 0.282 ns 0.264 ns 0.91 0.01 356 B - NA
WriteToObject .NET 8.0 Default 63.91 ns 0.558 ns 0.494 ns 1.01 0.01 53 B - NA
WriteToInterface Job-JUCZRX main 117.00 ns 0.711 ns 0.630 ns 1.00 0.00 422 B - NA
WriteToInterface Job-NEFMBN pr 88.45 ns 1.282 ns 1.425 ns 0.75 0.01 369 B - NA
WriteToInterface .NET 8.0 Default 93.90 ns 0.541 ns 0.480 ns 0.80 0.01 53 B - NA
WriteToType Job-JUCZRX main 41.50 ns 0.684 ns 0.640 ns 1.00 0.00 60 B - NA
WriteToType Job-NEFMBN pr 40.00 ns 0.070 ns 0.058 ns 0.96 0.02 60 B - NA
WriteToType .NET 8.0 Default 39.99 ns 0.062 ns 0.048 ns 0.96 0.02 58 B - NA

@jkotas I've benchmarked with string, the paths for value seem to be better but the test with null seems to confuse PGO and have it place the null path in the slow code behind the throw helper now. Not sure if this can be fixed without JIT changes, the JIT should probably assume that throw helpers are always colder than cold paths.

Full codegen:

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteNull(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+8]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Array, IntPtr, System.Object)
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53
; System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Array, IntPtr, System.Object)
       sub       rsp,28
       mov       eax,[rcx+8]
       cmp       rdx,rax
       jae       short M01_L02
       lea       rax,[rcx+rdx*8+10]
       mov       rdx,[rcx]
       mov       rdx,[rdx+38]
       test      r8,r8
       jne       short M01_L00
       xor       ecx,ecx
       mov       [rax],rcx
       add       rsp,28
       ret
M01_L00:
       cmp       rdx,[r8]
       je        short M01_L01
       mov       r10,offset MT_System.Object[]
       cmp       [rcx],r10
       je        short M01_L01
       mov       rcx,rax
       add       rsp,28
       jmp       qword ptr [7FFB68CB4498]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
M01_L01:
       mov       rcx,rax
       mov       rdx,r8
       add       rsp,28
       jmp       near ptr System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
M01_L02:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3
; Total bytes of code 93
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rdi,rcx
       mov       rbx,rdx
       mov       rsi,r8
       mov       rcx,290F9800B80
       mov       rcx,[rcx]
       mov       rax,[rsi]
       add       rcx,10
       rorx      rdx,rax,20
       xor       rdx,rbx
       mov       r8,9E3779B97F4A7C15
       imul      rdx,r8
       mov       r8d,[rcx]
       shrx      rdx,rdx,r8
       xor       r8d,r8d
M02_L00:
       lea       r10d,[rdx+1]
       movsxd    r10,r10d
       lea       r10,[r10+r10*2]
       lea       r10,[rcx+r10*8]
       mov       r9d,[r10]
       mov       r11,[r10+8]
       and       r9d,0FFFFFFFE
       cmp       r11,rax
       jne       short M02_L01
       mov       r11,rbx
       xor       r11,[r10+10]
       cmp       r11,1
       jbe       short M02_L02
M02_L01:
       test      r9d,r9d
       je        short M02_L03
       inc       r8d
       add       edx,r8d
       and       edx,[rcx+4]
       cmp       r8d,8
       jl        short M02_L00
       jmp       short M02_L03
M02_L02:
       cmp       r9d,[r10]
       jne       short M02_L03
       cmp       r11d,1
       jne       short M02_L03
       mov       rcx,7FFB69DC54C8
       call      CORINFO_HELP_COUNTPROFILE32
       mov       rcx,rdi
       mov       rdx,rsi
       call      System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L03:
       mov       rcx,7FFB69DC54CC
       call      CORINFO_HELP_COUNTPROFILE32
       mov       rcx,rdi
       mov       rdx,rbx
       mov       r8,rsi
       call      qword ptr [7FFB68CB44B0]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper_NoCacheLookup(System.Object ByRef, Void*, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 221
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper_NoCacheLookup(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rsi,rcx
       mov       rdi,rdx
       mov       rbx,r8
       test      rbx,rbx
       jne       short M03_L00
       mov       rdx,2D18E150008
       mov       rcx,rdx
       call      qword ptr [7FFB68D1D3B0]
M03_L00:
       mov       rcx,rdi
       mov       rdx,rbx
       call      System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)
       mov       rbx,rax
       test      rbx,rbx
       je        short M03_L01
       mov       rcx,rsi
       mov       rdx,rbx
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       jmp       near ptr System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
M03_L01:
       mov       rcx,offset MT_System.ArrayTypeMismatchException
       call      CORINFO_HELP_NEWSFAST
       mov       rbx,rax
       mov       rcx,rbx
       call      qword ptr [7FFB69827C78]
       mov       rcx,rbx
       call      CORINFO_HELP_THROW
       int       3
; Total bytes of code 113

Extern method
System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteNull(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+8]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Object[], IntPtr, System.Object)
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53
; System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Object[], IntPtr, System.Object)
       sub       rsp,28
       mov       eax,[rcx+8]
       cmp       rax,rdx
       jbe       short M01_L01
       lea       rax,[rcx+rdx*8+10]
       mov       rdx,[rcx]
       mov       rdx,[rdx+30]
       test      r8,r8
       jne       short M01_L02
       xor       ecx,ecx
       mov       [rax],rcx
       add       rsp,28
       ret
M01_L00:
       mov       r10,offset MT_System.Object[]
       cmp       [rcx],r10
       je        short M01_L03
       mov       rcx,rax
       add       rsp,28
       jmp       qword ptr [7FFB6BDB44B0]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
M01_L01:
       call      qword ptr [7FFB6BDB4450]
       int       3
M01_L02:
       cmp       rdx,[r8]
       jne       short M01_L00
M01_L03:
       mov       rcx,rax
       mov       rdx,r8
       add       rsp,28
       jmp       near ptr System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
; Total bytes of code 94
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rdi,rcx
       mov       rbx,rdx
       mov       rsi,r8
       call      qword ptr [7FFBC8B31840]
       mov       rdx,[rax+0B50]
       mov       rax,[rsi]
       add       rdx,10
       mov       r8,rax
       rol       r8,20
       xor       r8,rbx
       mov       r10,9E3779B97F4A7C15
       imul      r8,r10
       mov       ecx,[rdx]
       shr       r8,cl
       xor       ecx,ecx
M02_L00:
       lea       r10d,[r8+1]
       movsxd    r10,r10d
       lea       r10,[r10+r10*2]
       lea       r10,[rdx+r10*8]
       mov       r9d,[r10]
       mov       r11,[r10+8]
       and       r9d,0FFFFFFFE
       cmp       r11,rax
       jne       short M02_L01
       mov       r11,rbx
       xor       r11,[r10+10]
       cmp       r11,1
       jbe       short M02_L02
M02_L01:
       test      r9d,r9d
       je        short M02_L03
       inc       ecx
       add       r8d,ecx
       and       r8d,[rdx+4]
       cmp       ecx,8
       jl        short M02_L00
       jmp       short M02_L03
M02_L02:
       cmp       r9d,[r10]
       jne       short M02_L03
       cmp       r11d,1
       jne       short M02_L03
       mov       rcx,rdi
       mov       rdx,rsi
       call      qword ptr [7FFBC8B4A2A8]; System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L03:
       mov       rcx,rbx
       mov       rdx,rsi
       call      qword ptr [7FFBC8B4A288]; System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)
       test      rax,rax
       je        short M02_L04
       mov       rcx,rdi
       mov       rdx,rax
       call      qword ptr [7FFBC8B4A2A8]; System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L04:
       call      qword ptr [7FFBC8B4A2E0]
       int       3
; Total bytes of code 209

Extern method
System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)

.NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteNull(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+8]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      CORINFO_HELP_ARRADDR_ST
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToObject(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+8]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Array, IntPtr, System.Object)
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53
; System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Array, IntPtr, System.Object)
       sub       rsp,28
       mov       eax,[rcx+8]
       cmp       rdx,rax
       jae       short M01_L02
       lea       rax,[rcx+rdx*8+10]
       mov       rdx,[rcx]
       mov       rdx,[rdx+38]
       test      r8,r8
       je        short M01_L01
       cmp       rdx,[r8]
       je        short M01_L00
       mov       r10,offset MT_System.Object[]
       cmp       [rcx],r10
       jne       short M01_L03
M01_L00:
       mov       rcx,rax
       mov       rdx,r8
       add       rsp,28
       jmp       near ptr System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
M01_L01:
       xor       ecx,ecx
       mov       [rax],rcx
       add       rsp,28
       ret
M01_L02:
       call      CORINFO_HELP_RNGCHKFAIL
M01_L03:
       mov       rcx,rax
       add       rsp,28
       jmp       qword ptr [7FFB68CD4498]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
; Total bytes of code 92
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rdi,rcx
       mov       rbx,rdx
       mov       rsi,r8
       call      qword ptr [7FFBC7912090]
       mov       rdx,[rax+0B48]
       mov       r8,[rsi]
       add       rdx,10
       mov       rax,r8
       rol       rax,20
       xor       rax,rbx
       mov       r10,9E3779B97F4A7C15
       imul      rax,r10
       mov       ecx,[rdx]
       shr       rax,cl
       xor       ecx,ecx
M02_L00:
       lea       r10d,[rax+1]
       movsxd    r10,r10d
       lea       r10,[r10+r10*2]
       lea       r10,[rdx+r10*8]
       mov       r9d,[r10]
       mov       r11,[r10+8]
       and       r9d,0FFFFFFFE
       cmp       r11,r8
       jne       short M02_L01
       mov       r11,rbx
       xor       r11,[r10+10]
       cmp       r11,1
       jbe       short M02_L02
M02_L01:
       test      r9d,r9d
       je        short M02_L03
       inc       ecx
       add       eax,ecx
       and       eax,[rdx+4]
       cmp       ecx,8
       jl        short M02_L00
       jmp       short M02_L03
M02_L02:
       cmp       r9d,[r10]
       jne       short M02_L03
       cmp       r11d,1
       jne       short M02_L03
       mov       rcx,rdi
       mov       rdx,rsi
       call      qword ptr [7FFBC792CD88]; System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L03:
       mov       rcx,rdi
       mov       rdx,rbx
       mov       r8,rsi
       call      qword ptr [7FFBC792CDC8]; Precode of System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper_NoCacheLookup(System.Object ByRef, Void*, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 186

Extern method
System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToObject(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+8]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Object[], IntPtr, System.Object)
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53
; System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Object[], IntPtr, System.Object)
       sub       rsp,28
       mov       eax,[rcx+8]
       cmp       rax,rdx
       jbe       short M01_L02
       lea       rax,[rcx+rdx*8+10]
       mov       rdx,[rcx]
       mov       rdx,[rdx+30]
       test      r8,r8
       je        short M01_L01
       cmp       rdx,[r8]
       je        short M01_L00
       mov       r10,offset MT_System.Object[]
       cmp       [rcx],r10
       jne       short M01_L03
M01_L00:
       mov       rcx,rax
       mov       rdx,r8
       add       rsp,28
       jmp       near ptr System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
M01_L01:
       xor       ecx,ecx
       mov       [rax],rcx
       add       rsp,28
       ret
M01_L02:
       call      qword ptr [7FFB68CA4450]
       int       3
M01_L03:
       mov       rcx,rax
       add       rsp,28
       jmp       qword ptr [7FFB68CA44B0]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
; Total bytes of code 94
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rdi,rcx
       mov       rbx,rdx
       mov       rsi,r8
       call      qword ptr [7FFBC8661840]
       mov       rdx,[rax+0B50]
       mov       rax,[rsi]
       add       rdx,10
       mov       r8,rax
       rol       r8,20
       xor       r8,rbx
       mov       r10,9E3779B97F4A7C15
       imul      r8,r10
       mov       ecx,[rdx]
       shr       r8,cl
       xor       ecx,ecx
M02_L00:
       lea       r10d,[r8+1]
       movsxd    r10,r10d
       lea       r10,[r10+r10*2]
       lea       r10,[rdx+r10*8]
       mov       r9d,[r10]
       mov       r11,[r10+8]
       and       r9d,0FFFFFFFE
       cmp       r11,rax
       jne       short M02_L01
       mov       r11,rbx
       xor       r11,[r10+10]
       cmp       r11,1
       jbe       short M02_L02
M02_L01:
       test      r9d,r9d
       je        short M02_L03
       inc       ecx
       add       r8d,ecx
       and       r8d,[rdx+4]
       cmp       ecx,8
       jl        short M02_L00
       jmp       short M02_L03
M02_L02:
       cmp       r9d,[r10]
       jne       short M02_L03
       cmp       r11d,1
       jne       short M02_L03
       mov       rcx,rdi
       mov       rdx,rsi
       call      qword ptr [7FFBC867A2A8]; System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L03:
       mov       rcx,rbx
       mov       rdx,rsi
       call      qword ptr [7FFBC867A288]; System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)
       test      rax,rax
       je        short M02_L04
       mov       rcx,rdi
       mov       rdx,rax
       call      qword ptr [7FFBC867A2A8]; System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L04:
       call      qword ptr [7FFBC867A2E0]
       int       3
; Total bytes of code 209

Extern method
System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)

.NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToObject(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+8]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      CORINFO_HELP_ARRADDR_ST
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToInterface(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+10]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Array, IntPtr, System.Object)
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53
; System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Array, IntPtr, System.Object)
       sub       rsp,28
       mov       eax,[rcx+8]
       cmp       rdx,rax
       jae       short M01_L00
       lea       rax,[rcx+rdx*8+10]
       mov       rdx,[rcx]
       mov       rdx,[rdx+38]
       test      r8,r8
       je        short M01_L02
       cmp       rdx,[r8]
       je        short M01_L01
       mov       r10,offset MT_System.Object[]
       cmp       [rcx],r10
       je        short M01_L01
       mov       rcx,rax
       add       rsp,28
       jmp       qword ptr [7FFB68CD4498]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
M01_L00:
       call      CORINFO_HELP_RNGCHKFAIL
M01_L01:
       mov       rcx,rax
       mov       rdx,r8
       add       rsp,28
       jmp       near ptr System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
M01_L02:
       xor       ecx,ecx
       mov       [rax],rcx
       add       rsp,28
       ret
; Total bytes of code 92
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rax,25A67800B80
       mov       rax,[rax]
       mov       r10,[r8]
       add       rax,10
       rorx      r9,r10,20
       xor       r9,rdx
       mov       r11,9E3779B97F4A7C15
       imul      r9,r11
       mov       r11d,[rax]
       shrx      r9,r9,r11
       xor       r11d,r11d
M02_L00:
       lea       ebx,[r9+1]
       movsxd    rbx,ebx
       lea       rbx,[rbx+rbx*2]
       lea       rbx,[rax+rbx*8]
       mov       esi,[rbx]
       mov       rdi,[rbx+8]
       and       esi,0FFFFFFFE
       cmp       rdi,r10
       jne       short M02_L01
       mov       rdi,rdx
       xor       rdi,[rbx+10]
       cmp       rdi,1
       jbe       short M02_L02
M02_L01:
       test      esi,esi
       je        short M02_L03
       inc       r11d
       add       r9d,r11d
       and       r9d,[rax+4]
       cmp       r11d,8
       jl        short M02_L00
       jmp       short M02_L03
M02_L02:
       cmp       esi,[rbx]
       jne       short M02_L03
       cmp       edi,1
       jne       short M02_L03
       mov       rdx,r8
       call      System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L03:
       call      qword ptr [7FFB68CD44B0]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper_NoCacheLookup(System.Object ByRef, Void*, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 166
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper_NoCacheLookup(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rsi,rcx
       mov       rdi,rdx
       mov       rbx,r8
       test      rbx,rbx
       jne       short M03_L00
       mov       rcx,[7FFBC79636C8]
       mov       rdx,[rcx]
       mov       rcx,rdx
       call      qword ptr [7FFBC79300B8]
M03_L00:
       mov       rcx,rdi
       mov       rdx,rbx
       call      qword ptr [7FFBC792CD68]; System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)
       mov       rbx,rax
       test      rbx,rbx
       je        short M03_L01
       mov       rdx,rbx
       mov       rcx,rsi
       lea       rax,[7FFBC792CD90]
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       jmp       qword ptr [rax]
M03_L01:
       call      qword ptr [7FFBC791B5E8]
       mov       rbx,rax
       mov       rcx,rbx
       call      qword ptr [7FFBC7924598]
       mov       rcx,rbx
       call      qword ptr [7FFBC7911760]; CORINFO_HELP_THROW
       int       3
; Total bytes of code 111

Extern method
System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToInterface(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+10]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Object[], IntPtr, System.Object)
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53
; System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Object[], IntPtr, System.Object)
       sub       rsp,28
       mov       eax,[rcx+8]
       cmp       rax,rdx
       jbe       short M01_L00
       lea       rax,[rcx+rdx*8+10]
       mov       rdx,[rcx]
       mov       rdx,[rdx+30]
       test      r8,r8
       je        short M01_L02
       cmp       rdx,[r8]
       je        short M01_L01
       mov       r10,offset MT_System.Object[]
       cmp       [rcx],r10
       je        short M01_L01
       mov       rcx,rax
       add       rsp,28
       jmp       qword ptr [7FFB68CC44B0]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
M01_L00:
       call      qword ptr [7FFB68CC4450]
       int       3
M01_L01:
       mov       rcx,rax
       mov       rdx,r8
       add       rsp,28
       jmp       near ptr System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
M01_L02:
       xor       ecx,ecx
       mov       [rax],rcx
       add       rsp,28
       ret
; Total bytes of code 94
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper(System.Object ByRef, Void*, System.Object)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rax,21C53C00B88
       mov       rax,[rax]
       mov       r10,[r8]
       add       rax,10
       rorx      r9,r10,20
       xor       r9,rdx
       mov       r11,9E3779B97F4A7C15
       imul      r9,r11
       mov       r11d,[rax]
       shrx      r9,r9,r11
       xor       r11d,r11d
M02_L00:
       lea       ebx,[r9+1]
       movsxd    rbx,ebx
       lea       rbx,[rbx+rbx*2]
       lea       rbx,[rax+rbx*8]
       mov       esi,[rbx]
       mov       rdi,[rbx+8]
       and       esi,0FFFFFFFE
       cmp       rdi,r10
       jne       short M02_L01
       mov       rdi,rdx
       xor       rdi,[rbx+10]
       cmp       rdi,1
       jbe       short M02_L02
M02_L01:
       test      esi,esi
       je        short M02_L03
       inc       r11d
       add       r9d,r11d
       and       r9d,[rax+4]
       cmp       r11d,8
       jl        short M02_L00
       jmp       short M02_L03
M02_L02:
       cmp       esi,[rbx]
       jne       short M02_L03
       cmp       edi,1
       jne       short M02_L03
       mov       rdx,r8
       call      System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M02_L03:
       call      qword ptr [7FFB68CC44C8]; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper_NoCacheLookup(System.Object ByRef, Void*, System.Object)
       nop
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 166
; System.Runtime.CompilerServices.CastHelpers.StelemRef_Helper_NoCacheLookup(System.Object ByRef, Void*, System.Object)
       push      rbx
       sub       rsp,20
       mov       rbx,rcx
       mov       rax,r8
       mov       rcx,rdx
       mov       rdx,rax
       call      qword ptr [7FFBC867A288]; System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)
       test      rax,rax
       je        short M03_L00
       mov       rdx,rax
       mov       rcx,rbx
       lea       rax,[7FFBC867A2B0]
       add       rsp,20
       pop       rbx
       jmp       qword ptr [rax]
M03_L00:
       call      qword ptr [7FFBC867A2E0]
       int       3
; Total bytes of code 56

Extern method
System.Runtime.CompilerServices.CastHelpers.WriteBarrier(System.Object ByRef, System.Object)
System.Runtime.CompilerServices.CastHelpers.IsInstanceOfAny_NoCacheLookup(Void*, System.Object)

.NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToInterface(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+10]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
M00_L00:
       movsxd    rdx,edi
       mov       rcx,rsi
       mov       r8,rbx
       call      CORINFO_HELP_ARRADDR_ST
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 53

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToType(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+18]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
       nop       dword ptr [rax+rax]
M00_L00:
       lea       rcx,[rsi+rdi*8+10]
       mov       rdx,rbx
       call      CORINFO_HELP_ASSIGN_REF
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 60

.NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToType(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       rbx,rdx
       mov       rsi,[rcx+18]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
       nop       dword ptr [rax+rax]
M00_L00:
       lea       rcx,[rsi+rdi*8+10]
       mov       rdx,rbx
       call      CORINFO_HELP_ASSIGN_REF
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 60

.NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; ConsoleApp7.Benchmark.WriteToType(System.String)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       mov       rbx,rdx
       mov       rsi,[rcx+18]
       xor       edi,edi
       mov       ebp,[rsi+8]
       test      ebp,ebp
       jle       short M00_L01
       nop       dword ptr [rax]
       nop       dword ptr [rax+rax]
M00_L00:
       mov       ecx,edi
       lea       rcx,[rsi+rcx*8+10]
       mov       rdx,rbx
       call      CORINFO_HELP_ASSIGN_REF
       inc       edi
       cmp       ebp,edi
       jg        short M00_L00
M00_L01:
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 58

@jkotas
Copy link
Member

jkotas commented Mar 20, 2024

I agree that the cold blocks ordering with PGO can be better in some case. I do not see any material code difference caused by this change. The exact code for these helpers can differ from workload to workload and from run to run depending on how PGO rolls the dice.

I have pushed a commit with a minor cleanup as I was looking at the code.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants