Skip to content

perf: mark RuntimeHelpers.GetSubArray with AggressiveInlining#127505

Closed
unsafePtr wants to merge 1 commit intodotnet:mainfrom
unsafePtr:perf/get-sub-array-aggressive-inline
Closed

perf: mark RuntimeHelpers.GetSubArray with AggressiveInlining#127505
unsafePtr wants to merge 1 commit intodotnet:mainfrom
unsafePtr:perf/get-sub-array-aggressive-inline

Conversation

@unsafePtr
Copy link
Copy Markdown
Contributor

Summary

RuntimeHelpers.GetSubArray<T> is emitted by the C# compiler for every array[x..y] range expression — it is never called directly by user code. Despite being a compiler-generated call, it currently exceeds the JIT's default inline budget (~119 IL bytes) and is never inlined, which prevents the JIT from using constant information available at the call site.

Problem

When GetSubArray is not inlined, the JIT compiles it as a standalone method where length is an opaque runtime value. Even when the range size is a compile-time constant — which is common for slicing patterns like arr[offset..(offset + chunkSize)] — the JIT cannot fold any of the internal branches.

Without inlining, the call site looks like this:

mov  rdx, 0x3200000000          ; Range(0..50) — compiler already folded it to a constant
call RuntimeHelpers:GetSubArray[int]   ; but that constant stops here

Inside the standalone GetSubArray[int], length is just a register — branches cannot be eliminated.

Fix

Adding [AggressiveInlining] allows the JIT to inline GetSubArray into the caller. Once inlined, the JIT can propagate constants from the call site through GetOffsetAndLength and fold the internal branches.

What gets eliminated when the range size is a constant (arr[offset..(offset+50)]):

; typeof(T[]) == array.GetType() → eliminated when array type is known
; length == 0 → eliminated (50 != 0)
; Memmove byte count → constant-folded (50 * sizeof(int) = 200)
mov  edx, 50
call CORINFO_HELP_NEWARR_1_VC
mov  r8d, 200
call Memmove

The range size being constant is enough — the array itself can be dynamic. arr[i..(i+50)] where i is a variable still produces length = 50 after GetOffsetAndLength is inlined, so all the same folds apply.

The only case that does not benefit from constant folding is a fully dynamic range: arr[i..j] where both bounds are variables. Those call sites still gain from eliminating the function call overhead.

Future work

Since the array returned by GetSubArray is immediately and completely overwritten by Buffer.Memmove, zero-initializing it first is redundant for unmanaged types. Replacing new T[length] with GC.AllocateUninitializedArray<T>(length) would skip that work for large arrays. With [AggressiveInlining] already in place, the threshold branch inside AllocateUninitializedArray (length < 2048 / sizeof(T)) would also constant-fold away at call sites with known range sizes — producing a single direct allocation path with no branches at all. That change is worth pursuing separately once this lands.

@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 28, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@unsafePtr
Copy link
Copy Markdown
Contributor Author

unsafePtr commented Apr 28, 2026

Code examples

Case 1: fixed-size chunk slicing (dynamic array, constant range size)

// arr is a parameter — length unknown to JIT
// but the range size (50) is a constant
T[] Slice(T[] arr, int offset) => arr[offset..(offset + 50)];

BeforeGetSubArray not inlined, constants at the call site cannot propagate into the method body:

mov  rdx, 0x3200000000          ; Range(0..50) — compiler folded it, but it stops at the call boundary
call RuntimeHelpers:GetSubArray[int]

Inside the standalone GetSubArray[int], length is just a register. The type check and zero-length check are both runtime branches:

mov  rcx, 0x7FFBB615B940        ; typeof(int[]) — constant
cmp  qword ptr [rbx], rcx       ; array.GetType() — runtime read, branch stays
je   exact_type_path

test edi, edi                    ; length == 0 — runtime, branch stays
je   return_empty

movsxd rdx, edi
call CORINFO_HELP_NEWARR_1_VC    ; new T[length]

After — inlined, constants propagate through GetOffsetAndLength:

; typeof check → eliminated (exact type known at call site)
; length == 0 → eliminated (50 != 0, constant)
; Memmove byte count → constant-folded to 200 (50 * sizeof(int))
mov  edx, 50
call CORINFO_HELP_NEWARR_1_VC
mov  r8d, 200
call Memmove

Case 2: fully constant (local array + literal range)

var src = new int[100];
var slice = src[0..50];

Same result as above.


Note on GC.AllocateUninitializedArray

The array returned by GetSubArray is immediately and completely overwritten by Buffer.Memmove, so zero-initializing it first is wasted work for unmanaged types. Replacing new T[length] with GC.AllocateUninitializedArray<T>(length) would skip that. The two changes are complementary — worth pursuing as a follow-up once this lands.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 28, 2026

This is slow convenience API. Do you have an example of a real-world code where the inlining helps to justify the code bloat?

@unsafePtr
Copy link
Copy Markdown
Contributor Author

Right, if it's rarely used, there is no point to look into it at all. That feeling when you think you found an actual improvement, but it's actually not 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants