perf: mark RuntimeHelpers.GetSubArray with AggressiveInlining#127505
perf: mark RuntimeHelpers.GetSubArray with AggressiveInlining#127505unsafePtr wants to merge 1 commit intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-runtime |
Code examplesCase 1: fixed-size chunk slicing (dynamic array, constant range size)// arr is a parameter — length unknown to JIT
// but the range size (50) is a constant
T[] Slice(T[] arr, int offset) => arr[offset..(offset + 50)];Before — mov rdx, 0x3200000000 ; Range(0..50) — compiler folded it, but it stops at the call boundary
call RuntimeHelpers:GetSubArray[int]Inside the standalone mov rcx, 0x7FFBB615B940 ; typeof(int[]) — constant
cmp qword ptr [rbx], rcx ; array.GetType() — runtime read, branch stays
je exact_type_path
test edi, edi ; length == 0 — runtime, branch stays
je return_empty
movsxd rdx, edi
call CORINFO_HELP_NEWARR_1_VC ; new T[length]After — inlined, constants propagate through ; typeof check → eliminated (exact type known at call site)
; length == 0 → eliminated (50 != 0, constant)
; Memmove byte count → constant-folded to 200 (50 * sizeof(int))
mov edx, 50
call CORINFO_HELP_NEWARR_1_VC
mov r8d, 200
call MemmoveCase 2: fully constant (local array + literal range)var src = new int[100];
var slice = src[0..50];Same result as above. Note on
|
|
This is slow convenience API. Do you have an example of a real-world code where the inlining helps to justify the code bloat? |
|
Right, if it's rarely used, there is no point to look into it at all. That feeling when you think you found an actual improvement, but it's actually not 😄 |
Summary
RuntimeHelpers.GetSubArray<T>is emitted by the C# compiler for everyarray[x..y]range expression — it is never called directly by user code. Despite being a compiler-generated call, it currently exceeds the JIT's default inline budget (~119 IL bytes) and is never inlined, which prevents the JIT from using constant information available at the call site.Problem
When
GetSubArrayis not inlined, the JIT compiles it as a standalone method wherelengthis an opaque runtime value. Even when the range size is a compile-time constant — which is common for slicing patterns likearr[offset..(offset + chunkSize)]— the JIT cannot fold any of the internal branches.Without inlining, the call site looks like this:
Inside the standalone
GetSubArray[int],lengthis just a register — branches cannot be eliminated.Fix
Adding
[AggressiveInlining]allows the JIT to inlineGetSubArrayinto the caller. Once inlined, the JIT can propagate constants from the call site throughGetOffsetAndLengthand fold the internal branches.What gets eliminated when the range size is a constant (
arr[offset..(offset+50)]):The range size being constant is enough — the array itself can be dynamic.
arr[i..(i+50)]whereiis a variable still produceslength = 50afterGetOffsetAndLengthis inlined, so all the same folds apply.The only case that does not benefit from constant folding is a fully dynamic range:
arr[i..j]where both bounds are variables. Those call sites still gain from eliminating the function call overhead.Future work
Since the array returned by
GetSubArrayis immediately and completely overwritten byBuffer.Memmove, zero-initializing it first is redundant for unmanaged types. Replacingnew T[length]withGC.AllocateUninitializedArray<T>(length)would skip that work for large arrays. With[AggressiveInlining]already in place, the threshold branch insideAllocateUninitializedArray(length < 2048 / sizeof(T)) would also constant-fold away at call sites with known range sizes — producing a single direct allocation path with no branches at all. That change is worth pursuing separately once this lands.