Improve performance of iterating with ForEach over ImmutableArray #1183

hnrqbaggio · 2019-12-27T03:26:34Z

Fixes: #780.

Use AggressiveInline on ImmutableArray<T>.GetEnumerator so the JIT can optimize the foreach loops to be closer to the code when using an simple T[].

The benchmark for String doesn't show the same improvement because it is coded as a generic class, and the JIT can't apply the same optimization on a shared generic method called inside another shared generic class. But it's possible to see the same performance gain than the Int32 case in a non-generic benchmark.

Based on the reported assembly from the benchmark, the code size doesn't increase even when inlining the method call, because the JIT is able to optimize the loop further and make it smaller (see the discussion on the issue for more details).

Benchmark results, including a non-generic version for comparison.

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61901, CoreFX 5.0.19.61901), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Type	Method	Size	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated	BranchInstructions/Op	BranchMispredictions/Op	CacheMisses/Op
IterateForEach<Int32>	Array	512	369.9 ns	15.74 ns	16.84 ns	364.3 ns	351.5 ns	404.8 ns	-	-	-	-	519	1	0
IterateForEach<String>	Array	512	305.1 ns	10.97 ns	12.63 ns	304.9 ns	288.7 ns	334.0 ns	-	-	-	-	518	1	0
IterateForEach_Int32	Array	512	288.5 ns	25.03 ns	28.83 ns	272.9 ns	254.4 ns	331.7 ns	-	-	-	-	517	1	0
IterateForEach_String	Array	512	317.7 ns	25.76 ns	29.67 ns	321.8 ns	251.4 ns	371.4 ns	-	-	-	-	517	1	0
IterateForEach<Int32>	ImmutableArray	512	396.7 ns	9.58 ns	10.25 ns	395.1 ns	378.2 ns	419.2 ns	-	-	-	-	1,031	1	0
IterateForEach<String>	ImmutableArray	512	1,531.5 ns	91.32 ns	93.78 ns	1,514.5 ns	1,439.6 ns	1,823.3 ns	-	-	-	-	1,059	2	1
IterateForEach_Int32	ImmutableArray	512	314.8 ns	28.15 ns	32.42 ns	303.5 ns	270.7 ns	359.0 ns	-	-	-	-	1,030	1	0
IterateForEach_String	ImmutableArray	512	421.9 ns	8.41 ns	9.35 ns	423.7 ns	408.8 ns	437.3 ns	-	-	-	-	1,031	1	0

cc @adamsitnik, @danmosemsft, @AndyAyersMS

Analysis of the generated ASM code vs the same benchmark for Array shows that the GetEnumerator call is not being inlined (the loop itself is). In the case of the ValueType ImmutableArray.GetEnumerator method, there's a call to ThrowNullRefIfNotInitialized for validation. By adding MethodImplAttribute(MethodImplOptions.AggressiveInlining) to both methods, we are able to force the JIT to inline the call and get similar results in the benchmark. Looking at the hardware counters collected in the benchmark, there are less CacheMisses and BranchMispredictions/Op when the inlining happens. Unfortunately, the same fix didn't seem to work for the other overloads of GetEnumerator, for the explicit generic implementation. That still needs more investigation.

…ne GetEnumerator

Wraith2 · 2019-12-27T09:04:03Z

I believe the JIT should make inlining decisions wherever possible. Use of the AgressiveInliining hint turns up a lot in benchmarking where it can be shown to improve those individual cases but don't show the larger picture of increased code sizes in consumers.

jkotas · 2019-12-27T17:00:23Z

Use of the AgressiveInliining hint turns up a lot in benchmarking where it can be shown to improve those individual cases but don't show the larger picture of increased code sizes in consumers.

Agree in general. As discussed in the issue, this specific use of AggressiveInlining makes the code smaller. For example, foreach loop over ImmutableArray<int> is typically 30+ bytes smaller with this change on x64.

jkotas

Thank you!

adamsitnik · 2020-01-22T10:51:14Z

@hnrqbaggio big thanks for the PR! I've missed it during the winter holidays (I was offline for 2+ weeks) but I just wanted to say that you have performed a very impressive investigation using available docs and tools and the performance improvement is really great. Once again thanks!

Henrique Fernandes Baggio added 2 commits December 21, 2019 14:05

AggressiveInline in ThrowNullRefIfNotInitialized isn't needed to inli…

b277a2f

…ne GetEnumerator

Dotnet-GitSync-Bot added the area-System.Collections label Dec 27, 2019

jkotas approved these changes Dec 27, 2019

View reviewed changes

jkotas merged commit 7ae48cf into dotnet:master Dec 27, 2019

AndyAyersMS mentioned this pull request Dec 28, 2019

Iterating with ForEach over ImmutableArray is slower than over Array #780

Closed

hnrqbaggio deleted the iterateforeach_immutablearray branch December 28, 2019 19:07

adamsitnik added the tenet-performance Performance related issue label Jan 22, 2020

adamsitnik changed the title ~~Improve ForEach micro-benchmark for ImmutableArray~~ Improve performance of iterating with ForEach over ImmutableArray Jan 22, 2020

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of iterating with ForEach over ImmutableArray #1183

Improve performance of iterating with ForEach over ImmutableArray #1183

hnrqbaggio commented Dec 27, 2019

Wraith2 commented Dec 27, 2019

jkotas commented Dec 27, 2019

jkotas left a comment

adamsitnik commented Jan 22, 2020

Improve performance of iterating with ForEach over ImmutableArray #1183

Improve performance of iterating with ForEach over ImmutableArray #1183

Conversation

hnrqbaggio commented Dec 27, 2019

Wraith2 commented Dec 27, 2019

jkotas commented Dec 27, 2019

jkotas left a comment

Choose a reason for hiding this comment

adamsitnik commented Jan 22, 2020