Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of iterating with ForEach over ImmutableArray #1183

Merged
merged 2 commits into from
Dec 27, 2019

Conversation

hnrqbaggio
Copy link
Contributor

Fixes: #780.

Use AggressiveInline on ImmutableArray<T>.GetEnumerator so the JIT can optimize the foreach loops to be closer to the code when using an simple T[].

The benchmark for String doesn't show the same improvement because it is coded as a generic class, and the JIT can't apply the same optimization on a shared generic method called inside another shared generic class. But it's possible to see the same performance gain than the Int32 case in a non-generic benchmark.

Based on the reported assembly from the benchmark, the code size doesn't increase even when inlining the method call, because the JIT is able to optimize the loop further and make it smaller (see the discussion on the issue for more details).

Benchmark results, including a non-generic version for comparison.

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61901, CoreFX 5.0.19.61901), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Type Method Size Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated BranchInstructions/Op BranchMispredictions/Op CacheMisses/Op
IterateForEach<Int32> Array 512 369.9 ns 15.74 ns 16.84 ns 364.3 ns 351.5 ns 404.8 ns - - - - 519 1 0
IterateForEach<String> Array 512 305.1 ns 10.97 ns 12.63 ns 304.9 ns 288.7 ns 334.0 ns - - - - 518 1 0
IterateForEach_Int32 Array 512 288.5 ns 25.03 ns 28.83 ns 272.9 ns 254.4 ns 331.7 ns - - - - 517 1 0
IterateForEach_String Array 512 317.7 ns 25.76 ns 29.67 ns 321.8 ns 251.4 ns 371.4 ns - - - - 517 1 0
IterateForEach<Int32> ImmutableArray 512 396.7 ns 9.58 ns 10.25 ns 395.1 ns 378.2 ns 419.2 ns - - - - 1,031 1 0
IterateForEach<String> ImmutableArray 512 1,531.5 ns 91.32 ns 93.78 ns 1,514.5 ns 1,439.6 ns 1,823.3 ns - - - - 1,059 2 1
IterateForEach_Int32 ImmutableArray 512 314.8 ns 28.15 ns 32.42 ns 303.5 ns 270.7 ns 359.0 ns - - - - 1,030 1 0
IterateForEach_String ImmutableArray 512 421.9 ns 8.41 ns 9.35 ns 423.7 ns 408.8 ns 437.3 ns - - - - 1,031 1 0

cc @adamsitnik, @danmosemsft, @AndyAyersMS

Henrique Fernandes Baggio added 2 commits December 21, 2019 14:05
Analysis of the generated ASM code vs the same benchmark for Array shows
that the GetEnumerator call is not being inlined (the loop itself is).

In the case of the ValueType ImmutableArray.GetEnumerator method,
there's a call to ThrowNullRefIfNotInitialized for validation.

By adding MethodImplAttribute(MethodImplOptions.AggressiveInlining)
to both methods, we are able to force the JIT to inline the call and get
similar results in the benchmark.
Looking at the hardware counters collected in the benchmark, there are less
CacheMisses and BranchMispredictions/Op when the inlining happens.

Unfortunately, the same fix didn't seem to work for the other overloads of
GetEnumerator, for the explicit generic implementation. That still needs
more investigation.
@Wraith2
Copy link
Contributor

Wraith2 commented Dec 27, 2019

I believe the JIT should make inlining decisions wherever possible. Use of the AgressiveInliining hint turns up a lot in benchmarking where it can be shown to improve those individual cases but don't show the larger picture of increased code sizes in consumers.

@jkotas
Copy link
Member

jkotas commented Dec 27, 2019

Use of the AgressiveInliining hint turns up a lot in benchmarking where it can be shown to improve those individual cases but don't show the larger picture of increased code sizes in consumers.

Agree in general. As discussed in the issue, this specific use of AggressiveInlining makes the code smaller. For example, foreach loop over ImmutableArray<int> is typically 30+ bytes smaller with this change on x64.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jkotas jkotas merged commit 7ae48cf into dotnet:master Dec 27, 2019
@hnrqbaggio hnrqbaggio deleted the iterateforeach_immutablearray branch December 28, 2019 19:07
@adamsitnik adamsitnik added the tenet-performance Performance related issue label Jan 22, 2020
@adamsitnik
Copy link
Member

@hnrqbaggio big thanks for the PR! I've missed it during the winter holidays (I was offline for 2+ weeks) but I just wanted to say that you have performed a very impressive investigation using available docs and tools and the performance improvement is really great. Once again thanks!

@adamsitnik adamsitnik changed the title Improve ForEach micro-benchmark for ImmutableArray Improve performance of iterating with ForEach over ImmutableArray Jan 22, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iterating with ForEach over ImmutableArray is slower than over Array
5 participants