Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 5.0 Microbenchmarks Performance Study Report #41871

Closed
17 of 21 tasks
adamsitnik opened this issue Sep 4, 2020 · 12 comments
Closed
17 of 21 tasks

.NET 5.0 Microbenchmarks Performance Study Report #41871

adamsitnik opened this issue Sep 4, 2020 · 12 comments
Labels
area-Meta discussion tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark tracking This issue is tracking the completion of other related issues.
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented Sep 4, 2020

Goals

The main goal of my study was to ensure that we ship .NET 5.0 without any performance regressions and validate whether in the near future we can fully rely on the regressions auto-filing bot written by @DrewScoggins.
My other goal was to get .NET Library Team members involved and keep on growing the performance culture.

#tl;dr The bot is doing a great job in detecting regressions. Most serious regressions have been already fixed, however a few investigations are still in progress.

Methodology (and how it evolved)

In 2018 I had the pleasure to review @AndreyAkinshin "Pro .NET Benchmarking" book. The "Statistics for Performance Engineers" and "Performance Analysis and Performance Testing" chapters inspired me to implement a small tool called Results Comparer. The tool uses the Mann-Whitney U statistical test to detect performance regressions in results exported by BenchmarkDotNet. It's being used (or at least it should) as part of our benchmarking workflow to prevent introducing regressions to .NET.

In 2019 I was asked by @danmosemsft to verify .NET Core 3.0 performance. Initially, I’ve run all the microbenchmarks from dotnet/performance repository using a single machine with dual boot for Windows 10 and Ubuntu 18.04 x64 and used the Results Comparer to find regressions. It very quickly turned out that such a sample was way too small to make sure that we don’t have any regressions. Some benchmarks were simply unstable, some architectures like ARM and ARM64 were simply not covered. Other Linux distros and CPU families were also not covered.

Then I’ve run the benchmarks on all the PCs, laptops, and VMs that I could access. But I was still missing AMD and ARM results, so I've asked @tannergooding and @BruceForstall for help. @tannergooding has run the benchmarks on all his AMD machines. @BruceForstall has provided me access to a document that explains how to use ARM machines owned by the JIT Team. This turned out to be an invaluable help as I've used these machines many, many times. Including this year during the 5.0 investigation.

After having enough samples to cover our matrix of supported OSes and architectures, I’ve built a simple console app on top of ResultsComparer (source code available here). The tool uses the very same statistical test to detect regressions, aggregates the results from all different configurations, and sorts them from the biggest regression to the biggest improvement.

Such approach allows for very quick identification of regressions of all kinds:

  • affecting every configuration

System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)

Result Base Diff Ratio Operating System Bit
Slower 570.88 3069.76 0.19 Windows 10.0.19041.388 X64
Slower 610.20 3674.19 0.17 Windows 10.0.18363.959 X64
Slower 598.37 3519.26 0.17 Windows 10.0.18363.959 X64
Slower 700.86 4238.85 0.17 Windows 10.0.19041.450 X64
Slower 583.19 3538.60 0.16 Windows 10.0.19041.450 X64
Slower 546.58 3015.23 0.18 Windows 10.0.19042 X64
Slower 665.53 3776.10 0.18 Windows 10.0.19041.450 X64
Slower 515.15 3162.05 0.16 Windows 10.0.19041.450 X64
Slower 626.94 3928.55 0.16 ubuntu 18.04 X64
Slower 630.90 4196.01 0.15 manjaro X64
Slower 813.80 4605.57 0.18 pop 20.04 X64
Slower 608.59 3587.44 0.17 alpine 3.11 X64
Slower 615.67 3390.01 0.18 ubuntu 18.04 X64
Slower 2148.33 10335.71 0.21 ubuntu 16.04 Arm64
Slower 2183.77 10620.53 0.21 ubuntu 16.04 Arm64
Slower 2163.67 10815.16 0.20 ubuntu 16.04 Arm64
Slower 1176.33 11641.04 0.10 ubuntu 18.04 Arm64
Slower 1550.48 5183.74 0.30 ubuntu 20.04 Arm64
Slower 568.67 3637.59 0.16 Windows 10.0.18363.959 X86
Slower 664.86 4576.24 0.15 Windows 10.0.19041.450 X86
Slower 972.74 8054.46 0.12 Windows 10.0.18363.1016 Arm
Slower 790.15 5171.92 0.15 macOS Catalina 10.15.6 X64
Slower 668.62 4153.54 0.16 macOS Catalina 10.15.6 X64
Slower 743.69 4727.58 0.16 macOS Mojave 10.14.5 X64
  • affecting specific OS families (Windows, Unix)

System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options: (en-US, IgnoreSymbols, False))

Result Base Diff Ratio Operating System Bit
Slower 53.24 26589.31 0.00 Windows 10.0.19041.388 X64
Slower 65.47 28371.93 0.00 Windows 10.0.18363.959 X64
Slower 63.89 27952.39 0.00 Windows 10.0.18363.959 X64
Slower 75.24 35910.74 0.00 Windows 10.0.19041.450 X64
Slower 67.29 55198.94 0.00 Windows 10.0.19041.450 X64
Slower 58.36 31008.73 0.00 Windows 10.0.19042 X64
Slower 70.38 34632.87 0.00 Windows 10.0.19041.450 X64
Slower 58.92 27533.16 0.00 Windows 10.0.19041.450 X64
Same 24197.26 24316.40 1.00 ubuntu 18.04 X64
Same 23317.93 23585.42 0.99 manjaro X64
Same 30855.66 30176.99 1.02 pop 20.04 X64
Same 29081.88 28590.29 1.02 alpine 3.11 X64
Same 23929.07 23728.33 1.01 ubuntu 18.04 X64
Same 51918.86 51256.87 1.01 ubuntu 16.04 Arm64
Same 51674.77 51693.86 1.00 ubuntu 16.04 Arm64
Same 51690.93 52015.88 0.99 ubuntu 16.04 Arm64
Same 61071.92 43711.17 1.40 ubuntu 18.04 Arm64
Faster 43870.66 26020.13 1.69 ubuntu 20.04 Arm64
Slower 78.42 36208.27 0.00 Windows 10.0.18363.959 X86
Slower 88.01 42312.37 0.00 Windows 10.0.19041.450 X86
Slower 104.29 57622.86 0.00 Windows 10.0.18363.1016 Arm
Same 38089.02 40079.68 0.95 macOS Catalina 10.15.6 X64
Same 32208.09 32537.00 0.99 macOS Catalina 10.15.6 X64
Same 32575.17 32782.69 0.99 macOS Mojave 10.14.5 X64
  • affecting specific Linux distros

System.Threading.Tests.Perf_CancellationToken.Cancel

Result Base Diff Ratio Operating System Bit
Same 116.42 120.28 0.97 Windows 10.0.19041.388 X64
Same 148.25 146.53 1.01 Windows 10.0.18363.959 X64
Same 144.37 144.09 1.00 Windows 10.0.18363.959 X64
Same 154.82 151.57 1.02 Windows 10.0.19041.450 X64
Same 134.57 133.40 1.01 Windows 10.0.19041.450 X64
Same 122.52 119.39 1.03 Windows 10.0.19042 X64
Same 154.48 150.92 1.02 Windows 10.0.19041.450 X64
Same 128.87 122.90 1.05 Windows 10.0.19041.450 X64
Same 169.50 168.46 1.01 ubuntu 18.04 X64
Faster 171.67 155.11 1.11 manjaro X64
Same 179.54 175.17 1.02 pop 20.04 X64
Slower 146.39 203.94 0.72 alpine 3.11 X64
Same 179.39 180.75 0.99 ubuntu 18.04 X64
Same 1068.08 1029.35 1.04 ubuntu 16.04 Arm64
Same 1066.73 1056.79 1.01 ubuntu 16.04 Arm64
Same 1111.72 1037.54 1.07 ubuntu 16.04 Arm64
Same 751.74 622.83 1.21 ubuntu 18.04 Arm64
Faster 675.51 318.18 2.12 ubuntu 20.04 Arm64
Same 258.80 257.15 1.01 Windows 10.0.18363.959 X86
Same 194.61 192.96 1.01 Windows 10.0.19041.450 X86
Same 486.93 508.05 0.96 Windows 10.0.18363.1016 Arm
Same 200.25 203.78 0.98 macOS Catalina 10.15.6 X64
Same 168.62 163.47 1.03 macOS Catalina 10.15.6 X64
Same 174.95 177.88 0.98 macOS Mojave 10.14.5 X64
  • affecting specific CPU families

System.Buffers.Text.Tests.Base64EncodeDecodeInPlaceTests.Base64EncodeInPlace(NumberOfBytes: 200000000)

Result Base Diff Ratio Operating System Bit Processor Name
Same 125616750.00 125476550.00 1.00 Windows 10.0.19041.388 X64 AMD Ryzen 9 3900X
Same 161388400.00 156493500.00 1.03 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 154933500.00 154730800.00 1.00 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 180481800.00 180129900.00 1.00 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell)
Slower 161742300.00 211160300.00 0.77 Windows 10.0.19041.450 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake)
Same 152928600.00 150232700.00 1.02 Windows 10.0.19042 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 206708750.00 206860050.00 1.00 Windows 10.0.19041.450 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Slower 140924300.00 185228400.00 0.76 Windows 10.0.19041.450 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same 154948321.00 154788579.50 1.00 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 175860282.50 163007313.50 1.08 manjaro X64 Intel Core i7-4771 CPU 3.50GHz (Haswell)
Slower 199713880.00 255270486.50 0.78 pop 20.04 X64 Intel Core i7-6600U CPU 2.60GHz (Skylake)
Same 151256100.00 168661900.00 0.90 alpine 3.11 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 171229200.00 165843050.00 1.03 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 503785101.00 505992400.50 1.00 ubuntu 16.04 Arm64 Unknown processor
Same 503901205.00 506190175.00 1.00 ubuntu 16.04 Arm64 Unknown processor
Same 504131772.50 506220395.00 1.00 ubuntu 16.04 Arm64 Unknown processor
Same 473629200.00 541631800.00 0.87 ubuntu 18.04 Arm64 Unknown processor
Same 331381500.00 333779500.00 0.99 ubuntu 20.04 Arm64 Unknown processor
Same 246876150.00 247010200.00 1.00 Windows 10.0.18363.959 X86 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 290036150.00 289409500.00 1.00 Windows 10.0.19041.450 X86 Intel Core i7-5557U CPU 3.10GHz (Broadwell)
Same 418007450.00 415404450.00 1.01 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz
Same 204196936.50 204410652.50 1.00 macOS Catalina 10.15.6 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Same 176763730.00 175647563.50 1.01 macOS Catalina 10.15.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell)
Same 180812724.00 184849205.00 0.98 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell)

Using the tool had one major flaw: it was not automated and hence we were finding out about the regressions only when we searched for them.

This has been recognized and a new project has been started. In 2020 @DrewScoggins started implementing a GitHub bot that would be using the data gathered from performance lab (a set of machines owned by .NET Performance Team) microbenchmark runs to detect and auto-file the regressions. So far the bot was reporting new issues in a dedicated repository and once a week the workgroup led by @DrewScoggins that consisted of @AndyAyersMS, @kunalspathak, @tannergooding any myself was going through the list and triaging the issues. Issues that were seemed as actual regressions were labeled as Needs Transfer and were later moved by @DrewScoggins to the runtime repo.

A few weeks ago we were getting close to "code freeze" for .NET 5 and I have asked myself a question: are we sure that the bot has reported all possible regressions for all the supported OS versions?

The bot is using different statistical methods to detect regressions and so far it has been enabled only for Windows 10 x64, Ubuntu 18.04 x64, and Windows 10 x86. So I've decided to spend some time and use the old tool that I wrote to verify it. To increase the sample size and get other .NET Libraries Team members involved, I've simply asked the Team to run the benchmarks and share the results with me.

Running the performance repo microbenchmarks against the latest .NET Core SDK is super easy thanks to a python script implemented by @jorive. The script downloads the right SDK and starts benchmarking with cleared environment variables.

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter '*'

Data

The data I've received from the .NET Libraries Team members allowed me a big part of the entire matrix of the supported configurations:

Operating System Arch Processor Name Provided by
Windows 10.0.19041.388 X64 AMD Ryzen 9 3900X @tannergooding
Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz @adamsitnik
Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) @adamsitnik
Windows 10.0.19041.450 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) @GrabYourPitchforks
Windows 10.0.19042 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) @danmosemsft
Windows 10.0.19041.450 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) @jeffhandley
Windows 10.0.19041.450 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) @jeffhandley
ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz @adamsitnik
manjaro X64 Intel Core i7-4771 CPU 3.50GHz (Haswell) @ManickaP
pop 20.04 X64 Intel Core i7-6600U CPU 2.60GHz (Skylake) @carlossanlop
alpine 3.11 (WSL2) X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) @danmosemsft
ubuntu 18.04 (WSL2) X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) @danmosemsft
ubuntu 16.04 Arm64 Qualcomm Centriq @adamsitnik
ubuntu 18.04 (WSL2) Arm64 Microsoft SQ1 3.0 GHz (Surface Pro X) @carlossanlop
ubuntu 20.04 (WSL2) Arm64 Microsoft SQ1 3.0 GHz (Surface Pro X) @pgovind
Windows 10.0.18363.959 X86 Intel Xeon CPU E5-1650 v4 3.60GHz @adamsitnik
Windows 10.0.19041.450 X86 Intel Core i7-5557U CPU 3.10GHz (Broadwell) @adamsitnik
Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz (Surface Pro X) @adamsitnik
macOS Catalina 10.15.6 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) @jeffhandley
macOS Catalina 10.15.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) @carlossanlop
macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) @adamsitnik

Everyone interested can download the data from here. The full report generated by the tool is available here.

Moreover, the full historical data turned out to be extremely useful. I've used it every time I was not sure whether something was a regression or just unstable|multimodal benchmark:

Regressions

Already fixed

Investigation in progress

By design or Acceptable

Moved to 6.0

Unstable or multimodal benchmarks

There was of course more of them, here are the ones that I've noted to use as Contract Tests in the near future (to reduce the noise produced by the bot):

Summary

  • The bot has reported all major performance issues for the configurations that it was enabled for (Windows x64, x86, and Ubuntu x64). Great work @DrewScoggins!
  • The full historical data turned out to be extremely useful to exclude all false positives for multimodal and unstable benchmarks.
  • We have missed one important x86 bug during triaging (human error), but it got discovered during the study ([Perf -196%] System.Collections.ContainsTrue<Int32> (6) #41167 (comment)). To avoid such problems in the future and to enable the bot in the runtime repo, the noise of the bot needs to be reduced. Currently, it's quite high, mostly due to the multimodal nature of the benchmarks.
  • The study has detected relatively many new ARM64 perf problems at a late stage of the release. The sooner we enable the bot for ARM64, the better. Moreover, we should be more frequently asking for ARM64 results when reviewing big changes that affect the performance of frequently used features (like sorting the arrays).
  • The study has shown that measuring the performance of GNU libc based Linux distros like Ubuntu is not enough to detect musl libc specific regressions. We should consider adding Alpine runs to the perf lab.
  • This time no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.
  • The Alpine regression has shown that an increased number of Gen 0 collections can be a very valuable metric to detect regressions. We should consider extending the bot to use it.

Big thanks to everyone involved!

@adamsitnik adamsitnik added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark discussion Triaged labels Sep 4, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Sep 4, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Sep 4, 2020
@danmoseley
Copy link
Member

Great work @adamsitnik . I am pleased that our systems have improved since last cycle, and that you and @DrewScoggins will be using this exercise to improve them such that more issues are found earlier and not at the end of the cycle.

Cc @Lxiamail @jkotas

no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.

I thought the perf lab did not cover Mac and only had one type of CPU. How can we catch such regression before the end of the cycle?

@adamsitnik
Copy link
Member Author

thought the perf lab did not cover Mac and only had one type of CPU.

You are right.

no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.

I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed

@danmoseley
Copy link
Member

Could an effort like this (script run over manual collected submissions) be made cheap enough that we could do it occasionally during the cycle?

@Lxiamail
Copy link
Member

Lxiamail commented Sep 4, 2020

@adamsitnik Great job! Based on the data out of this exercise, we will add Alpine to .net perf lab.

@ladeak
Copy link
Contributor

ladeak commented Sep 4, 2020

Is the collected data internal only or available publicly as well?

@jeffhandley
Copy link
Member

Could an effort like this (script run over manual collected submissions) be made cheap enough that we could do it occasionally during the cycle?

We talked about this offline, but sharing the comment here too... We'll be putting effort into that between now and the first 6.0 previews with the goal of completing targeting manual runs for each of the 6.0 Preview/RC releases.

@joperezr joperezr added the tracking This issue is tracking the completion of other related issues. label Sep 8, 2020
@DrewScoggins
Copy link
Member

thought the perf lab did not cover Mac and only had one type of CPU.

You are right.

no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.

I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed

Just for clarity here. We will have a brand new batch of AMD hardware this calendar year. So we will not be covering MacOS in the lab, but we will have both Intel and AMD hardware for coverage.

@tannergooding
Copy link
Member

@DrewScoggins, do we know what the hardware specs are?

@DrewScoggins
Copy link
Member

I do not, @billwert did the work of speccing out the machines.

@richlander
Copy link
Member

@adamsitnik -- that sharepoint link didn't work for me. Can you just publish the data to GH?

@richlander
Copy link
Member

My mistake. I was using the wrong browser profile.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Meta discussion tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark tracking This issue is tracking the completion of other related issues.
Projects
None yet
Development

No branches or pull requests

10 participants