Add `MemoryRandomization` attribute #3973

DrewScoggins · 2024-02-21T19:00:18Z

This checkin adds the MemoryRandomization attribute to a select set of benchmarks in the performance repo.

In order to determine which benchmarks should be selected the following procedure was used.

All tests were run in a seperate config with the MemoryRandomization attribute turned on.
5 weeks of data from these runs was used to calculate an average coefficient fo variance (CoV) for the runs with the attribute
This was then compared against the CoV of the regular runs that happened on the same builds and machines.
Any test that had all 5 weeks of CoV improve by more than 30% with the attribute were included.

DrewScoggins · 2024-02-21T19:00:56Z

cc @AndyAyersMS

src/benchmarks/micro/libraries/System.Memory/ReadOnlySpan.cs

src/benchmarks/micro/libraries/System.Runtime.Extensions/Perf.Convert.cs

src/benchmarks/micro/libraries/System.Runtime/Perf.DateTimeOffset.cs

src/benchmarks/micro/libraries/System.Runtime/Perf.String.cs

src/benchmarks/micro/libraries/System.Text.Encodings.Web/Perf.Encoders.cs

src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Common.cs

src/benchmarks/micro/libraries/System.Text/Perf.StringBuilder.cs

src/benchmarks/micro/libraries/System.Threading/Perf.Volatile.cs

src/benchmarks/micro/runtime/Devirtualization/GuardedThreeClassInterface.cs

src/benchmarks/micro/runtime/Linq/Linq.cs

src/benchmarks/micro/runtime/System.Reflection/Attributes.cs

AndyAyersMS · 2024-02-21T21:35:09Z

Happy to see this at long last.

Any thoughts on how this will get applied to new benchmarks? Will we need to redo the experiment every so often?

cincuranet

~~Do we have some process planned of how to keep the list of benchmarks that need randomization up-to-date (for example when new benchmarks are added)?~~ I see @AndyAyersMS already asked basically the same question.

DrewScoggins · 2024-03-01T18:33:41Z

Happy to see this at long last.

Any thoughts on how this will get applied to new benchmarks? Will we need to redo the experiment every so often?

I have all of the tooling, and queries that I used to generate this data. My guess is that we should check that in, but I think that it should probably not go in this repo. We don't really have tooling here unless it is directly for measuring performance.

~~I will make a PR in our infra repo.~~

So to me it looks like we have two main problems that we want to address. The first is among existing tests, are there any that were missed or that were improperly classified from this exercise. The second is, new tests were added and should they use this attribute.

I'll start with the second case. This feels like something that we should be able to do when new tests are checked in. It should be fairly straightforward to run the tests that are being added with memory randomization and without. I think the criteria should be that anytime we see multiple modes in the different iterations with memory randomization we should turn it on for those tests. We should build out some documentation for how this process works so that we will have something to point at when people go to add tests.

For the first case, I have all of the tooling and queries that I used to generate this test list, and that should be persisted. I will likely do it in our infrastructure repo, as that seems a better fit then here. I am not sure exactly what the cadence should be on how often we do the comparison. We were planning to turn off the memory randomization experiment to give us back room for other trials. To me it feels like something in the range of every 6-12 months makes sense for taking another pass and making sure that things are still behaving how we expect, but I am open to other ideas.

This checkin adds the `MemoryRandomization` attribute to a select set of benchmarks in the performance repo. In order to determine which benchmarks should be selected the following procedure was used. 1. All tests were run in a seperate config with the `MemoryRandomization` attribute turned on. 2. 5 weeks of data from these runs was used to calculate an average coefficient fo variance (CoV) for the runs with the attribute 3. This was then compared against the CoV of the regular runs that happened on the same builds and machines. 4. Any test that had all 5 weeks of CoV improve by more than 30% with the attribute were included.

DrewScoggins · 2024-03-05T19:05:38Z

Failures are all on 8.0 and unrelated to this change.

DrewScoggins requested review from stephentoub, cincuranet, adamsitnik, LoopedBard3 and caaavik-msft February 21, 2024 19:00