Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MemoryRandomization attribute #3973

Merged
merged 3 commits into from
Mar 5, 2024

Conversation

DrewScoggins
Copy link
Member

This checkin adds the MemoryRandomization attribute to a select set of benchmarks in the performance repo.

In order to determine which benchmarks should be selected the following procedure was used.

  1. All tests were run in a seperate config with the MemoryRandomization attribute turned on.
  2. 5 weeks of data from these runs was used to calculate an average coefficient fo variance (CoV) for the runs with the attribute
  3. This was then compared against the CoV of the regular runs that happened on the same builds and machines.
  4. Any test that had all 5 weeks of CoV improve by more than 30% with the attribute were included.

@DrewScoggins
Copy link
Member Author

cc @AndyAyersMS

@AndyAyersMS
Copy link
Member

Happy to see this at long last.

Any thoughts on how this will get applied to new benchmarks? Will we need to redo the experiment every so often?

cincuranet
cincuranet previously approved these changes Feb 23, 2024
Copy link
Contributor

@cincuranet cincuranet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some process planned of how to keep the list of benchmarks that need randomization up-to-date (for example when new benchmarks are added)? I see @AndyAyersMS already asked basically the same question.

@DrewScoggins
Copy link
Member Author

DrewScoggins commented Mar 1, 2024

Happy to see this at long last.

Any thoughts on how this will get applied to new benchmarks? Will we need to redo the experiment every so often?

I have all of the tooling, and queries that I used to generate this data. My guess is that we should check that in, but I think that it should probably not go in this repo. We don't really have tooling here unless it is directly for measuring performance.

I will make a PR in our infra repo.

So to me it looks like we have two main problems that we want to address. The first is among existing tests, are there any that were missed or that were improperly classified from this exercise. The second is, new tests were added and should they use this attribute.

I'll start with the second case. This feels like something that we should be able to do when new tests are checked in. It should be fairly straightforward to run the tests that are being added with memory randomization and without. I think the criteria should be that anytime we see multiple modes in the different iterations with memory randomization we should turn it on for those tests. We should build out some documentation for how this process works so that we will have something to point at when people go to add tests.

For the first case, I have all of the tooling and queries that I used to generate this test list, and that should be persisted. I will likely do it in our infrastructure repo, as that seems a better fit then here. I am not sure exactly what the cadence should be on how often we do the comparison. We were planning to turn off the memory randomization experiment to give us back room for other trials. To me it feels like something in the range of every 6-12 months makes sense for taking another pass and making sure that things are still behaving how we expect, but I am open to other ideas.

This checkin adds the `MemoryRandomization` attribute to a select set of
benchmarks in the performance repo.

In order to determine which benchmarks should be selected the following
procedure was used.

1. All tests were run in a seperate config with the
   `MemoryRandomization` attribute turned on.
2. 5 weeks of data from these runs was used to calculate an average
   coefficient fo variance (CoV) for the runs with the attribute
3. This was then compared against the CoV of the regular runs that
   happened on the same builds and machines.
4. Any test that had all 5 weeks of CoV improve by more than 30% with
   the attribute were included.
@DrewScoggins
Copy link
Member Author

Failures are all on 8.0 and unrelated to this change.

@DrewScoggins DrewScoggins merged commit d97774f into dotnet:main Mar 5, 2024
29 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants