Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect ASV benchmarking to CI #496

Merged
merged 6 commits into from Oct 30, 2023
Merged

Connect ASV benchmarking to CI #496

merged 6 commits into from Oct 30, 2023

Conversation

anissa111
Copy link
Member

@anissa111 anissa111 commented Oct 27, 2023

PR Summary

Closes #472

To see this in action, see my fork.

  • Benchmarks run in this action, producing this commit on NCAR/geocat-asv
  • Then, I manually re-run the benchmarks in this action. It does not re-run the previously run benchmarks in the previous actions, and fails here on purpose.

In this PR, I made a couple changes from the proof of concept runs above:

  • I've increased the number of commits that gets benchmarked. For our actual CI, I've set it to benchmark all commits from the v2022.10.0 tag forward. This will take a while to run the first time, but should only produce long CI runs when a new benchmark is added in the future.
  • I've altered the last step, the push to NCAR/geocat-comp-asv, to only run if the CI is coming from this repository from the main branch. This addition is to make sure that any manual runs on branches other than main do not get included in our archived benchmark results.

Here are a couple of design decisions that might warrant further discussion:

  • the action will fail if there are no new benchmarks to be committed. I thought about putting some checks to make sure the action passed anyway, but decided against it. If this CI, which runs automatically on pushes to main and tagged versions, can't find a new push to main or tagged version to benchmark (or fails to produce benchmarks), I assume we might actually want that failure to notify us if something has gone weird. @kafitzgerald, what do you think here?
  • How long do we actually want to produce a commit history for? Is v2022.10.0 too long or too short?
  • With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?
  • Just noting again that it feels really un-ideal to have my personal info added to authenticate the pushes, but I'm unsure how to get around that without fine-grained tokens and/or a bot

PR Checklist

General

  • Make an issue if one doesn't already exist
  • Link the issue this PR resolves by adding closes #XXX to the PR description where XXX is the number of the issue.
  • Add a brief summary of changes to docs/release-notes.rst in a relevant section for the next unreleased release. Possible sections include: Documentation, New Features, Bug Fixes, Internal Changes, Breaking Changes/Deprecated
  • Add appropriate labels to this PR
  • Make your changes in a forked repository rather than directly in this repo
  • Passes precommit. To set up on your local, run pre-commit install from the top level of the repository. To manually run pre-commits, use pre-commit run --all-files and re-add any changed files before committing again and pushing.
  • If needed, squash and merge PR commits into a single commit to clean up commit history

@anissa111 anissa111 added developer feature For development standardization / best practices / enhancement testing Issue related to testing scalability and performance Related to scalability, performance, or benchmarking labels Oct 27, 2023
@philipc2
Copy link
Member

With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?

By this, do you mean squashing any excessive commits? Using myself as a bad example, I made the mistake of not squashing a series of small commits before merging a PR (see here). Would ASV run a benchmark on each of these commits?

@anissa111
Copy link
Member Author

anissa111 commented Oct 27, 2023

With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?

By this, do you mean squashing any excessive commits? Using myself as a bad example, I made the mistake of not squashing a series of small commits before merging a PR (see here). Would ASV run a benchmark on each of these commits?

With the way we're currently set up in this PR, yes. The asv-run documentation says that they passes the commit list generated from git rev-list. From the commit list I got on uxarray, I can see it's grabbing all the un-squashed commits, which means that asv would benchmach all of them with this set up.

(base) ➜  uxarray git:(main) git rev-list v2023.10.1..main
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
2bd81d47ef6827ec11f7f6d4104678a783b6c7d3
01a34ffa2ff4e9c027871fee72aa65d608ae63ff
9e1c9605845449ecc09aaa3e0a164ac564ed1502
feb0985ba01498c39411ac92fe08408ac7b15c9d
5bf042b3f4fef84f7641a47194cc6fe6057135ea
c72b6e9d4958a9ba159955941abfa5fa039979ff
487ce1f957521d90e3de0636f24fe42a8b7c8056
84bc776b8892ddc3b2d58b84ea3e9dc9d1cd325a
28fcf1dfdf93c861f5fd185d4c9748d9e1973607
6cf5cde795936b2ebd6ff876998fd1593f533218
802749dcebe69b06084a54ee7bf487895cce1ea4
8394dd5e1a2240f8867eea82a52c78de04d9faa6
01e13ca4d921140c1df7aafcb8d3fe710ce7360b
f4c57fe0928fa2927fd2a12b04b12927f53a2cb1
b43157dc39969530058a9e8dfd0d9fa1418f7bda
f046cea7b9bb48e5d8fe645999058cfdd89ec076
d28f3b30e7606063dfb076adea86b66d701e0329
69308cdfe539b967b7602664de86f597c0304abe
da03e5a440671b33f2764acc1b4772eda266f836
5133a62e637b6f5feada702c5ef825f58b93ca30
9b50bdf30eed249b9af2bd931863f4330f78f515
943c9d8d3238ad1e27ff51f13cd04f1e8303eed1
7d444f7f509cd8cb2caa09ed51cf14b682876743
535fbb7e6a054d994cdd88985ce3029dbfd91fde
59d6fd1627439f6d0fcd7d5bb7a77771f0e61c47
f38e4c9fe1acd943c87bd14849c9922097cd8fb0
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
a00aa1062adb9dbf667299c26cb58bdba8c3d522
b23d33120cff9aedb5992bcbda5fc5fe46060682
63e7d5f06d406d69b55f5c6bda222064d7c3a64b
828f29654927feeefdef2d809fabe1ce60b4cc22
dbbf2362f5c78dbaaab29bbf7bcdbce927d0a003
6bb7416728b4c5d6fcb5d18a8e33a4dff729c0f3
e6d2a33be122553a9fb7026f6522ea6e7fefc608
f00e182f0aaffbe9631d4ec070e5a5ed19d1bea1
7a1bccfd33bc0bf2a1da95e19b4cac3f90a3fb1c

@philipc2
Copy link
Member

philipc2 commented Oct 27, 2023

With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?

By this, do you mean squashing any excessive commits? Using myself as a bad example, I made the mistake of not squashing a series of small commits before merging a PR (see here). Would ASV run a benchmark on each of these commits?

With the way we're currently set up in this PR, yes. The asv-run documentation says that they passes the commit list generated from git rev-list. From the commit list I got on uxarray, I can see it's grabbing all the un-squashed commits, which means that asv would benchmach all of them with this set up.

(base) ➜  uxarray git:(main) git rev-list v2023.10.1..main
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
2bd81d47ef6827ec11f7f6d4104678a783b6c7d3
01a34ffa2ff4e9c027871fee72aa65d608ae63ff
9e1c9605845449ecc09aaa3e0a164ac564ed1502
feb0985ba01498c39411ac92fe08408ac7b15c9d
5bf042b3f4fef84f7641a47194cc6fe6057135ea
c72b6e9d4958a9ba159955941abfa5fa039979ff
487ce1f957521d90e3de0636f24fe42a8b7c8056
84bc776b8892ddc3b2d58b84ea3e9dc9d1cd325a
28fcf1dfdf93c861f5fd185d4c9748d9e1973607
6cf5cde795936b2ebd6ff876998fd1593f533218
802749dcebe69b06084a54ee7bf487895cce1ea4
8394dd5e1a2240f8867eea82a52c78de04d9faa6
01e13ca4d921140c1df7aafcb8d3fe710ce7360b
f4c57fe0928fa2927fd2a12b04b12927f53a2cb1
b43157dc39969530058a9e8dfd0d9fa1418f7bda
f046cea7b9bb48e5d8fe645999058cfdd89ec076
d28f3b30e7606063dfb076adea86b66d701e0329
69308cdfe539b967b7602664de86f597c0304abe
da03e5a440671b33f2764acc1b4772eda266f836
5133a62e637b6f5feada702c5ef825f58b93ca30
9b50bdf30eed249b9af2bd931863f4330f78f515
943c9d8d3238ad1e27ff51f13cd04f1e8303eed1
7d444f7f509cd8cb2caa09ed51cf14b682876743
535fbb7e6a054d994cdd88985ce3029dbfd91fde
59d6fd1627439f6d0fcd7d5bb7a77771f0e61c47
f38e4c9fe1acd943c87bd14849c9922097cd8fb0
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
a00aa1062adb9dbf667299c26cb58bdba8c3d522
b23d33120cff9aedb5992bcbda5fc5fe46060682
63e7d5f06d406d69b55f5c6bda222064d7c3a64b
828f29654927feeefdef2d809fabe1ce60b4cc22
dbbf2362f5c78dbaaab29bbf7bcdbce927d0a003
6bb7416728b4c5d6fcb5d18a8e33a4dff729c0f3
e6d2a33be122553a9fb7026f6522ea6e7fefc608
f00e182f0aaffbe9631d4ec070e5a5ed19d1bea1
7a1bccfd33bc0bf2a1da95e19b4cac3f90a3fb1c

Got it. What about only running benchmarks on commits that originated after merging a pull request? Could that possibly be one solution?

@anissa111
Copy link
Member Author

anissa111 commented Oct 30, 2023

Got it. What about only running benchmarks on commits that originated after merging a pull request? Could that possibly be one solution?

@philipc2 Ah! Yeah there is an option for this, it's git rev-list --first-parent.

To continue the example from Friday, this is what that change produces on uxarray:

(base) ➜  uxarray git:(main) git rev-list main --first-parent v2023.10.1..main
9158ae47c46071d8c1a5e5e5d3f07c66279dfe22
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
63e7d5f06d406d69b55f5c6bda222064d7c3a64b

I'll try to see if I can pass flags through asv-run

EDIT: So, I did manage to get it to run with the --first-parent flag, but when checking it out in verbose to confirm that less commits were found with the flag, I discovered that by default asv run adds the --first-parent flag 🤦‍♀️. Good news though we should already be set.

Copy link
Contributor

@kafitzgerald kafitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm also a bit worried about the "failure" state being confusing, but ok with merging for now and seeing how that plays out in practice

Not so much about this specific PR, but the set-up here in general:

  • This will likely run on a lot of commits. We'll need to take that into account when writing benchmarks and perhaps create another class of benchmarks to run elsewhere.
  • Given that we may be running on different shared hardware (and not the most controlled environment), it'll be interesting to see how much variability we have in run time for various benchmarks.

I think these are just things we figure out as we go though.

I guess one more question before I approve this... If we did end up refactoring how this runs and wanted to rerun (say from a different release) and start over on the archive repo, how much of a problem would that be?

@philipc2
Copy link
Member

Got it. What about only running benchmarks on commits that originated after merging a pull request? Could that possibly be one solution?

@philipc2 Ah! Yeah there is an option for this, it's git rev-list --first-parent.

To continue the example from Friday, this is what that change produces on uxarray:

(base) ➜  uxarray git:(main) git rev-list main --first-parent v2023.10.1..main
9158ae47c46071d8c1a5e5e5d3f07c66279dfe22
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
63e7d5f06d406d69b55f5c6bda222064d7c3a64b

I'll try to see if I can pass flags through asv-run

EDIT: So, I did manage to get it to run with the --first-parent flag, but when checking it out in verbose to confirm that less commits were found with the flag, I discovered that by default asv run adds the --first-parent flag 🤦‍♀️. Good news though we should already be set.

Thats good to hear then!

@philipc2
Copy link
Member

philipc2 commented Oct 30, 2023

Given that we may be running on different hardware (and not the most controlled environment), it'll be interesting to see how much variability we have in run time for various benchmarks.

Adding to what @kafitzgerald mentioned, I'm worried on what type of results the default actions instance would produce, since I don't belive they have substantial hardware.

For an "automatic" action that we run, would it not be worth considering a fixed hardware (i.e. known CPU, Memory) instance on something like AWS (or even casper, if that's possible) that runs the bencharmk, instead of the default action instance?

However, ASV does support keeping track of the machine that ran the bencharmk (see numpy), so we should be able to distinguish between these automatic ones and manually comitted benchmarks.

EDIT: This could also cause issues if, when running on Github actions, we get an instance that has different hardware than the previous run, which would lead to adding another "machine" to our list.

This is a good read

@anissa111
Copy link
Member Author

@kafitzgerald

  • I'm also a bit worried about the "failure" state being confusing, but ok with merging for now and seeing how that plays out in practice

Fair enough, I think this won't actually happen that often, if ever, in the regular use of this action.

Not so much about this specific PR, but the set-up here in general:

  • This will likely run on a lot of commits. We'll need to take that into account when writing benchmarks and perhaps create another class of benchmarks to run elsewhere.

Yeah, totally. Any weighty benchmarks should probably be filtered out of this regular CI

I guess one more question before I approve this... If we did end up refactoring how this runs and wanted to rerun (say from a different release) and start over on the archive repo, how much of a problem would that be?

We can reconfigure this action without needing to delete all our archives (though we could also do that too). The results are only grouped by machine, which we're manually setting. If we wanted to switch to start from an earlier release, we could add to the existing archive by altering the asv run line.

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the same page now, this looks great!

Copy link
Contributor

@kafitzgerald kafitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@anissa111 anissa111 merged commit a5bdf49 into NCAR:main Oct 30, 2023
9 checks passed
@anissa111 anissa111 deleted the asv-ci branch October 30, 2023 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer feature For development standardization / best practices / enhancement scalability and performance Related to scalability, performance, or benchmarking testing Issue related to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connect asv benchmarking to CI
4 participants