-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automation of benchmark comparison #292
Comments
I went through the codebase. @jrbourbeau @ncclementi @ian-r-rose please correct me if I say something wrong or impossible. The
If that's the case, my plan is fairly straightforward:
Sample workflowI want to test dask/distributed#6614. I create a new branch of coiled-runtime, in my own fork: In the new branch, I create the following files. Each file is a bunch of AB_parameters/6614_before.yamlconda:
- python=3.9
- coiled-runtime=0.1.0
pip:
- dask==2022.8.1
- git+https://github.com/dask/distributed@f07f384eb0b8b6bad7518e8535e42fdba106ed4b AB_parameters/6614_inf.yamlconda:
- python=3.9
- coiled-runtime=0.1.0
pip:
- dask==2022.8.1
- git+https://github.com/dask/distributed@dd81b424971e81616e1a52fa09ce4698a5002d41 AB_parameters/6614_12.yamlconda:
- python=3.9
- coiled-runtime=0.1.0
pip:
- dask==2022.8.1
- git+https://github.com/dask/distributed@dd81b424971e81616e1a52fa09ce4698a5002d41
environ:
DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION: "1.2" Notes
I then have to update
matrix:
AB_environments: [] to matrix:
AB_environments:
- 6614_before
- 6614_inf
- 6614_12
benchmarks:
...
strategy:
fail-fast: false
matrix:
os: ["ubuntu-latest"]
python-version: ["3.9"]
runtime-version: ["latest", "0.0.4", "0.1.0"] to matrix:
os: ["ubuntu-latest"]
python-version: ["3.9"]
runtime-version: [AB_6614_before, AB_6614_inf, AB_6614_12] Note: I would love to not need to touch Finally, I submit a PR named The static report will show side by side the time, peak memory, and average memory for each test from each test suite that has run and has the measure. Further improvements, out of scope for this ticket: RequirementsAll requirements in the OP are satisfied. Nice to haves
Known issues and limitations
|
the tests workflow is already pretty complicated. I wouldn't mind writing something new. If it actually is easier to develop within this file, I'm OK. Would love to get @ian-r-rose thoughts on this
nit: if you are running on your fork, there will be no credentials. Right now, we have to run everything on the main repo. I don't think this impacts your proposal
Agreed but I think that's an OK workflow for now |
Looks like the checkout action supports checking out anything you want. It just defaults to the current commit https://github.com/actions/checkout I suggest to do this in a follow up. |
Nice follow up: the CI job posts the final results to the PR and closes the PR automatically |
I would expect these to be much simpler once we merge #235 and live in a
No objection here :)
This would also be unnecessary in a
On this front, I pushed some WIP that I have found useful for looking at regression in #294. You might be able to use that with minimal changes (it could probably use a bit of CLI work). |
I agree that it's quite complicated. However, there are two things that would simplify it dramatically:
Would that allay your concerns about having something in |
@ian-r-rose beat me to it, but I was about to say just that. I think once Package sync is in and we simplify CI, things should get much simpler to work with. |
@crusaderky I'd recommend holding off on making major time investments into changing the software environment setup process until #235 is in. Or building on top of that PR. |
I'd like to get it in today or tomorrow. Before we merge it, do you have any concerns about using |
How is this supported? Let's say I wanted to test dask/dask#9446 while it's still a prototype PR. I want to run 3 cases:
How would you express the fact that different test code needs to run in different cases? With this approach, it seems like I'd have to add some way into my dask fork to express the two different cardinalities via environment variables. Or another case: what if I wanted to benchmark the above code, also on 3 different cluster sizes? |
Another way of approaching this would be to have the ability to compare across different branches of the You'd create a branch for each variation you want to test. In that branch, you can change the You also can change all code however you need to between branches. This allows you to:
Pros:
Cons:
The downside, of course, is that you don't have this nice YAML file describing exactly what's different between the cases, which could be nice for dashboarding. But inevitably, that YAML is going to have to grow and grow to encompass more types of things you want to compare, and eventually require its own documentation and tests. (The alternative is to push all that complexity into the forked code you're running, which I think is a bad idea. Maintaining a branch of a branch, which adds hacks to alter behavior based on environment variables, would be brittle and frustrating.) Making the changes in different branches could become tedious, especially with a large parameter space. I could certainly see making a script which takes a YAML configuration file and sets up all the branches for you :) There's a question of whether we want to have thousands of branches in this repo. (We certainly wouldn't open PRs for all of them—just pushing would have to be sufficient.) Creating a shared git config to ignore fetching refspecs following a pattern (like The workflow would probably be something like:
|
You'd do like you always do when running tests against different versions of a dependency: @pytest.mark.skipif(dask.__version__ < (whatever is generated by your commit))
Cluster size is hardcoded in the tests, and it varies from test to test, so you can't. In a future iteration we can revisit the tests so that they use an environment variable whenever sensible.
This feels extremely complicated to me. So you'd have a "lead" PR which contains a list of git handles, and get everything from git twice? |
E.g. Use a branch in dask that hard codes a value or uses environment variables / configs to read this out. Solving such a problem specifically for the problem at hand is much simpler than figuring out a generic infrastructure solution that can handle everything
I'm with Guido on this. Let's keep our first iteration simple. Once it is up and running we evaluate what features are still missing. |
Requirements
Nice to haves
The text was updated successfully, but these errors were encountered: