-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Summary
Currently, we run 6-15+ jobs simultaneously for each CI trigger which all execute the notebooks. In addition, some jobs execute the notebooks in parallel, and some notebooks execute calls in parallel. I think this is resulting in too many simultaneous calls to IRSA. My sense is that an appropriate max number of simultaneous calls is around 10.
Details
| Trigger | Jobs | Runs in Forks[*] | Total |
|---|---|---|---|
| PR to main | • 1x build-docs (CircleCI) • 5x test matrix (GHA) |
No | 6 |
| Push to main | • 1x publish_html (GHA) • 1x build-docs (CircleCI) • 5x test matrix (GHA) |
No | 7 |
| PR to any branch | • 5x test matrix (GHA) | No | 5 |
| Weekly cron | • 5x test matrix (GHA) | Yes | 15+ |
[*] This is what runs in my fork and one other that I know of. Not sure if there are more.
Two things compounding the problem:
- Some jobs execute the notebooks in parallel -- is this all GHA jobs?
- Some notebooks make calls in parallel. For example, the spherex_cutouts.md notebook uses
max_workers=10to get cutouts in parallel. This means we're likely making between 60 - 150 simultaneous calls for data from that notebook alone, plus calls from the other notebooks.
So I'm concerned that we're being unfriendly at best, and may actually be causing problems for ourselves (execution failures), IRSA, and/or other users. This is especially true if IRSA is already experiencing heavy load from other users, as has been the case recently for SPHEREx data in particular.
Changes that could help
Sorted by impact. I'm looking for help with this table, both correcting what's there (esp. impact and implementation estimates) and adding new rows/ideas. @bsipocz and others, what do you think?
| Done | Trigger | Job | Change | Impact (hi/med/low) | Implementation (easy/hard) | |
|---|---|---|---|---|---|---|
| A | [x] #160 | Weekly cron | test matrix | stop these from running in forks | hi | easy |
| B | [ ] | Weekly cron | test matrix | Spread out so one matrix cell runs each day | med/hi | easy |
| C | [ ] | PR to main, Push to main, and PR to any branch | test matrix | IMO, running the test matrix should be done once before pushing to main, but running every time a commit is pushed to a PR is too much. Also, it's not necessary to run again after merge (on push to main) if it's been run before merging. How can we accomplish that? Some ideas: 1. Remove from trigger "PR to main"; only run it on "Push to main". It wouldn't run until after merge and we'd have to make sure to monitor the results and respond to failures. So, non-ideal, but maybe an acceptable trade off? 2. Implement a GH label to skip the test matrix job. Auto-add it to PRs when opened and require it to be removed (and test matrix run) before merging. I think(?) all this is doable in the config files, so maybe a better option than the above? |
med/hi | ? |
| D | [x] #176 | Push to main | build-docs | remove from this trigger | low/med | easy |
| E | [ ] #143 | PR to main | build-docs | only execute notebooks with changes. render the others with no output |
low/med | easy? (shell hack, diff with main and add non-changed notebooks to excluded list?) |
| F | [ ] | Push to main | publish_html | only execute notebooks with changes. use cache for the rest |
low/med | ? (how to implement the caching?) |