-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ci): dynamic test scheduler / balancer #12180
Conversation
be7382a
to
0fa5599
Compare
0fa5599
to
651d791
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Great job! The time saving are amazing! 🚀
I left a bunch of comments but these are really just suggestions.
I come from a little bit different code style and I'm trying to better understand what is preferred here so I could adjust myself appropriately.
3484afd
to
b5c9f74
Compare
975610e
to
d5571a0
Compare
repo-path: Kong/gateway-action-storage/main/.ci/runtimes.json | ||
|
||
- name: Schedule tests | ||
uses: Kong/gateway-test-scheduler/schedule@main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a commit sha will benefit to security, but as we are still iterrating it's not a blocker from my side. we can change it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Once the scheduler implementation has settled a little, we should pin to a SHA or a tag. Given that the "Build & Test"-Workflow has no outputs other than the success or failure, the risk should be managable.
luacov.stats.out | ||
name: schedule-test-files | ||
path: test-chunk.* | ||
retention-days: 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice if we can cat
the content of two schedule files into $GITHUB_STEP_SUMMARY
so we can view it without downloading the artifact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a report to the schedule action that makes it easy to find which runner runs what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at the output of the "Schedule Tests" step in https://github.com/Kong/kong/actions/runs/7207036134/job/19653095500 - Is this what you've asked for? I decided to not re-sort the output so that one can see the order in which tests are run. To find a particular test file, one needs to search. If a list sorted by test filename and suite mapping to runner would be useful, I can also include that.
assert(pl_path.mkdir(usr_interface_path)) | ||
os.execute("mkdir -p " .. usr_interface_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/non-blocker, but Penlight has a mkdir -p
equivalent:
https://lunarmodules.github.io/Penlight/libraries/pl.dir.html#makepath
create a directory path. This will create subdirectories as necessary!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total duration 12m 11s
Awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of large change to CI always comes with some anxiety of "what if it breaks with a cryptic error at the worst possible moment, and $person_who_implemented_it_and_holds_all_the_knowledge
isn't around to help debug?", so my one ask is for some documentation about the scheduler and its moving pieces to help the rest of us get up to speed as needed. I'm not sure where is appropriate for that--maybe here, maybe in the https://github.com/Kong/gateway-test-scheduler repo.
All that said, we don't need to release from master any time soon, so this is as good a time as any to merge such a change, as we have runway to work out any kinks.
Nice job, looking forward to shorter test runs! 🏆
This commit adds an automatic scheduler for running busted tests. It replaces the static, shell script based scheduler by a mechanism that distributes the load onto a number of runners. Each runner gets to work on a portion of the tests that need to be run. The scheduler uses historic run time information to distribute the work evenly across runners, with the goal of making them all run for the same amount of time. With the 7 runners configured in the PR, the overall time it takes to run tests is reduced from around 30 minutes to around 11 minutes. Previously, the scheduling for tests was defined by what the run_tests.sh shell script did. This has now changed so that the new JSON file `test_suites.json` is instead used to define the tests that need to run. Like before, each of the test suites can have its own set of environment variables and test exclusions. The test runner has been rewritten in Javascript in order to make it easier to interface with the declarative configuration file and to facilitate reporting and interfacing with busted. It resides in the https://github.com/Kong/gateway-test-scheduler repository and provides its functionality through custom GitHub Actions. A couple of tests had to be changed to isolate them from other tests better. As the tests are no longer run in identical order every time, it has become more important that each test performs any required cleanup before it runs.
b4af8c1
to
543004c
Compare
The running test title |
No, because the runners run tests from all suites based on their schedule now. |
Due to false green observed on `master`.
Due to false green observed on `master`.
Summary
This PR adds an automatic scheduler for running busted tests. It replaces the static, shell script based scheduler by a mechanism that distributes the load onto a number of runners. Each runner gets to work on a portion of the tests that need to be run. The scheduler uses historic run time information to distribute the work evenly across runners, with the goal of making them all run for the same amount of time. With the 7 runners configured in the PR, the overall time it takes to run tests is reduced from around 30 minutes to around 11 minutes.
Previously, the scheduling for tests was defined by what the run_tests.sh shell script did. This has now changed so that the new JSON file
test_suites.json
is instead used to define the tests that need to run. Like before, each of the test suites can have its own set of environment variables and test exclusions.The test runner has been rewritten in Javascript in order to make it easier to interface with the declarative configuration file and to facilitate reporting and interfacing with busted. It resides in the https://github.com/Kong/gateway-test-scheduler repository and provides its functionality through custom GitHub Actions.
A couple of tests had to be changed to isolate them from other tests better. As the tests are no longer run in identical order every time, it has become more important that each test performs any required cleanup before it runs.
Checklist
changelog/unreleased/kong
orskip-changelog
label added on PR if changelog is unnecessary. README.mdIssue reference
KAG-3196