Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ci): dynamic test scheduler / balancer #12180

Merged
merged 1 commit into from Dec 15, 2023
Merged

Conversation

hanshuebner
Copy link
Contributor

@hanshuebner hanshuebner commented Dec 8, 2023

Summary

This PR adds an automatic scheduler for running busted tests. It replaces the static, shell script based scheduler by a mechanism that distributes the load onto a number of runners. Each runner gets to work on a portion of the tests that need to be run. The scheduler uses historic run time information to distribute the work evenly across runners, with the goal of making them all run for the same amount of time. With the 7 runners configured in the PR, the overall time it takes to run tests is reduced from around 30 minutes to around 11 minutes.

Previously, the scheduling for tests was defined by what the run_tests.sh shell script did. This has now changed so that the new JSON file test_suites.json is instead used to define the tests that need to run. Like before, each of the test suites can have its own set of environment variables and test exclusions.

The test runner has been rewritten in Javascript in order to make it easier to interface with the declarative configuration file and to facilitate reporting and interfacing with busted. It resides in the https://github.com/Kong/gateway-test-scheduler repository and provides its functionality through custom GitHub Actions.

A couple of tests had to be changed to isolate them from other tests better. As the tests are no longer run in identical order every time, it has become more important that each test performs any required cleanup before it runs.

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

KAG-3196

@github-actions github-actions bot added the chore Not part of the core functionality of kong, but still needed label Dec 8, 2023
.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved
.ci/runtimes.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@nowNick nowNick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Great job! The time saving are amazing! 🚀

I left a bunch of comments but these are really just suggestions.

I come from a little bit different code style and I'm trying to better understand what is preferred here so I could adjust myself appropriately.

.ci/runtimes.txt Outdated Show resolved Hide resolved
.ci/schedule-tests/src/append-to-file.js Outdated Show resolved Hide resolved
.ci/schedule-tests/src/combine-statistics.js Outdated Show resolved Hide resolved
.ci/schedule-tests/src/combine-statistics.js Outdated Show resolved Hide resolved
.ci/schedule-tests/src/combine-statistics.js Outdated Show resolved Hide resolved
.ci/schedule-tests/src/test-runner.js Outdated Show resolved Hide resolved
.ci/schedule-tests/src/test-runner.js Outdated Show resolved Hide resolved
.ci/schedule-tests/src/test-runner.js Outdated Show resolved Hide resolved
.ci/schedule-tests/package.json Outdated Show resolved Hide resolved
.ci/schedule-tests/src/test-runner.js Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Dec 12, 2023

CLA assistant check
All committers have signed the CLA.

repo-path: Kong/gateway-action-storage/main/.ci/runtimes.json

- name: Schedule tests
uses: Kong/gateway-test-scheduler/schedule@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a commit sha will benefit to security, but as we are still iterrating it's not a blocker from my side. we can change it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Once the scheduler implementation has settled a little, we should pin to a SHA or a tag. Given that the "Build & Test"-Workflow has no outputs other than the success or failure, the risk should be managable.

luacov.stats.out
name: schedule-test-files
path: test-chunk.*
retention-days: 7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if we can cat the content of two schedule files into $GITHUB_STEP_SUMMARY so we can view it without downloading the artifact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a report to the schedule action that makes it easy to find which runner runs what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the output of the "Schedule Tests" step in https://github.com/Kong/kong/actions/runs/7207036134/job/19653095500 - Is this what you've asked for? I decided to not re-sort the output so that one can see the order in which tests are run. To find a particular test file, one needs to search. If a list sorted by test filename and suite mapping to runner would be useful, I can also include that.

Comment on lines -186 to +188
assert(pl_path.mkdir(usr_interface_path))
os.execute("mkdir -p " .. usr_interface_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/non-blocker, but Penlight has a mkdir -p equivalent:

https://lunarmodules.github.io/Penlight/libraries/pl.dir.html#makepath

create a directory path. This will create subdirectories as necessary!

Copy link
Contributor

@locao locao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total duration 12m 11s

Awesome!

Copy link
Contributor

@flrgh flrgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of large change to CI always comes with some anxiety of "what if it breaks with a cryptic error at the worst possible moment, and $person_who_implemented_it_and_holds_all_the_knowledge isn't around to help debug?", so my one ask is for some documentation about the scheduler and its moving pieces to help the rest of us get up to speed as needed. I'm not sure where is appropriate for that--maybe here, maybe in the https://github.com/Kong/gateway-test-scheduler repo.

All that said, we don't need to release from master any time soon, so this is as good a time as any to merge such a change, as we have runway to work out any kinks.

Nice job, looking forward to shorter test runs! 🏆

This commit adds an automatic scheduler for running busted tests.  It
replaces the static, shell script based scheduler by a mechanism that
distributes the load onto a number of runners.  Each runner gets to
work on a portion of the tests that need to be run.  The scheduler
uses historic run time information to distribute the work evenly
across runners, with the goal of making them all run for the same
amount of time.  With the 7 runners configured in the PR, the overall
time it takes to run tests is reduced from around 30 minutes to around
11 minutes.

Previously, the scheduling for tests was defined by what the
run_tests.sh shell script did.  This has now changed so that the new
JSON file `test_suites.json` is instead used to define the tests that
need to run.  Like before, each of the test suites can have its own
set of environment variables and test exclusions.

The test runner has been rewritten in Javascript in order to make it
easier to interface with the declarative configuration file and to
facilitate reporting and interfacing with busted.  It resides in the
https://github.com/Kong/gateway-test-scheduler repository and
provides its functionality through custom GitHub Actions.

A couple of tests had to be changed to isolate them from other tests
better.  As the tests are no longer run in identical order every time,
it has become more important that each test performs any required
cleanup before it runs.
@dndx dndx requested a review from ADD-SP December 15, 2023 04:18
@dndx dndx merged commit ac59ffd into master Dec 15, 2023
27 checks passed
@dndx dndx deleted the feat/test-run-scheduler branch December 15, 2023 05:58
@chronolaw
Copy link
Contributor

chronolaw commented Dec 18, 2023

The running test title Build & Test / Busted test runner X (pull_request) is a bit indistinct, could we change it back to old Build & Test / Postgres plugins - first tests (pull_request) and others?

@hanshuebner
Copy link
Contributor Author

The running test title Build & Test / Busted test runner X (pull_request) is a bit indistinct, could we change it back to old Build & Test / Postgres plugins - first tests (pull_request) and others?

No, because the runners run tests from all suites based on their schedule now.

@chronolaw chronolaw restored the feat/test-run-scheduler branch December 29, 2023 02:55
@dndx dndx deleted the feat/test-run-scheduler branch December 29, 2023 06:19
dndx pushed a commit that referenced this pull request Dec 29, 2023
Due to false green observed on `master`.
dndx pushed a commit that referenced this pull request Dec 29, 2023
Due to false green observed on `master`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Not part of the core functionality of kong, but still needed size/XL skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet