feat(ci): dynamic test scheduler / balancer #12180

hanshuebner · 2023-12-08T17:39:31Z

Summary

This PR adds an automatic scheduler for running busted tests. It replaces the static, shell script based scheduler by a mechanism that distributes the load onto a number of runners. Each runner gets to work on a portion of the tests that need to be run. The scheduler uses historic run time information to distribute the work evenly across runners, with the goal of making them all run for the same amount of time. With the 7 runners configured in the PR, the overall time it takes to run tests is reduced from around 30 minutes to around 11 minutes.

Previously, the scheduling for tests was defined by what the run_tests.sh shell script did. This has now changed so that the new JSON file test_suites.json is instead used to define the tests that need to run. Like before, each of the test suites can have its own set of environment variables and test exclusions.

The test runner has been rewritten in Javascript in order to make it easier to interface with the declarative configuration file and to facilitate reporting and interfacing with busted. It resides in the https://github.com/Kong/gateway-test-scheduler repository and provides its functionality through custom GitHub Actions.

A couple of tests had to be changed to isolate them from other tests better. As the tests are no longer run in identical order every time, it has become more important that each test performs any required cleanup before it runs.

Checklist

The Pull Request has tests
A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

KAG-3196

.github/workflows/build_and_test.yml

.ci/runtimes.txt

.ci/schedule-tests/src/combine-statistics.js

.github/workflows/build_and_test.yml

nowNick

Hey! Great job! The time saving are amazing! 🚀

I left a bunch of comments but these are really just suggestions.

I come from a little bit different code style and I'm trying to better understand what is preferred here so I could adjust myself appropriately.

.ci/runtimes.txt

.ci/schedule-tests/src/append-to-file.js

.ci/schedule-tests/src/combine-statistics.js

.ci/schedule-tests/src/test-runner.js

.ci/schedule-tests/package.json

.ci/schedule-tests/src/test-runner.js

CLAassistant · 2023-12-12T15:43:04Z

All committers have signed the CLA.

fffonion · 2023-12-14T11:16:28Z

.github/workflows/build_and_test.yml

+        repo-path: Kong/gateway-action-storage/main/.ci/runtimes.json
+
+    - name: Schedule tests
+      uses: Kong/gateway-test-scheduler/schedule@main


use a commit sha will benefit to security, but as we are still iterrating it's not a blocker from my side. we can change it later.

Agreed. Once the scheduler implementation has settled a little, we should pin to a SHA or a tag. Given that the "Build & Test"-Workflow has no outputs other than the success or failure, the risk should be managable.

fffonion · 2023-12-14T11:20:39Z

.github/workflows/build_and_test.yml

-          luacov.stats.out
+        name: schedule-test-files
+        path: test-chunk.*
+        retention-days: 7


would be nice if we can cat the content of two schedule files into $GITHUB_STEP_SUMMARY so we can view it without downloading the artifact

I'll add a report to the schedule action that makes it easy to find which runner runs what.

Please have a look at the output of the "Schedule Tests" step in https://github.com/Kong/kong/actions/runs/7207036134/job/19653095500 - Is this what you've asked for? I decided to not re-sort the output so that one can see the order in which tests are run. To find a particular test file, one needs to search. If a list sorted by test filename and suite mapping to runner would be useful, I can also include that.

flrgh · 2023-12-14T20:01:53Z

spec/01-unit/29-admin_gui/02-admin_gui_template_spec.lua

-        assert(pl_path.mkdir(usr_interface_path))
+        os.execute("mkdir -p " .. usr_interface_path)


nit/non-blocker, but Penlight has a mkdir -p equivalent:

https://lunarmodules.github.io/Penlight/libraries/pl.dir.html#makepath

create a directory path. This will create subdirectories as necessary!

.ci/test_suites.json

locao

Total duration 12m 11s

Awesome!

flrgh

This kind of large change to CI always comes with some anxiety of "what if it breaks with a cryptic error at the worst possible moment, and $person_who_implemented_it_and_holds_all_the_knowledge isn't around to help debug?", so my one ask is for some documentation about the scheduler and its moving pieces to help the rest of us get up to speed as needed. I'm not sure where is appropriate for that--maybe here, maybe in the https://github.com/Kong/gateway-test-scheduler repo.

All that said, we don't need to release from master any time soon, so this is as good a time as any to merge such a change, as we have runway to work out any kinks.

Nice job, looking forward to shorter test runs! 🏆

This commit adds an automatic scheduler for running busted tests. It replaces the static, shell script based scheduler by a mechanism that distributes the load onto a number of runners. Each runner gets to work on a portion of the tests that need to be run. The scheduler uses historic run time information to distribute the work evenly across runners, with the goal of making them all run for the same amount of time. With the 7 runners configured in the PR, the overall time it takes to run tests is reduced from around 30 minutes to around 11 minutes. Previously, the scheduling for tests was defined by what the run_tests.sh shell script did. This has now changed so that the new JSON file `test_suites.json` is instead used to define the tests that need to run. Like before, each of the test suites can have its own set of environment variables and test exclusions. The test runner has been rewritten in Javascript in order to make it easier to interface with the declarative configuration file and to facilitate reporting and interfacing with busted. It resides in the https://github.com/Kong/gateway-test-scheduler repository and provides its functionality through custom GitHub Actions. A couple of tests had to be changed to isolate them from other tests better. As the tests are no longer run in identical order every time, it has become more important that each test performs any required cleanup before it runs.

.github/workflows/build_and_test.yml

chronolaw · 2023-12-18T02:38:19Z

The running test title Build & Test / Busted test runner X (pull_request) is a bit indistinct, could we change it back to old Build & Test / Postgres plugins - first tests (pull_request) and others?

hanshuebner · 2023-12-19T15:40:20Z

The running test title Build & Test / Busted test runner X (pull_request) is a bit indistinct, could we change it back to old Build & Test / Postgres plugins - first tests (pull_request) and others?

No, because the runners run tests from all suites based on their schedule now.

Due to false green observed on `master`.

hanshuebner added the skip-changelog label Dec 8, 2023

pull-request-size bot added the size/XXL label Dec 8, 2023

github-actions bot assigned hanshuebner Dec 8, 2023

github-actions bot added the chore Not part of the core functionality of kong, but still needed label Dec 8, 2023

dndx requested changes Dec 9, 2023

View reviewed changes

.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved

hanshuebner force-pushed the feat/test-run-scheduler branch from be7382a to 0fa5599 Compare December 11, 2023 09:06

hanshuebner requested a review from dndx December 11, 2023 09:14

fffonion reviewed Dec 11, 2023

View reviewed changes

.ci/runtimes.txt Outdated Show resolved Hide resolved

fffonion reviewed Dec 11, 2023

View reviewed changes

.ci/schedule-tests/src/combine-statistics.js Outdated Show resolved Hide resolved

fffonion reviewed Dec 11, 2023

View reviewed changes

.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved

AndyZhang0707 reviewed Dec 11, 2023

View reviewed changes

.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved

.github/workflows/build_and_test.yml Show resolved Hide resolved

hanshuebner force-pushed the feat/test-run-scheduler branch from 0fa5599 to 651d791 Compare December 11, 2023 12:17

nowNick reviewed Dec 11, 2023

View reviewed changes

pull-request-size bot added size/XL size/XXL and removed size/XXL size/XL labels Dec 12, 2023

hanshuebner force-pushed the feat/test-run-scheduler branch from 3484afd to b5c9f74 Compare December 13, 2023 07:28

pull-request-size bot added size/XL and removed size/XXL labels Dec 13, 2023

hanshuebner force-pushed the feat/test-run-scheduler branch from 975610e to d5571a0 Compare December 14, 2023 07:19

fffonion reviewed Dec 14, 2023

View reviewed changes

fffonion approved these changes Dec 14, 2023

View reviewed changes

flrgh reviewed Dec 14, 2023

View reviewed changes

.ci/test_suites.json Show resolved Hide resolved

locao approved these changes Dec 14, 2023

View reviewed changes

flrgh approved these changes Dec 14, 2023

View reviewed changes

locao force-pushed the feat/test-run-scheduler branch from b4af8c1 to 543004c Compare December 14, 2023 20:57

dndx reviewed Dec 15, 2023

View reviewed changes

.github/workflows/build_and_test.yml Show resolved Hide resolved

dndx requested a review from ADD-SP December 15, 2023 04:18

windmgc approved these changes Dec 15, 2023

View reviewed changes

ADD-SP approved these changes Dec 15, 2023

View reviewed changes

dndx approved these changes Dec 15, 2023

View reviewed changes

dndx merged commit ac59ffd into master Dec 15, 2023
27 checks passed

dndx deleted the feat/test-run-scheduler branch December 15, 2023 05:58

chronolaw restored the feat/test-run-scheduler branch December 29, 2023 02:55

chronolaw mentioned this pull request Dec 29, 2023

chore(ci): revert dynamic test scheduler and fix tests #12268

Merged

3 tasks

dndx deleted the feat/test-run-scheduler branch December 29, 2023 06:19

dndx pushed a commit that referenced this pull request Dec 29, 2023

chore(actions): revert dynamic test scheduler (#12180)

298e928

Due to false green observed on `master`.

dndx pushed a commit that referenced this pull request Dec 29, 2023

chore(actions): revert dynamic test scheduler (#12180)

e804fd4

Due to false green observed on `master`.

This was referenced Jan 23, 2024

[backport 3.5]feat(router/atc): http segments matching and other improvements #12397

Merged

[backport 3.4]feat(router/atc): http segments matching and other improvements #12400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): dynamic test scheduler / balancer #12180

feat(ci): dynamic test scheduler / balancer #12180

hanshuebner commented Dec 8, 2023 •

edited

Loading

nowNick left a comment

CLAassistant commented Dec 12, 2023 •

edited

Loading

fffonion Dec 14, 2023

hanshuebner Dec 14, 2023

fffonion Dec 14, 2023

hanshuebner Dec 14, 2023

hanshuebner Dec 14, 2023

flrgh Dec 14, 2023

locao left a comment

flrgh left a comment

chronolaw commented Dec 18, 2023 •

edited

Loading

hanshuebner commented Dec 19, 2023

		assert(pl_path.mkdir(usr_interface_path))
		os.execute("mkdir -p " .. usr_interface_path)

feat(ci): dynamic test scheduler / balancer #12180

feat(ci): dynamic test scheduler / balancer #12180

Conversation

hanshuebner commented Dec 8, 2023 • edited Loading

Summary

Checklist

Issue reference

nowNick left a comment

Choose a reason for hiding this comment

CLAassistant commented Dec 12, 2023 • edited Loading

fffonion Dec 14, 2023

Choose a reason for hiding this comment

hanshuebner Dec 14, 2023

Choose a reason for hiding this comment

fffonion Dec 14, 2023

Choose a reason for hiding this comment

hanshuebner Dec 14, 2023

Choose a reason for hiding this comment

hanshuebner Dec 14, 2023

Choose a reason for hiding this comment

flrgh Dec 14, 2023

Choose a reason for hiding this comment

locao left a comment

Choose a reason for hiding this comment

flrgh left a comment

Choose a reason for hiding this comment

chronolaw commented Dec 18, 2023 • edited Loading

hanshuebner commented Dec 19, 2023

hanshuebner commented Dec 8, 2023 •

edited

Loading

CLAassistant commented Dec 12, 2023 •

edited

Loading

chronolaw commented Dec 18, 2023 •

edited

Loading