feat(k6): add automated state consistency checking #5661

lc525 · 2024-06-04T10:13:15Z

Adds a number of reusable features to the k6 scenarios, so that we may periodically check that the state of objects (models, pipelines) on the seldon scheduler matches their state as viewed from the operator (k8s) side.

k6 periodic randezvous implementation, with time slots where VU 1 executes code exclusively and other VUs wait. This is needed because we need other VUs to stop making changes to the state while we check consistency
fetching object status from scheduler via k6 grpc streaming
use of async operations during VU iteration code
adds detailed state consistency checks for Models and Pipelines

Which issue(s) this PR fixes:

INFRA-1026 (Internal): Add automated state consistency checks to k6

- chosen version propagated to Dockerfile - also used for local k6 build

Adds a number of reusable features to the k6 scenarios, so that we may periodically check that the state of objects (models, pipelines) on the seldon scheduler matches their state as viewed from the operator (k8s) side. - k6 periodic randezvous implementation, with time slots where VU 1 executes code exclusively and other VUs wait. This is needed because we need other VUs to stop making changes to the state while we check consistency - fetching object status from scheduler via k6 grpc streaming - use of async operations during VU iteration code - adds detailed state consistency checks for Models and Pipelines **Which issue(s) this PR fixes:** - INFRA-1026 (Internal): Add automated state consistency checks to k6

lc525 · 2024-06-04T10:20:42Z

tests/k6/scenarios/core2_qa_control_plane_ops.js

    const numModelTypes = config.modelType.length

-    var idx = Math.floor(Math.random() * numModelTypes)


Rather than picking a random model type and then checking whether we have configured at least 1 model of that type, we now only directly generate a random index amongst the ones that are > 0. Giving up on the rejection sampling strategy means we avoid remaining in the loop for too long (i.e when lots of model types are configured to have a max of 0 models)

tests/k6/components/utils.js

- Until now, pipelines which were not deleted together with their model would never get deleted. We now pick a victim out of the existing pipelines which gets deleted when the pipeline corresponding to the model doesn't. - Cleanup k8s module, better function naming and organisation. - Initial (incomplete) support for experiments as part of state checking.

- fix ExperimentStatus scheduler subscription request - fix Pipeline state consistency version check - fix wrong Pipeline name on Update operation

This is to get more "functional" pipelines remaining in the pool, rather than most of them having unavailable models.

lc525 · 2024-06-06T12:44:37Z

tests/k6/scenarios/core2_qa_control_plane_ops.js

+                    // means that the probability to delete the pipeline
+                    // associated with the deleted model is slightly larger than
+                    // 0.8
+                    let unloadAltPipeline = Math.random() > 0.8 ? 0 : 1


With probability 0.5, too many pipelines were left inactive (required model missing). Once a model is deleted, the corresponding pipeline was deleted only with probability 0.5 when the operation was delete (this happens 0.33% of time for no bias between operations). So the probability of the "orphan" pipeline being deleted ended up being ~0.1 (0.5 * 0.33), and even lower if the bias towards create/update ops was introduced.

sakoush

Great work! This is important to assert consistency for Core 2.

tests/k6/Makefile

tests/k6/components/k8s.js

tests/k6/components/scheduler.js

tests/k6/components/utils.js

tests/k6/scenarios/core2_qa_control_plane_ops.js

lc525 added 3 commits June 4, 2024 10:50

add: control the k6 version being built via Makefile var

004313a

- chosen version propagated to Dockerfile - also used for local k6 build

feat: add option to enable/disable state consistency checking

c20faf4

lc525 added the v2 label Jun 4, 2024

lc525 requested a review from sakoush as a code owner June 4, 2024 10:13

lc525 commented Jun 4, 2024

View reviewed changes

tests/k6/components/utils.js Show resolved Hide resolved

lc525 added 7 commits June 4, 2024 12:42

fix matching version checks

7b417bf

fix: minor logic mistakes

9fa83e6

- fix ExperimentStatus scheduler subscription request - fix Pipeline state consistency version check - fix wrong Pipeline name on Update operation

fix: update random pipeline if the one associated with the model is gone

f2efb2e

fix: improve control-plane test code readability

4fd1b3b

fix: pipeline selection on update

1c2f97c

fix: delete the pipeline together with the model more frequently

5a716dc

This is to get more "functional" pipelines remaining in the pool, rather than most of them having unavailable models.

lc525 commented Jun 6, 2024

View reviewed changes

sakoush approved these changes Jun 19, 2024

View reviewed changes

minor fixes/cleanup following code review

58c0fc7

lc525 merged commit 5ab82b6 into SeldonIO:v2 Jun 21, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(k6): add automated state consistency checking #5661

feat(k6): add automated state consistency checking #5661

lc525 commented Jun 4, 2024 •

edited

Loading

lc525 Jun 4, 2024

lc525 Jun 6, 2024 •

edited

Loading

sakoush left a comment

		const numModelTypes = config.modelType.length

		var idx = Math.floor(Math.random() * numModelTypes)

feat(k6): add automated state consistency checking #5661

feat(k6): add automated state consistency checking #5661

Conversation

lc525 commented Jun 4, 2024 • edited Loading

lc525 Jun 4, 2024

Choose a reason for hiding this comment

lc525 Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

sakoush left a comment

Choose a reason for hiding this comment

lc525 commented Jun 4, 2024 •

edited

Loading

lc525 Jun 6, 2024 •

edited

Loading