Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(k6): add automated state consistency checking #5661

Merged
merged 11 commits into from
Jun 21, 2024

Conversation

lc525
Copy link
Member

@lc525 lc525 commented Jun 4, 2024

Adds a number of reusable features to the k6 scenarios, so that we may periodically check that the state of objects (models, pipelines) on the seldon scheduler matches their state as viewed from the operator (k8s) side.

  • k6 periodic randezvous implementation, with time slots where VU 1 executes code exclusively and other VUs wait. This is needed because we need other VUs to stop making changes to the state while we check consistency
  • fetching object status from scheduler via k6 grpc streaming
  • use of async operations during VU iteration code
  • adds detailed state consistency checks for Models and Pipelines

Which issue(s) this PR fixes:

  • INFRA-1026 (Internal): Add automated state consistency checks to k6

lc525 added 3 commits June 4, 2024 10:50
- chosen version propagated to Dockerfile
- also used for local k6 build
Adds a number of reusable features to the k6 scenarios, so that
we may periodically check that the state of objects (models, pipelines)
on the seldon scheduler matches their state as viewed from the operator
(k8s) side.

- k6 periodic randezvous implementation, with time slots where VU 1
  executes code exclusively and other VUs wait. This is needed because
  we need other VUs to stop making changes to the state while we check
  consistency
- fetching object status from scheduler via k6 grpc streaming
- use of async operations during VU iteration code
- adds detailed state consistency checks for Models and Pipelines

**Which issue(s) this PR fixes:**

- INFRA-1026 (Internal): Add automated state consistency checks to k6
@lc525 lc525 added the v2 label Jun 4, 2024
@lc525 lc525 requested a review from sakoush as a code owner June 4, 2024 10:13
const numModelTypes = config.modelType.length

var idx = Math.floor(Math.random() * numModelTypes)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than picking a random model type and then checking whether we have configured at least 1 model of that type, we now only directly generate a random index amongst the ones that are > 0. Giving up on the rejection sampling strategy means we avoid remaining in the loop for too long (i.e when lots of model types are configured to have a max of 0 models)

lc525 added 7 commits June 4, 2024 12:42
- Until now, pipelines which were not deleted together with their model
would never get deleted. We now pick a victim out of the existing pipelines
which gets deleted when the pipeline corresponding to the model doesn't.
- Cleanup k8s module, better function naming and organisation.
- Initial (incomplete) support for experiments as part of state checking.
- fix ExperimentStatus scheduler subscription request
- fix Pipeline state consistency version check
- fix wrong Pipeline name on Update operation
This is to get more "functional" pipelines remaining in the pool,
rather than most of them having unavailable models.
// means that the probability to delete the pipeline
// associated with the deleted model is slightly larger than
// 0.8
let unloadAltPipeline = Math.random() > 0.8 ? 0 : 1
Copy link
Member Author

@lc525 lc525 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With probability 0.5, too many pipelines were left inactive (required model missing). Once a model is deleted, the corresponding pipeline was deleted only with probability 0.5 when the operation was delete (this happens 0.33% of time for no bias between operations). So the probability of the "orphan" pipeline being deleted ended up being ~0.1 (0.5 * 0.33), and even lower if the bias towards create/update ops was introduced.

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This is important to assert consistency for Core 2.

tests/k6/Makefile Outdated Show resolved Hide resolved
tests/k6/components/k8s.js Outdated Show resolved Hide resolved
tests/k6/components/k8s.js Outdated Show resolved Hide resolved
tests/k6/components/scheduler.js Show resolved Hide resolved
tests/k6/components/utils.js Show resolved Hide resolved
tests/k6/components/utils.js Outdated Show resolved Hide resolved
tests/k6/components/utils.js Outdated Show resolved Hide resolved
tests/k6/components/utils.js Outdated Show resolved Hide resolved
tests/k6/scenarios/core2_qa_control_plane_ops.js Outdated Show resolved Hide resolved
@lc525 lc525 merged commit 5ab82b6 into SeldonIO:v2 Jun 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants