-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(k6): add automated state consistency checking #5661
Conversation
- chosen version propagated to Dockerfile - also used for local k6 build
Adds a number of reusable features to the k6 scenarios, so that we may periodically check that the state of objects (models, pipelines) on the seldon scheduler matches their state as viewed from the operator (k8s) side. - k6 periodic randezvous implementation, with time slots where VU 1 executes code exclusively and other VUs wait. This is needed because we need other VUs to stop making changes to the state while we check consistency - fetching object status from scheduler via k6 grpc streaming - use of async operations during VU iteration code - adds detailed state consistency checks for Models and Pipelines **Which issue(s) this PR fixes:** - INFRA-1026 (Internal): Add automated state consistency checks to k6
const numModelTypes = config.modelType.length | ||
|
||
var idx = Math.floor(Math.random() * numModelTypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than picking a random model type and then checking whether we have configured at least 1 model of that type, we now only directly generate a random index amongst the ones that are > 0. Giving up on the rejection sampling strategy means we avoid remaining in the loop for too long (i.e when lots of model types are configured to have a max of 0 models)
- Until now, pipelines which were not deleted together with their model would never get deleted. We now pick a victim out of the existing pipelines which gets deleted when the pipeline corresponding to the model doesn't. - Cleanup k8s module, better function naming and organisation. - Initial (incomplete) support for experiments as part of state checking.
- fix ExperimentStatus scheduler subscription request - fix Pipeline state consistency version check - fix wrong Pipeline name on Update operation
This is to get more "functional" pipelines remaining in the pool, rather than most of them having unavailable models.
// means that the probability to delete the pipeline | ||
// associated with the deleted model is slightly larger than | ||
// 0.8 | ||
let unloadAltPipeline = Math.random() > 0.8 ? 0 : 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With probability 0.5, too many pipelines were left inactive (required model missing). Once a model is deleted, the corresponding pipeline was deleted only with probability 0.5 when the operation was delete (this happens 0.33% of time for no bias between operations). So the probability of the "orphan" pipeline being deleted ended up being ~0.1 (0.5 * 0.33), and even lower if the bias towards create/update ops was introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This is important to assert consistency for Core 2.
Adds a number of reusable features to the k6 scenarios, so that we may periodically check that the state of objects (models, pipelines) on the seldon scheduler matches their state as viewed from the operator (k8s) side.
Which issue(s) this PR fixes: