-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set initial scale during update of a router deployment #350
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
krithika369
force-pushed
the
set_initial_scale
branch
from
July 8, 2023 03:42
40a0c65
to
cb6ab24
Compare
krithika369
force-pushed
the
set_initial_scale
branch
from
July 8, 2023 08:27
cb6ab24
to
fa2afcc
Compare
5 tasks
krithika369
added a commit
to caraml-dev/merlin
that referenced
this pull request
Jul 18, 2023
<!-- Thanks for sending a pull request! Here are some tips for you: 1. Run unit tests and ensure that they are passing 2. If your change introduces any API changes, make sure to update the e2e tests 3. Make sure documentation is updated for your PR! --> **What this PR does / why we need it**: <!-- Explain here the context and why you're making the change. What is the problem you're trying to solve. ---> This PR implements setting the initial scale of a model version when it is redeployed, to match the current scale it is at. This is done using the [Knative Initial Scale](https://knative.dev/docs/serving/autoscaling/scale-bounds/#initial-scale) annotation, as the Predictor / Transformer specs do not directly support it. One caveat is that annotations applied to inference services will be applied to all components (predictor and transformer) and thus, it is not possible to individually control their initial scales (see related issue: kserve/kserve#666). Thus, the max of the 2 values (current scale of predictor, and transformer, if enabled), will be used as the initial scale. Autoscaling will eventually correct the replicas as needed, and within the max threshold for the specific component. Note that this change is only applicable to **serverless deployments**. Main changes: * `api/cluster/controller.go` - Add method `GetCurrentDeploymentScale` to the controller interface, to retrieve the current scale from the Knative revision. If patching an existing deployment that is serverless, get the current scale of the components and pass it down to the templater. * `api/cluster/resource/templater.go` - `PatchInferenceServiceSpec` now takes in the current replicas of the existing inference service deployment and applies the relevant annotation. Related Turing PR: caraml-dev/turing#350 **Does this PR introduce a user-facing change?**: <!-- If no, just write "NONE" in the release-note block below. If yes, a release note is required. Enter your extended release note in the block below. If the PR requires additional action from users switching to the new release, include the string "action required". For more information about release notes, see kubernetes' guide here: http://git.k8s.io/community/contributors/guide/release-notes.md --> ```release-note When re-deploying a model version using serverless deployment, its current scale will be applied at start up. ``` **Checklist** - [x] Added unit test, integration, and/or e2e tests - [x] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduce API changes
ariefrahmansyah
approved these changes
Jul 20, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Awesome job as always, @krithika369!
Thanks for the detailed review, @ariefrahmansyah ! Merging. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR includes the addition of the Knative Intial Scale annotation to the components deployed for the router. When deploying a router version, if there is an existing version of the router already deployed, their respective components' current desired scale will be copied over as the initial scale for the new deployment. This way, the new deployment will start with a number of replicas that matches the old deployment, rather than at min replicas.
Changes
api/turing/api/deployment_controller.go
- Pass down the current router version to the deployment serviceapi/turing/cluster/controller.go
- AddGetKnativeServiceDesiredReplicas
to retrieve the currently deployed revision's desired replicas, if existsapi/turing/cluster/knative_service.go
- AddInitialReplicas
to theKnativeService
struct and copy its value to the K8s object's annotations, if setapi/turing/cluster/servicebuilder/*
- Propagate the initial scale down to theKnativeService
structapi/turing/service/router_deployment_service.go
-createServices
has the main logic changes to compare current version's properties and set the initial scale on the new deployment