Set initial scale during update of a router deployment #350

krithika369 · 2023-07-08T01:32:52Z

Summary

This PR includes the addition of the Knative Intial Scale annotation to the components deployed for the router. When deploying a router version, if there is an existing version of the router already deployed, their respective components' current desired scale will be copied over as the initial scale for the new deployment. This way, the new deployment will start with a number of replicas that matches the old deployment, rather than at min replicas.

Changes

api/turing/api/deployment_controller.go - Pass down the current router version to the deployment service
api/turing/cluster/controller.go - Add GetKnativeServiceDesiredReplicas to retrieve the currently deployed revision's desired replicas, if exists
api/turing/cluster/knative_service.go - Add InitialReplicas to the KnativeService struct and copy its value to the K8s object's annotations, if set
api/turing/cluster/servicebuilder/* - Propagate the initial scale down to the KnativeService struct
api/turing/service/router_deployment_service.go - createServices has the main logic changes to compare current version's properties and set the initial scale on the new deployment
Updated unit tests

**What this PR does / why we need it**:  This PR implements setting the initial scale of a model version when it is redeployed, to match the current scale it is at. This is done using the [Knative Initial Scale](https://knative.dev/docs/serving/autoscaling/scale-bounds/#initial-scale) annotation, as the Predictor / Transformer specs do not directly support it. One caveat is that annotations applied to inference services will be applied to all components (predictor and transformer) and thus, it is not possible to individually control their initial scales (see related issue: kserve/kserve#666). Thus, the max of the 2 values (current scale of predictor, and transformer, if enabled), will be used as the initial scale. Autoscaling will eventually correct the replicas as needed, and within the max threshold for the specific component. Note that this change is only applicable to **serverless deployments**. Main changes: * `api/cluster/controller.go` - Add method `GetCurrentDeploymentScale` to the controller interface, to retrieve the current scale from the Knative revision. If patching an existing deployment that is serverless, get the current scale of the components and pass it down to the templater. * `api/cluster/resource/templater.go` - `PatchInferenceServiceSpec` now takes in the current replicas of the existing inference service deployment and applies the relevant annotation. Related Turing PR: caraml-dev/turing#350 **Does this PR introduce a user-facing change?**:  ```release-note When re-deploying a model version using serverless deployment, its current scale will be applied at start up. ``` **Checklist** - [x] Added unit test, integration, and/or e2e tests - [x] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduce API changes

api/turing/cluster/controller.go

api/turing/cluster/knative_service.go

api/turing/cluster/servicebuilder/service_builder.go

api/turing/service/router_deployment_service.go

ariefrahmansyah

LGTM! Awesome job as always, @krithika369!

krithika369 · 2023-07-20T03:21:34Z

Thanks for the detailed review, @ariefrahmansyah ! Merging.

Krithika Sundararajan added 2 commits July 8, 2023 08:41

Set initial scale on new deployment if the router is already deployed

2c68e3c

Send current router version down to deployment service

ba619a3

krithika369 marked this pull request as draft July 8, 2023 01:32

Krithika Sundararajan added 2 commits July 8, 2023 10:14

Fix test failure

b49e157

Fix test failure

e533d11

krithika369 force-pushed the set_initial_scale branch from 40a0c65 to cb6ab24 Compare July 8, 2023 03:42

Fix test failure

fa2afcc

krithika369 force-pushed the set_initial_scale branch from cb6ab24 to fa2afcc Compare July 8, 2023 08:27

Krithika Sundararajan added 4 commits July 8, 2023 17:30

Add controller test for GetKnativeServiceDesiredReplicas

136b21d

Fix failing unit tests

8962f73

Upgrade CI lint action

5177f80

Fix lint errors

48ad967

krithika369 marked this pull request as ready for review July 10, 2023 01:53

krithika369 requested review from ariefrahmansyah and deadlycoconuts July 10, 2023 01:53

krithika369 mentioned this pull request Jul 17, 2023

Set initial scale during redeployment of a model version caraml-dev/merlin#431

Merged

5 tasks

ariefrahmansyah reviewed Jul 18, 2023

View reviewed changes

Krithika Sundararajan added 2 commits July 18, 2023 15:14

Merge from main

b85afe1

Address code review comments

c0c256f

krithika369 requested a review from ariefrahmansyah July 18, 2023 08:14

ariefrahmansyah approved these changes Jul 20, 2023

View reviewed changes

krithika369 merged commit f6fb569 into caraml-dev:main Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set initial scale during update of a router deployment #350

Set initial scale during update of a router deployment #350

krithika369 commented Jul 8, 2023 •

edited

Loading

ariefrahmansyah left a comment

krithika369 commented Jul 20, 2023

Set initial scale during update of a router deployment #350

Set initial scale during update of a router deployment #350

Conversation

krithika369 commented Jul 8, 2023 • edited Loading

Summary

Changes

ariefrahmansyah left a comment

Choose a reason for hiding this comment

krithika369 commented Jul 20, 2023

krithika369 commented Jul 8, 2023 •

edited

Loading