Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set initial scale during update of a router deployment #350

Merged
merged 11 commits into from
Jul 20, 2023

Conversation

krithika369
Copy link
Collaborator

@krithika369 krithika369 commented Jul 8, 2023

Summary

This PR includes the addition of the Knative Intial Scale annotation to the components deployed for the router. When deploying a router version, if there is an existing version of the router already deployed, their respective components' current desired scale will be copied over as the initial scale for the new deployment. This way, the new deployment will start with a number of replicas that matches the old deployment, rather than at min replicas.

Changes

  • api/turing/api/deployment_controller.go - Pass down the current router version to the deployment service
  • api/turing/cluster/controller.go - Add GetKnativeServiceDesiredReplicas to retrieve the currently deployed revision's desired replicas, if exists
  • api/turing/cluster/knative_service.go - Add InitialReplicas to the KnativeService struct and copy its value to the K8s object's annotations, if set
  • api/turing/cluster/servicebuilder/* - Propagate the initial scale down to the KnativeService struct
  • api/turing/service/router_deployment_service.go - createServices has the main logic changes to compare current version's properties and set the initial scale on the new deployment
  • Updated unit tests

@krithika369 krithika369 marked this pull request as draft July 8, 2023 01:32
@krithika369 krithika369 marked this pull request as ready for review July 10, 2023 01:53
krithika369 added a commit to caraml-dev/merlin that referenced this pull request Jul 18, 2023
<!--  Thanks for sending a pull request!  Here are some tips for you:

1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!

-->

**What this PR does / why we need it**:
<!-- Explain here the context and why you're making the change. What is
the problem you're trying to solve. --->

This PR implements setting the initial scale of a model version when it
is redeployed, to match the current scale it is at. This is done using
the [Knative Initial
Scale](https://knative.dev/docs/serving/autoscaling/scale-bounds/#initial-scale)
annotation, as the Predictor / Transformer specs do not directly support
it. One caveat is that annotations applied to inference services will be
applied to all components (predictor and transformer) and thus, it is
not possible to individually control their initial scales (see related
issue: kserve/kserve#666). Thus, the max of
the 2 values (current scale of predictor, and transformer, if enabled),
will be used as the initial scale. Autoscaling will eventually correct
the replicas as needed, and within the max threshold for the specific
component.

Note that this change is only applicable to **serverless deployments**.

Main changes:
* `api/cluster/controller.go` - Add method `GetCurrentDeploymentScale`
to the controller interface, to retrieve the current scale from the
Knative revision. If patching an existing deployment that is serverless,
get the current scale of the components and pass it down to the
templater.
* `api/cluster/resource/templater.go` - `PatchInferenceServiceSpec` now
takes in the current replicas of the existing inference service
deployment and applies the relevant annotation.

Related Turing PR: caraml-dev/turing#350

**Does this PR introduce a user-facing change?**:
<!--
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->

```release-note
When re-deploying a model version using serverless deployment, its current scale will be applied at start up.
```

**Checklist**

- [x] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [ ] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduce API
changes
api/turing/cluster/controller.go Outdated Show resolved Hide resolved
api/turing/cluster/knative_service.go Outdated Show resolved Hide resolved
api/turing/cluster/servicebuilder/service_builder.go Outdated Show resolved Hide resolved
api/turing/service/router_deployment_service.go Outdated Show resolved Hide resolved
api/turing/service/router_deployment_service.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ariefrahmansyah ariefrahmansyah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome job as always, @krithika369!

@krithika369
Copy link
Collaborator Author

Thanks for the detailed review, @ariefrahmansyah ! Merging.

@krithika369 krithika369 merged commit f6fb569 into caraml-dev:main Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants