Disable autoscaling tracking on agent for models that are fixed #4501

sakoush · 2022-12-14T09:46:47Z

What this PR does / why we need it:
Previously the agent will track autoscaling metrics for all models regardless of whether the user would want to autoscale them. The scheduler will reject later any autoscaling events. This strategy suffers from the issue that there are unnecessary grpc messages moving from agent->scheduler.

This PR disables tracking autoscaling metrics for models that are fixed (i.e. autoscaling is not set on them). Therefore no events will be fired from agent to scheduler. This is done by the scheduler setting a enable autoscaling flag during model load. There is a caveat with this strategy though:

we cannot set autoscaling later after the model (replica) has been loaded to a particular server and the user would have to create a new version and do a rolling update

This PR also adds a flag disable-autoscaling to disable (model) autoscaling service on scheduler cmdline args. This is set by default in local (docker-compose) deployment as it is not possible to add model replicasgiven we have only one server replica (so far)

seldondev · 2022-12-17T09:46:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cliveseldon
To complete the pull request process, please assign
You can assign the PR to them by writing /assign in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* initial commit for pipeline inputs from other pipelines * Add extended example * add trigger example * fix kafka header test - headers now lowercase * Fix cli x-seldon-route and also step tensor map and add step pipeline example * Update notebook * Ensure tensorMap works across pipelines by sending pipeline name in tensormap * Update docs/source/contents/pipelines/index.md Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com> * Update docs/source/contents/pipelines/index.md Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com> * [SCv2] Improve inference docs re routing and headers (#4481) * Capitalise Seldon when used as proper noun * Formatting, capitalisation, etc. * Add detail to inference docs page * Formatting and typo fixes * Use consistent capitalisation of model & pipeline through inference docs page * Fix typo in inference docs for Kafka topics * Specify use of inference v2 protocol high up in inference docs * Add mention of headers to sync inference introduction * Formatting + minor rewording for clarity * Use tabs for Compose vs. k8s methods for finding the seldon-mesh endpoint * Add note on port-forwarding seldon-mesh svc for inference requests * Add note on service meshes for sending inference requests * Add section on inference request routing with headers * Add section on path-based routing for inference endpoints * Add subsection header for Seldon routing (vs. ingress routing) * Add section on routing from ingress -> seldon-mesh for inference calls * Add links to RFCs for host & authority headers * Update link to RFC for HTTP/1 Host header RFC-7230 obsoletes RFC-2616, the previous link. * Add line describing virtual hosts vs. physical ones * Use tabs for alternate ways of making inference requests * Add inference request example with Seldon CLI * Use consistent capitalisation of v2 for inference protocol * Add note on Kafka headers for pipelines * Use ordinal numbering for bullet points * Update URI for consistency and to avoid confusion * Move section on making requests above section on routing * Use interpolation syntax to clarify usage of path-based routing in Seldon mesh * Add second form of path-based routing for pipelines in Seldon mesh * Clarify wording re virtual endpoints in SCv2 * Add section for header-based routing examples This section builds on the examples from the prior section on making inference requests. * Update basic examples to exclude routing headers Routing headers are then given in the examples relevant to that section. * Formatting * Use group-tabs for example requests with different clients * Add emphasis to header lines in examples for header-based routing * Add notes on support for subdomain-based routing * Add example snippets for subdomain routing * Add Open Inference schema for iris model for examples * Move pipeline inference tip lower for better flow * Fix datatype for iris model inputs * Bump MLServer version to 1.2.1 (#4503) * add a notebook test for changing model replicas (#4504) * Disable autoscaling tracking on agent for models that are fixed (#4501) * add flag for autoscaling in grpc msg * autogen files * extract helper function * adjust comment * wire up autoscaling flag in server * wire up autoscaling in agent client * set thresholds for scaling in local deployment * add autoscaling flag to scheduler * add a toggle for autoscaling service * revert autoscaling envs set in local deployment * disable scaling for local deployment * use a disable toggle instead * do not disable by default scaling service * Upgrading docker compose CLI command (#4498) Not sure if this is necessary but it actually took me some time to figure it out as I was sure that I have `docker compose` already installed. According to the [Docker documentation](https://docs.docker.com/compose/reference/) the spaced version looks like the newer one and maybe the makefile should be updated for that. * Ensure x-request-id header matches kafka key (#4511) * Fix possible SIGSEV after producer close in modelgateway (#4515) * Fix possible SIGSEV after producer close in modelgateway * Set running after setup * review comments * Link how to install docker compose v2 from github releases (#4516) * link compose github for easier installation * Update docs/source/contents/getting-started/docker-installation/index.md Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com> Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com> * review comments Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com> Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com> Co-authored-by: Adrian Gonzalez-Martin <agm@seldon.io> Co-authored-by: Sherif Akoush <sa@seldon.io> Co-authored-by: Saeid <s.ghafouri@qmul.ac.uk> Co-authored-by: RafalSkolasinski <r.j.skolasinski@gmail.com>

sakoush added 6 commits December 14, 2022 08:27

add flag for autoscaling in grpc msg

c061814

autogen files

23d8d1e

extract helper function

379fd0e

adjust comment

c0010ac

wire up autoscaling flag in server

3947e91

wire up autoscaling in agent client

a24882c

seldondev added the size/L label Dec 14, 2022

sakoush requested a review from ukclivecox December 14, 2022 09:50

sakoush added 7 commits December 14, 2022 10:36

set thresholds for scaling in local deployment

2a4396f

add autoscaling flag to scheduler

247b5d1

add a toggle for autoscaling service

f907982

revert autoscaling envs set in local deployment

124b36a

disable scaling for local deployment

3f2fcd0

use a disable toggle instead

56b9aa5

do not disable by default scaling service

79305c8

sakoush added the v2 label Dec 14, 2022

ukclivecox approved these changes Dec 17, 2022

View reviewed changes

ukclivecox merged commit bf57bc5 into SeldonIO:v2 Dec 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable autoscaling tracking on agent for models that are fixed #4501

Disable autoscaling tracking on agent for models that are fixed #4501

sakoush commented Dec 14, 2022 •

edited

Loading

seldondev commented Dec 17, 2022

Disable autoscaling tracking on agent for models that are fixed #4501

Disable autoscaling tracking on agent for models that are fixed #4501

Conversation

sakoush commented Dec 14, 2022 • edited Loading

seldondev commented Dec 17, 2022

sakoush commented Dec 14, 2022 •

edited

Loading