-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable autoscaling tracking on agent for models that are fixed #4501
Disable autoscaling tracking on agent for models that are fixed #4501
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cliveseldon The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* initial commit for pipeline inputs from other pipelines * Add extended example * add trigger example * fix kafka header test - headers now lowercase * Fix cli x-seldon-route and also step tensor map and add step pipeline example * Update notebook * Ensure tensorMap works across pipelines by sending pipeline name in tensormap * Update docs/source/contents/pipelines/index.md Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com> * Update docs/source/contents/pipelines/index.md Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com> * [SCv2] Improve inference docs re routing and headers (#4481) * Capitalise Seldon when used as proper noun * Formatting, capitalisation, etc. * Add detail to inference docs page * Formatting and typo fixes * Use consistent capitalisation of model & pipeline through inference docs page * Fix typo in inference docs for Kafka topics * Specify use of inference v2 protocol high up in inference docs * Add mention of headers to sync inference introduction * Formatting + minor rewording for clarity * Use tabs for Compose vs. k8s methods for finding the seldon-mesh endpoint * Add note on port-forwarding seldon-mesh svc for inference requests * Add note on service meshes for sending inference requests * Add section on inference request routing with headers * Add section on path-based routing for inference endpoints * Add subsection header for Seldon routing (vs. ingress routing) * Add section on routing from ingress -> seldon-mesh for inference calls * Add links to RFCs for host & authority headers * Update link to RFC for HTTP/1 Host header RFC-7230 obsoletes RFC-2616, the previous link. * Add line describing virtual hosts vs. physical ones * Use tabs for alternate ways of making inference requests * Add inference request example with Seldon CLI * Use consistent capitalisation of v2 for inference protocol * Add note on Kafka headers for pipelines * Use ordinal numbering for bullet points * Update URI for consistency and to avoid confusion * Move section on making requests above section on routing * Use interpolation syntax to clarify usage of path-based routing in Seldon mesh * Add second form of path-based routing for pipelines in Seldon mesh * Clarify wording re virtual endpoints in SCv2 * Add section for header-based routing examples This section builds on the examples from the prior section on making inference requests. * Update basic examples to exclude routing headers Routing headers are then given in the examples relevant to that section. * Formatting * Use group-tabs for example requests with different clients * Add emphasis to header lines in examples for header-based routing * Add notes on support for subdomain-based routing * Add example snippets for subdomain routing * Add Open Inference schema for iris model for examples * Move pipeline inference tip lower for better flow * Fix datatype for iris model inputs * Bump MLServer version to 1.2.1 (#4503) * add a notebook test for changing model replicas (#4504) * Disable autoscaling tracking on agent for models that are fixed (#4501) * add flag for autoscaling in grpc msg * autogen files * extract helper function * adjust comment * wire up autoscaling flag in server * wire up autoscaling in agent client * set thresholds for scaling in local deployment * add autoscaling flag to scheduler * add a toggle for autoscaling service * revert autoscaling envs set in local deployment * disable scaling for local deployment * use a disable toggle instead * do not disable by default scaling service * Upgrading docker compose CLI command (#4498) Not sure if this is necessary but it actually took me some time to figure it out as I was sure that I have `docker compose` already installed. According to the [Docker documentation](https://docs.docker.com/compose/reference/) the spaced version looks like the newer one and maybe the makefile should be updated for that. * Ensure x-request-id header matches kafka key (#4511) * Fix possible SIGSEV after producer close in modelgateway (#4515) * Fix possible SIGSEV after producer close in modelgateway * Set running after setup * review comments * Link how to install docker compose v2 from github releases (#4516) * link compose github for easier installation * Update docs/source/contents/getting-started/docker-installation/index.md Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com> Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com> * review comments Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com> Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com> Co-authored-by: Adrian Gonzalez-Martin <agm@seldon.io> Co-authored-by: Sherif Akoush <sa@seldon.io> Co-authored-by: Saeid <s.ghafouri@qmul.ac.uk> Co-authored-by: RafalSkolasinski <r.j.skolasinski@gmail.com>
What this PR does / why we need it:
Previously the agent will track autoscaling metrics for all models regardless of whether the user would want to autoscale them. The scheduler will reject later any autoscaling events. This strategy suffers from the issue that there are unnecessary grpc messages moving from agent->scheduler.
This PR disables tracking autoscaling metrics for models that are fixed (i.e. autoscaling is not set on them). Therefore no events will be fired from agent to scheduler. This is done by the scheduler setting a enable autoscaling flag during model load. There is a caveat with this strategy though:
This PR also adds a flag
disable-autoscaling
to disable (model) autoscaling service on scheduler cmdline args. This is set by default in local (docker-compose) deployment as it is not possible to add model replicasgiven we have only one server replica (so far)