Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api,ui,sdk): Make CPU limits configurable #586

Merged
merged 26 commits into from
Jun 3, 2024

Conversation

deadlycoconuts
Copy link
Contributor

@deadlycoconuts deadlycoconuts commented May 17, 2024

Description

As of present, users are not able to configure the CPU limits of the pods in which Merlin models and transformers are deployed in - they are instead determined automatically on the platform-level (Merlin API server). Depending on how the API server has been configured, one of the following happens:

  • the CPU limit of a model is set as its CPU request value, multiplied by a scaling factor (e.g. 2 CPU * 1.5) or,
    • Note that this is the existing way memory limits are automatically set by the Merlin API server
  • the CPU limit is left unset
    • Note that because KServe does not currently allow CPU limits to be completely unset, the Merlin API server instead sets an arbitrary value (ideally one that is very big) as the CPU limit instead

This PR introduces a new workflow which would allow users to instead override the platform-level CPU limits (described in the paragraph above) set on a model. This workflow is available via the UI, SDK and by extension, directly calling the API endpoint of the API server.

UI:
Screenshot 2024-05-24 at 2 13 46 PM

Screenshot 2024-05-24 at 2 23 42 PM

SDK:

merlin.deploy(
    version_1,
    resource_request=merlin.ResourceRequest(
        min_replica=0,
        max_replica=0,
        cpu_request="0.5",
        cpu_limit="2",
        memory_request="1Gi",
    ),
)

In addition, this PR adds a new configuration, DefaultEnvVarsWithoutCPULimits, which is a list of env vars that automatically get added to all Merlin models and transformers when CPU limits are not set. This allows the Merlin API server's operators to set env vars platform-wide that can potentially improve these deployments' performance, e.g. env vars involving concurrency.

Modifications

  • api/cluster/resource/templater.go - Refactoring of templater methods to set default env vars when cpu limits are not explicitly set and when the cpu limit scaling factor is set as 0
  • api/config/config.go - Addition of the new field DefaultEnvVarsWithoutCPULimits
  • api/config/config_test.go - Addition of a new unit test to test the parsing of configs from .yaml files
  • docs/user/templates/model_deployment/01_deploying_a_model_version.md - Addition of docs to demonstrate how the platform-level CPU limits can be overriden
  • python/sdk/merlin/resource_request.py - Addition of a new cpu limit field to the resource request class
  • ui/src/pages/version/components/forms/components/CPULimitsFormGroup.js - Addition of a new form group to allow cpu limits to be specified on the UI

Tests

  • Deploying existing models (and transformers) with and without CPU limits set

Checklist

  • Added PR label
  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduces API changes

Release Notes

NONE

@deadlycoconuts deadlycoconuts added the enhancement New feature or request label May 17, 2024
@deadlycoconuts deadlycoconuts self-assigned this May 17, 2024
Copy link

codecov bot commented May 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.50%. Comparing base (cfd27a2) to head (6f1f13b).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #586      +/-   ##
==========================================
+ Coverage   60.31%   60.50%   +0.19%     
==========================================
  Files         274      274              
  Lines       21930    22066     +136     
==========================================
+ Hits        13226    13351     +125     
- Misses       7855     7859       +4     
- Partials      849      856       +7     
Flag Coverage Δ
api-test 58.46% <ø> (+0.22%) ⬆️
sdk-test-3.10 75.43% <ø> (-0.05%) ⬇️
sdk-test-3.8 75.35% <ø> (-0.05%) ⬇️
sdk-test-3.9 75.35% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@deadlycoconuts deadlycoconuts force-pushed the make_cpu_limits_configurable branch 2 times, most recently from d42ac0e to 267781e Compare May 27, 2024 03:54
Copy link
Contributor

@leonlnj leonlnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, LGTM.

@deadlycoconuts deadlycoconuts merged commit c05af1d into main Jun 3, 2024
35 checks passed
@deadlycoconuts deadlycoconuts deleted the make_cpu_limits_configurable branch June 3, 2024 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants