-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sdk): Fix SDK model redeployment #466
Conversation
c729a13
to
654b4cd
Compare
python/sdk/merlin/model.py
Outdated
@@ -1029,6 +1029,21 @@ def deploy(self, environment_name: str = None, | |||
:param protocol: protocol to be used for deploying the model (default: HTTP_JSON) | |||
:return: VersionEndpoint object | |||
""" | |||
current_endpoint = self.endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line checks for the existence of an existing endpoint that has been deployed.
# Check that the deployment configs of v2 have remained the same as those used in v1 | ||
assert endpoint.resource_request.cpu_request == "123m" | ||
assert endpoint.resource_request.memory_request == "234Mi" | ||
assert endpoint.env_vars == {"green": "TRUE"} | ||
assert endpoint.deployment_mode == DeploymentMode.RAW_DEPLOYMENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional checks to ensure the original configs do not get overwritten by the default configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add another test case for redeployment where the first version is deployed with transformer and the second version without transformer. The expected result will be no transformer in the second version, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that's actually not how the API server works right now though 👀 It seems like the intended behaviour (from the implementation in the API server) for the update endpoint workflow is to only 'override' or 'add on' additional configs to the existing endpoint configs. In the case of the transformer, if there's a transformer in the original 'configs' but no transformer specified in the new 'configs', the existing transformer will simply not get overwritten and remain in the newly deployed endpoint. So there's actually no way to 'remove' the transformer config by using this UpdateEndpoint
endpoint.
Actually this behaviour exists for all the other endpoint configs, even the 'null-able' ones, like env vars and the logger. I suppose this is a design choice for the update endpoint workflow and not a bug? In order to remove the transformer, env vars or the logger, I think the user has to create an entirely new model version right? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I have forgotten this. Thanks for checking it.
-> I suppose this is a design choice for the update endpoint workflow and not a bug?
Yes, this is a design choice.
-> In order to remove the transformer, env vars or the logger, I think the user has to create an entirely new model version right?
It's not necessarily to create an entirely new model version. You can still using redeploy but with empty or disabled value. For example:
merlin.deploy(v, transformer=StandardTransformer(enabled=False))
a0e7469
to
c70ab65
Compare
python/sdk/merlin/model.py
Outdated
target_deployment_mode = None | ||
if deployment_mode is None: | ||
if current_endpoint is None: | ||
target_deployment_mode = DeploymentMode.SERVERLESS.value | ||
else: | ||
target_deployment_mode = deployment_mode.value | ||
|
||
target_protocol = None | ||
if protocol is None: | ||
if current_endpoint is None: | ||
target_protocol = Protocol.HTTP_JSON.value | ||
else: | ||
target_protocol = protocol.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ordering them by each parameter, what do you think of rearranging the order by grouping the activity. For example, something like this:
# Initialize target parameters
target_deployment_mode = None
target_protocol = None
...
# Get the currently deployed endpoint and if there's no deployed endpoint yet, use the default value
current_endpoint = self.endpoint
if current_endpoint is None:
target_deployment_mode = DeploymentMode.SERVERLESS.value
target_protocol = Protocol.HTTP_JSON.value
...
# Update target values from given user inputs
if deployment_mode is not None:
target_deployment_mode = deployment_mode.value
if protocol is not None:
target_protocol = protocol.value
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, that's an awesome suggestion 😁 I was also thinking the deploy
method was starting to look a little messy with the setting of each parameter manually but yeah I think your suggestion looks really clean; let me try refactoring this a little, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I refactored the deploy
method quite a bit so now it looks like what you had suggested, and also by creating some helper functions here and there to make the method easier to understand.
python/sdk/merlin/model.py
Outdated
resource_request.cpu_request, resource_request.memory_request) | ||
if resource_request.gpu_request is not None and resource_request.gpu_name is not None: | ||
env_api = EnvironmentApi(self._api_client) | ||
env_list = env_api.environments_get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: Currently, the deploy() function could call List Environments API up to three times. We can optimize it in different PR. This comment is just me thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, while refactoring the deploy
method, I actually went to simplify it also so it does not repeat the ListEnvironments API call multiple times. It now calls that API once and reuses the response of that call throughout the entire deploy
method.
# Check that the deployment configs of v2 have remained the same as those used in v1 | ||
assert endpoint.resource_request.cpu_request == "123m" | ||
assert endpoint.resource_request.memory_request == "234Mi" | ||
assert endpoint.env_vars == {"green": "TRUE"} | ||
assert endpoint.deployment_mode == DeploymentMode.RAW_DEPLOYMENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add another test case for redeployment where the first version is deployed with transformer and the second version without transformer. The expected result will be no transformer in the second version, right?
5400bf2
to
9f4c510
Compare
@@ -1029,82 +1029,67 @@ def deploy(self, environment_name: str = None, | |||
:param protocol: protocol to be used for deploying the model (default: HTTP_JSON) | |||
:return: VersionEndpoint object | |||
""" | |||
current_endpoint = self.endpoint | |||
env_list = self._get_env_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call reduces all the redundant calls to the ListEnvironments API to just 1 call at the start of the entire method.
return target_env_vars | ||
|
||
@staticmethod | ||
def _create_transformer_spec(transformer: Transformer, target_env_name: str, env_list: List[client.models.Environment]) -> client.Transformer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually almost the same as the original create_transformer_spec
method - the only changes I made was make this method static and reused some other helper methods.
09db525
to
90685e7
Compare
995321f
to
69f1870
Compare
@@ -136,7 +136,7 @@ def requests(): | |||
retry_strategy = Retry( | |||
total=5, | |||
status_forcelist=[429, 500, 502, 503, 504, 404], | |||
method_whitelist=["POST"], | |||
allowed_methods=["POST"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in this argument is done because the keyword argument method_whitelist
has been deprecated as of urllib3>=2
. allowed_methods
was introduced as a replacement from urllib3>=1.26
onwards (the arguments for method_whitelist
get passed to allowed_methods
under the hood anyway).
This was changed because the e2e tests were breaking: https://github.com/caraml-dev/merlin/actions/runs/6490922979/job/17627543254#step:14:222.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
"urllib3>=1.23", | ||
"urllib3>=1.26", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bumped up min version of urllib3
. Context: https://github.com/caraml-dev/merlin/pull/466/files#r1356056883
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Zi Yi!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for improving the codebase as well, Zi Yi!
"urllib3>=1.23", | ||
"urllib3>=1.26", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Zi Yi!
@@ -136,7 +136,7 @@ def requests(): | |||
retry_strategy = Retry( | |||
total=5, | |||
status_forcelist=[429, 500, 502, 503, 504, 404], | |||
method_whitelist=["POST"], | |||
allowed_methods=["POST"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks a lot for the detailed review and the suggestions @ariefrahmansyah and @tkpd-hafizhan ! Merging this now! 🚀 |
commit 637050b Author: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com> Date: Fri Oct 13 10:45:16 2023 +0800 fix(sdk): Fix SDK model redeployment (#466) **What this PR does / why we need it**: This PR introduces a fix to PR #455, which itself introduced a new feature to allow the redeployment of existing Merlin model versions using the `deploy` method. While this new feature assumes that a user would be able to only **_specify fields he/she intends to update_** in the model version configuration when calling the `deploy` method, the existing implementation however unexpectedly overwrites some of the existing configuration with certain default configuration values if they are not explicitly specified. These configuration values include: - protocol - deployment mode - resources - autoscaling policy This happens because the 'deploy' method indirectly sets default values for the aforementioned configuration if they aren't specified (this was primarily intended to make the **initial** deployment process simpler so users wouldn't have to specify too many config values), and does not distinguish between a user calling `deploy` on an existing model version or on a new model version. This PR thus makes changes to the SDK to set these defaults selectively. A separate change is made to the `override` method used in the Merlin API server update model version endpoint to override the existing model protocol ONLY when the new model version explicitly specifies a protocol. This change is a **_partial reversion_** of PR #300 (change 2) which means that from now on users need to explicitly specify the HTTP protocol (as opposed to leaving it as `null`) when updating a UPI model to use HTTP instead. It's ultimately a minor change since the main fix that had been introduced in that PR involves change 3 (in that PR). **Which issue(s) this PR fixes**: Fixes #455 **Does this PR introduce a user-facing change?**: ```release-note Users need to explicitly specify the HTTP protocol (as opposed to leaving it as `null`) when updating a UPI model to use HTTP instead ``` **Checklist** - [x] Added unit test, integration, and/or e2e tests - [x] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduce API changes
…out deployment type specified (#474) **What this PR does / why we need it**: As a follow up to #455 and #466, this tiny change to the update endpoint _endpoint_ allows requests that do not specify a deployment type to avoid the check that checks for different deployment types between a deployed endpoint and the new endpoint configuration. What will instead happen if a deployment type isn't specified is that the existing deployed endpoint's deployment type will be used. The main change is just the addition of another boolean condition, but there's a unit test that I added which is just an almost exact copy of another unit test case. **Which issue(s) this PR fixes**: Fixes # **Does this PR introduce a user-facing change?**: ```release-note NONE ``` **Checklist** - [x] Added unit test, integration, and/or e2e tests - [x] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduce API changes
What this PR does / why we need it:
This PR introduces a fix to PR #455, which itself introduced a new feature to allow the redeployment of existing Merlin model versions using the
deploy
method.While this new feature assumes that a user would be able to only specify fields he/she intends to update in the model version configuration when calling the
deploy
method, the existing implementation however unexpectedly overwrites some of the existing configuration with certain default configuration values if they are not explicitly specified. These configuration values include:This happens because the 'deploy' method indirectly sets default values for the aforementioned configuration if they aren't specified (this was primarily intended to make the initial deployment process simpler so users wouldn't have to specify too many config values), and does not distinguish between a user calling
deploy
on an existing model version or on a new model version.This PR thus makes changes to the SDK to set these defaults selectively.
A separate change is made to the
override
method used in the Merlin API server update model version endpoint to override the existing model protocol ONLY when the new model version explicitly specifies a protocol. This change is a partial reversion of PR #300 (change 2) which means that from now on users need to explicitly specify the HTTP protocol (as opposed to leaving it asnull
) when updating a UPI model to use HTTP instead. It's ultimately a minor change since the main fix that had been introduced in that PR involves change 3 (in that PR).Which issue(s) this PR fixes:
Fixes #455
Does this PR introduce a user-facing change?:
Checklist