Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk): Fix SDK model redeployment #466

Merged
merged 19 commits into from
Oct 13, 2023
Merged

Conversation

deadlycoconuts
Copy link
Contributor

@deadlycoconuts deadlycoconuts commented Sep 21, 2023

What this PR does / why we need it:
This PR introduces a fix to PR #455, which itself introduced a new feature to allow the redeployment of existing Merlin model versions using the deploy method.

While this new feature assumes that a user would be able to only specify fields he/she intends to update in the model version configuration when calling the deploy method, the existing implementation however unexpectedly overwrites some of the existing configuration with certain default configuration values if they are not explicitly specified. These configuration values include:

  • protocol
  • deployment mode
  • resources
  • autoscaling policy

This happens because the 'deploy' method indirectly sets default values for the aforementioned configuration if they aren't specified (this was primarily intended to make the initial deployment process simpler so users wouldn't have to specify too many config values), and does not distinguish between a user calling deploy on an existing model version or on a new model version.

This PR thus makes changes to the SDK to set these defaults selectively.

A separate change is made to the override method used in the Merlin API server update model version endpoint to override the existing model protocol ONLY when the new model version explicitly specifies a protocol. This change is a partial reversion of PR #300 (change 2) which means that from now on users need to explicitly specify the HTTP protocol (as opposed to leaving it as null) when updating a UPI model to use HTTP instead. It's ultimately a minor change since the main fix that had been introduced in that PR involves change 3 (in that PR).

Which issue(s) this PR fixes:
Fixes #455

Does this PR introduce a user-facing change?:

Users need to explicitly specify the HTTP protocol (as opposed to leaving it as `null`) when updating a UPI model to use HTTP instead

Checklist

  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduce API changes

@deadlycoconuts deadlycoconuts added the bug Something isn't working label Sep 21, 2023
@deadlycoconuts deadlycoconuts self-assigned this Sep 21, 2023
@ghost
Copy link

ghost commented Sep 21, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@@ -1029,6 +1029,21 @@ def deploy(self, environment_name: str = None,
:param protocol: protocol to be used for deploying the model (default: HTTP_JSON)
:return: VersionEndpoint object
"""
current_endpoint = self.endpoint
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line checks for the existence of an existing endpoint that has been deployed.

Comment on lines +1127 to +1131
# Check that the deployment configs of v2 have remained the same as those used in v1
assert endpoint.resource_request.cpu_request == "123m"
assert endpoint.resource_request.memory_request == "234Mi"
assert endpoint.env_vars == {"green": "TRUE"}
assert endpoint.deployment_mode == DeploymentMode.RAW_DEPLOYMENT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional checks to ensure the original configs do not get overwritten by the default configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add another test case for redeployment where the first version is deployed with transformer and the second version without transformer. The expected result will be no transformer in the second version, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's actually not how the API server works right now though 👀 It seems like the intended behaviour (from the implementation in the API server) for the update endpoint workflow is to only 'override' or 'add on' additional configs to the existing endpoint configs. In the case of the transformer, if there's a transformer in the original 'configs' but no transformer specified in the new 'configs', the existing transformer will simply not get overwritten and remain in the newly deployed endpoint. So there's actually no way to 'remove' the transformer config by using this UpdateEndpoint endpoint.

Actually this behaviour exists for all the other endpoint configs, even the 'null-able' ones, like env vars and the logger. I suppose this is a design choice for the update endpoint workflow and not a bug? In order to remove the transformer, env vars or the logger, I think the user has to create an entirely new model version right? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I have forgotten this. Thanks for checking it.

-> I suppose this is a design choice for the update endpoint workflow and not a bug?
Yes, this is a design choice.

-> In order to remove the transformer, env vars or the logger, I think the user has to create an entirely new model version right?
It's not necessarily to create an entirely new model version. You can still using redeploy but with empty or disabled value. For example:

merlin.deploy(v, transformer=StandardTransformer(enabled=False))

api/service/version_endpoint_service.go Outdated Show resolved Hide resolved
python/sdk/merlin/model.py Outdated Show resolved Hide resolved
Comment on lines 1034 to 1046
target_deployment_mode = None
if deployment_mode is None:
if current_endpoint is None:
target_deployment_mode = DeploymentMode.SERVERLESS.value
else:
target_deployment_mode = deployment_mode.value

target_protocol = None
if protocol is None:
if current_endpoint is None:
target_protocol = Protocol.HTTP_JSON.value
else:
target_protocol = protocol.value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ordering them by each parameter, what do you think of rearranging the order by grouping the activity. For example, something like this:

# Initialize target parameters
target_deployment_mode = None
target_protocol = None
...

# Get the currently deployed endpoint and if there's no deployed endpoint yet, use the default value
current_endpoint = self.endpoint
if current_endpoint is None:
        target_deployment_mode = DeploymentMode.SERVERLESS.value
        target_protocol = Protocol.HTTP_JSON.value
        ...

# Update target values from given user inputs
if deployment_mode is not None:
        target_deployment_mode = deployment_mode.value

if protocol is not None:
        target_protocol = protocol.value

...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, that's an awesome suggestion 😁 I was also thinking the deploy method was starting to look a little messy with the setting of each parameter manually but yeah I think your suggestion looks really clean; let me try refactoring this a little, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I refactored the deploy method quite a bit so now it looks like what you had suggested, and also by creating some helper functions here and there to make the method easier to understand.

resource_request.cpu_request, resource_request.memory_request)
if resource_request.gpu_request is not None and resource_request.gpu_name is not None:
env_api = EnvironmentApi(self._api_client)
env_list = env_api.environments_get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Currently, the deploy() function could call List Environments API up to three times. We can optimize it in different PR. This comment is just me thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, while refactoring the deploy method, I actually went to simplify it also so it does not repeat the ListEnvironments API call multiple times. It now calls that API once and reuses the response of that call throughout the entire deploy method.

Comment on lines +1127 to +1131
# Check that the deployment configs of v2 have remained the same as those used in v1
assert endpoint.resource_request.cpu_request == "123m"
assert endpoint.resource_request.memory_request == "234Mi"
assert endpoint.env_vars == {"green": "TRUE"}
assert endpoint.deployment_mode == DeploymentMode.RAW_DEPLOYMENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add another test case for redeployment where the first version is deployed with transformer and the second version without transformer. The expected result will be no transformer in the second version, right?

@deadlycoconuts deadlycoconuts force-pushed the fix_sdk_model_redeployment branch 2 times, most recently from 5400bf2 to 9f4c510 Compare September 26, 2023 07:58
@@ -1029,82 +1029,67 @@ def deploy(self, environment_name: str = None,
:param protocol: protocol to be used for deploying the model (default: HTTP_JSON)
:return: VersionEndpoint object
"""
current_endpoint = self.endpoint
env_list = self._get_env_list()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call reduces all the redundant calls to the ListEnvironments API to just 1 call at the start of the entire method.

return target_env_vars

@staticmethod
def _create_transformer_spec(transformer: Transformer, target_env_name: str, env_list: List[client.models.Environment]) -> client.Transformer:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually almost the same as the original create_transformer_spec method - the only changes I made was make this method static and reused some other helper methods.

@@ -136,7 +136,7 @@ def requests():
retry_strategy = Retry(
total=5,
status_forcelist=[429, 500, 502, 503, 504, 404],
method_whitelist=["POST"],
allowed_methods=["POST"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in this argument is done because the keyword argument method_whitelist has been deprecated as of urllib3>=2. allowed_methods was introduced as a replacement from urllib3>=1.26 onwards (the arguments for method_whitelist get passed to allowed_methods under the hood anyway).

This was changed because the e2e tests were breaking: https://github.com/caraml-dev/merlin/actions/runs/6490922979/job/17627543254#step:14:222.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

"urllib3>=1.23",
"urllib3>=1.26",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped up min version of urllib3. Context: https://github.com/caraml-dev/merlin/pull/466/files#r1356056883

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Zi Yi!

Copy link
Contributor

@ariefrahmansyah ariefrahmansyah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for improving the codebase as well, Zi Yi!

"urllib3>=1.23",
"urllib3>=1.26",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Zi Yi!

@@ -136,7 +136,7 @@ def requests():
retry_strategy = Retry(
total=5,
status_forcelist=[429, 500, 502, 503, 504, 404],
method_whitelist=["POST"],
allowed_methods=["POST"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Copy link
Contributor

@tkpd-hafizhan tkpd-hafizhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deadlycoconuts
Copy link
Contributor Author

Thanks a lot for the detailed review and the suggestions @ariefrahmansyah and @tkpd-hafizhan ! Merging this now! 🚀

@deadlycoconuts deadlycoconuts merged commit 637050b into main Oct 13, 2023
42 checks passed
@deadlycoconuts deadlycoconuts deleted the fix_sdk_model_redeployment branch October 13, 2023 02:45
tkpd-hafizhan added a commit that referenced this pull request Oct 16, 2023
commit 637050b
Author: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Date:   Fri Oct 13 10:45:16 2023 +0800

    fix(sdk): Fix SDK model redeployment (#466)

    **What this PR does / why we need it**:
    This PR introduces a fix to PR #455, which itself introduced a new
    feature to allow the redeployment of existing Merlin model versions
    using the `deploy` method.

    While this new feature assumes that a user would be able to only
    **_specify fields he/she intends to update_** in the model version
    configuration when calling the `deploy` method, the existing
    implementation however unexpectedly overwrites some of the existing
    configuration with certain default configuration values if they are not
    explicitly specified. These configuration values include:
    - protocol
    - deployment mode
    - resources
    - autoscaling policy

    This happens because the 'deploy' method indirectly sets default values
    for the aforementioned configuration if they aren't specified (this was
    primarily intended to make the **initial** deployment process simpler so
    users wouldn't have to specify too many config values), and does not
    distinguish between a user calling `deploy` on an existing model version
    or on a new model version.

    This PR thus makes changes to the SDK to set these defaults selectively.

    A separate change is made to the `override` method used in the Merlin
    API server update model version endpoint to override the existing model
    protocol ONLY when the new model version explicitly specifies a
    protocol. This change is a **_partial reversion_** of PR #300 (change 2)
    which means that from now on users need to explicitly specify the HTTP
    protocol (as opposed to leaving it as `null`) when updating a UPI model
    to use HTTP instead. It's ultimately a minor change since the main fix
    that had been introduced in that PR involves change 3 (in that PR).

    **Which issue(s) this PR fixes**:
    Fixes #455

    **Does this PR introduce a user-facing change?**:
    ```release-note
    Users need to explicitly specify the HTTP protocol (as opposed to leaving it as `null`) when updating a UPI model to use HTTP instead
    ```

    **Checklist**

    - [x] Added unit test, integration, and/or e2e tests
    - [x] Tested locally
    - [ ] Updated documentation
    - [ ] Update Swagger spec if the PR introduce API changes
    - [ ] Regenerated Golang and Python client if the PR introduce API
    changes
ariefrahmansyah pushed a commit that referenced this pull request Oct 19, 2023
…out deployment type specified (#474)

**What this PR does / why we need it**:
As a follow up to #455 and #466, this tiny change to the update endpoint
_endpoint_ allows requests that do not specify a deployment type to
avoid the check that checks for different deployment types between a
deployed endpoint and the new endpoint configuration. What will instead
happen if a deployment type isn't specified is that the existing
deployed endpoint's deployment type will be used.

The main change is just the addition of another boolean condition, but
there's a unit test that I added which is just an almost exact copy of
another unit test case.

**Which issue(s) this PR fixes**:
Fixes #

**Does this PR introduce a user-facing change?**:
```release-note
NONE
```

**Checklist**

- [x] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [ ] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduce API
changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants