Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch TorchServe without repackaging model contents #117

Open
fm1ch4 opened this issue Mar 15, 2022 · 5 comments
Open

Launch TorchServe without repackaging model contents #117

fm1ch4 opened this issue Mar 15, 2022 · 5 comments

Comments

@fm1ch4
Copy link

fm1ch4 commented Mar 15, 2022

Currently, the startup code will repackage the model contents in environment.model_dir into TS format using the TS model archiver: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/torchserve.py#L78

This causes the model contents to be read and rewritten to disk on container startup, which increases container startup time. For SageMaker Serverless Inference, this causes cold starts to be longer (even longer for larger models).

TorchServe is making a change to support loading models from a directory without the need to repackage the model as a .mar file: pytorch/serve#1498

This issue is to request for this inference toolkit to use this new feature and avoid repackaging the model contents. I /think/ this should be as simple as removing the model archiver command execution and setting [--models in the TorchServe command to --models model=environment.model_dir

@mseth10
Copy link
Contributor

mseth10 commented Mar 25, 2022

Hi @fm1ch4 , I am working on the changes you suggested. If I understand correctly, we don't need the _adapt_to_ts_format function anymore. This function was using DEFAULT_TS_MODEL_NAME to name the generated mar file, which is also being referenced here in TS_CONFIG_FILE

"{DEFAULT_TS_MODEL_NAME}": {{\\
"1.0": {{\\
"defaultVersion": true,\\
"marName": "{DEFAULT_TS_MODEL_NAME}.mar",\\

What should the model name be in config file after we remove the formatting function? I don't quite understand how torchserve processes the parameter --models model=environment.model_dir to generate the model name.

@fm1ch4
Copy link
Author

fm1ch4 commented Mar 28, 2022

Hey @mseth10, thanks for taking a look.

The for the --models model=environment.model_dir command, model is the model name - it could actually be replaced with the DEFAULT_TS_MODEL_NAME variable instead, and we can continue using that throughout the config file. Here is the TorchServe doc on the --models parameter: https://pytorch.org/serve/server.html:

Models to be loaded using [model_name=]model_location
format. Location can be a HTTP URL, a model archive
file or directory contains model archive files in
MODEL_STORE.

@Qingzi-Lan
Copy link

Hi @mseth10 , do we have an estimation for a fix for this?

@mseth10
Copy link
Contributor

mseth10 commented Apr 6, 2022

Hi @mseth10 , do we have an estimation for a fix for this?

Hi @Qingzi-Lan , we are currently seeing some specific test failures as mentioned in #118 (comment) , can you help understand why these tests would pass with DLC and fail with the generic image. Also, can we skip them and get the PR merged to get the DLC customers unblocked?

@Qingzi-Lan
Copy link

Hi @mseth10 , do we have an estimation for a fix for this?

Hi @Qingzi-Lan , we are currently seeing some specific test failures as mentioned in #118 (comment) , can you help understand why these tests would pass with DLC and fail with the generic image. Also, can we skip them and get the PR merged to get the DLC customers unblocked?

what image are the test using by generic image?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants