You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to deploy BLIP-2 (specifically Salesforce/blip2-opt-2.7b) to a Sagemaker (SM) endpoint, but coming up against some problems.
We can deploy this model by tar'ing the model artifacts as model.tar.gz and hosting on S3, but creating a ~9GB tar file is time-consuming and leads to slow deployment feedback loops.
Alternatively, the toolkit has experimental support for downloading models from 🤗Hub on start, which is a more time/space efficient.
However, this functionality only supports passing HF_TASK and HF_MODEL_ID as env vars. In order to run inference on this model using GPU's available on SM (T4/A10) we need to pass additional model_kwargs as:
A potential solution to this would be:
On line 104 of handler_service.py the ability to pass kwargs has not been implemented, but the function get_pipeline allows for kwargs.
The text was updated successfully, but these errors were encountered:
Thank you for opening the request. It is a good idea to think about adding "HF_KWARGS" as parameter.
In the meantime you can enable this by creating a custom inference.py. See here for an example: https://www.philschmid.de/custom-inference-huggingface-sagemaker
I'm trying to deploy BLIP-2 (specifically
Salesforce/blip2-opt-2.7b
) to a Sagemaker (SM) endpoint, but coming up against some problems.We can deploy this model by tar'ing the model artifacts as
model.tar.gz
and hosting on S3, but creating a ~9GB tar file is time-consuming and leads to slow deployment feedback loops.Alternatively, the toolkit has experimental support for downloading models from 🤗Hub on start, which is a more time/space efficient.
However, this functionality only supports passing
HF_TASK
andHF_MODEL_ID
as env vars. In order to run inference on this model using GPU's available on SM (T4/A10) we need to pass additionalmodel_kwargs
as:A potential solution to this would be:
On line 104 of handler_service.py the ability to pass
kwargs
has not been implemented, but the functionget_pipeline
allows forkwargs
.The text was updated successfully, but these errors were encountered: