-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
While Model / FrameworkModel's prepare_container_def() supports (here) manually configuring script mode environment variables for an existing model.tar.gz
package, HuggingFaceModel's override implementation does not (here). User-configured env={ "SAGEMAKER_PROGRAM", "SAGEMAKER_SUBMIT_DIRECTORY", ...}
are ignored regardless of whether re-packing of new entrypoint code is requested.
This is important for importing large (multi-GB) pre-trained models to SageMaker inference, because it forces us to use the SDK class' re-packing functionality to add inference code... Which is significantly slower in some cases: Can reach tens of minutes extra delay.
To reproduce
- Prepare a
model.tar.gz
in S3, already containing acode/inference.py
alongside (whatever) model artifacts. For a simple reproduction, could use no model artifacts at all - and add trivial custom model loader to inference.py something likemodel_fn(data_dir): return lambda x: x
.
In my current use case, my model artifacts are about 5GB and constructing/uploading this archive takes ~10min - regardless of whether the small script code is included.
- Create and deploy a Hugging Face Model from the archive on S3 via SageMaker Python SDK, indicating what code directory and entry point should be used:
model = HuggingFaceModel(
model_data = "s3://.../model.tar.gz", # (Contains code/inference.py)
role=sagemaker.get_execution_role(),
py_version="py38",
pytorch_version="1.10",
transformers_version="4.17",
env={
"SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
"SAGEMAKER_PROGRAM": "inference.py",
"SAGEMAKER_REGION": "ap-southeast-1",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
},
)
predictor = model.deploy(instance_type="ml.g4dn.xlarge", initial_instance_count=1)
Observed behavior
The endpoint will fail to find the inference.py entry point (and therefore will not correctly use the model_fn()
and fail to load).
This is because the HuggingFaceModel
overrides the SAGEMAKER_PROGRAM
and SAGEMAKER_SUBMIT_DIRECTORY
environment variables to empty even though no entry_point
or source_dir
are provided.
Expected behavior
The HuggingFaceModel
should correctly propagate the user-specified environment variables, to support using a pre-prepared model.tar.gz
without re-packing. In this case, the container would find the pre-loaded inference.py
entry point and correctly use the override model_fn
.
Screenshots or logs
N/A
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.92.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): HuggingFace
- Framework version: 4.17
- Python version: py38
- CPU or GPU: GPU
- Custom Docker image (Y/N): N
Additional context
I am able to deploy a working endpoint by having my code
folder and inference.py
locally and adding these options to the model: HuggingFaceModel(source_dir="code", entry_point="inference.py", ...)
.
The problem is this >doubles the time and resources taken to prepare the package:
- 10min to produce an initial "model-raw.tar.gz" and load to S3
-
10min for the SageMaker SDK to download that archive, extract and re-pack it to add
code
folder, and re-upload to a new location
Since the use case here is just to prepare the model from local artifacts+code, it would also be OK if model_data
was able to accept a local, uncompressed folder: As the 10min tarball creation would still only need to be done once. From my tests though, this doesn't seem to be possible?