Skip to content

HuggingFaceModel does not properly accept script mode environment variables #3361

@athewsey

Description

@athewsey

Describe the bug

While Model / FrameworkModel's prepare_container_def() supports (here) manually configuring script mode environment variables for an existing model.tar.gz package, HuggingFaceModel's override implementation does not (here). User-configured env={ "SAGEMAKER_PROGRAM", "SAGEMAKER_SUBMIT_DIRECTORY", ...} are ignored regardless of whether re-packing of new entrypoint code is requested.

This is important for importing large (multi-GB) pre-trained models to SageMaker inference, because it forces us to use the SDK class' re-packing functionality to add inference code... Which is significantly slower in some cases: Can reach tens of minutes extra delay.

To reproduce

  • Prepare a model.tar.gz in S3, already containing a code/inference.py alongside (whatever) model artifacts. For a simple reproduction, could use no model artifacts at all - and add trivial custom model loader to inference.py something like model_fn(data_dir): return lambda x: x.

In my current use case, my model artifacts are about 5GB and constructing/uploading this archive takes ~10min - regardless of whether the small script code is included.

  • Create and deploy a Hugging Face Model from the archive on S3 via SageMaker Python SDK, indicating what code directory and entry point should be used:
model = HuggingFaceModel(
    model_data = "s3://.../model.tar.gz",  # (Contains code/inference.py)  
    role=sagemaker.get_execution_role(),
    py_version="py38",
    pytorch_version="1.10",
    transformers_version="4.17",
    env={
        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_PROGRAM": "inference.py",
        "SAGEMAKER_REGION": "ap-southeast-1",
        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    },
)
predictor = model.deploy(instance_type="ml.g4dn.xlarge", initial_instance_count=1)

Observed behavior

The endpoint will fail to find the inference.py entry point (and therefore will not correctly use the model_fn() and fail to load).

This is because the HuggingFaceModel overrides the SAGEMAKER_PROGRAM and SAGEMAKER_SUBMIT_DIRECTORY environment variables to empty even though no entry_point or source_dir are provided.

Expected behavior

The HuggingFaceModel should correctly propagate the user-specified environment variables, to support using a pre-prepared model.tar.gz without re-packing. In this case, the container would find the pre-loaded inference.py entry point and correctly use the override model_fn.

Screenshots or logs

N/A

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.92.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): HuggingFace
  • Framework version: 4.17
  • Python version: py38
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context

I am able to deploy a working endpoint by having my code folder and inference.py locally and adding these options to the model: HuggingFaceModel(source_dir="code", entry_point="inference.py", ...).

The problem is this >doubles the time and resources taken to prepare the package:

  • 10min to produce an initial "model-raw.tar.gz" and load to S3
  • 10min for the SageMaker SDK to download that archive, extract and re-pack it to add code folder, and re-upload to a new location

Since the use case here is just to prepare the model from local artifacts+code, it would also be OK if model_data was able to accept a local, uncompressed folder: As the 10min tarball creation would still only need to be done once. From my tests though, this doesn't seem to be possible?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions