Skip to content

Batch transform fails to find container image that is present in model description #820

@mmalahe

Description

@mmalahe

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): TensorFlow
  • Framework Version: 1.13.1
  • Python Version: 3.6
  • CPU or GPU: GPU
  • Python SDK Version: 1.22.0
  • Are you using a custom image: Yes

Describe the problem

When creating a batch transform job for a model through the Python SDK, the transform operation fails on not finding the container image. This does not occur when creating a batch transform for the same model through the web interface.

Minimal repro / logs

I create a model from my own custom image from the online interface. This results in a model called "model-name" with a single container with its image specified at "xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/sagemaker/model-name:tag". Now, I create a batch transform job through the Python SDK.

  • Exact command to reproduce:
from sagemaker.transformer import Transformer
input_dir = "s3://some-bucket/test-input/"
output_dir = "s3://some-bucket/test-output/"
transformer = \
    Transformer(model_name="model-name",
                instance_count=2,
                instance_type='ml.p2.xlarge',
                assemble_with='Line',
                output_path=output_dir,
                env={'ENV_VAR': 'VAL'})
transformer.transform(input_dir, data_type='S3Prefix', split_type='Line')

This results in

Traceback (most recent call last):
...
  File "/home/mmalahe/envs/ml/lib/python3.6/site-packages/sagemaker/transformer.py", line 108, in transform
    base_name = self.base_transform_job_name or base_name_from_image(self._retrieve_image_name())
  File "/home/mmalahe/envs/ml/lib/python3.6/site-packages/sagemaker/transformer.py", line 126, in _retrieve_image_name
    return model_desc['PrimaryContainer']['Image']
KeyError: 'PrimaryContainer'

The _retrieve_image_name function it's looking for the "PrimaryContainer" key in the model description, which is not present. What is present is a key called "Containers", which is a list with a single entry. This entry has "Image" defined correctly. In a debugger:

Running 'cont' or 'step' will restart the program
> /home/mmalahe/envs/ml/lib/python3.6/site-packages/sagemaker/transformer.py(126)_retrieve_image_name()
-> return model_desc['PrimaryContainer']['Image']
(Pdb) model_desc
{'ModelName': 'model-name', 'Containers': [{'Image': 'xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/sagemaker/model-name:tag', ... }], ... }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions