System Information
- Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): TensorFlow
- Framework Version: 1.13.1
- Python Version: 3.6
- CPU or GPU: GPU
- Python SDK Version: 1.22.0
- Are you using a custom image: Yes
Describe the problem
When creating a batch transform job for a model through the Python SDK, the transform operation fails on not finding the container image. This does not occur when creating a batch transform for the same model through the web interface.
Minimal repro / logs
I create a model from my own custom image from the online interface. This results in a model called "model-name" with a single container with its image specified at "xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/sagemaker/model-name:tag". Now, I create a batch transform job through the Python SDK.
- Exact command to reproduce:
from sagemaker.transformer import Transformer
input_dir = "s3://some-bucket/test-input/"
output_dir = "s3://some-bucket/test-output/"
transformer = \
Transformer(model_name="model-name",
instance_count=2,
instance_type='ml.p2.xlarge',
assemble_with='Line',
output_path=output_dir,
env={'ENV_VAR': 'VAL'})
transformer.transform(input_dir, data_type='S3Prefix', split_type='Line')
This results in
Traceback (most recent call last):
...
File "/home/mmalahe/envs/ml/lib/python3.6/site-packages/sagemaker/transformer.py", line 108, in transform
base_name = self.base_transform_job_name or base_name_from_image(self._retrieve_image_name())
File "/home/mmalahe/envs/ml/lib/python3.6/site-packages/sagemaker/transformer.py", line 126, in _retrieve_image_name
return model_desc['PrimaryContainer']['Image']
KeyError: 'PrimaryContainer'
The _retrieve_image_name function it's looking for the "PrimaryContainer" key in the model description, which is not present. What is present is a key called "Containers", which is a list with a single entry. This entry has "Image" defined correctly. In a debugger:
Running 'cont' or 'step' will restart the program
> /home/mmalahe/envs/ml/lib/python3.6/site-packages/sagemaker/transformer.py(126)_retrieve_image_name()
-> return model_desc['PrimaryContainer']['Image']
(Pdb) model_desc
{'ModelName': 'model-name', 'Containers': [{'Image': 'xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/sagemaker/model-name:tag', ... }], ... }
System Information
Describe the problem
When creating a batch transform job for a model through the Python SDK, the transform operation fails on not finding the container image. This does not occur when creating a batch transform for the same model through the web interface.
Minimal repro / logs
I create a model from my own custom image from the online interface. This results in a model called "model-name" with a single container with its image specified at "xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/sagemaker/model-name:tag". Now, I create a batch transform job through the Python SDK.
This results in
The
_retrieve_image_namefunction it's looking for the "PrimaryContainer" key in the model description, which is not present. What is present is a key called "Containers", which is a list with a single entry. This entry has "Image" defined correctly. In a debugger: