Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sagemaker.image_uris.retrieve returns a wrong image when instance type comes from a ParameterString #3024

Closed
jonsnowseven opened this issue Mar 23, 2022 · 13 comments
Assignees
Labels
bug component: pipelines Relates to the SageMaker Pipeline Platform

Comments

@jonsnowseven
Copy link

jonsnowseven commented Mar 23, 2022

Describe the bug
sagemaker.image_uris.retrieve returns a wrong image when instance type comes from a Parameter

To reproduce

import sagemaker
from sagemaker.workflow.parameters import ParameterString 

training_instance_type = ParameterString(
    name="TrainingInstanceType",
    default_value="ml.m5.xlarge",
)
...
tf_model_image_uri = sagemaker.image_uris.retrieve(
    framework="tensorflow",
    region=region,
    version="1.15",
    image_scope="training",
    py_version="py3",
    instance_type=training_instance_type, # ParameterString
)

estimator = TensorFlowimage_uri=tf_model_image_uri,
    ...
)
... # rest of the code for the SageMaker Pipeline definition

pipeline.start(dict(TrainingInstanceType="ml.p2.xlarge"))

# after starting an execution, the image that is inputed into the estimator is 
# <account>.dkr.ecr.<region>.amazonaws.com/tensorflow-training:1.15-cpu-py3 
# which is a CPU image **instead of GPU image**

Expected behavior
Lazy evaluation of the image to allow getting the actual parameter value for the instance type.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.80.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): TensorFlow
  • Framework version: 1.15
  • Python version: 3.7
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N
@ahsan-z-khan ahsan-z-khan self-assigned this Mar 24, 2022
@ahsan-z-khan
Copy link
Member

Hi @jonsnowseven ,

Thanks for using Amazon SageMaker.

When I do print(tf_model_image_uri) it returns me 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:1.15-cpu-py3 which is expected. That means ParameterString is working correctly as expected. Is the issue different? Are you saying this pipeline.start(dict(TrainingInstanceType="ml.p2.xlarge")) is not working as expected?

@ahsan-z-khan ahsan-z-khan added the component: pipelines Relates to the SageMaker Pipeline Platform label Mar 24, 2022
@ahsan-z-khan ahsan-z-khan removed their assignment Mar 24, 2022
@jonsnowseven
Copy link
Author

@ahsan-z-khan, yes.

When I start a pipeline execution, the image that is retrieved into the estimator is not the correct one (should be a gpu image). Sorry, the code points to a different thing. I'll fix it.

@qidewenwhen qidewenwhen self-assigned this Mar 25, 2022
@jerrypeng7773
Copy link
Contributor

Hey @jonsnowseven, the sagemaker.image_uris.retrieve will get executed during compile time, however, the parameter you supplied can only be evaluated at pipeline runtime. Hence, sagemaker.image_uris.retrieve won't support parameterization.

@qidewenwhen
Copy link
Member

qidewenwhen commented Mar 30, 2022

the sagemaker.image_uris.retrieve will get executed during compile time, however, the parameter you supplied can only be evaluated at pipeline runtime. Hence, sagemaker.image_uris.retrieve won't support parameterization.

To give more context, the compile time means the time when generating the pipeline definition in Python SDK stage, while the pipeline runtime means the time when pipeline is running an execution in backend.
Therefore, as @jerrypeng7773 mentioned above, the sagemaker.image_uris.retrieve does not support inputs of any Pipeline variables such as ParameterString, Properties, JsonGet, Join etc. since these Pipeline variables are not parsed/evaluated in Python SDK scope.

I'll update to throw exception if Pipeline variables are passed into the sagemaker.image_uris.retrieve to prevent such behaviors.

@jonsnowseven
Copy link
Author

Thanks for the explanation @jerrypeng7773 and @qidewenwhen!

On which version will that change be included, @qidewenwhen?

@qidewenwhen
Copy link
Member

I'll update to throw exception if Pipeline variables are passed into the sagemaker.image_uris.retrieve to prevent such behaviors.

Sorry for the delay. Just opened a PR for this action item. Once the PR gets merged, it will be released within a week. Will update the release version at that time.

@qidewenwhen
Copy link
Member

The PR has been released in v2.91.0, closing this issue.

@jonsnowseven feel free to reopen if you have any questions.

@qidewenwhen
Copy link
Member

Reopen this issue as the release needs to be reverted for some reason. Will update once it's ready to put back.

@jerrypeng7773
Copy link
Contributor

@jonsnowseven can you confirm PR 3130 unblocked you, let us know, thanks.

@qidewenwhen
Copy link
Member

In v2.92.0, we added a validation in image_uris.retrieve, which explicitly throws exceptions when passing in any pipeline variables.

@m3et
Copy link

m3et commented May 29, 2022

commit has breaking changes, please give proper note in advance next time.

@pranjal-joshi
Copy link

Hi, facing similar issue.
Even though I've hardcoded the image URI in pipeline.py I get following exception.

Exception: instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>)
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pipelines/run_pipeline.py", line 77, in main
    pipeline = get_pipeline_driver(args.module_name, args.kwargs)
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pipelines/_utils.py", line 33, in get_pipeline_driver
    return _imports.get_pipeline(**kwargs)
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pipelines/nhg/pipeline.py", line 118, in get_pipeline
    step_args = estimator.fit(
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py", line 206, in wrapper
    run_func(*args, **kwargs)
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/estimator.py", line 976, in fit
    self._prepare_for_training(job_name=job_name)
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/estimator.py", line 2700, in _prepare_for_training
    super(Framework, self)._prepare_for_training(job_name=job_name)
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/estimator.py", line 621, in _prepare_for_training
    self._current_job_name = self._get_or_create_name(job_name)
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/estimator.py", line 595, in _get_or_create_name
    self._ensure_base_job_name()
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/estimator.py", line 579, in _ensure_base_job_name
    or base_name_from_image(self.training_image_uri())
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/estimator.py", line 2847, in training_image_uri
    return image_uris.get_training_image_uri(
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/image_uris.py", line 504, in get_training_image_uri
    return retrieve(
  File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/image_uris.py", line 117, in retrieve
    raise ValueError("%s should not be a pipeline variable (%s)" % (name, type(val)))
ValueError: instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>)

Any idea about what is going wrong as all my ParameterString have default values set properly and works nicely through the manual trigger of pipeline from SageMaker studio.
CC: @rdadlaney

@qidewenwhen
Copy link
Member

Hi @pranjal-joshi, could you open a new issue for the issue if it still persists?

Please add 1) sample code snippet and 2) the SDK version in the new issue to help us for further investigation.

I suspect your SDK version is not up-to-date compared with that in your SM studio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: pipelines Relates to the SageMaker Pipeline Platform
Projects
None yet
Development

No branches or pull requests

6 participants