Skip to content

Parametrising spot instance use in a Sagemaker Pipeline is not possible #4140

@lorenzwalthert

Description

@lorenzwalthert

Describe the feature you'd like

Outside a pipeline context, I used a variable to set use_spot_instances to either True or False.

sagemaker.sklearn.SKLearn(
        use_spot_instances=params["use_spot_instances"],
        max_run=3 * 3600, 
        max_wait=1800 * 48,  
    )

Where params["use_spot_instances"] is boolean. I note that setting max_wait without enabling spot training returns in an error.

ClientError: Failed to invoke sagemaker:CreateTrainingJob. Error Details: Invalid MaxWaitTimeInSeconds. It is only supported when EnableManagedSpotTraining is set to true

So I improved to

sagemaker.sklearn.SKLearn(
        use_spot_instances=params["use_spot_instances"],
        max_run=3 * 3600, 
        max_wait=1800 * 48 if params["use_spot_instances"] else None,  
    )

However, if use_spot_instances is a Pipeline variable instead of a literal boolean value (which is supported according to typing annotations), I have to set the argument max_wait as well if at run time, use_spot_instancse evaluates to True. If I don't want spot training, max_wait has to be None. Problem: I can't use native Python if else at built time (=when I define the pipeline) to only set the argument if params["use_spot_instances"], because the expression should only be evaluated at runtime.

The simple solution would be to simply ignore max_wait whenever use_spot_instances is set, and emit an info / warning instead of failing. Then, there would be no need for delayed ifelse evaluation logic similar to sagemaker.workflow.functions.Join and friends.

How would this feature be used? Please describe.

Allow choosing between spot instances or regular instances for training when starting a pipeline execution, not just at built time (i.e. when creating the pipeline).

Describe alternatives you've considered

  • Always use spot or always use regular.
  • Upsert the pipeline definition (and loose caching) to switch between the two.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions