-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the feature you'd like
Outside a pipeline context, I used a variable to set use_spot_instances to either True or False.
sagemaker.sklearn.SKLearn(
use_spot_instances=params["use_spot_instances"],
max_run=3 * 3600,
max_wait=1800 * 48,
)Where params["use_spot_instances"] is boolean. I note that setting max_wait without enabling spot training returns in an error.
ClientError: Failed to invoke sagemaker:CreateTrainingJob. Error Details: Invalid MaxWaitTimeInSeconds. It is only supported when EnableManagedSpotTraining is set to true
So I improved to
sagemaker.sklearn.SKLearn(
use_spot_instances=params["use_spot_instances"],
max_run=3 * 3600,
max_wait=1800 * 48 if params["use_spot_instances"] else None,
)However, if use_spot_instances is a Pipeline variable instead of a literal boolean value (which is supported according to typing annotations), I have to set the argument max_wait as well if at run time, use_spot_instancse evaluates to True. If I don't want spot training, max_wait has to be None. Problem: I can't use native Python if else at built time (=when I define the pipeline) to only set the argument if params["use_spot_instances"], because the expression should only be evaluated at runtime.
The simple solution would be to simply ignore max_wait whenever use_spot_instances is set, and emit an info / warning instead of failing. Then, there would be no need for delayed ifelse evaluation logic similar to sagemaker.workflow.functions.Join and friends.
How would this feature be used? Please describe.
Allow choosing between spot instances or regular instances for training when starting a pipeline execution, not just at built time (i.e. when creating the pipeline).
Describe alternatives you've considered
- Always use spot or always use regular.
- Upsert the pipeline definition (and loose caching) to switch between the two.