Skip to content

HyperparameterTuner.attach() does not get use_spot_instances setting #1817

@dz902

Description

@dz902

Describe the bug

When using the following:

PARENT_TUNER = HyperparameterTuner.attach(
    tuning_job_name = PARENT_TUNING_JOB_NAME
)

...on a tuning job where its job definition has:

...
        "StoppingCondition": {
            "MaxRuntimeInSeconds": 3600,
            "MaxWaitTimeInSeconds": 7200
        },
        "EnableNetworkIsolation": false,
        "EnableInterContainerTrafficEncryption": false,
        "EnableManagedSpotTraining": true
...

The max_wait and use_spot_instances setting are both None. I traced back to:

def _prepare_init_params_from_job_description(cls, job_details, model_channel_name=None):
"""Convert the job description to init params that can be handled by the
class constructor
Args:
job_details: the returned job details from a describe_training_job
API call.
model_channel_name (str): Name of the channel where pre-trained
model data will be downloaded.
Returns:
dictionary: The transformed init_params
"""
init_params = dict()
init_params["role"] = job_details["RoleArn"]
init_params["instance_count"] = job_details["ResourceConfig"]["InstanceCount"]
init_params["instance_type"] = job_details["ResourceConfig"]["InstanceType"]
init_params["volume_size"] = job_details["ResourceConfig"]["VolumeSizeInGB"]
init_params["max_run"] = job_details["StoppingCondition"]["MaxRuntimeInSeconds"]
init_params["input_mode"] = job_details["AlgorithmSpecification"]["TrainingInputMode"]
init_params["base_job_name"] = base_from_name(job_details["TrainingJobName"])
init_params["output_path"] = job_details["OutputDataConfig"]["S3OutputPath"]
init_params["output_kms_key"] = job_details["OutputDataConfig"]["KmsKeyId"]
if "EnableNetworkIsolation" in job_details:
init_params["enable_network_isolation"] = job_details["EnableNetworkIsolation"]
has_hps = "HyperParameters" in job_details
init_params["hyperparameters"] = job_details["HyperParameters"] if has_hps else {}
if "AlgorithmName" in job_details["AlgorithmSpecification"]:
init_params["algorithm_arn"] = job_details["AlgorithmSpecification"]["AlgorithmName"]
elif "TrainingImage" in job_details["AlgorithmSpecification"]:
init_params["image_uri"] = job_details["AlgorithmSpecification"]["TrainingImage"]
else:
raise RuntimeError(
"Invalid AlgorithmSpecification. Either TrainingImage or "
"AlgorithmName is expected. None was found."
)
if "MetricDefinitons" in job_details["AlgorithmSpecification"]:
init_params["metric_definitions"] = job_details["AlgorithmSpecification"][
"MetricsDefinition"
]
if "EnableInterContainerTrafficEncryption" in job_details:
init_params["encrypt_inter_container_traffic"] = job_details[
"EnableInterContainerTrafficEncryption"
]
subnets, security_group_ids = vpc_utils.from_dict(job_details.get(vpc_utils.VPC_CONFIG_KEY))
if subnets:
init_params["subnets"] = subnets
if security_group_ids:
init_params["security_group_ids"] = security_group_ids
if "InputDataConfig" in job_details and model_channel_name:
for channel in job_details["InputDataConfig"]:
if channel["ChannelName"] == model_channel_name:
init_params["model_channel_name"] = model_channel_name
init_params["model_uri"] = channel["DataSource"]["S3DataSource"]["S3Uri"]
break
return init_params

It seems use_spot_instances and max_wait do not get carried over to newly created estimator.

To reproduce

See above.

Expected behavior

use_spot_instances and max_wait etc. should all be carried over to newly attach()ed tuner. This also affects warm start helpers like identical_data_and_algorithm().

If applicable, add screenshots or logs to help explain your problem.

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: v2.0.0
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**:
- **Framework version**:
- **Python version**:
- **CPU or GPU**:
- **Custom Docker image (Y/N)**: N, official image classification image

**Additional context**
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions