-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
XGBoostcomponent: trainingRelates to the SageMaker Training PlatformRelates to the SageMaker Training Platformtype: bug
Description
Describe the bug
When using the XGBoost estimator is script mode, user's are unable to provide custom tunable parameters in their script. It appears there is a check in the sdk (and boto3 below it) that assumes that the XGBoost hyper-parameters must match those from the built-in algo.
To reproduce
import sagemaker
from sagemaker.xgboost import XGBoost
from sagemaker.tuner import ContinuousParameter, IntegerParameter, CategoricalParameter, HyperparameterTuner
sess = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
static_hyperparameters = {'num_round': 50}
estimator = XGBoost(
entry_point='train.py',
source_dir='xgb_src',
role=role,
framework_version='1.2-1',
model_dir='/opt/ml/model',
output_path="s3://{}/{}/output".format(bucket, 'xgb-hpo-demo'),
instance_type='ml.m5.xlarge',
instance_count=1,
hyperparameters=static_hyperparameters
)
train_loc = sess.upload_data(path='./train.csv', bucket=bucket, key_prefix='churn/train')
val_loc = sess.upload_data(path='./validation.csv', bucket=bucket, key_prefix='churn/val')
hyperparameter_range = {
'eta': ContinuousParameter(0.1, 0.8),
'feature_xform': CategoricalParameter(['onehot', 'ordinal'])
}
objective_metric_name = 'validation:error'
tuner = HyperparameterTuner(
estimator,
objective_metric_name,
hyperparameter_range,
strategy='Bayesian',
max_jobs=4,
max_parallel_jobs=2,
objective_type='Minimize'
)
tuner.fit(inputs={"train": train_loc, "validation": val_loc})
Expected behavior
Typical HPO tunable parameters passed through argparse.
Screenshots or logs
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
<ipython-input-16-fa2298a1a26b> in <module>
1 # Train without buckets being parameters
2 channels = {"train": train_loc, "validation": val_loc}
----> 3 tuner.fit(inputs=channels)
/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in fit(self, inputs, job_name, include_cls_metadata, estimator_kwargs, wait, **kwargs)
442 """
443 if self.estimator is not None:
--> 444 self._fit_with_estimator(inputs, job_name, include_cls_metadata, **kwargs)
445 else:
446 self._fit_with_estimator_dict(inputs, job_name, include_cls_metadata, estimator_kwargs)
/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in _fit_with_estimator(self, inputs, job_name, include_cls_metadata, **kwargs)
453 self._prepare_estimator_for_tuning(self.estimator, inputs, job_name, **kwargs)
454 self._prepare_for_tuning(job_name=job_name, include_cls_metadata=include_cls_metadata)
--> 455 self.latest_tuning_job = _TuningJob.start_new(self, inputs)
456
457 def _fit_with_estimator_dict(self, inputs, job_name, include_cls_metadata, estimator_kwargs):
/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in start_new(cls, tuner, inputs)
1507 ]
1508
-> 1509 tuner.sagemaker_session.create_tuning_job(**tuner_args)
1510 return cls(tuner.sagemaker_session, tuner._current_job_name)
1511
/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in create_tuning_job(self, job_name, tuning_config, training_config, training_config_list, warm_start_config, tags)
2027 LOGGER.info("Creating hyperparameter tuning job with name: %s", job_name)
2028 LOGGER.debug("tune request: %s", json.dumps(tune_request, indent=4))
-> 2029 self.sagemaker_client.create_hyper_parameter_tuning_job(**tune_request)
2030
2031 def describe_tuning_job(self, job_name):
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 "%s() only accepts keyword arguments." % py_operation_name)
356 # The "self" in this scope is referring to the BaseClient.
--> 357 return self._make_api_call(operation_name, kwargs)
358
359 _api_call.__name__ = str(py_operation_name)
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
674 error_code = parsed_response.get("Error", {}).get("Code")
675 error_class = self.exceptions.from_code(error_code)
--> 676 raise error_class(parsed_response, operation_name)
677 else:
678 return parsed_response
ClientError: An error occurred (ValidationException) when calling the CreateHyperParameterTuningJob operation: The hyperparameter tuning job that you requested has the following untunable hyperparameters: [feature_xform]. For the algorithm, 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.2-1, you can tune only [colsample_bytree, lambda, eta, max_depth, alpha, colsample_bynode, num_round, colsample_bylevel, subsample, min_child_weight, max_delta_step, gamma]. Delete untunable hyperparameters.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.25.2'
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): XGBoost
- Framework version: Any
- Python version: Py3
- CPU or GPU: Doesn't matter
- Custom Docker image (Y/N): N
Additional context
Add any other context about the problem here.
andrewang-bk and Rizhiy
Metadata
Metadata
Assignees
Labels
XGBoostcomponent: trainingRelates to the SageMaker Training PlatformRelates to the SageMaker Training Platformtype: bug