Skip to content

Latest commit

 

History

History
468 lines (327 loc) · 19.1 KB

v2.rst

File metadata and controls

468 lines (327 loc) · 19.1 KB

Use Version 2.x of the SageMaker Python SDK

Installation

To install the latest version:

pip install --upgrade sagemaker

If you are executing this pip install command in a notebook, make sure to restart your kernel.

Breaking Changes

This section is for major changes that may require updates to your SageMaker Python SDK code. For the full list of changes, see the CHANGELOG.

Removals

Python 2 Support

This library is no longer compatible with Python 2. Python 2 has been EOL since January 1, 2020. Please upgrade to Python 3 if you haven't already.

Remove Legacy TensorFlow

TensorFlow versions 1.4-1.10 and some variations of versions 1.11-1.12 (see What Constitutes "Legacy TensorFlow Support") are no longer natively supported by the SageMaker Python SDK.

To use those versions of TensorFlow, you must specify the Docker image URI explicitly, and configure settings via hyperparameters or environment variables rather than using SDK parameters. For more information, see Upgrade from Legacy TensorFlow Support.

SageMaker Python SDK CLI

The SageMaker Python SDK CLI has been removed. (This is different from the AWS CLI.)

delete_endpoint() for Estimators and HyperparameterTuner

The delete_endpoint() method for estimators and HyperparameterTuner is now a no-op. Please use sagemaker.predictor.Predictor.delete_endpoint instead.

update_endpoint in deploy()

The update_endpoint argument in deploy() methods for estimators and models is now a no-op. Please use sagemaker.predictor.Predictor.update_endpoint instead.

serializer and deserializer in create_model()

The serializer and deserializer arguments in sagemaker.estimator.Estimator.create_model are now no-ops. Please specify serializers and deserializers in deploy() methods instead.

content_type and accept in the Predictor Constructor

The content_type and accept parameters are now no-ops in the following classes and methods:

  • sagemaker.predictor.Predictor
  • sagemaker.estimator.Estimator.create_model
  • sagemaker.algorithms.AlgorithmEstimator.create_model
  • sagemaker.tensorflow.model.TensorFlowPredictor

Please specify content types in a serializer or deserializer class instead.

Changes in Default Behavior

Require framework_version and py_version for Frameworks

Framework estimator and model classes now require framework_version and py_version instead of supplying defaults, unless an image URI is explicitly supplied.

For example:

from sagemaker.tensorflow import TensorFlow

TensorFlow(
    entry_point="script.py",
    framework_version="2.2.0",  # now required
    py_version="py37",  # now required
    role="my-role",
    instance_type="ml.m5.xlarge",
    instance_count=1,
)

from sagemaker.mxnet import MXNetModel

MXNetModel(
    model_data="s3://bucket/model.tar.gz",
    role="my-role",
    entry_point="inference.py",
    framework_version="1.6.0",  # now required
    py_version="py3",  # now required
)

Log Display Behavior with attach()

Logs are no longer printed when using attach() with an estimator. To view logs after attaching a training job to an estimator, use sagemaker.estimator.EstimatorBase.logs.

HyperparameterTuner.fit() and Transformer.transform()

sagemaker.tuner.HyperparameterTuner.fit and sagemaker.transformer.Transformer.transform now wait until the completion of the Hyperparameter Tuning Job or Batch Transform Job, respectively. To make the function non-blocking, use wait=False.

XGBoost Predictor

The default serializer of sagemaker.xgboost.model.XGBoostPredictor has been changed from NumpySerializer to LibSVMSerializer.

Parameter Order Changes

sagemaker.model.Model Parameter Order

The parameter order for sagemaker.model.Model changed: instead of model_data being first, image_uri (formerly image) is first. As a result, model_data has been made into an optional parameter.

If you are using the sagemaker.model.Model class, your code should be changed as follows:

# v1.x
Model("s3://bucket/path/model.tar.gz", "my-image:latest")

# v2.0 and later
Model("my-image:latest", model_data="s3://bucket/path/model.tar.gz")

Airflow Parameter Order

For sagemaker.workflow.airflow.model_config and sagemaker.workflow.airflow.model_config_from_estimator, instance_type is no longer the first positional argument and is now an optional keyword argument.

Dependency Changes

SciPy

SciPy is no longer a required dependency of the SageMaker Python SDK.

If you use sagemaker.amazon.common.write_spmatrix_to_sparse_tensor and don't already install SciPy in your environment, you can use our scipy installation target:

pip install sagemaker[scipy]

TensorFlow

The tensorflow installation target has been removed, as it is no longer needed for any SageMaker Python SDK functionality.

If you want to install TensorFlow, see the TensorFlow documentation.

Non-Breaking Changes

Deprecations

Pre-instantiated Serializer and Deserializer Objects

The csv_serializer, json_serializer, npy_serializer, csv_deserializer, json_deserializer, and numpy_deserializer objects have been deprecated.

Please instantiate the objects instead.

v1.x v2.0 and later
sagemaker.predictor.csv_serializer sagemaker.serializers.CSVSerializer()
sagemaker.predictor.json_serializer sagemaker.serializers.JSONSerializer()
sagemaker.predictor.npy_serializer sagemaker.serializers.NumpySerializer()
sagemaker.predictor.csv_deserializer sagemaker.deserializers.CSVDeserializer()
sagemaker.predictor.json_deserializer sagemaker.deserializers.JSONDeserializer()
sagemaker.predictor.numpy_deserializer sagemaker.deserializers.NumpyDeserializer()

sagemaker.content_types

The sagemaker.content_types module is deprecated in v2.0 and later of the SageMaker Python SDK.

Instead of importing constants from sagemaker.content_types, explicitly write MIME types as a string.

v1.x v2.0 and later
CONTENT_TYPE_JSON "application/json"
CONTENT_TYPE_CSV "text/csv"
CONTENT_TYPE_OCTET_STREAM "application/octet-stream"
CONTENT_TYPE_NPY "application/x-npy"

Image URI Functions (e.g. get_image_uri)

The following functions have been deprecated in favor of sagemaker.image_uris.retrieve:

  • sagemaker.amazon_estimator.get_image_uri()
  • sagemaker.fw_utils.create_image_uri()
  • sagemaker.fw_registry.registry()
  • sagemaker.utils.get_ecr_image_uri_prefix()

For more information about usage, see sagemaker.image_uris.retrieve.

enable_cloudwatch_metrics for Estimators and Models

The parameter enable_cloudwatch_metrics has been deprecated. CloudWatch metrics are already emitted for all Training Jobs, etc.

sagemaker.fw_utils.parse_s3_url

The sagemaker.fw_utils.parse_s3_url function has been deprecated. Please use sagemaker.s3.parse_s3_url instead.

sagemaker.session.ModelContainer

The class sagemaker.session.ModelContainer has been deprecated, as it is not needed for creating inference pipelines.

sagemaker.workflow.condition_step.JsonGet

The class sagemaker.workflow.condition_step.JsonGet has been deprecated. Please use sagemaker.workflow.functions.JsonGet instead.

Parameter and Class Name Changes

Estimators

Renamed Estimator Parameters

The following estimator parameters have been renamed:

v1.x v2.0 and later
train_instance_count instance_count
train_instance_type instance_type
train_max_run max_run
train_use_spot_instances use_spot_instances
train_max_wait max_wait
train_volume_size volume_size
train_volume_kms_key volume_kms_key
Serializer and Deserializer Classes

The follow serializer/deserializer classes have been renamed and/or moved:

v1.x v2.0 and later
sagemaker.predictor._CsvDeserializer sagemaker.deserializers.CSVDeserializer
sagemaker.predictor._CsvSerializer sagemaker.serializers.CSVSerializer
sagemaker.predictor.BytesDeserializer sagemaker.deserializers.BytesDeserializers
sagemaker.predictor.StringDeserializer sagemaker.deserializers.StringDeserializer
sagemaker.predictor.StreamDeserializer sagemaker.deserializers.StreamDeserializer
sagemaker.predictor._JsonSerializer sagemaker.serializers.JSONSerializer
sagemaker.predictor._NumpyDeserializer sagemaker.deserializers.NumpyDeserializer
sagemaker.predictor._NPYSerializer sagemaker.serializers.NumpySerializer
sagemaker.amazon.common.numpy_to_record_serializer sagemaker.amazon.common.RecordSerializer
sagemaker.amazon.common.record_deserializer sagemaker.amazon.common.RecordDeserializer
sagemaker.predictor._JsonDeserializer sagemaker.deserializers.JSONDeserializer

sagemaker.serializers.LibSVMSerializer has been added in v2.0.

distributions

For TensorFlow and MXNet estimators, distributions has been renamed to distribution.

Specify Custom Training Images

The image_name parameter has been renamed to image_uri for specifying a custom Docker image URI to use with training.

Models

Specify Custom Serving Image

The image parameter has been renamed to image_uri for specifying a custom Docker image URI to use with inference.

TensorFlow Serving Model

sagemaker.tensorflow.serving.Model has been renamed to sagemaker.tensorflow.model.TensorFlowModel. (For the previous implementation of that class, see Remove Legacy TensorFlow).

Predictors

Generic Predictor Class Name

sagemaker.predictor.RealTimePredictor has been renamed to sagemaker.predictor.Predictor.

Endpoint Argument Name

For sagemaker.predictor.Predictor, sagemaker.sparkml.model.SparkMLPredictor, and predictors for Amazon algorithm (e.g. Factorization Machines, Linear Learner, etc.), the endpoint attribute has been renamed to endpoint_name.

TensorFlow Serving Predictor

sagemaker.tensorflow.serving.Predictor has been renamed to sagemaker.tensorflow.model.TensorFlowPredictor. (For the previous implementation of that class, see Remove Legacy TensorFlow).

Inputs

s3_input

sagemaker.session.s3_input has been renamed to sagemaker.inputs.TrainingInput.

ShuffleConfig

sagemaker.session.ShuffleConfig has been renamed to sagemaker.inputs.ShuffleConfig.

Airflow

For sagemaker.workflow.airflow.model_config, sagemaker.workflow.airflow.model_config_from_estimator, and sagemaker.workflow.airflow.transform_config_from_estimator, the image argument has been renamed to image_uri.

Automatically Upgrade Your Code

To help make your transition as seamless as possible, v2 of the SageMaker Python SDK comes with a command-line tool to automate updating your code. It automates as much as possible, but there are still syntactical and stylistic changes that cannot be performed by the script.

Warning

While the tool is intended to be easy to use, we recommend using it as part of a process that includes testing before and after you run the tool.

Usage

Currently, the tool supports only converting one file at a time:

$ sagemaker-upgrade-v2 --in-file input.py --out-file output.py
$ sagemaker-upgrade-v2 --in-file input.ipynb --out-file output.ipynb

You can apply it to a set of files using a loop:

$ for file in $(find input-dir); do sagemaker-upgrade-v2 --in-file $file --out-file output-dir/$file; done

Limitations

Jupyter Notebook Cells with Shell Commands

If your Jupyter notebook has a code cell with lines that start with either %% or !, the tool ignores that cell. The other cells in the notebook are still updated.

Aliased Imports

The tool checks for a limited number of patterns when looking for constructors. For example, if you are using a TensorFlow estimator, only the following invocation styles are handled:

TensorFlow()
sagemaker.tensorflow.TensorFlow()
sagemaker.tensorflow.estimator.TensorFlow()

If you have aliased an import, e.g. from sagemaker.tensorflow import TensorFlow as TF, the tool does not take care of updating its parameters.

TensorFlow Serving

If you are using the sagemaker.tensorflow.serving.Model class, the tool does not take care of adding a framework version or changing it to sagemaker.tensorflow.TensorFlowModel.

sagemaker.model.Model

If you are using the sagemaker.model.Model class, the tool does not take care of switching the order between model_data and image_uri (formerly image).

update_endpoint and delete_endpoint

The tool does not take care of removing the update_endpoint argument from a deploy call. If you are using that argument, please modify your code to use sagemaker.predictor.Predictor.update_endpoint instead.

The tool also does not handle delete_endpoint calls on estimators or HyperparameterTuner. If you are using that method, please modify your code to use sagemaker.predictor.Predictor.delete_endpoint instead.