Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support MXNet 1.4 with MMS #812

Merged
merged 10 commits into from
May 24, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 25 additions & 17 deletions doc/using_mxnet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Using MXNet with the SageMaker Python SDK

With the SageMaker Python SDK, you can train and host MXNet models on Amazon SageMaker.

Supported versions of MXNet: ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
Supported versions of MXNet: ``1.4.0``, ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.

Supported versions of MXNet for Elastic Inference: ``1.3.0``.
Supported versions of MXNet for Elastic Inference: ``1.4.0``, ``1.3.0``.

Training with MXNet
-------------------
Expand Down Expand Up @@ -38,7 +38,7 @@ Preparing the MXNet training script
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING |
+==========================================================================================================================================================+
| The structure for training scripts changed with MXNet version 1.3. |
| The structure for training scripts changed starting at MXNet version 1.3. |
| Make sure you refer to the correct section of this README when you prepare your script. |
| For information on how to upgrade an old script to the new format, see `"Updating your MXNet training script" <#updating-your-mxnet-training-script>`__. |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -700,6 +700,13 @@ Where ``model`` is the model objected loaded by ``model_fn``, ``request_body`` i
This one function should handle processing the input, performing a prediction, and processing the output.
The return object should be one of the following:

For versions 1.4 and higher:
----------------------------
- a tuple with two items: the response data and ``accept_type`` (the content type of the response data), or
- the response data: (the content type of the response will be set to either the accept header in the initial request or a default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


For versions 1.3 and lower:
---------------------------
- a tuple with two items: the response data and ``accept_type`` (the content type of the response data), or
- a Flask response object: http://flask.pocoo.org/docs/1.0/api/#response-objects

Expand Down Expand Up @@ -802,23 +809,24 @@ Your MXNet training script will be run on version 1.2.1 by default. (See below f

The Docker images have the following dependencies installed:

+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5|
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 | MXNet 1.4.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.6|
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 9.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 | 1.16.3 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 | 1.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 | 2.2.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+

The Docker images extend Ubuntu 16.04.

You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.2``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.2.1.
Alternatively, you can build your own image by following the instructions in the SageMaker MXNet containers repository, and passing ``image_name`` to the MXNet Estimator constructor.

You can visit the SageMaker MXNet containers repository here: https://github.com/aws/sagemaker-mxnet-container
You can visit the SageMaker MXNet training containers repository here: https://github.com/aws/sagemaker-mxnet-container
You can visit the SageMaker MXNet serving containers repository here: https://github.com/aws/sagemaker-mxnet-serving-container
2 changes: 1 addition & 1 deletion src/sagemaker/fw_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
'Please add framework_version={} to your constructor to avoid this error.'

VALID_PY_VERSIONS = ['py2', 'py3']
VALID_EIA_FRAMEWORKS = ['tensorflow', 'tensorflow-serving', 'mxnet']
VALID_EIA_FRAMEWORKS = ['tensorflow', 'tensorflow-serving', 'mxnet', 'mxnet-serving']
VALID_ACCOUNTS_BY_REGION = {'us-gov-west-1': '246785580436',
'us-iso-east-1': '744548109606'}

Expand Down
29 changes: 21 additions & 8 deletions src/sagemaker/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@

import json
import logging
import os

import sagemaker
from sagemaker import fw_utils, local, session, utils
from sagemaker.fw_utils import UploadedCode
from sagemaker.transformer import Transformer

LOGGER = logging.getLogger('sagemaker')
Expand Down Expand Up @@ -408,6 +410,7 @@ def __init__(self, model_data, image, role, entry_point, source_dir=None, predic
else:
self.bucket, self.key_prefix = None, None
self.uploaded_code = None
self.repacked_model_data = None

def prepare_container_def(self, instance_type, accelerator_type=None): # pylint disable=unused-argument
"""Return a container definition with framework configuration set in model environment variables.
Expand All @@ -428,18 +431,28 @@ def prepare_container_def(self, instance_type, accelerator_type=None): # pylint
deploy_env.update(self._framework_env_vars())
return sagemaker.container_def(self.image, self.model_data, deploy_env)

def _upload_code(self, key_prefix):
def _upload_code(self, key_prefix, repack=False):
local_code = utils.get_config_value('local.local_code', self.sagemaker_session.config)
if self.sagemaker_session.local_mode and local_code:
self.uploaded_code = None
else:
bucket = self.bucket or self.sagemaker_session.default_bucket()
self.uploaded_code = fw_utils.tar_and_upload_dir(session=self.sagemaker_session.boto_session,
bucket=bucket,
s3_key_prefix=key_prefix,
script=self.entry_point,
directory=self.source_dir,
dependencies=self.dependencies)
if repack:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this part of the conditional should also include bucket = self.bucket or self.sagemaker_session.default_bucket() and pass that along to repack_model()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(talked offline.) fair enough. let's track it internally as a possible enhancement though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.repacked_model_data = utils.repack_model(inference_script=self.entry_point,
source_directory=self.source_dir,
model_uri=self.model_data,
sagemaker_session=self.sagemaker_session)

self.uploaded_code = UploadedCode(s3_prefix=self.repacked_model_data,
script_name=os.path.basename(self.entry_point))

else:
bucket = self.bucket or self.sagemaker_session.default_bucket()
self.uploaded_code = fw_utils.tar_and_upload_dir(session=self.sagemaker_session.boto_session,
bucket=bucket,
s3_key_prefix=key_prefix,
script=self.entry_point,
directory=self.source_dir,
dependencies=self.dependencies)

def _framework_env_vars(self):
if self.uploaded_code:
Expand Down
35 changes: 18 additions & 17 deletions src/sagemaker/mxnet/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Using MXNet with the SageMaker Python SDK

With the SageMaker Python SDK, you can train and host MXNet models on Amazon SageMaker.

Supported versions of MXNet: ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
Supported versions of MXNet: ``1.4.0``, ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.

Supported versions of MXNet for Elastic Inference: ``1.3.0``.
Supported versions of MXNet for Elastic Inference: ``1.3.0``, ``1.4.0``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the order here consistent with the others (i.e. 1.4 first)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow nice catch


For information about using MXNet with the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/using_mxnet.html.

Expand All @@ -15,29 +15,30 @@ SageMaker MXNet Containers

When training and deploying training scripts, SageMaker runs your Python script in a Docker container with several libraries installed. When creating the Estimator and calling deploy to create the SageMaker Endpoint, you can control the environment your script runs in.

SageMaker runs MXNet Estimator scripts in either Python 2.7 or Python 3.5. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.5. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.
SageMaker runs MXNet Estimator scripts in either Python 2.7 or Python 3.6. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.5. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • s/MXNet Estimator scripts/MXNet scripts
  • s/3.5/3.6 (there's a second one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chaned


Your MXNet training script will be run on version 1.2.1 by default. (See below for how to choose a different version, and currently supported versions.) The decision to use the GPU or CPU version of MXNet is made by the ``train_instance_type``, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.

The Docker images have the following dependencies installed:

+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5|
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 | MXNet 1.4.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.6|
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 9.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 | 1.16.3 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 | 1.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 | 2.2.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+

The Docker images extend Ubuntu 16.04.

You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.2``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.2.1.
Alternatively, you can build your own image by following the instructions in the SageMaker MXNet containers repository, and passing ``image_name`` to the MXNet Estimator constructor.

You can visit the SageMaker MXNet containers repository here: https://github.com/aws/sagemaker-mxnet-container
You can visit the SageMaker MXNet training containers repository here: https://github.com/aws/sagemaker-mxnet-container
You can visit the SageMaker MXNet serving containers repository here: https://github.com/aws/sagemaker-mxnet-serving-container
2 changes: 1 addition & 1 deletion src/sagemaker/mxnet/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class MXNet(Framework):
__framework_name__ = 'mxnet'
_LOWEST_SCRIPT_MODE_VERSION = ['1', '3']

LATEST_VERSION = '1.3'
LATEST_VERSION = '1.4'
"""The latest version of MXNet included in the SageMaker pre-built Docker images."""

def __init__(self, entry_point, source_dir=None, hyperparameters=None, py_version='py2',
Expand Down
16 changes: 13 additions & 3 deletions src/sagemaker/mxnet/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

import logging

from pkg_resources import parse_version

import sagemaker
from sagemaker.fw_utils import create_image_uri, model_code_key_prefix, python_deprecation_warning
from sagemaker.model import FrameworkModel, MODEL_SERVER_WORKERS_PARAM_NAME
Expand Down Expand Up @@ -45,6 +47,7 @@ class MXNetModel(FrameworkModel):
"""An MXNet SageMaker ``Model`` that can be deployed to a SageMaker ``Endpoint``."""

__framework_name__ = 'mxnet'
_LOWEST_MMS_VERSION = '1.4'

def __init__(self, model_data, role, entry_point, image=None, py_version='py2', framework_version=MXNET_VERSION,
predictor_cls=MXNetPredictor, model_server_workers=None, **kwargs):
Expand Down Expand Up @@ -89,17 +92,24 @@ def prepare_container_def(self, instance_type, accelerator_type=None):
Returns:
dict[str, str]: A container definition object usable with the CreateModel API.
"""
mms_version = parse_version(self.framework_version) >= parse_version(self._LOWEST_MMS_VERSION)

deploy_image = self.image
if not deploy_image:
region_name = self.sagemaker_session.boto_session.region_name
deploy_image = create_image_uri(region_name, self.__framework_name__, instance_type,

framework_name = self.__framework_name__
if mms_version:
framework_name += '-serving'

deploy_image = create_image_uri(region_name, framework_name, instance_type,
self.framework_version, self.py_version, accelerator_type=accelerator_type)

deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
self._upload_code(deploy_key_prefix)
self._upload_code(deploy_key_prefix, mms_version)
deploy_env = dict(self.env)
deploy_env.update(self._framework_env_vars())

if self.model_server_workers:
deploy_env[MODEL_SERVER_WORKERS_PARAM_NAME.upper()] = str(self.model_server_workers)
return sagemaker.container_def(deploy_image, self.model_data, deploy_env)
return sagemaker.container_def(deploy_image, self.repacked_model_data or self.model_data, deploy_env)
2 changes: 1 addition & 1 deletion src/sagemaker/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ def repack_model(inference_script, source_directory, model_uri, sagemaker_sessio
local_code_path = os.path.join(tmp, 'local_code.tar.gz')
download_file_from_url(source_directory, local_code_path, sagemaker_session)

with tarfile.open(name=local_model_path, mode='r:gz') as t:
with tarfile.open(name=local_code_path, mode='r:gz') as t:
t.extractall(path=code_dir)

elif source_directory:
Expand Down
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def chainer_version(request):


@pytest.fixture(scope='module', params=['0.12', '0.12.1', '1.0', '1.0.0', '1.1', '1.1.0', '1.2',
'1.2.1', '1.3', '1.3.0'])
'1.2.1', '1.3', '1.3.0', '1.4', '1.4.0'])
def mxnet_version(request):
return request.param

Expand Down
Loading