Skip to content

Commit

Permalink
Support TensorFlow-1.5.0 and MXNet-1.0.0 (#82)
Browse files Browse the repository at this point in the history
* Update .gitignore to ignore pytest_cache.

* Support TensorFlow-1.5.0 and MXNet-1.0.0

* Update and refactor tests. Add tests for fw_utils.

* Fix typo.
  • Loading branch information
nadiaya committed Feb 27, 2018
1 parent 1380a26 commit c8fa80b
Show file tree
Hide file tree
Showing 25 changed files with 828 additions and 223 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ doc/_build
**/.DS_Store
venv/
*~
.pytest_cache/
18 changes: 12 additions & 6 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,35 @@
CHANGELOG
=========

1.1.0
=====

* feature: Estimators: add support for TensorFlow-1.5.0
* feature: Estimators: add support for MXNet-1.0.0

1.0.4
=====

* feature: Estimators: add support for Amazon Neural Topic Model(NTM) algorithm
* feature: Documentation: Fix description of an argument of sagemaker.session.train
* feature: Documentation: Add FM and LDA to the documentation
* feature: Documentation: fix description of an argument of sagemaker.session.train
* feature: Documentation: add FM and LDA to the documentation
* feature: Estimators: add support for async fit
* bug-fix: Estimators: fix estimator role expansion

1.0.3
=====

* feature: Estimators: add support for Amazon LDA algorithm
* feature: Hyperparameters: Add data_type to hyperparameters
* feature: Documentation: Update TensorFlow examples following API change
* feature: Session: Support multi-part uploads
* feature: Hyperparameters: add data_type to hyperparameters
* feature: Documentation: update TensorFlow examples following API change
* feature: Session: support multi-part uploads


1.0.2
=====

* feature: Estimators: add support for Amazon FactorizationMachines algorithm
* feature: Session: Correctly handle TooManyBuckets error_code in default_bucket method
* feature: Session: correctly handle TooManyBuckets error_code in default_bucket method
* feature: Tests: add training failure tests for TF and MXNet
* feature: Documentation: show how to make predictions against existing endpoint
* feature: Estimators: implement write_spmatrix_to_sparse_tensor to support any scipy.sparse matrix
Expand Down
89 changes: 60 additions & 29 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ You can install from source by cloning this repository and issuing a pip install

git clone https://github.com/aws/sagemaker-python-sdk.git
python setup.py sdist
pip install dist/sagemaker-1.0.4.tar.gz
pip install dist/sagemaker-1.1.0.tar.gz

Supported Python versions
~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -114,6 +114,8 @@ MXNet SageMaker Estimators

With MXNet Estimators, you can train and host MXNet models on Amazon SageMaker.

Supported versions of MXNet: ``1.0.0``, ``0.12.1``.

Training with MXNet
~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -185,7 +187,7 @@ If you want to run your training script locally via the Python interpreter, look
Using MXNet and numpy
^^^^^^^^^^^^^^^^^^^^^

You can import both ``mxnet`` and ``numpy`` in your training script. When your script runs in SageMaker, it will run with access to MXNet version 0.12 and numpy version 1.12.0. For more information on the environment your script runs in, please see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__.
You can import both ``mxnet`` and ``numpy`` in your training script. When your script runs in SageMaker, it will run with access to MXNet version 1.0.0 and numpy version 1.13.3 by default. For more information on the environment your script runs in, please see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__.

Running an MXNet training script in SageMaker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -682,26 +684,33 @@ When training and deploying training scripts, SageMaker runs your Python script
SageMaker runs MXNet Estimator scripts in either Python 2.7 or Python 3.5. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.5. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.
Your MXNet training script will be run on version 0.12 of MXNet, built for either GPU or CPU use. The decision to use the GPU or CPU version of MXNet is made by the train_instance_type, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.
Your MXNet training script will be run on version 1.0.0 (by default) or 0.12 of MXNet, built for either GPU or CPU use. The decision to use the GPU or CPU version of MXNet is made by the ``train_instance_type``, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.
Each Docker container has the following dependencies installed:
The Docker images have the following dependencies installed:
- Python 2.7 or Python 3.5, depending on the ``py_version`` argument on
the MXNet constructor.
- MXNet 0.12, built for either GPU or CPU, depending on the instance
type for training or deploying.
- CUDA 9.0
- numpy 1.12
+-------------------------+--------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 |
+-------------------------+--------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5|
+-------------------------+--------------+-------------+
| CUDA | 9.0 | 9.0 |
+-------------------------+--------------+-------------+
| numpy | 1.13.3 | 1.13.3 |
+-------------------------+--------------+-------------+
The Docker images extend Ubuntu 16.04.
You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are ``1.0.0`` and ``0.12.1``. You can also set ``framework_version`` to ``1.0 (default)`` or ``0.12`` which will cause your training script to be run on the latest supported MXNet 1.0 or 0.12 versions respectively.
TensorFlow SageMaker Estimators
-------------------------------
TensorFlow SageMaker Estimators allow you to run your own TensorFlow
training algorithms on SageMaker Learner, and to host your own TensorFlow
models on SageMaker Hosting.
Supported versions of TensorFlow: ``1.4.1``, ``1.5.0``.
Training with TensorFlow
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -735,7 +744,7 @@ Preparing the TensorFlow training script
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your TensorFlow training script must be a **Python 2.7** source file. The current supported TensorFlow
version is **1.4.0**. This training script **must contain** the following functions:
versions are **1.5.0 (default)** and **1.4.1**. This training script **must contain** the following functions:
- ``model_fn``: defines the model that will be trained.
- ``train_input_fn``: preprocess and load training data.
Expand Down Expand Up @@ -1419,27 +1428,49 @@ A example with ``input_fn`` and ``output_fn`` above can be found in
SageMaker TensorFlow Docker containers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The TensorFlow Docker container supports Python 2.7. The Docker container has the following Python modules installed:
- awscli 1.12.1
- boto3 1.4.7
- botocore 1.5.92
- futures 2.2.0
- gevent 1.2.2
- grpcio 1.7.0
- numpy 1.13.3
- pandas 0.21.0
- protobuf 3.4.0
- requests 2.14.2
- scikit-learn 0.19.1
- scipy 1.0.0
- six 1.10.0
- sklearn 0.0
- tensorflow 1.4.0
- tensorflow-serving-api 1.4.0
- tensorflow-tensorboard 0.4.0rc2
The TensorFlow Docker images support Python 2.7 and have the following Python modules installed:
+------------------------+------------------+------------------+
| Dependencies | tensorflow 1.4.1 | tensorflow 1.5.0 |
+------------------------+------------------+------------------+
| awscli | 1.12.1 | 1.14.35 |
+------------------------+------------------+------------------+
| boto3 | 1.4.7 | 1.5.22 |
+------------------------+------------------+------------------+
| botocore | 1.5.92 | 1.8.36 |
+------------------------+------------------+------------------+
| futures | 2.2.0 | 2.2.0 |
+------------------------+------------------+------------------+
| gevent | 1.2.2 | 1.2.2 |
+------------------------+------------------+------------------+
| grpcio | 1.7.0 | 1.9.0 |
+------------------------+------------------+------------------+
| numpy | 1.13.3 | 1.14.0 |
+------------------------+------------------+------------------+
| pandas | 0.21.0 | 0.22.0 |
+------------------------+------------------+------------------+
| protobuf | 3.4.0 | 3.5.1 |
+------------------------+------------------+------------------+
| requests | 2.14.2 | 2.18.4 |
+------------------------+------------------+------------------+
| scikit-learn | 0.19.1 | 0.19.1 |
+------------------------+------------------+------------------+
| scipy | 1.0.0 | 1.0.0 |
+------------------------+------------------+------------------+
| six | 1.10.0 | 1.10.0 |
+------------------------+------------------+------------------+
| sklearn | 0.0 | 0.0 |
+------------------------+------------------+------------------+
| tensorflow | 1.4.1 | 1.5.0 |
+------------------------+------------------+------------------+
| tensorflow-serving-api | 1.4.0 | 1.5.0 |
+------------------------+------------------+------------------+
| tensorflow-tensorboard | 0.4.0 | 1.5.1 |
+------------------------+------------------+------------------+
The Docker images extend Ubuntu 16.04.
You can select version of TensorFlow by passing a ``framework_version`` keyword arg to the TensorFlow Estimator constructor. Currently supported versions are ``1.5.0`` and ``1.4.1``. You can also set ``framework_version`` to ``1.5 (default)`` or ``1.4`` which will cause your training script to be run on the latest supported TensorFlow 1.5 or 1.4 versions respectively.
AWS SageMaker Estimators
------------------------
Expand Down
2 changes: 1 addition & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def __getattr__(cls, name):
'tensorflow.python.framework', 'tensorflow_serving', 'tensorflow_serving.apis']
sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)

version = '1.0.4'
version = '1.1.0'
project = u'sagemaker'

# Add any Sphinx extension module names here, as strings. They can be extensions
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def read(fname):


setup(name="sagemaker",
version="1.0.4",
version="1.1.0",
description="Open source library for training and deploying models on Amazon SageMaker.",
packages=find_packages('src'),
package_dir={'': 'src'},
Expand Down
46 changes: 32 additions & 14 deletions src/sagemaker/fw_utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
Expand Down Expand Up @@ -28,26 +28,28 @@
"""


def create_image_uri(region, framework, instance_type, py_version='py2', tag='1.0', account='520713654638'):
def create_image_uri(region, framework, instance_type, framework_version, py_version, account='520713654638'):
"""Return the ECR URI of an image.
Args:
region (str): AWS region where the image is uploaded.
framework (str): framework used by the image.
instance_type (str): EC2 instance type. Used to determine whether to use the CPU image or GPU image.
py_version (str): Python version. (default: 'py2')
tag (str): ECR image tag, which denotes the image version. (default: '1.0')
framework_version (str): The version of the framework.
py_version (str): Python version. One of 'py2' or 'py3'.
account (str): AWS account that contains the image. (default: '520713654638')
Returns:
str: The appropriate image URI based on the given parameters.
"""
device_version = 'cpu'
# Instance types that start with G, P are GPU powered: https://aws.amazon.com/ec2/instance-types/
device_type = 'cpu'
# Instance types that start with G, P are GPU powered: https://aws.amazon.com/sagemaker/pricing/instance-types/
if instance_type[3] in ['g', 'p']:
device_version = 'gpu'
device_type = 'gpu'

tag = "{}-{}-{}".format(framework_version, device_type, py_version)
return "{}.dkr.ecr.{}.amazonaws.com/sagemaker-{}-{}-{}:{}" \
.format(account, region, framework, py_version, device_version, tag)
.format(account, region, framework, py_version, device_type, tag)


def tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory):
Expand Down Expand Up @@ -112,21 +114,37 @@ def framework_name_from_image(image_name):
tuple: A tuple containing:
str: The framework name
str: The Python version
str: The image tag
"""
# image name format: <account>.dkr.ecr.<region>.amazonaws.com/sagemaker-<framework>-<py_ver>-<device>:<tag>
sagemaker_pattern = re.compile('^(\d+)(\.)dkr(\.)ecr(\.)(.+)(\.)amazonaws.com(/)(.*)(:)(.*)$')
sagemaker_pattern = re.compile('^(\d+)(\.)dkr(\.)ecr(\.)(.+)(\.)amazonaws.com(/)(.*:.*)$')
sagemaker_match = sagemaker_pattern.match(image_name)
if sagemaker_match is None:
return None, None
return None, None, None
else:
# extract framework and python version
name_pattern = re.compile('^sagemaker-(tensorflow|mxnet)-(py2|py3)-(cpu|gpu)$')
# extract framework, python version and image tag
name_pattern = re.compile('^sagemaker-(tensorflow|mxnet)-(py2|py3)-(cpu|gpu):(.*)$')

name_match = name_pattern.match(sagemaker_match.group(8))

if name_match is None:
return None, None
return None, None, None
else:
return name_match.group(1), name_match.group(2)
return name_match.group(1), name_match.group(2), name_match.group(4)


def framework_version_from_tag(image_tag):
"""Extract the framework version from the image tag.
Args:
image_tag (str): Image tag, which should take the form '<framework_version>-<device>-<py_version>'
Returns:
str: The framework version.
"""
tag_pattern = re.compile('^(.*)-(cpu|gpu)-(py2|py3)$')
tag_match = tag_pattern.match(image_tag)
return None if tag_match is None else tag_match.group(1)


def parse_s3_url(url):
Expand Down
4 changes: 1 addition & 3 deletions src/sagemaker/mxnet/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
Expand All @@ -14,5 +14,3 @@
from sagemaker.mxnet.model import MXNetModel, MXNetPredictor

__all__ = [MXNet, MXNetModel, MXNetPredictor]

DOCKER_TAG = "1.0"
13 changes: 13 additions & 0 deletions src/sagemaker/mxnet/defaults.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
MXNET_VERSION = '1.0'

0 comments on commit c8fa80b

Please sign in to comment.