Support TensorFlow-1.5.0 and MXNet-1.0.0 (#82)

* Update .gitignore to ignore pytest_cache. * Support TensorFlow-1.5.0 and MXNet-1.0.0 * Update and refactor tests. Add tests for fw_utils. * Fix typo.
aws · Feb 27, 2018 · c8fa80b · c8fa80b
1 parent 1380a26
commit c8fa80b
Show file tree

Hide file tree

Showing 25 changed files with 828 additions and 223 deletions.
diff --git a/.gitignore b/.gitignore
@@ -21,3 +21,4 @@ doc/_build
 **/.DS_Store
 venv/
 *~
+.pytest_cache/
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -2,29 +2,35 @@
 CHANGELOG
 =========
 
+1.1.0
+=====
+
+* feature: Estimators: add support for TensorFlow-1.5.0
+* feature: Estimators: add support for MXNet-1.0.0
+
 1.0.4
 =====
 
 * feature: Estimators: add support for Amazon Neural Topic Model(NTM) algorithm
-* feature: Documentation: Fix description of an argument of sagemaker.session.train
-* feature: Documentation: Add FM and LDA to the documentation
+* feature: Documentation: fix description of an argument of sagemaker.session.train
+* feature: Documentation: add FM and LDA to the documentation
 * feature: Estimators: add support for async fit
 * bug-fix: Estimators: fix estimator role expansion
 
 1.0.3
 =====
 
 * feature: Estimators: add support for Amazon LDA algorithm
-* feature: Hyperparameters: Add data_type to hyperparameters
-* feature: Documentation: Update TensorFlow examples following API change
-* feature: Session: Support multi-part uploads
+* feature: Hyperparameters: add data_type to hyperparameters
+* feature: Documentation: update TensorFlow examples following API change
+* feature: Session: support multi-part uploads
 
 
 1.0.2
 =====
 
 * feature: Estimators: add support for Amazon FactorizationMachines algorithm
-* feature: Session: Correctly handle TooManyBuckets error_code in default_bucket method
+* feature: Session: correctly handle TooManyBuckets error_code in default_bucket method
 * feature: Tests: add training failure tests for TF and MXNet
 * feature: Documentation: show how to make predictions against existing endpoint
 * feature: Estimators: implement write_spmatrix_to_sparse_tensor to support any scipy.sparse matrix

diff --git a/README.rst b/README.rst
@@ -39,7 +39,7 @@ You can install from source by cloning this repository and issuing a pip install
 
     git clone https://github.com/aws/sagemaker-python-sdk.git
     python setup.py sdist
-    pip install dist/sagemaker-1.0.4.tar.gz
+    pip install dist/sagemaker-1.1.0.tar.gz
 
 Supported Python versions
 ~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -114,6 +114,8 @@ MXNet SageMaker Estimators
 
 With MXNet Estimators, you can train and host MXNet models on Amazon SageMaker.
 
+Supported versions of MXNet: ``1.0.0``, ``0.12.1``.
+
 Training with MXNet
 ~~~~~~~~~~~~~~~~~~~
 
@@ -185,7 +187,7 @@ If you want to run your training script locally via the Python interpreter, look
 Using MXNet and numpy
 ^^^^^^^^^^^^^^^^^^^^^
 
-You can import both ``mxnet`` and ``numpy`` in your training script. When your script runs in SageMaker, it will run with access to MXNet version 0.12 and numpy version 1.12.0. For more information on the environment your script runs in, please see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__.
+You can import both ``mxnet`` and ``numpy`` in your training script. When your script runs in SageMaker, it will run with access to MXNet version 1.0.0 and numpy version 1.13.3 by default. For more information on the environment your script runs in, please see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__.
 
 Running an MXNet training script in SageMaker
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -682,26 +684,33 @@ When training and deploying training scripts, SageMaker runs your Python script
 
 SageMaker runs MXNet Estimator scripts in either Python 2.7 or Python 3.5. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.5. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.
 
-Your MXNet training script will be run on version 0.12 of MXNet, built for either GPU or CPU use. The decision to use the GPU or CPU version of MXNet is made by the train_instance_type, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.
+Your MXNet training script will be run on version 1.0.0 (by default) or 0.12 of MXNet, built for either GPU or CPU use. The decision to use the GPU or CPU version of MXNet is made by the ``train_instance_type``, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.
 
-Each Docker container has the following dependencies installed:
+The Docker images have the following dependencies installed:
 
--  Python 2.7 or Python 3.5, depending on the ``py_version`` argument on
-   the MXNet constructor.
--  MXNet 0.12, built for either GPU or CPU, depending on the instance
-   type for training or deploying.
--  CUDA 9.0
--  numpy 1.12
++-------------------------+--------------+-------------+
+| Dependencies            | MXNet 0.12.1 | MXNet 1.0.0 |
++-------------------------+--------------+-------------+
+| Python                  |   2.7 or 3.5 |   2.7 or 3.5|
++-------------------------+--------------+-------------+
+| CUDA                    |          9.0 |         9.0 |
++-------------------------+--------------+-------------+
+| numpy                   |       1.13.3 |      1.13.3 |
++-------------------------+--------------+-------------+
 
 The Docker images extend Ubuntu 16.04.
 
+You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are ``1.0.0`` and ``0.12.1``. You can also set ``framework_version`` to ``1.0 (default)`` or ``0.12`` which will cause your training script to be run on the latest supported MXNet 1.0 or 0.12 versions respectively.
+
 TensorFlow SageMaker Estimators
 -------------------------------
 
 TensorFlow SageMaker Estimators allow you to run your own TensorFlow
 training algorithms on SageMaker Learner, and to host your own TensorFlow
 models on SageMaker Hosting.
 
+Supported versions of TensorFlow: ``1.4.1``, ``1.5.0``.
+
 Training with TensorFlow
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -735,7 +744,7 @@ Preparing the TensorFlow training script
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Your TensorFlow training script must be a **Python 2.7** source file. The current supported TensorFlow
-version is **1.4.0**. This training script **must contain** the following functions:
+versions are **1.5.0 (default)** and **1.4.1**. This training script **must contain** the following functions:
 
 - ``model_fn``: defines the model that will be trained.
 - ``train_input_fn``: preprocess and load training data.
@@ -1419,27 +1428,49 @@ A example with ``input_fn`` and ``output_fn`` above can be found in
 SageMaker TensorFlow Docker containers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The TensorFlow Docker container supports Python 2.7. The Docker container has the following Python modules installed:
-- awscli 1.12.1
-- boto3 1.4.7
-- botocore 1.5.92
-- futures 2.2.0
-- gevent 1.2.2
-- grpcio 1.7.0
-- numpy 1.13.3
-- pandas 0.21.0
-- protobuf 3.4.0
-- requests 2.14.2
-- scikit-learn 0.19.1
-- scipy 1.0.0
-- six 1.10.0
-- sklearn 0.0
-- tensorflow 1.4.0
-- tensorflow-serving-api 1.4.0
-- tensorflow-tensorboard 0.4.0rc2
+The TensorFlow Docker images support Python 2.7 and have the following Python modules installed:
+
++------------------------+------------------+------------------+
+| Dependencies           | tensorflow 1.4.1 | tensorflow 1.5.0 |
++------------------------+------------------+------------------+
+| awscli                 |           1.12.1 |          1.14.35 |
++------------------------+------------------+------------------+
+| boto3                  |            1.4.7 |           1.5.22 |
++------------------------+------------------+------------------+
+| botocore               |           1.5.92 |           1.8.36 |
++------------------------+------------------+------------------+
+| futures                |            2.2.0 |            2.2.0 |
++------------------------+------------------+------------------+
+| gevent                 |            1.2.2 |            1.2.2 |
++------------------------+------------------+------------------+
+| grpcio                 |            1.7.0 |            1.9.0 |
++------------------------+------------------+------------------+
+| numpy                  |           1.13.3 |           1.14.0 |
++------------------------+------------------+------------------+
+| pandas                 |           0.21.0 |           0.22.0 |
++------------------------+------------------+------------------+
+| protobuf               |            3.4.0 |            3.5.1 |
++------------------------+------------------+------------------+
+| requests               |           2.14.2 |           2.18.4 |
++------------------------+------------------+------------------+
+| scikit-learn           |           0.19.1 |           0.19.1 |
++------------------------+------------------+------------------+
+| scipy                  |            1.0.0 |            1.0.0 |
++------------------------+------------------+------------------+
+| six                    |           1.10.0 |           1.10.0 |
++------------------------+------------------+------------------+
+| sklearn                |              0.0 |              0.0 |
++------------------------+------------------+------------------+
+| tensorflow             |            1.4.1 |            1.5.0 |
++------------------------+------------------+------------------+
+| tensorflow-serving-api |            1.4.0 |            1.5.0 |
++------------------------+------------------+------------------+
+| tensorflow-tensorboard |            0.4.0 |            1.5.1 |
++------------------------+------------------+------------------+
 
 The Docker images extend Ubuntu 16.04.
 
+You can select version of TensorFlow by passing a ``framework_version`` keyword arg to the TensorFlow Estimator constructor. Currently supported versions are ``1.5.0`` and ``1.4.1``. You can also set ``framework_version`` to ``1.5 (default)`` or ``1.4`` which will cause your training script to be run on the latest supported TensorFlow 1.5 or 1.4 versions respectively.
 
 AWS SageMaker Estimators
 ------------------------

diff --git a/doc/conf.py b/doc/conf.py
@@ -18,7 +18,7 @@ def __getattr__(cls, name):
                 'tensorflow.python.framework', 'tensorflow_serving', 'tensorflow_serving.apis']
 sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)
 
-version = '1.0.4'
+version = '1.1.0'
 project = u'sagemaker'
 
 # Add any Sphinx extension module names here, as strings. They can be extensions

diff --git a/setup.py b/setup.py
@@ -11,7 +11,7 @@ def read(fname):
 
 
 setup(name="sagemaker",
-      version="1.0.4",
+      version="1.1.0",
       description="Open source library for training and deploying models on Amazon SageMaker.",
       packages=find_packages('src'),
       package_dir={'': 'src'},

diff --git a/src/sagemaker/fw_utils.py b/src/sagemaker/fw_utils.py
@@ -1,4 +1,4 @@
-# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License"). You
 # may not use this file except in compliance with the License. A copy of
@@ -28,26 +28,28 @@
 """
 
 
-def create_image_uri(region, framework, instance_type, py_version='py2', tag='1.0', account='520713654638'):
+def create_image_uri(region, framework, instance_type, framework_version, py_version, account='520713654638'):
     """Return the ECR URI of an image.
 
     Args:
         region (str): AWS region where the image is uploaded.
         framework (str): framework used by the image.
         instance_type (str): EC2 instance type. Used to determine whether to use the CPU image or GPU image.
-        py_version (str): Python version. (default: 'py2')
-        tag (str): ECR image tag, which denotes the image version. (default: '1.0')
+        framework_version (str): The version of the framework.
+        py_version (str): Python version. One of 'py2' or 'py3'.
         account (str): AWS account that contains the image. (default: '520713654638')
 
     Returns:
         str: The appropriate image URI based on the given parameters.
     """
-    device_version = 'cpu'
-    # Instance types that start with G, P are GPU powered: https://aws.amazon.com/ec2/instance-types/
+    device_type = 'cpu'
+    # Instance types that start with G, P are GPU powered: https://aws.amazon.com/sagemaker/pricing/instance-types/
     if instance_type[3] in ['g', 'p']:
-        device_version = 'gpu'
+        device_type = 'gpu'
+
+    tag = "{}-{}-{}".format(framework_version, device_type, py_version)
     return "{}.dkr.ecr.{}.amazonaws.com/sagemaker-{}-{}-{}:{}" \
-        .format(account, region, framework, py_version, device_version, tag)
+        .format(account, region, framework, py_version, device_type, tag)
 
 
 def tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory):
@@ -112,21 +114,37 @@ def framework_name_from_image(image_name):
         tuple: A tuple containing:
             str: The framework name
             str: The Python version
+            str: The image tag
     """
     # image name format: <account>.dkr.ecr.<region>.amazonaws.com/sagemaker-<framework>-<py_ver>-<device>:<tag>
-    sagemaker_pattern = re.compile('^(\d+)(\.)dkr(\.)ecr(\.)(.+)(\.)amazonaws.com(/)(.*)(:)(.*)$')
+    sagemaker_pattern = re.compile('^(\d+)(\.)dkr(\.)ecr(\.)(.+)(\.)amazonaws.com(/)(.*:.*)$')
     sagemaker_match = sagemaker_pattern.match(image_name)
     if sagemaker_match is None:
-        return None, None
+        return None, None, None
     else:
-        # extract framework and python version
-        name_pattern = re.compile('^sagemaker-(tensorflow|mxnet)-(py2|py3)-(cpu|gpu)$')
+        # extract framework, python version and image tag
+        name_pattern = re.compile('^sagemaker-(tensorflow|mxnet)-(py2|py3)-(cpu|gpu):(.*)$')
+
         name_match = name_pattern.match(sagemaker_match.group(8))
 
         if name_match is None:
-            return None, None
+            return None, None, None
         else:
-            return name_match.group(1), name_match.group(2)
+            return name_match.group(1), name_match.group(2), name_match.group(4)
+
+
+def framework_version_from_tag(image_tag):
+    """Extract the framework version from the image tag.
+
+    Args:
+        image_tag (str): Image tag, which should take the form '<framework_version>-<device>-<py_version>'
+
+    Returns:
+        str: The framework version.
+    """
+    tag_pattern = re.compile('^(.*)-(cpu|gpu)-(py2|py3)$')
+    tag_match = tag_pattern.match(image_tag)
+    return None if tag_match is None else tag_match.group(1)
 
 
 def parse_s3_url(url):

diff --git a/src/sagemaker/mxnet/__init__.py b/src/sagemaker/mxnet/__init__.py
@@ -1,4 +1,4 @@
-# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License"). You
 # may not use this file except in compliance with the License. A copy of
@@ -14,5 +14,3 @@
 from sagemaker.mxnet.model import MXNetModel, MXNetPredictor
 
 __all__ = [MXNet, MXNetModel, MXNetPredictor]
-
-DOCKER_TAG = "1.0"
diff --git a/src/sagemaker/mxnet/defaults.py b/src/sagemaker/mxnet/defaults.py
@@ -0,0 +1,13 @@
+# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"). You
+# may not use this file except in compliance with the License. A copy of
+# the License is located at
+#
+#     http://aws.amazon.com/apache2.0/
+#
+# or in the "license" file accompanying this file. This file is
+# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
+# ANY KIND, either express or implied. See the License for the specific
+# language governing permissions and limitations under the License.
+MXNET_VERSION = '1.0'