*****************************
Make sure you are using the `conda_python3` Jupyter Kernel.
We will install the necessary libraries.
*****************************


# Training and hosting SageMaker Models using the Apache MXNet Module API

The **SageMaker Python SDK** makes it easy to train and deploy MXNet models. In this example, we train a simple neural network using the Apache MXNet [Module API](https://mxnet.apache.org/api/python/module/module.html) and the MNIST dataset. The MNIST dataset is widely used for handwritten digit classification, and consists of 70,000 labeled 28x28 pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits). The task at hand is to train a model using the 60,000 training images and subsequently test its classification accuracy on the 10,000 test images.

## Setup

First we need to define a few variables that will be needed later in the example.

### Install MXNet and SageMaker
_Note:  Ignore Warnings and Errors Below_

In [8]:
!pip uninstall -y mxnet

Uninstalling mxnet-1.5.1:
  Successfully uninstalled mxnet-1.5.1
[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [10]:
!pip3 uninstall -y mxnet

[33mSkipping mxnet as it is not installed.[0m


In [3]:
!pip3 install mxnet==1.5.1 --upgrade --ignore-installed --no-cache

Collecting mxnet==1.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/98/57/549a1ac1179c9d590cd8fa1c8b4c6bdeb581139f65e3790812c81ae98bbf/mxnet-1.5.1-py2.py3-none-manylinux1_x86_64.whl (23.1MB)
[K    100% |████████████████████████████████| 23.1MB 111.2MB/s ta 0:00:01
[?25hCollecting graphviz<0.9.0,>=0.8.1 (from mxnet==1.5.1)
  Downloading https://files.pythonhosted.org/packages/53/39/4ab213673844e0c004bed8a0781a0721a3f6bb23eb8854ee75c236428892/graphviz-0.8.4-py2.py3-none-any.whl
Collecting requests<3,>=2.20.0 (from mxnet==1.5.1)
[?25l  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
[K    100% |████████████████████████████████| 61kB 75.4MB/s ta 0:00:01
[?25hCollecting numpy<2.0.0,>1.16.0 (from mxnet==1.5.1)
[?25l  Downloading https://files.pythonhosted.org/packages/92/e6/45f71bd24f4e37629e9db5fb75caab919507deae6a5a257f9e4685a5f931/numpy-1.18.0-cp36-cp36m

In [4]:
!pip3 install sagemaker --upgrade --ignore-installed --no-cache --user

Collecting sagemaker
[?25l  Downloading https://files.pythonhosted.org/packages/d1/af/9f4c8faf81faee2d51ef4bd91f2e081f1a8f3ea2ae28836ac2cfd00d333f/sagemaker-1.50.0.tar.gz (291kB)
[K    100% |████████████████████████████████| 296kB 59.4MB/s ta 0:00:01
[?25hCollecting boto3>=1.10.32 (from sagemaker)
[?25l  Downloading https://files.pythonhosted.org/packages/3f/f9/9798c5221d45b637ae1f42f0e0467e3bdfc3af46769b6bc7a29d93b2ecf6/boto3-1.10.46-py2.py3-none-any.whl (128kB)
[K    100% |████████████████████████████████| 133kB 84.0MB/s ta 0:00:01
[?25hCollecting numpy>=1.9.0 (from sagemaker)
[?25l  Downloading https://files.pythonhosted.org/packages/92/e6/45f71bd24f4e37629e9db5fb75caab919507deae6a5a257f9e4685a5f931/numpy-1.18.0-cp36-cp36m-manylinux1_x86_64.whl (20.1MB)
[K    100% |████████████████████████████████| 20.1MB 109.3MB/s ta 0:00:01
[?25hCollecting protobuf>=3.1 (from sagemaker)
[?25l  Downloading https://files.pythonhosted.org/packages/ca/ac/838c8c8a5f33a58132dd2ad2a30329f6ae161

In [6]:
#!pip3 install requests==2.20.1 --user

## Restart the Kernel to Recognize New Dependencies Above

In [None]:
from IPython.display import display_html
display_html("<script>Jupyter.notebook.kernel.restart()</script>", raw=True)

In [None]:
!pip3 list

## Create the SageMaker Session

In [10]:
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

## Setup the Service Execution Role and Region
Get IAM role arn used to give training and hosting access to your data.  See the documentation for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the `sagemaker.get_execution_role()` with a the appropriate full IAM role arn string(s).

In [11]:
# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here if you wish.
bucket = sagemaker_session.default_bucket()

# Location to save your custom code in tar.gz format.
custom_code_upload_location = 's3://{}/customcode/mxnet'.format(bucket)

# Location where results of model training are saved.
model_artifacts_location = 's3://{}/artifacts'.format(bucket)

In [12]:
role = get_execution_role()
print('RoleARN:  {}\n'.format(role))

region = sagemaker_session.boto_session.region_name
print('Region:  {}'.format(region))

RoleARN:  arn:aws:iam::835319576252:role/service-role/AmazonSageMaker-ExecutionRole-20191006T135881

Region:  us-east-1


### The training script

The ``mnist_mxnet.py`` script provides all the code we need for training and hosting a SageMaker model. The script also checkpoints the model at the end of every epoch and saves the model graph, params and optimizer state in the folder `/opt/ml/checkpoints`. If the folder path does not exist then it will skip checkpointing. The script we will use is adaptated from Apache MXNet [MNIST tutorial (https://mxnet.incubator.apache.org/tutorials/python/mnist.html).



In [2]:
!cat ./src/mnist_mxnet.py

import argparse
import gzip
import json
import logging
import os
import struct

import mxnet as mx
import numpy as np


def load_data(path):
    with gzip.open(find_file(path, "labels.gz")) as flbl:
        struct.unpack(">II", flbl.read(8))
        labels = np.fromstring(flbl.read(), dtype=np.int8)
    with gzip.open(find_file(path, "images.gz")) as fimg:
        _, _, rows, cols = struct.unpack(">IIII", fimg.read(16))
        images = np.fromstring(fimg.read(), dtype=np.uint8).reshape(len(labels), rows, cols)
        images = images.reshape(images.shape[0], 1, 28, 28).astype(np.float32) / 255
    return labels, images


def find_file(root_path, file_name):
    for root, dirs, files in os.walk(root_path):
        if file_name in files:
            return os.path.join(root, file_name)


def build_graph():
    data = mx.sym.var('data')
    data = mx.sym.flatten(data=data)
    fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
    act1 = mx.sym.Activat

### SageMaker's MXNet Estimator Class

The SageMaker ```MXNet``` estimator allows us to run single machine or distributed training in SageMaker, using CPU or GPU-based instances.

When we create the estimator, we pass in the filename of our training script, the name of our IAM execution role, and the S3 locations we defined in the setup section. We also provide a few other parameters. ``train_instance_count`` and ``train_instance_type`` determine the number and type of SageMaker instances that will be used for the training job. The ``hyperparameters`` parameter is a ``dict`` of values that will be passed to your training script -- you can see how to access these values in the ``mnist.py`` script above.

For this example, we will choose one ``ml.m4.xlarge`` instance.

In [3]:
from sagemaker.mxnet import MXNet

mnist_estimator = MXNet(entry_point='mnist_mxnet.py',
                        source_dir='./src',
                        role=role,
                        output_path=model_artifacts_location,
                        code_location=custom_code_upload_location,
                        train_instance_count=1,
                        train_instance_type='ml.m4.xlarge',
                        framework_version='1.4.1',
                        py_version='py3',
                        distributions={'parameter_server': {'enabled': True}},
                        hyperparameters={'learning-rate': 0.1})

### Running the Training Job

After we've constructed our MXNet object, we can fit it using data stored in S3. Below we run SageMaker training on two input channels: **train** and **test**.

During training, SageMaker makes this data stored in S3 available in the local filesystem where the mnist script is running. The ```mnist.py``` script simply loads the train and test data from disk.

In [13]:
import boto3

region = boto3.Session().region_name
train_data_location = 's3://sagemaker-sample-data-{}/mxnet/mnist/train'.format(region)
test_data_location = 's3://sagemaker-sample-data-{}/mxnet/mnist/test'.format(region)

mnist_estimator.fit({'train': train_data_location, 'test': test_data_location}, wait=False)

training_job_name = mnist_estimator.latest_training_job.name
print('training_job_name:  {}'.format(training_job_name))

training_job_name:  mxnet-training-2020-01-06-18-54-07-615
CPU times: user 34.9 ms, sys: 4.56 ms, total: 39.4 ms
Wall time: 344 ms


In [None]:
from sagemaker.mxnet import MXNet

mnist_estimator = MXNet.attach(training_job_name=training_job_name)

2020-01-06 18:54:09 Starting - Launching requested ML instances.

In [5]:
!aws --region {region} s3 ls --recursive {model_output_path}/{training_job_name}


Invalid endpoint: https://s3.{region}.amazonaws.com


In [None]:
print(model_output_path)

### Creating an inference Endpoint

After training, we use the ``MXNet estimator`` object to build and deploy an ``MXNetPredictor``. This creates a Sagemaker **Endpoint** -- a hosted prediction service that we can use to perform inference. 

The arguments to the ``deploy`` function allow us to set the number and type of instances that will be used for the Endpoint. These do not need to be the same as the values we used for the training job. For example, you can train a model on a set of GPU-based instances, and then deploy the Endpoint to a fleet of CPU-based instances. Here we will deploy the model to a single ``ml.m4.xlarge`` instance.

In [None]:
%%time

predictor = mnist_estimator.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

The request handling behavior of the Endpoint is determined by the ``mnist_mxnet.py`` script. In this case, the script doesn't include any request handling functions, so the Endpoint will use the default handlers provided by SageMaker. These default handlers allow us to perform inference on input data encoded as a multi-dimensional JSON array.

### Making an inference request

Now that our Endpoint is deployed and we have a ``predictor`` object, we can use it to classify handwritten digits.

To see inference in action, draw a digit in the image box below. The pixel data from your drawing will be loaded into a ``data`` variable in this notebook. 

*Note: after drawing the image, you'll need to move to the next notebook cell.*

In [None]:
from IPython.display import HTML
HTML(open("input.html").read())

Now we can use the ``predictor`` object to classify the handwritten digit:

In [None]:
response = predictor.predict(data)
print('Raw prediction result:')
response = response[0]
print(response)

labeled_predictions = list(zip(range(10), response))
print('Labeled predictions: ')
print(labeled_predictions)

labeled_predictions.sort(key=lambda label_and_prob: 1.0 - label_and_prob[1])
print('Most likely answer: {}'.format(labeled_predictions[0]))

# (Optional) Delete the Endpoint

After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it.

In [None]:
print("Endpoint name: " + predictor.endpoint)

In [None]:
import sagemaker

predictor.delete_endpoint()