## MNIST Training and Prediction with SageMaker Chainer

[MNIST](http://yann.lecun.com/exdb/mnist/), the "Hello World" of machine learning, is a popular dataset for handwritten digit classification. It consists of 70,000 28x28 grayscale images labeled in 10 digit classes (0 to 9). This tutorial will show how to train a model to predict handwritten digits on the MNIST dataset by running a Chainer script on SageMaker using the sagemaker-python-sdk.

For more on the Chainer container, please visit the sagemaker-chainer-containers repository and the sagemaker-python-sdk repository:

* https://github.com/aws/sagemaker-chainer-containers
* https://github.com/aws/sagemaker-python-sdk

For more on Chainer, please visit the Chainer repository:

* https://github.com/chainer/chainer

This notebook is adapted from the [MNIST](https://github.com/chainer/chainer/tree/master/examples/mnist) example in the Chainer repository.

In [None]:
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments. Just change your estimator's train_instance_type to `local` or `local_gpu`. For more information, see [local mode](https://github.com/aws/sagemaker-python-sdk#local-mode).

In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU). Running following script will install docker-compose or nvidia-docker-compose and configure the notebook environment for you.

Note, you can only run a single local notebook at a time.

In [None]:
!/bin/bash ./setup.sh

## Download MNIST datasets

We can use Chainer's built-in `get_mnist()` method to download, import and preprocess the MNIST dataset.

In [None]:
import chainer

train, test = chainer.datasets.get_mnist()

## Parse, save, and upload the data

We save our data, then use `sagemaker_session.upload_data` to upload the data to an S3 location used for training. The return value identifies the S3 path to the uploaded data.

In [None]:
import os
import shutil
import numpy as np

train_images = np.array([data[0] for data in train])
train_labels = np.array([data[1] for data in train])
test_images = np.array([data[0] for data in test])
test_labels = np.array([data[1] for data in test])

try:
    os.makedirs('/tmp/data/train')
    os.makedirs('/tmp/data/test')

    np.savez('/tmp/data/train/train.npz', images=train_images, labels=train_labels)
    np.savez('/tmp/data/test/test.npz', images=test_images, labels=test_labels)

    train_input = sagemaker_session.upload_data(path=os.path.join('/tmp/data', 'train'), key_prefix='notebook/chainer/mnist')
    test_input = sagemaker_session.upload_data(path=os.path.join('/tmp/data', 'test'), key_prefix='notebook/chainer/mnist')
finally:
    shutil.rmtree('/tmp/data')

## Writing the Chainer script to run on Amazon SageMaker

### Training

We need to provide a training script that can run on the SageMaker platform. The training script is very similar to a training script you might run outside of SageMaker, but you can construct a `sagemaker_containers.env.TrainingEnv` instance to discover useful properties from the training environment, such as:

  * `training_env.model_dir (str)`: path to the directory to write model artifacts to. These artifacts are uploaded to S3 for model hosting.
  * `training_env.num_gpus (int): ` The number of GPUs available to the host.
  * `training_env.channel_input_dirs (dict of str: str)`: A map of input channel names (like 'train' and 'test') to filesystem paths to data in those input channels. 
  * `training_env.output_data_dir (str)`: The filesystem path to write output artifacts to. Output artifacts may include checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressed and uploaded to S3 to the same S3 prefix as the model artifacts.

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to `model_dir` so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an `argparse.ArgumentParser` instance.

Because the Chainer container imports your training script, you should always put your training code in a main guard (`if __name__=='__main__':`) so that the container does not inadvertently run your training code at the wrong point in execution.

For more on `TrainingEnv`, please visit https://github.com/aws/sagemaker-containers.

### Hosting and Inference

We will use the same script to host the Chainer model as we will use to train it (but this is not necessary -- we could use separate training and hosting scripts). In contrast with the training script, the hosting script requires you to implement functions with particular function signatures (or rely on defaults for those functions).

These function hooks load your model, deserialize data sent by a client, obtain inferences from your loaded model, and serialize predictions back to a client:


* **`model_fn(model_dir)`**: This function is invoked to load model artifacts from those written into `model_dir` during training.
* `input_fn(input_data, content_type)`: This function is invoked to deserialize prediction data when a prediction request is made. The return value is passed to predict_fn. `input_fn` accepts two arguments: `input_data`, which is the serialized input data in the body of the prediction request, and `content_type`, the MIME type of the data
  
  
* `predict_fn(input_data, model)`: This function accepts the return value of `input_fn` (as `input_data`) and the return value of `model_fn`, `model`, and returns inferences obtained from the model
  
  
* `output_fn(prediction, accept)`: This function is invoked to serialize the return value from `predict_fn`, passed in via `prediction`, back to the SageMaker client in response to prediction requests


`model_fn` is always required, but defaults exist for the remaining functions. These defaults can deserialize a NumPy array, invoking the model's `__call__` method on the input data, and serialize a NumPy array back to the client.

This notebook relies on the default `input_fn`, `predict_fn`, and `output_fn`. See the Chainer sentiment analysis notebook for an example of how one can implement these hosting functions.

Please examine the script below. Training occurs behind the main guard, and `model_fn` loads the model saved into `model_dir` during training.



For more on writing Chainer scripts to run on SageMaker, or for more on the Chainer container itself, please see the following repositories: 

* For writing Chainer scripts to run on SageMaker: https://github.com/aws/sagemaker-python-sdk
* For more on the Chainer container and default hosting functions: https://github.com/aws/sagemaker-chainer-containers


In [None]:
!cat 'chainer_mnist_single_machine.py'

## Create SageMaker chainer estimator

To run our Chainer training script on SageMaker, we construct a `sagemaker.chainer.estimator.Chainer` estimator, which accepts several constructor arguments:

* `entry_point`: The path to the Python script SageMaker runs for training and prediction.


* `train_instance_count`: An integer representing how many training instances to start.


* `train_instance_type`: The type of SageMaker instances for training. We pass the string `local` or `local_gpu` here to enable the local mode for training in the local environment. `local` is for cpu training and `local_gpu` is for gpu training. If you want to train on a remote instance, specify a SageMaker ML instance type here accordingly. See [Amazon SageMaker ML Instance Types](https://aws.amazon.com/sagemaker/pricing/instance-types/) for a list of instance types.


* `hyperparameters`: A dictionary passed to the `train` function as `hyperparameters`.

In [None]:
import subprocess

from sagemaker.chainer.estimator import Chainer

instance_type = 'local'

if subprocess.call('nvidia-smi') == 0:
    ## Set type to GPU if one is present
    instance_type = 'local_gpu'
    
print("Instance type = " + instance_type)

chainer_estimator = Chainer(entry_point='chainer_mnist_single_machine.py', role=role,
                            train_instance_count=1, train_instance_type=instance_type,
                            hyperparameters={'epochs': 3, 'batch_size': 128})

## Train on MNIST data in S3

After we've constructed our Chainer object, we can fit it using the MNIST data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our user script can simply read the data from disk.

In [None]:
chainer_estimator.fit({'train': train_input, 'test': test_input})

Our user script writes various artifacts, such as plots, to a directory `output_data_dir`, the contents of which SageMaker uploads to S3. Now we download and extract these artifacts.

In [None]:
import glob
import shutil

try:
    os.makedirs('output/single_machine_mnist')
except OSError:
    pass

chainer_training_job = chainer_estimator.latest_training_job.name

desc = chainer_estimator.sagemaker_session.sagemaker_client.describe_training_job(TrainingJobName=chainer_training_job)
output_data = desc['ModelArtifacts']['S3ModelArtifacts'].replace('model', 'output')
for file in glob.glob(output_data + '/**/*.png', recursive=True):
    shutil.copy(file, 'output/single_machine_mnist')

These plots show the accuracy and loss over each epoch:

In [None]:
from IPython.display import Image
from IPython.display import display

accuracy_graph = Image(filename = "output/single_machine_mnist/accuracy.png", width=800, height=800)
loss_graph = Image(filename = "output/single_machine_mnist/loss.png", width=800, height=800)

display(accuracy_graph, loss_graph)

## Deploy model to endpoint

After training, we deploy the model to an endpoint. Here we also specify instance_type to be `local` or `local_gpu` to deploy the model to the local environment.

In [None]:
predictor = chainer_estimator.deploy(initial_instance_count=1, instance_type=instance_type)

## Predict Hand-Written Digit

We can use this predictor returned by `deploy` to send inference requests to our locally-hosted model. Let's get some random test images in MNIST first.

In [None]:
import random

import matplotlib.pyplot as plt

num_samples = 5
indices = random.sample(range(test_images.shape[0] - 1), num_samples)
images, labels = test_images[indices], test_labels[indices]

for i in range(num_samples):
    plt.subplot(1,num_samples,i+1)
    plt.imshow(images[i].reshape(28, 28), cmap='gray')
    plt.title(labels[i])
    plt.axis('off')

Now let's see if we can make correct predictions.

In [None]:
prediction = predictor.predict(images)
predicted_label = prediction.argmax(axis=1)
print('The predicted labels are: {}'.format(predicted_label))

Now let's get some test data from you! Drawing into the image box loads the pixel data into a variable named 'data' in this notebook, which we can then pass to the Chainer predictor.

In [None]:
from IPython.display import HTML
HTML(open("input.html").read())

Now let's see if your writing can be recognized!

In [None]:
image = np.array(data, dtype=np.float32)
prediction = predictor.predict(image)
predicted_label = prediction.argmax(axis=1)[0]
print('What you wrote is: {}'.format(predicted_label))

## Clean resources

After you have finished with this example, remember to delete the prediction endpoint to release the instance associated with it.

In [None]:
chainer_estimator.delete_endpoint()