# AutoEncoder Example: Distributed training and Hyperparameter Tuning with Amazon SageMaker

Training the spatio-temporal stacked frame AutoEncoder is significatnly slower. For this reason We will use SageMaker in this notebook to optimize the model parameters and to train the model in parallel on multiple hosts. 

In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
print (role)

## Train and Deploy the standard way
You need to upload the training data to an S3 bucket: check the script ```upload_data.py``` how to upload data to S3.
Once the data is uploaded you can define the MXNet Estimator, which takes as argument an entry point ```train.py```, the role, the training instance type, the path where data is located and another path where the code shall be uploaded. If you don't indicate these paths, SageMaker will use a default bucket. Next we have to define the Deep Learning framework we want to use and the hyperparameters.

It can be useful for debugging purposes to define the instance type as local in the beginning. Then the SageMaker will execute your code in your local Notebook instance. 

In [None]:
from sagemaker.mxnet import MXNet

MY_S3_BUCKET = ''

mxnet_estimator = MXNet('train.py',
                        role=role,
                        train_instance_type='local',#'ml.m5.xlarge',
                        train_instance_count=1,
                        output_path='s3://MY_S3_BUCKET',
                        code_location='s3://MY_S3_BUCKET',
                        framework_version='1.3.0', py_version='py2',
                        hyperparameters={'batch_size': 16,
                         'epochs': 10,
                         'learning_rate': 0.0001,
                         'wd': 0.0})


Now we can call fit on our training data. Behind the scenes SageMaker spin up your EC2 instance indicated in ```train_instance_type``` (if not set to local). Once the instance is ready SageMaker will download a MXNet Docker container, and execute the function ```train()``` from ```train.py```, which creates and trains the model. After training the model is saved.

In [None]:

mxnet_estimator.fit({'train': 's3://MY_S3_BUCKET/data/input_data.npy'})

Once our autoencoder model is trained we can deploy it. Here we define that the endpoint shall run on a ```m5.xlarge``` instance, which does not provide any GPUs. Inference won't run very fast, but this instance type is therefore cheaper.

In [None]:
predictor = mxnet_estimator.deploy(instance_type='ml.m5.xlarge', initial_instance_count=1)

Now that the endpoint is ready, we can send requests to it. SageMaker provides standard code for model inference. But often it is useful to customize these functions, for this reason ```train.py``` overwrites the default ```model_fn```. In the following example we send a numpy array filled with zeros to the endpoint. The endpoint will verify the user request, parse the input, then load the model and return the inference results.

In [None]:
from sagemaker.predictor import numpy_deserializer, npy_serializer
import numpy as np

#predictor.accept = 'application/x-npy'
#predictor.content_type = 'application/x-npy'
#predictor.deserializer =  numpy_deserializer
#predictor.serializer =  npy_serializer
print(predictor.predict(np.zeros((10,10,227,277))))

## Run Hyperparameter Tuning Job
Define HyperparameterTuner, which takes our MXNetEstimator, some hyperparameters and the metric that shall be optimized.

In [None]:
from sagemaker.tuner import HyperparameterTuner, ContinuousParameter, IntegerParameter 

tuner = HyperparameterTuner(estimator=mxnet_estimator,  
                               objective_metric_name='loss',
                               hyperparameter_ranges={'learning_rate': ContinuousParameter(0.00001, 0.0001), 
                                                      'epochs': IntegerParameter(5,50),
                                                      'wd': ContinuousParameter(0, 0.001) },
                               metric_definitions=[{'Name': 'loss', 'Regex': 'loss:([0-9\\.]+)'}],
                               max_jobs=40,
                               max_parallel_jobs=5,
                               objective_type='Minimize')


Start the hyperparameter tuning jobs. This will create in total 40 tuning jobs.

In [None]:
tuner.fit({'train': 's3://MY_S3_BUCKET/data/input_data.npy'})

Once the jobs are ready we can pick the best one and deploy this as an endpoint:

In [None]:
tuner.deploy(instance_type='ml.m5.xlarge', initial_instance_count=1)

## Distributed Training
The goal is to run a data parallel jobs, where each host runs the same model training but on different data. 
Instead of using ```train.py``` as entry point, we use a modified version ```train_distributed.py```. The new script takes care of reading in the right input file, setting up the ```kvstore``` and gathering, merging the overall losses. We have to increase ```train_instance_count```, otherwise we won't run on multiple hosts.

In [None]:
mxnet_estimator = MXNet('train_distributed.py',
                        role=role,
                        train_instance_type='ml.m5.xlarge',
                        train_instance_count=2, 
                        output_path='s3://MY_S3_BUCKET',
                        code_location='s3://MY_S3_BUCKET',
                        framework_version='1.3.0', py_version='py2',
                        hyperparameters={'batch_size': 16,
                         'epochs': 10,
                         'learning_rate': 0.0001,
                         'wd': 0.0})

Start the distributed training.

In [None]:
mxnet_estimator.fit('s3://MY_S3_BUCKET/data')