# Welcome to Duckietown!

This is the companion tutorial file for learning how to use Amazon AWS's Sagemaker tool to train your Duckietown AIDO submission... **in the cloud**!

We'll be building of our our [Reinforcement Learning](https://goo.gl/YFTjn3) Tutorial, where we take DDPG and use Sagemaker to train with speed!

This tutorial will walk you through, step by step, how to get your Sagemaker account running and using it to train a AIDO Lane Following Submission.

Some prerequisites we expect you to have:
1. An AWS Account (You can get one by signing up [here](https://aws.amazon.com/)
2. A good overview of the code we'll be looking at. We'll be building off [this repository ](https://github.com/duckietown/challenge-aido1_LF1-baseline-RL-sim-pytorch), and this code can be found [here](???). A good start would be the video tutorial posted above.

We've broken this tutorial down into four parts:

1. Getting Started with AWS and Sagemaker
2. Walking through the code
3. Submitting your model
4. Improvements and Faster Training with Sagemaker

### The parts of the sample container

The `container` directory has all the components you need to extend the SageMaker PyTorch container to use as an sample algorithm:

    .
    ├── Dockerfile
    ├── build_and_push.sh
    └── cifar10
        ├── cifar10.py

Let's discuss each of these in turn:

* __`Dockerfile`__ describes how to build your Docker container image. More details are provided below.
* __`build_and_push.sh`__ is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.
* __`cifar10`__ is the directory which contains our user code to be invoked.

In this simple application, we install only one file in the container. You may only need that many, but if you have many supporting routines, you may wish to install more.

The files that we put in the container are:

* __`cifar10.py`__ is the program that implements our training algorithm and handles loading our model for inferences.

### The Dockerfile

The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations. 

We start from the SageMaker PyTorch image as the base. The base image is an ECR image, so it will have the following pattern.
* {account}.dkr.ecr.{region}.amazonaws.com/sagemaker-{framework}:{framework_version}-{processor_type}-{python_version}

Here is an explanation of each field.
1. account - AWS account ID the ECR image belongs to. Our public deep learning framework images are all under the 520713654638 account.
2. region - The region the ECR image belongs to. [Available regions](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/).
3. framework - The deep learning framework.
4. framework_version - The version of the deep learning framework.
5. processor_type - CPU or GPU.
6. python_version - The supported version of Python.

So the SageMaker PyTorch ECR image would be:
520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-pytorch:0.4.0-cpu-py3

Information on supported frameworks and versions can be found in this [README](https://github.com/aws/sagemaker-python-sdk).

Next, we add the code that implements our specific algorithm to the container and set up the right environment for it to run under.

**DISCLAIMER: As of now, the support for the two environment variables below are only supported for the SageMaker Chainer (4.1.0+) and PyTorch (0.4.0+) containers.**

Finally, we need to specify two environment variables.
1. SAGEMAKER_SUBMIT_DIRECTORY - the directory within the container containing our Python script for training and inference.
2. SAGEMAKER_PROGRAM - the Python script that should be invoked for training and inference.

Let's look at the Dockerfile for this example.

In [1]:
!cat container/Dockerfile

# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
#     http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.

# For more information on creating a Dockerfile
# https://docs.docker.com/compose/gettingstarted/#step-2-create-a-dockerfile
# https://github.com/awslabs/amazon-sagemaker-examples/master/advanced_functionality/pytorch_extending_our_containers/pytorch_extending_our_containers.ipynb
ARG REGION=us-east-1

# SageMaker PyTorch image

# needs to also be gpu??
FROM 520713654638.dkr.ecr.$REGION.amazonaws.com/sage

### Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `container/build-and-push.sh`, which you can run as `build-and-push.sh pytorch-extending-our-containers-cifar10-example` to build the image `pytorch-extending-our-containers-cifar10-example`. 

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it. In addition, since we are using the SageMaker PyTorch image as the base, we will need to retrieve ECR credentials to pull this public image.

In [12]:
%%sh

# NEED TO ADD AmazonEC2ContainerRegistryFullAccess policy

# The name of our algorithm
algorithm_name=duckietown-extending

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
# region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Get the login command from ECR in order to pull down the SageMaker PyTorch image
$(aws ecr get-login --registry-ids 520713654638 --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} . --build-arg REGION=${region}
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Login Succeeded
Sending build context to Docker daemon  35.33kB
Step 1/10 : ARG REGION=us-east-1
Step 2/10 : FROM 520713654638.dkr.ecr.$REGION.amazonaws.com/sagemaker-pytorch:0.4.0-cpu-py3
 ---> 3b7a3b2dfec0
Step 3/10 : RUN apt-get install -y freeglut3-dev xvfb xorg-dev libglu1-mesa libgl1-mesa-dev libxinerama1 libxcursor1
 ---> Using cache
 ---> 637529d2a75b
Step 4/10 : RUN git clone -b aido1_lf1_r3-v3 https://github.com/duckietown/gym-duckietown src/gym-duckietown
 ---> Using cache
 ---> b99e07fb24c2
Step 5/10 : RUN pip install -e src/gym-duckietown/
 ---> Using cache
 ---> 0c4323bb88e4
Step 6/10 : ENV PATH="/opt/ml/code:${PATH}"
 ---> Using cache
 ---> 6969ed5f5298
Step 7/10 : COPY /duckietown-rl /opt/ml/code/duckietown-rl
 ---> a7e4f6846620
Step 8/10 : ENV PYTHONPATH="/opt/ml/code/duckietown-rl:/opt/ml/code/:${PYTHONPATH}"
 ---> Running in a68c141ff7f2
Removing intermediate container a68c141ff7f2
 ---> 379f66570fe6
Step 9/10 : ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/m

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Testing your algorithm on your local machine

When you're packaging your first algorithm to use with Amazon SageMaker, you probably want to test it yourself to make sure it's working correctly. We use the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to test both locally and on SageMaker. For more examples with the SageMaker Python SDK, see [Amazon SageMaker Examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk). In order to test our algorithm, we need our dataset.

## SageMaker Python SDK Local Training
To represent our training, we use the Estimator class, which needs to be configured in five steps. 
1. IAM role - our AWS execution role
2. train_instance_count - number of instances to use for training.
3. train_instance_type - type of instance to use for training. For training locally, we specify `local` or `local_gpu`.
4. image_name - our custom PyTorch Docker image we created.
5. hyperparameters - hyperparameters we want to pass.

Let's start with setting up our IAM role. We make use of a helper function within the Python SDK. This function throw an exception if run outside of a SageMaker notebook instance, as it gets metadata from the notebook instance. If running outside, you must provide an IAM role with proper access stated above in [Permissions](#Permissions).

In [3]:
from sagemaker import get_execution_role

role = get_execution_role()

## Fit, Deploy, Predict

Now that the rest of our estimator is configured, we can call `fit()` with the path to our local CIFAR10 dataset prefixed with `file://`. This invokes our PyTorch container with 'train' and passes in our hyperparameters and other metadata as json files in /opt/ml/input/config within the container to our program entry point defined in the Dockerfile.

After our training has succeeded, our training algorithm outputs our trained model within the /opt/ml/model directory, which is used to handle predictions.

We can then call `deploy()` with an instance_count and instance_type, which is 1 and `local`. This invokes our PyTorch container with 'serve', which setups our container to handle prediction requests as defined [here](https://github.com/aws/sagemaker-pytorch-container/blob/master/src/sagemaker_pytorch_container/serving.py#L103). What is returned is a predictor, which is used to make inferences against our trained model.

After our prediction, we can delete our endpoint.

We recommend testing and training your training algorithm locally first, as it provides quicker iterations and better debuggability.

In [4]:
# Lets set up our SageMaker notebook instance for local mode.
!/bin/bash ./utils/setup.sh

SageMaker instance route table setup is ok. We are good to go.
SageMaker instance routing for Docker is ok. We are good to go!


In [5]:
import os
import subprocess

instance_type = 'local'

if subprocess.call('nvidia-smi') == 0:
    ## Set type to GPU if one is present
    instance_type = 'local_gpu'
    
print("Instance type = " + instance_type)

Instance type = local


In [None]:
from sagemaker.estimator import Estimator

hyperparameters = {'max_timesteps': 300}

estimator = Estimator(role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      image_name='duckietown-extending:latest',
                      )

estimator.fit('file:///tmp')
