# Welcome to Duckietown!

This is the companion tutorial file for learning how to use Amazon AWS's Sagemaker tool to train your Duckietown AIDO submission... **in the cloud**!

We'll be building of our our [Reinforcement Learning](https://goo.gl/YFTjn3) Tutorial, where we take IL and use Sagemaker to train with speed!

This tutorial will walk you through, step by step, how to get your Sagemaker account running and using it to train a AIDO Lane Following Submission.

Some prerequisites we expect you to have:
1. An AWS Account (You can get one by signing up [here](https://aws.amazon.com/))
2. A good overview of the code we'll be looking at. We'll be building off [this repository ](https://github.com/duckietown/challenge-aido1_LF1-baseline-RL-sim-pytorch), and this code can be found [here](https://github.com/duckietown/aido-on-sagemaker). A good start would be the video tutorial posted above.
3. The ability to submit with `duckietown-shell` (which means you already have a [Duckietown Account](https://www.duckietown.org/research/ai-driving-olympics/ai-do-register)) as well as `git` on your computer
4. Understanding our more thorough [Pytorch Reinforcement Learning Tutorial on Sagemaker](https://github.com/duckietown/aido-on-sagemaker/blob/master/duckietown-pytorch-rl/duckietown-extending.ipynb).

In [None]:
!cat container/Dockerfile

### Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `container/build-and-push.sh`.

This code looks for an ECR repository in the account you're using and the current default **region** (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it. In addition, since we are using the SageMaker PyTorch image as the base, we will need to retrieve ECR credentials to pull this public image.

The main thing you want to note is the `algorithm_name`.

In [36]:
%%sh

# The name of our algorithm
algorithm_name=duckietown-imitation

cd container

chmod +x duckietown-il/train

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  20.99kB
Step 1/14 : FROM tensorflow/tensorflow:1.8.0-py3
 ---> a83a3dd79ff9
Step 2/14 : RUN apt-get update && apt-get install -y --no-install-recommends nginx curl
 ---> Using cache
 ---> 02279947e273
Step 3/14 : RUN apt-get install -y freeglut3-dev xvfb xorg-dev libglu1-mesa libgl1-mesa-dev libxinerama1 libxcursor1
 ---> Using cache
 ---> 425d968c770f
Step 4/14 : RUN apt-get install -y git python-pip
 ---> Using cache
 ---> 50f9284e138d
Step 5/14 : RUN git clone -b aido1_lf1_r3-v3 https://github.com/duckietown/gym-duckietown src/gym-duckietown
 ---> Using cache
 ---> e9562fae869c
Step 6/14 : RUN pip install -e src/gym-duckietown/
 ---> Using cache
 ---> 489894b58bd5
Step 7/14 : RUN pip install opencv-python
 ---> Using cache
 ---> 1e1dd599eee8
Step 8/14 : ENV PATH="/opt/ml/code:${PATH}"
 ---> Using cache
 ---> dbca79060914
Step 9/14 : COPY /duckietown-il /opt/ml/code/
 ---> Using cache
 ---> c241387eb877
Step 10/14 : ENV PYTHON

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## SageMaker Training
To represent our training, we use the Estimator class, which needs to be configured in five steps. 
1. IAM role - our AWS execution role
2. train_instance_count - number of instances to use for training.
3. train_instance_type - type of instance to use for training. For training locally, we specify `local` or `local_gpu`.
4. image_name - our custom PyTorch Docker image we created.
5. hyperparameters - hyperparameters we want to pass.

Our model will run `duckietown-il/train` which is still a Python script, just without the extension.

In [3]:
import os
import subprocess

from sagemaker import get_execution_role

role = get_execution_role()

instance_type = 'local'

if subprocess.call('nvidia-smi') == 0:
    ## Set type to GPU if one is present
    instance_type = 'local_gpu'
    
# When you're ready to really train: - Check the diff. instance types!
# for example...
# instance_type = 'ml.m4.xlarge'

print("Instance type = " + instance_type)

Instance type = local


In [None]:
from sagemaker.estimator import Estimator

estimator = Estimator(role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      image_name='duckietown-imitation:latest',
                      )

estimator.fit('file:///tmp', wait=False)
print("All done!")

INFO:sagemaker:Created S3 bucket: sagemaker-us-east-1-945394400746
INFO:sagemaker:Creating training-job with name: duckietown-imitation-2018-11-17-05-17-28-144


[{'DataUri': 'file:///tmp', 'ChannelName': 'training', 'DataSource': {'FileDataSource': {'FileDataDistributionType': 'FullyReplicated', 'FileUri': 'file:///tmp'}}}]
Creating tmp6apx_k_algo-1-V0EYO_1_9887a7e1d5d6 ... 
[1BAttaching to tmp6apx_k_algo-1-V0EYO_1_ccaa16d7374a2mdone[0m
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m Starting Xvfb
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m Executing command train
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m   from ._conv import register_converters as _register_converters
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m INFO:gym-duckietown:gym-duckietown 2018.10.1
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m 
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m INFO:gym-duckietown:Registering gym environment id: Duckietown-loop_pedestrians-v0
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m INFO:gym-duckietown:Registering gym environment id: Duckietown-zigzag_dists-v0
[36malgo-1-V0EYO_1_ccaa16d7374a |[0m INFO:gym-duckietown:Registering gym environment id: Duckietown-straight_road-v0
[36malgo-1-V0E

## Submitting Your Model

Now you're training succeeded, but unlike the Pytorch or Tensorflow tutorials, you don't see any output or models directory. This is one of the nice things about Sagemaker - they throw everything into S3 for you, so you don't have to worry about losing track of your models. It will be in your S3 bucket, which you can access by [this link](https://console.aws.amazon.com/s3/home). Click on your Sagemaker bucket, and download the `model.tar.gz` (this is what it is saved as by default, but if you'd like you can change it).

Now, you can follow the steps from the other tutorial. Clone this repository locally, navigate to the `duckietown-il/submission` directory, put your model in the right place, edit the `solution.py` as needed, and write `dts challenges submit`! It's that easy!