# Introduction

---

**This is a work in progress...**

---

This notebook outlines the steps required to build a custom SageMaker RL container that includes recent versions of [TensorFlow (v2.1)](https://www.tensorflow.org/) and [Ray RLlib (v0.8.2)](https://ray.readthedocs.io/en/latest/rllib.html#).

The notebook then shows how to use the custom container to create a SageMaker training job. The training job applies RLlib's multi-agent PPO to the [Battlesnake gym](https://github.com/awslabs/sagemaker-battlesnake-ai) in order to train a unique Battlesnake policy model per agent. The intention is to use the trained models to compete in the online [Battlesnake competition](https://play.battlesnake.com/).

The resulting models will be stored in S3 as RLlib checkpoints (note: these are not native TensorFlow models, although we __can__ export those too)

Both GPU and CPU training instances are supported. The training script will use all CPUs and any GPUs that are present, by default. Only single-instance training is supported at this time.

Inference via SageMaker endpoints is not yet supported via this notebook, although the PPO models can be evaluated using an RLlib rollout script (ask perrysc@) and the training results can also be viewed in Tensorboard (extract the model.tar.gz file and point Tensorboard to the resulting directory via --logdir)

**Note:** the custom container is only required until the new version of the managed TF/RLlib container is released by the SageMaker team (soon). See: https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/rl

### Known Issues
* The training script generates a load of `Box bound precision lowered by casting to float32` warnings due to a cast to np.float32. The warnings appear benign, but should be fixed
* Running the training script with a map size other than 11x11 will likely require that you edit the `cnn_tf.py` file and adjust the CNN layer configuration. This should be fixed so that it happens automatically, but that hasn't been implemented yet
* The default hyperparameters will produce somewhat interesting/functional models, but no HPO has taken place yet
* Because this custom container is built on the SageMaker RL TensorFlow container, it expects that TensorFlow models will be generated. As such, a successful RLlib training run will generate a warning at the end of the job `Your model will NOT be servable with SageMaker TensorFlow Serving container`, even though the job was successful and the model artifacts have been uploaded to S3.
* Currently, the Dockerfile copies the training script (along with the entire working directory) into /opt/ml/code/ within the container. This works, but probably isn't good practice.
* The training example uses a generic SageMaker `Estimator` instead of an `RLEstimator`. This is due to the fact that `RLEstimator` is tied to older versions of TensorFlow / Ray RLlib. See: https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/rl
  * Moving to an RLEstimator should be relatively easy once the updated SageMaker RL container is released.
* Model artifacts and training results / checkpoints are currently included in the same model.tar.gz output file
* .. plus probably some issues I've missed
---

The following steps should be executed on a SageMaker Notebook. Your Notebook's associated IAM role should be adjusted to allow access to S3 and ECR.

---

<br/>

## Create a custom TF 2.1 / Ray RLlib 0.8.2 container for training

First, let's use git to pull down the latest version of the Battlesnake gym into our working directory and symlink the gym source directory as 'battlesnake_gym'.

In [None]:
!git clone https://github.com/awslabs/sagemaker-battlesnake-ai.git

In [None]:
!ln -s ./sagemaker-battlesnake-ai/TrainingEnvironment/battlesnake_gym/ battlesnake_gym

Login to the SageMaker container registry so we can pull down the existing SageMaker TF 2.1 image for our base

In [None]:
!$(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 763104351884)

(Create an ECR if you haven't done so already - this step has been omitted). Login to your personal container registry so you can push your custom image to it in a subsequent step

In [None]:
!aws ecr get-login-password --region us-west-2 | docker login \
--username AWS --password-stdin 599069043765.dkr.ecr.us-west-2.amazonaws.com/bs-rllib

Adjust your training script, rewards, etc. as required. Then build the custom container. You will need to repeat the following steps whenever you make any changes to the training script or Battlesnake gym.

In [None]:
!docker build -t bs-rllib .

Tag the new image and push it to your container registry

In [None]:
!docker tag bs-rllib:latest 599069043765.dkr.ecr.us-west-2.amazonaws.com/bs-rllib:latest
!docker push 599069043765.dkr.ecr.us-west-2.amazonaws.com/bs-rllib:latest

In [None]:
!docker image ls

<br/>

## Initiate a SageMaker training job

In [None]:
from sagemaker.estimator import Estimator
from sagemaker import get_execution_role

In [None]:
# replace with the link to your custom image
custom_image = '599069043765.dkr.ecr.us-west-2.amazonaws.com/bs-rllib:latest'

role = get_execution_role()
print(role)

# example hyperparameters.. many more are available - see train_cnn_ppo_tf.py for details
hyperparameters = { "num-iters": 10, # Increase this when you are actually building models
                    "num-agents": 5,
                    "lr": 5.0e-4,
                  }

estimator = Estimator(image_name=custom_image,
                      role=role,
                      train_instance_count=1,   # At present, only a single instance is supported
                      train_instance_type='ml.m5.12xlarge',  # adjust instance size as required
                      hyperparameters=hyperparameters
                     )

estimator.fit()

<br/>

After training, your model/checkpoints will be available in model.tar.gz in S3

<br/>

---

Comments? perrysc@amazon.com