# Introduction

This notebook outlines the steps involved in building and deploying a Battlesnake model using Ray RLlib and TensorFlow on Amazon SageMaker.

Library versions currently in use:  TensorFlow 2.1, Ray RLlib 0.8.2

The model is first trained using multi-agent PPO, and then deployed to a managed _TensorFlow Serving_ SageMaker endpoint that can be used for inference.

<br/>

**Note:** This is a work-in-progress...

### Comments and Known Issues

* `cnn_tf.py` currently contains default CNN filters for map sizes ranging 7x7 to 21x21. These default filter sizes can be overriden via the 'conv_filters' config parameter.
* The current MultiAgentBattlesnake environment uses 2 frames for each observation. If you only want to use one frame, you'll need to adjust the observation code in `ma_battlesnake.py` accordingly
* The original TF model export code in `ray_launcher.py` did not work for TF2.1.
  * I switched over to RLlib's export_model() method, which seems to be working here
* I have not yet tested RLlib's built-in `{'use_lstm': True}` model parameter, which wraps the CNN in an LSTM. This was working for local training/inference but has not been tested with the SageMaker inference endpoint, yet
* Regardless of the number of snakes in the gym, or which policy is 'best', only policy_0 is currently exported as a TF model. Refer to `common/sagemaker_rl/tf_serving_utils.py` and see the comment in the inference section, below
* The Ray dashboard fails to start (errors during training) but does not abort the training job
* There are many warnings during training - most appear to be benign, but are annoying
* Both local-mode and SageMaker-based training and inference have been tested, and appear to be working
    * local-mode inference might generate some warnings, but seems to work regardless
* GPU training has been tested
* GPU inference has not been tested
* Single-instance training has been tested. Distributed multi-instance RLlib training has not yet been tested.
* Although the hosted model is able to provide predictions, I haven't yet verified that the predictions are correct or useful.
* The default hyperparameters are unlikely to generate an impressive model. Modify the hyperparameters and rewards if you are hoping to see something cool.

In [2]:
import sagemaker
from sagemaker.rl import RLEstimator, RLToolkit
import boto3

In [3]:
sm_session = sagemaker.session.Session()
s3_bucket = sm_session.default_bucket()

s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-us-west-2-412868550678/


In [4]:
job_name_prefix = 'battlesnake-rllib-ppo'

role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::412868550678:role/BattlesnakeEnvironment-jo-NotebookInstanceExecutio-1ESEZD1FEJJ5V


In [5]:
# Change local_mode to True if you want to do local training within this Notebook instance
# Otherwise, we'll spin-up a SageMaker training instance to handle the training

local_mode = False

if local_mode:
    instance_type = 'local'
else:
    instance_type = "ml.m5.4xlarge"
    
# If training locally, do some Docker housekeeping..
if local_mode:
    !/bin/bash ./common/setup.sh

In [6]:
# Specify the new TF v2.1 / Ray RLlib 0.8.2 container
#    Adjust 'cpu' or 'gpu' in the image name, as required
image_name = '462105765813.dkr.ecr.us-west-2.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.2-tf-cpu-py36'

In [7]:
%%time

# Define and execute our training job
# Adjust hyperparameters and train_instance_count accordingly

metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)
    
estimator = RLEstimator(entry_point="train-mabs.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl", "common/battlesnake_gym", "checkpoints"],
                        image_name=image_name,
                        role=role,
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        metric_definitions=metric_definitions,
                        hyperparameters={
                            # See train-mabs.py to add additional hyperparameters
                            # Also see ray_launcher.py for the rl.training.* hyperparameters
                            #
                            # number of training iterations
                            "num_iters": 10,
                            # number of snakes in the gym
                            "num_agents": 5,
                            # dimension of the gym. changing this could require changes to CNN kernels
                            # in cnn_ft.py
                            "map_height": 21,
                        }
                    )

estimator.fit()

job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

2020-03-30 23:14:59 Starting - Starting the training job...
2020-03-30 23:15:02 Starting - Launching requested ML instances......
2020-03-30 23:16:05 Starting - Preparing the instances for training...
2020-03-30 23:16:45 Downloading - Downloading input data
2020-03-30 23:16:45 Training - Downloading the training image......
2020-03-30 23:17:45 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-03-30 23:17:48,709 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-03-30 23:17:48,716 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-03-30 23:17:48,854 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-03-30 23:17:48,869 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m

In [8]:
# Where is the model stored in S3?
estimator.model_data

's3://sagemaker-us-west-2-412868550678/battlesnake-rllib-ppo-2020-03-30-23-14-59-441/output/model.tar.gz'

In [15]:
#model_data = "s3://sagemaker-us-west-2-412868550678/battlesnake-rllib-ppo-2020-03-30-20-06-31-079/output/model.tar.gz"
from sagemaker.tensorflow.serving import Model

model = Model(model_data=estimator.model_data,
              role=role,
              entry_point="inference.py",
              source_dir='inference',
              framework_version='2.1.0',
             )

if local_mode:
    inf_instance_type = 'local'
else:
    inf_instance_type = "ml.t2.medium"

# Deploy an inference endpoint
predictor = model.deploy(initial_instance_count=1, instance_type=inf_instance_type)

-------------!

In [16]:
# Spoof an observation from a Battlesnake environment, and get the predicted action from the model
#
# This example is using single observation for a 5-agent environment with an 11x11 map
# The last axis is 12 because the current MultiAgentEnv is concatenating 2 frames
#   5 agent maps + 1 food map = 6 maps total    6 maps * 2 frames = 12
#
# Note: this prediction is for the first policy in the environment "policy_0"
#   We need to fix this to export the 'best' policy, all policies, etc.
#   Also - the agent's policy # and position within the observation *does* currently matter.
#   For example, if we export policy_4 for inference, we need to ensure that the agent's current
#   snake representation (during inference) is located within index 4 of the observations (food is index 0)

import numpy as np
from time import time

health_dict = {0: 50, 1: 50}
json = {"turn": 4,
        "board": {
                "height": 15,
                "width": 15,
                "food": [],
                "snakes": []
                },
            "you": {
                "id": "snake-id-string",
                "name": "Sneky Snek",
                "health": 90,
                "body": [{"x": 1, "y": 3}]
                }
            }

fake_obs = np.zeros(shape=(1,11,11,12), dtype=np.float32).tolist()

test_data = {"inputs": { 'observations': fake_obs,
                        'prev_action': -1,
                        'is_training': False,
                        'prev_reward': -1,
                        'seq_lens': -1
                       },
             "all_health": health_dict,
             "json": json
            }
before = time()
result = predictor.predict(test_data)
elapsed = time() - before

print("Raw inference results:")
for key in sorted(result['outputs'].keys()):
    print("  ", key, ": ", result['outputs'][key])

print()
print("Our model predicts that the next action to take is: action", result['outputs']['actions'][0])
print()
print("Inference took %.2f ms" % (elapsed*1000))

Raw inference results:
   action_logp :  [0.0]
   action_prob :  [1.0]
   actions :  [0]
   behaviour_logits :  [[0.0550115407, -0.031956654, -0.0453675911, 0.0185272116]]
   heuristisc_action :  0
   vf_preds :  [-0.753036618]

Our model predicts that the next action to take is: action 0

Inference took 706.75 ms


In [None]:
# Uncomment and run to delete the endpoint
# predictor.delete_endpoint()