# Session servers autopilot
This notebook demonstrate a production grade solution for training a model that predicts number of game-servers to be allocated based on demand form external systems like matchmaking or past observations. 
The training job requires an API denoted by `gs_inventory_url` that generates the required number of game-servers with the following JSON format:
```
['Prediction']['num_of_gameservers']
```
It also requires a DynamoDB table called `observations` to stores the last observation for inference requests to the to be created model endpoint.

The training results are published to CloudWatch metrics. The namespace to be used is denoted by `cloudwatch_namespace`

## Pre-requisites 

### Imports

To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

In [47]:
import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
import numpy as np
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import get_execution_role, wait_for_s3_object
from docker_utils import build_and_push_docker_image
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

### Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. 

In [48]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-us-west-2-356566070122/


### Parameters

Adding new parameters for the job require update in the training section that invokes the RLEstimator.

In [49]:
job_name_prefix = 'rl-gs-training'
job_duration_in_seconds = 60 * 60 * 24 * 5
train_instance_count = 1
cloudwatch_namespace = 'rl-gs-training'
gs_inventory_url = 'https://4bfiebw6ui.execute-api.us-west-2.amazonaws.com/api/currsine1h/'
learning_freq = 65
min_servers=10
max_servers=100

In [50]:

# Pick the instance type
instance_type = "ml.c5.xlarge" #4 cpus
#     instance_type = "ml.c5.4xlarge" #16 cpus
#      instance_type = "ml.c5.2xlarge" #8 cpus
#      instance_type = "ml.c4.4xlarge"
#     instance_type = "ml.p2.8xlarge" #32 cpus
#     instance_type = "ml.p3.2xlarge" #8 cpus
#    instance_type = "ml.p3.8xlarge" #32 cpus
#     instance_type = "ml.p3.16xlarge" #96 cpus
#     instance_type = "ml.c5.18xlarge" #72 cpus

num_cpus_per_instance = 4

### Create an IAM role

Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role. In this example, the env thru the training job, publishes cloudwatch custom metrics as well as put values in DynamoDB table. Therefore, an appropriate role is required to be set to the role arn below.

In [51]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

Using IAM role arn: arn:aws:iam::356566070122:role/service-role/AmazonSageMaker-ExecutionRole-20181024T210472


## Train the RL model using the Python SDK Script mode

The RLEstimator is used for training RL jobs. 

1. The entry_point value indicates the script that invokes the GameServer RL environment.
2. source_dir indicates the location of environment code which currently includes train-gameserver-ppo.py and game_server_env.py. 
3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. 
4. Define the training parameters such as the instance count, job name, S3 path for output and job name. 
5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use. 
6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. 

In [52]:
metric_definitions = [{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_len_mean',
  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'entropy',
  'Regex': 'entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'vf_loss',
  'Regex': 'vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'policy_loss',
  'Regex': 'policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},                                            
]

metric_definitions

[{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_len_mean',
  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'entropy',
  'Regex': 'entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'vf_loss',
  'Regex': 'vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'policy_loss',
  'Regex': 'policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'}]

In [53]:
%%time
#metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)
    
estimator = RLEstimator(
                        entry_point="train_gameserver_ppo.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        toolkit=RLToolkit.RAY,
                        toolkit_version='0.6.5',
                        framework=RLFramework.TENSORFLOW,
                        role=role,
                        train_instance_type=instance_type,
                        train_instance_count=train_instance_count,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        metric_definitions=metric_definitions,
                        train_max_run=job_duration_in_seconds,
                        hyperparameters={
                          "cloudwatch_namespace":cloudwatch_namespace,
                          "gs_inventory_url":gs_inventory_url,
                          "learning_freq":learning_freq,
                          "time_total_s":job_duration_in_seconds,
                          "min_servers":min_servers,
                          "max_servers":max_servers,
                          "save_model": 1
                        }
                    )

estimator.fit(wait=False)
job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

Training job: rl-gs-training-norm-demand-2019-10-17-22-17-04-740
CPU times: user 113 ms, sys: 0 ns, total: 113 ms
Wall time: 344 ms


In [14]:
import sagemaker
sagemaker.__version__

'1.33.0'

# Evaluation of RL models

## Load checkpointed model

#### need to learn how to download the model checkpoint and pass it to the evaluate script

In [99]:
%%time
job_name = "5obs-local-sine-2019-08-18-21-13-45-314"
print("job_name: %s" % job_name)
estimator_eval = RLEstimator(entry_point="evaluate_gameserver_ppo.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        role=role,
                        toolkit=RLToolkit.RAY,
                        toolkit_version='0.6.5',
                        framework=RLFramework.TENSORFLOW,
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        base_job_name=job_name_prefix + "-evaluation",
                        hyperparameters={
                          "cloudwatch_namespace":cloudwatch_namespace,
                          "gs_inventory_url":gs_inventory_url,
                          "learning_freq":learning_freq,
                          "time_total_s":job_duration_in_seconds,
                          "min_servers":min_servers,
                          "max_servers":max_servers,
                          "save_model": 1,
                          "job_name":job_name,
                          "s3_bucket":s3_bucket
                        }     
                    )
estimator_eval.fit({'model': checkpoint_path})
job_name = estimator_eval.latest_training_job.job_name
print("Evaluation job: %s" % job_name)

job_name: 5obs-local-sine-2019-08-18-21-13-45-314
[31min __init__[0m
[31menv_config[0m
[31m{'cloudwatch_namespace': '5obs-local-sine', 'gs_inventory_url': 'https://4bfiebw6ui.execute-api.us-west-2.amazonaws.com/api/currsine1h/', 'learning_freq': '5', 'max_servers': '100', 'min_servers': '10', 'save_model': '1', 'time_total_s': '32400'}[0m
[31mself.curr_demand=63.138143498979936[0m
[31mcalculate the reward, calculate the ratio between allocation and demand, curr_alloc/curr_demand[0m
[31minterm ratio=1.0151651067081289[0m
[31mover provision - ratio>1 - -0.9574966835151812[0m
[31mhttps://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#adapting-your-local-tensorflow-script[0m
[31m2019-08-19 06:24:38,987 sagemaker-containers INFO     Reporting training SUCCESS[0m

2019-08-19 06:24:43 Uploading - Uploading generated training model
2019-08-19 06:24:43 Completed - Training job completed
Billable seconds: 60
Evaluation job: 5obs-local-sine-evaluation-2

# Model deployment

Now let us deploy the RL policy so that we can get the optimal action, given an environment observation.
In case the notebook restarted and lost its previous estimator object, populate the estimator.model_data with the full s3 link to the model.tar.gz. e.g., s3://sagemaker-us-west-2-356566070122/rl-gameserver-autopilot-2019-07-19-19-36-32-926/output/model.tar.gz

In [18]:
from sagemaker.tensorflow.serving import Model
print ("model name: %s" % estimator.model_data)
model_data='s3://sagemaker-us-west-2-356566070122/rl-gs-training-2019-09-23-15-41-40-260/output/model.tar.gz'
model = Model(model_data=model_data,
              role=role)

predictor = model.deploy(initial_instance_count=1, instance_type=instance_type)

model name: s3://sagemaker-us-west-2-356566070122/rl-gs-training-2019-09-23-15-41-40-260/output/model.tar.gz
-------------------------------------------------------------------------!