<h1>Amazon Sagemaker Reinforcement Learning demo</h1>
<h2 style="margin-top: 0.1em; font-weight: normal;"><em>Reinforcement Learning Algorithm Launcher</em></h2>

Amazon AWS Sagemaker provides two manners to execute the RL algorithms:
- **local** - the training is executed on the same instance on which the notebook is running (e.g. ml.t2.medium) which offer good performances for prototyping/initial tests.
- **dedicated instance** - training is executed as a job on a dedicated *ml* instance (e.g. ml.m5.large - more infomation can be read at https://aws.amazon.com/sagemaker/pricing/instance-types/). These instances are suited for production environment, so that they should be employed only when the algorithm is able to comple a simple training task. In addition, remember to take into account that only certain instances provide GPU support.

This notebook offers the code to run your own RL algorithm either locally or on a dedicated instance.

In [42]:
import sagemaker
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

### Configuration

In [None]:
# select in which AWS S3 bucket store
# the trained model (e.g. 'fruitpunch-sagemaker-test')
S3_BUCKET = <bucket-name>

# Amazon AWS account ID
ACCOUNT_ID = <012345678901>

# select in which region the algorithm container
# is be stored (e.g. 'eu-west-1')
REGION = 'eu-west-1'

# name of the repository on AWS ECR service where
# the image is stored (e.g. fruitpunchai/tf-mlagents)
REPOSITORY_NAME = 'fruitpunchai/tf-mlagents'

# Optional: the username of the account employed to run
# locally the training job (associated to the arn)
USERNAME = <fruitpunch>

# select the type of execution
LOCAL = True

The variable `LOCAL` above selects whether the training should be done locally or on a dedicated instance. In the former case, a specific role has to be provided, which corresponds to the *arn* of a user owning the permissions for accessing both Sagemaker and S3 services. The *arn* can be retrieved from the [IAM management console](https://console.aws.amazon.com/iam/home#/users), selecting the desired user from the list.  
In the latter case, the role can be determined automatically by Sagemaker library.

In [54]:
S3_output_path = 's3://{}/'.format(S3_BUCKET)
image_name = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(ACCOUNT_ID, REGION, REPOSITORY_NAME)

if LOCAL:
    train_type = 'local'
    role = 'arn:aws:iam::{}:user/{}'.format(ACCOUNT_ID, USERNAME)
else:
    train_type = 'ml.m5.large' # different train instances can be chosen, depending on the budget
    role = sagemaker.get_execution_role()

### Training script hyperparameters
If implemented by the entry point script, it is possible to pass different hyperparameters to each training job. It is sufficient to update the hyperparameters dictionary:

In [None]:
hp_dict = {
    "hyper-param-example": "This is an example of hyperparameter",
    "maximum-limit-of-everything": 7
}

### Creating and launching the algorithm
The RLEstimator class provide all the necessary details to run the training job. Below are reported the main parameters that have to be configured:
- `entry_point`: the path of the file that contains the training algorithm
- `source_dir`: the base directory from which the entry point file is searched
- `image_name`: the name of the custom TensorFlow image tailored to run the training algorithm
- `role`: which AWS role is going to be associated to the training job (the role should have permissions granted to both *Sagemaker* and *S3* services)
- `train_instance_count`: the number of instances to spawn for running the training job
- `train_instance_type`: select whether to run locally or on a specified dedicated instance
- `train_max_run`: the maximum number of seconds that the training job is allowed to run (default: 1 day)
- `base_job_name`: the prefix name of the training job (it has to satisfy the following regular expression: `^[a-zA-Z0-9](-*[a-zA-Z0-9])*`)
- `hyperparameters` a dictionary containing the hyperparameters that will be used for training the agents
- `output_path`: the url of the S3 bucket where the content of container `/opt/ml/model` and `/opt/ml/output` directory will be saved

More information regarding parameters configuration can be found on AWS Sagemaker Python SDK documentation, where [RLEstimator](https://sagemaker.readthedocs.io/en/stable/sagemaker.rl.html) and the more general [Estimator](https://sagemaker.readthedocs.io/en/stable/estimators.html) class are detailed. However, for most use cases, these parameters should be enough for running the RL training jobs.

In [None]:
# create the training job
estimator = RLEstimator(entry_point='<trainer_file>.py',
                        source_dir='<src-directory>',
                        image_name=image_name,
                        role=role,
                        train_instance_count=1,
                        train_instance_type=train_type,
                        train_max_run=86400,
                        base_job_name='<job_name>',
                        hyperparameters=hp_dict,
                        output_path=S3_output_path)

In [None]:
# launch the training job on selected instance
# Note: it might take a while before it actually starts,
# since it has to first download the custom Docker image
estimator.fit()

During the execution of the training job, the `fit` function will provide a brief report of job history.
In addition, the *AWS Sagemaker* console and [*AWS CloudwWatch Management console*](https://console.aws.amazon.com/cloudwatch/home) offer more detailed tools for monitoring training job history and its logs.