# Amazon SageMaker Notebook for ProcGen Starter Kit with Single Instance

In [None]:
import os
import yaml

import sagemaker
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework
import boto3

In [None]:
with open(os.path.join("config", "sagemaker_config.yaml")) as f:
    sagemaker_config = yaml.safe_load(f)

## Initialize Amazon SageMaker

In [None]:
sm_session = sagemaker.session.Session()
s3_bucket = sagemaker_config["S3_BUCKET"]

s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

In [None]:
job_name_prefix = 'sm-ray-procgen'

role = sagemaker.get_execution_role()
print(role)

### Configure training instance type and computational resources

By default (`local_mode=False`) launch a separate instance for training and debug using the AWS CloudWatch to monitor the logs for the training instance. 
If you want to train on the same instance as your notebook for quick debugging, then set `local_mode=True`. 

The recommended instances include with cost per hour as of September, 1, 2020 are:
* `ml.c5.4xlarge` $0.952 per hour (16 vCPU)

* `ml.g4dn.4xlarge` $1.686 per hour (1 GPU, 16 vCPU)

* `ml.p3.2xlarge` $4.284 per hour (1 GPU, 8 vCPU)

After you choose your instance type, make sure the edit the resources in `source\train-sagemaker.py`. For example, with `ml.p3.2xlarge`, you have 1 GPU and 8 vCPUs. The corresponding resources in `source\train-sagemaker.py` should be set as for `ray` as `

```
    def _get_ray_config(self):
        return {
            "ray_num_cpus": 8, # adjust based on selected instance type
            "ray_num_gpus": 1,
            "eager": False,
             "v": True, # requried for CW to catch the progress
        }
``` 
and for `rrlib` need to use 1 vCPU for driver ("num_workers": 7) and 1 GPU ("num_gpus": 1) for policy training.

In [None]:
# Change local_mode to True if you want to do local training within this Notebook instance
# Otherwise, we'll spin-up a SageMaker training instance to handle the training

local_mode = False

if local_mode:
    instance_type = 'local'
else:
    instance_type = sagemaker_config["CPU_TRAINING_INSTANCE"]
    
# If training locally, do some Docker housekeeping..
if local_mode:
    !/bin/bash source/common/setup.sh

# Configure the framework you want to use

Set `framework` to `"tf"` or `"torch"` for tensorflow or pytorch respectively.

You will also have to edit your entry point i.e., `train-sagemaker.py` with the configuration parameter `"use_pytorch"` to match the framework that you have selected.

In [None]:
framework = "tf"

# Train your model here

### Edit the training code

The training code is written in the file `train-sagemaker.py` which is uploaded in the /source directory.

#### *Warning: Confirm that the GPU and CPU resources are configured correctly for your instance type as described above.*

In [None]:
!pygmentize source/train-sagemaker.py

### Train the RL model using the Python SDK Script mode

If you are using local mode, the training will run on the notebook instance. 

When using SageMaker for training, you can select a GPU or CPU instance. The RLEstimator is used for training RL jobs.

1. Specify the source directory where the environment, presets and training code is uploaded.
2. Specify the entry point as the training code
3. Specify the custom image to be used for the training environment.
4. Define the training parameters such as the instance count, job name, S3 path for output and job name.
5. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks.

*[Choose](https://github.com/aws/sagemaker-rl-container#rl-images-provided-by-sagemaker) which docker image to use based on the instance type.* 
For this notebook, it has to be a container with Ray 0.8.5 and TensorFlow 2.1.0 to be consistent with the AICrowd ProcGen starter kit. 

If you prefer to use PyTorch, it is recommended to update your notebook kernel to `conda_pytorch_p36`. You would need to substitute for the corresponding container listed on Amazon SageMaker Reinforcement Learning documentation. In addition, you will need to ensure your starter kit is modified to train using PyTorch.

In [None]:
cpu_or_gpu = 'gpu' if instance_type.startswith(('ml.p', 'ml.g')) else 'cpu'
aws_region = boto3.Session().region_name

# Use Tensorflow 2 by default
custom_image_name = "462105765813.dkr.ecr.{}.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.5-{}-{}-py36".format(aws_region, framework, cpu_or_gpu)
custom_image_name

You need to define metrics to be displayed in the logs. The challenge has requirements on the number of steps and uses mean episode reward to rank various solutions. For details, refer to the AICrowd challange website.

In [None]:
metric_definitions =  [
    {'Name': 'training_iteration', 'Regex': 'training_iteration: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'}, 
    {'Name': 'episodes_total', 'Regex': 'episodes_total: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'}, 
    {'Name': 'num_steps_trained', 'Regex': 'num_steps_trained: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'}, 
    {'Name': 'timesteps_total', 'Regex': 'timesteps_total: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'},
    {'Name': 'training_iteration', 'Regex': 'training_iteration: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'},

    {'Name': 'episode_reward_max', 'Regex': 'episode_reward_max: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'}, 
    {'Name': 'episode_reward_mean', 'Regex': 'episode_reward_mean: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'}, 
    {'Name': 'episode_reward_min', 'Regex': 'episode_reward_min: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)'},
] 

### Run the RL estimator

There are 16 environments to choose from. You can run the RL estimator on multiple environments by proving a list of environments as well. The RL estimator will start the training job. This will take longer compared to the above cells, be patient. You can monitor the status of your training job from the console as well, go to Amazon SageMaker > Training jobs. The most recent job will be at the top.

In [None]:
# Select which procgen environments to run in `envs_to_run`
'''
envs_to_run = ["coinrun", "bigfish", "bossfight", "caveflyer",
               "chaser", "climber", "coinrun", "dodgeball",
               "fruitbot", "heist", "jumper", "leaper", "maze",
               "miner", "ninja", "plunder", "starpilot"]
'''

envs_to_run = ["coinrun", "bigfish", "bossfight"]

In [None]:
for env in envs_to_run:
    estimator = RLEstimator(entry_point="train-sagemaker.py",
                            source_dir='source',
                            dependencies=["source/utils", "source/common/", "neurips2020-procgen-starter-kit/"],
                            image_uri=custom_image_name,
                            role=role,
                            instance_type=instance_type,
                            instance_count=1,
                            output_path=s3_output_path,
                            base_job_name=job_name_prefix + "-" + env,
                            metric_definitions=metric_definitions,
                            debugger_hook_config=False,
                            hyperparameters={
                                #"rl.training.upload_dir": s3_output_path,
                                "rl.training.config.env_config.env_name": env,
                            }
                        )

    estimator.fit(wait=False)
    
    print(estimator.latest_training_job.job_name)

#### WAAAITTTTT... not more than 2 hours 

## Evaluate the model

### Visualize algorithm metrics for training

There are several options to visualize algorithm metrics. A detailed blog can be found [here](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/).


Option 1 (Amazon CloudWatch): You can go to the [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) metrics dashboard from your account to monitor and visualize the algorithm metrics as well as track the GPU and CPU usage. The training jobs details page has a direct link to the Amazon CloudWatch metrics dashboard for the metrics emitted by the training algorithm.

Option 2 (Amazon SageMaker Python SDK API): You can also visualize the metrics inline in your Amazon SageMaker Jupyter notebooks using the Amazon SageMaker Python SDK APIs. Please, refer to the section titled *Visualize algorithm metrics for training* in `train.ipynb`.

Option 3 (Tensorboard): You can also use Ray Tune's integrated Tensorboard by specifying the output directory of your results. It is recommended to set `upload_dir` to a Amazon S3 URI and Tune will automatically sync every 5 miniutes. You can thus visualize your experiment by running the following command on your local laptop:

`
$AWS_REGION=your-aws-region tensorboard --logdir s3://destination_s3_path --host localhost --port 6006
`

Check out `train-homo-distributed-cpu.ipynb` for an example of setting `upload_dir`.

#### Option 2: Plot metrics using Amazon SageMaker Python SDK API

You need to wait for the training job to allocate computational resources before viewing the logs. 

*Note: If you get a warning that the logs do not exist, wait for a few minutes and re-run the cell.*

*Note 2: If you are getting an import error from Tensorflow, open a terminal and type `source activate tensorflow2_p36`*

In [None]:
# For usage, refer to https://sagemaker.readthedocs.io/en/stable/api/training/analytics.html#
from sagemaker.analytics import TrainingJobAnalytics
import matplotlib.pyplot as plt
%matplotlib inline

from source.utils.inference import get_latest_sagemaker_training_job

# Get last training job_names
eval_training_jobs = [get_latest_sagemaker_training_job(name_contains="{}-{}".format(
    job_name_prefix, env)) for env in envs_to_run]

for training_job_name, env in zip(eval_training_jobs, envs_to_run):
    metric_names = ['episode_reward_mean', 'timesteps_total']

    # download the metrics on cloudwatch
    metrics_dataframe = TrainingJobAnalytics(training_job_name=training_job_name, metric_names=metric_names).dataframe()

    # pivot to get the metrics
    metrics_dataframe= metrics_dataframe.pivot(index='timestamp', columns='metric_name', values='value')
    
    fig = plt.figure()
    ax = metrics_dataframe.plot(kind='line', figsize=(12, 5), x='timesteps_total', y='episode_reward_mean', style='b.', legend=False)
    ax.set_ylabel('Episode Reward Mean')
    ax.set_xlabel('Timesteps')
    ax.set_title(env)

## Rollout the model

### Note that the following evaluation requries that at least one training job has completed.

In [None]:
import numpy as np
import gym
import matplotlib.pyplot as plt
from IPython import display

import ray
from ray.tune.registry import get_trainable_cls
from ray.rllib.models import ModelCatalog

from source.custom.envs.procgen_env_wrapper import ProcgenEnvWrapper
from source.custom.models.my_vision_network import MyVisionNetwork
from source.utils.inference import get_model_config, get_latest_sagemaker_training_job
from source.utils.inference import download_ray_checkpoint
from source.utils.inference import rollout

ray.init()

In [None]:
run = "PPO"
rollout_env = "coinrun"
num_steps = 1000

In [None]:
# You can choose to use the lastest training job or
# input the name of your previously trained job
latest_training_job = get_latest_sagemaker_training_job(name_contains="{}-{}".format(
    job_name_prefix, rollout_env))
# latest_training_job = <name of your training job>
print("Rolling out training job {}".format(latest_training_job))

checkpoint_dir = "checkpoint"
if not os.path.isdir(checkpoint_dir):
    os.mkdir(checkpoint_dir)
last_checkpoint_num = download_ray_checkpoint(checkpoint_dir, s3_bucket, latest_training_job)

# Print the parameters in the model
!cat $checkpoint_dir/params.json

You must register all agents, algorithms, models, and preprocessors that you defined in the entry-point.
For example, model could be registed like `ModelCatalog.register_custom_model("my_vision_network", MyVisionNetwork)`, custom agents could be registered with `ray.registry.register_trainable`, etc.

In [None]:
cls = get_trainable_cls(run)
ModelCatalog.register_custom_model("my_vision_network", MyVisionNetwork)
config = get_model_config()
agent = cls(config=config)
checkpoint = os.path.join("checkpoint", "checkpoint-{}".format(last_checkpoint_num))
agent.restore(checkpoint)

In [None]:
rgb_array = rollout(agent, "procgen:procgen-{}-v0".format(rollout_env),
                    num_steps, no_render=False)
img = plt.imshow(rgb_array[0])
plt.axis('off')
for arr in rgb_array[1:]:
    img.set_data(arr)
    display.display(plt.gcf())
    display.clear_output(wait=True)

In [None]:
ray.shutdown()