# Vehicle Routing Problem

## Pre-requisites 

### Imports

To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

In [1]:
import sagemaker
import boto3
import sys
import os

from sagemaker.rl import RLEstimator

sys.path.append("common")
from misc import get_execution_role
import docker_utils


### Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata.

In [2]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-us-west-2-975155340573/


### Configure where training happens
You can train your RL training jobs using the SageMaker ML instance. You can choose the instance type from avaialable [ml instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)

In [3]:
local_mode = False
# If on SageMaker, pick the instance type
instance_type = "ml.m5.large"

### Create an IAM role
Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role.


In [4]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

Using IAM role arn: arn:aws:iam::975155340573:role/service-role/AmazonSageMaker-ExecutionRole-20190502T151333


### Define the Sagemaker-RL docker image for Ray

We will use the `TensorFlow Ray` for this project. For a list of available RL images see https://github.com/aws/sagemaker-rl-container 


In [5]:
cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'
aws_region = boto3.Session().region_name
repository_name = "rl-baseline-repo"
base_image_name = "520713654638.dkr.ecr.{}.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-{}-py3".format(aws_region, cpu_or_gpu)
# docker_utils._ecr_login_if_needed(base_image_name)
custom_image_name = docker_utils.build_and_push_docker_image(repository_name, dockerfile='common/Dockerfile', build_args={"BASE_IMAGE":base_image_name})
print("Using ECR image %s" % custom_image_name)

Building docker image rl-baseline-repo from common/Dockerfile
$ docker build -t rl-baseline-repo -f common/Dockerfile . --build-arg BASE_IMAGE=520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-cpu-py3
Sending build context to Docker daemon  161.3kB
Step 1/3 : ARG BASE_IMAGE=520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-gpu-py3
Step 2/3 : FROM $BASE_IMAGE
 ---> 2cb0fbd42e4f
Step 3/3 : RUN pip install --upgrade pip && pip install --no-cache-dir xpress networkx
 ---> Using cache
 ---> bd97df84c9ea
Successfully built bd97df84c9ea
Successfully tagged rl-baseline-repo:latest
Done building docker image rl-baseline-repo
ECR repository already exists: rl-baseline-repo
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Logged into ECR
$ docker tag rl-baseline-repo 975155340573.dkr.ecr.us-west-2.amazonaws.com/rl-baseline-repo
Pushing docker image to ECR repository 975155340573.dkr.ecr.us-west-2

## Run the Mixed Integer Programming (MIP) Baseline

### Define Metric
A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs.

In [14]:
baseline_metric_definitions = [{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'}]

### Define Estimator
This Estimator executes the baseline generation script in a managed execution environment within a SageMaker Training Job. The managed environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script.

In [15]:
# create a descriptive job name
baseline_job_name_prefix = "baselinevrp"

train_entry_point = "train_baseline_mip.py"
train_job_max_duration_in_seconds = 3600 * 24 * 2

baseline_estimator = RLEstimator(entry_point= train_entry_point,
                        source_dir="src",
                        dependencies=["common/sagemaker_rl"],
                        image_name=custom_image_name,
                        role=role,
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=baseline_job_name_prefix,
                        metric_definitions=baseline_metric_definitions,
                        train_max_run=train_job_max_duration_in_seconds,
                        hyperparameters={}
                       )

In [16]:
baseline_estimator.fit(wait=local_mode)
baseline_job_name = baseline_estimator.latest_training_job.job_name
print("Training job: %s" % baseline_job_name)

Training job: baselinevrp-2019-10-22-08-38-50-645


## Run the RL model

The [RLEstimator](https://sagemaker.readthedocs.io/en/stable/sagemaker.rl.html) is used for training RL jobs. 

1. Specify the source directory where the gym environment and training code is uploaded.
2. Specify the entry point as the training code 
3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. 
4. Define the training parameters such as the instance count, job name, S3 path for output and job name. 
5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use. 
6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. 

### Define Metric
A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs.

In [9]:
rl_metric_definitions = [{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'entropy',
  'Regex': 'entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'vf_loss',
  'Regex': 'vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'policy_loss',
  'Regex': 'policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},                                            
]

### Define Estimator
This Estimator executes an RLEstimator script in a managed Reinforcement Learning (RL) execution environment within a SageMaker Training Job. The managed RL environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script.

In [10]:
# create a descriptive job name
job_name_prefix = 'vrp-n2map55od10pnty50'

train_entry_point = "train_vehicle_routing_problem_ppo.py"
train_job_max_duration_in_seconds = 3600 * 24 * 2

estimator = RLEstimator(entry_point= train_entry_point,
                        source_dir="src",
                        dependencies=["common/sagemaker_rl"],
                        image_name=custom_image_name,
                        role=role,
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        metric_definitions=rl_metric_definitions,
                        train_max_run=train_job_max_duration_in_seconds,
                        hyperparameters={}
                       )

In [11]:
estimator.fit(wait=local_mode)
rl_job_name = estimator.latest_training_job.job_name
print("Training job: %s" % rl_job_name)

Training job: vrp-n2map55od10pnty50-2019-10-22-08-36-48-964


### Visualization

RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that.

In [12]:
%matplotlib inline
from sagemaker.analytics import TrainingJobAnalytics

def visualize(job_name):
    if not local_mode:
        df = TrainingJobAnalytics(job_name, ['episode_reward_mean']).dataframe()
        df_min = TrainingJobAnalytics(job_name, ['episode_reward_min']).dataframe()
        df_max = TrainingJobAnalytics(job_name, ['episode_reward_max']).dataframe()
        df['rl_reward_mean'] = df['value']
        df['rl_reward_min'] = df_min['value']
        df['rl_reward_max'] = df_max['value']
        num_metrics = len(df)
    
        if num_metrics == 0:
            print("No algorithm metrics found in CloudWatch")
        else:
            plt = df.plot(x='timestamp', y=['rl_reward_mean'], figsize=(18,6), fontsize=18, legend=True, style='-', color=['b','r','g'])
            plt.fill_between(df.timestamp, df.rl_reward_min, df.rl_reward_max, color='b', alpha=0.2)
            plt.set_ylabel('Mean reward per episode', fontsize=20)
            plt.set_xlabel('Training time (s)', fontsize=20)
            plt.legend(loc=4, prop={'size': 20})
    else:
        print("Can't plot metrics in local mode.")
    


In [None]:
visualize(job_name=rl_job_name) # If this errors, it is possible that training is not yet started and no metrics are published

In [None]:
visualize(job_name=baseline_job_name) # If this errors, it is possible that training is not yet started and no metrics are published