# Simulations of Solar Car Racing with Amazon SageMaker

---
## Introduction

Solar car racing refers to competitive races of electric vehicles which are powered by solar energy obtained from solar panels on the surface of the car. The first solar car race was the Tour de Sol in 1985 which led to several similar races in Europe, US and Australia. Such challenges are often entered by universities to develop their students' engineering and technological skills, but many business corporations have entered competitions in the past. A small number of high school teams participate in solar car races designed exclusively for high school students. https://en.wikipedia.org/wiki/Solar_car_racing

This is built upon rl_roboschool_ray notebook https://github.com/aws/amazon-sagemaker-examples/blob/master/reinforcement_learning/rl_roboschool_ray/rl_roboschool_ray.ipynb

This approach is tested with PPO.

In [1]:
project_name = 'solarcar'

## Pre-requisites 

### Imports

To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

In [2]:
import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
import numpy as np
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import get_execution_role, wait_for_s3_object
from docker_utils import build_and_push_docker_image
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

### Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. 

In [3]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-ap-southeast-2-019676274883/


### Define Variables 

We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*

In [4]:
# create a descriptive job name 
job_name_prefix = 'rl-'+ project_name

### Configure where training happens

You can train your RL training jobs using the SageMaker notebook instance or local notebook instance. In both of these scenarios, you can run the following in either local or SageMaker modes. The local mode uses the SageMaker Python SDK to run your code in a local container before deploying to SageMaker. This can speed up iterative testing and debugging while using the same familiar Python SDK interface. You just need to set `local_mode = True`.

In [5]:
# run in local_mode on this machine, or as a SageMaker TrainingJob?
local_mode = False

if local_mode:
    instance_type = 'local'
else:
    # If on SageMaker, pick the instance type, ml.p3.2xlarge is authorized, please contact AWS 
    instance_type = "ml.p3.2xlarge"

### Create an IAM role

Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role.

In [6]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

Using IAM role arn: arn:aws:iam::019676274883:role/service-role/AmazonSageMaker-ExecutionRole-20201222T153520


### Install docker for `local` mode

In order to work in `local` mode, you need to have docker installed. When running from you local machine, please make sure that you have docker and docker-compose (for local CPU machines) and nvidia-docker (for local GPU machines) installed. Alternatively, when running from a SageMaker notebook instance, you can simply run the following script to install dependenceis.

Note, you can only run a single local notebook at one time.

In [7]:
# only run from SageMaker notebook instance
if local_mode:
    !/bin/bash ./common/setup.sh

## Build docker container

We must build a custom docker container with Roboschool installed.  This takes care of everything:

1. Fetching base container image
2. Installing Roboschool and its dependencies
3. Uploading the new container image to ECR

This step can take a long time if you are running on a machine with a slow internet connection.  If your notebook instance is in SageMaker or EC2 it should take 3-10 minutes depending on the instance type.


In [8]:
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

estimator = RLEstimator(entry_point="solarcar_train.py", # Our launcher code
                        source_dir='src', # Directory where the supporting files are at. All of this will be
                                          # copied into the container.
                       
                        dependencies=["common/sagemaker_rl"], # some other utils files.
                        toolkit=RLToolkit.RAY, # We want to run using the Ray toolkit against the ray container image.
                        framework=RLFramework.TENSORFLOW, # The code is in tensorflow backend.
                        toolkit_version='0.5.3', # Toolkit version. This will also choose an apporpriate tf version.                                               
                        #toolkit_version='0.6.5', # Toolkit version. This will also choose an apporpriate tf version.                        
                        role=role, # The IAM role that we created at the begining.
                        #train_instance_type="ml.m4.xlarge", # Since we want to run fast, lets run on GPUs.
                        train_instance_type="local", # Since we want to run fast, lets run on GPUs.
                        train_instance_count=1, # Single instance will also work, but running distributed makes things 
                                                # fast, particularly in the case of multiple rollout training.
                        output_path=s3_output_path, # The path where we can expect our trained model.
                        base_job_name=job_name_prefix, # This is the name we setup above to be to track our job.
                        hyperparameters = {      # Some hyperparameters for Ray toolkit to operate.
                          "s3_bucket": s3_bucket,
                          "rl.training.stop.training_iteration": 2, # Number of iterations.
                          "rl.training.checkpoint_freq": 2,
                        },
                        #metric_definitions=metric_definitions, # This will bring all the logs out into the notebook.
                    )

estimator.fit(wait = True, logs='All')

train_instance_count has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


Creating tmpkbe33ehh_algo-1-kr97p_1 ... 
[1BAttaching to tmpkbe33ehh_algo-1-kr97p_12mdone[0m
[36malgo-1-kr97p_1  |[0m 2021-02-24 03:06:58,762 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training
[36malgo-1-kr97p_1  |[0m 2021-02-24 03:06:58,766 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-kr97p_1  |[0m 2021-02-24 03:06:58,885 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-kr97p_1  |[0m 2021-02-24 03:06:58,901 sagemaker-containers INFO     Invoking user script
[36malgo-1-kr97p_1  |[0m 
[36malgo-1-kr97p_1  |[0m Training Env:
[36malgo-1-kr97p_1  |[0m 
[36malgo-1-kr97p_1  |[0m {
[36malgo-1-kr97p_1  |[0m     "additional_framework_parameters": {
[36malgo-1-kr97p_1  |[0m         "sagemaker_estimator": "RLEstimator"
[36malgo-1-kr97p_1  |[0m     },
[36malgo-1-kr97p_1  |[0m     "channel_input_dirs": {},
[36malgo-1-kr97p_1  |[0m     "current_host": "al

Failed to delete: /tmp/tmpkbe33ehh/algo-1-kr97p Please remove it manually.


===== Job Complete =====


In [9]:
%%time

cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'
repository_short_name = "sagemaker-roboschool-ray-%s" % cpu_or_gpu
docker_build_args = {
    'CPU_OR_GPU': cpu_or_gpu, 
    'AWS_REGION': boto3.Session().region_name,
}
custom_image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)
print("Using ECR image %s" % custom_image_name)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Logged into ECR
Building docker image sagemaker-roboschool-ray-gpu from Dockerfile
$ docker build -t sagemaker-roboschool-ray-gpu -f Dockerfile . --build-arg CPU_OR_GPU=gpu --build-arg AWS_REGION=ap-southeast-2
Sending build context to Docker daemon  8.947MB
Step 1/14 : ARG CPU_OR_GPU
Step 2/14 : ARG AWS_REGION
Step 3/14 : FROM 462105765813.dkr.ecr.${AWS_REGION}.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.2-tf-${CPU_OR_GPU}-py36
 ---> 044fa0c8742a
Step 4/14 : WORKDIR /opt/ml
 ---> Using cache
 ---> 983013f45aaf
Step 5/14 : RUN apt-get update && apt-get install -y       git cmake ffmpeg pkg-config       qtbase5-dev libqt5opengl5-dev libassimp-dev       libtinyxml-dev       libgl1-mesa-dev     && cd /opt     && apt-get clean && rm -rf /var/cache/apt/archives/* /var/lib/apt/lists/*
 ---> Using cache
 ---> 826153aae5a0
Step 6/14 : RUN apt-get update &&     apt-get install -y libboost-python-

## Write the Training Code

The training code is written in the file “solarcar_train.py” which is uploaded in the /src directory. 
First import the environment files and the preset files, and then define the main() function. 

In [10]:
!pygmentize src/{project_name}_train.py

[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m

[34mimport[39;49;00m [04m[36mgym[39;49;00m
[34mimport[39;49;00m [04m[36mray[39;49;00m
[34mfrom[39;49;00m [04m[36mray[39;49;00m[04m[36m.[39;49;00m[04m[36mtune[39;49;00m [34mimport[39;49;00m run_experiments


[34mfrom[39;49;00m [04m[36mray[39;49;00m[04m[36m.[39;49;00m[04m[36mtune[39;49;00m[04m[36m.[39;49;00m[04m[36mregistry[39;49;00m [34mimport[39;49;00m register_env
[34mfrom[39;49;00m [04m[36msolarcar_env[39;49;00m [34mimport[39;49;00m SolarCarEnv


[34mfrom[39;49;00m [04m[36msagemaker_rl[39;49;00m[04m[36m.[39;49;00m[04m[36mray_launcher[39;49;00m [34mimport[39;49;00m SageMakerRayLauncher
        


[34mclass[39;49;00m [04m[32mMyLauncher[39;49;00m(SageMakerRayLauncher):
        
          
    [34mdef[39;49;00m [32mregister_env_creator[39;49;00m([36mself[39;49;00m):
        env_name = [33m"[39;49;00m[33mSolarCarEnv-v0

## Train the RL model using the Python SDK Script mode

If you are using local mode, the training will run on the notebook instance. When using SageMaker for training, you can select a GPU or CPU instance. The RLEstimator is used for training RL jobs. 

1. Specify the source directory where the environment, presets and training code is uploaded.
2. Specify the entry point as the training code 
3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. 
4. Define the training parameters such as the instance count, job name, S3 path for output and job name. 
5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use. 
6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. 

In [11]:
%%time

metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)
    
estimator = RLEstimator(entry_point="solarcar_train.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        image_uri=custom_image_name,
                        role=role,
                        instance_type=instance_type,
                        instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        metric_definitions=metric_definitions,
                        hyperparameters={
                          # Attention scientists!  You can override any Ray algorithm parameter here:
                          #"rl.training.config.horizon": 5000,
                          #"rl.training.config.num_sgd_iter": 10,
                        }
                    )

estimator.fit(wait=local_mode)
job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

Training job: rl-solarcar-2021-02-24-03-53-51-962
CPU times: user 216 ms, sys: 0 ns, total: 216 ms
Wall time: 627 ms


### Create intermediate folder

RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that.

In [14]:
print("Job name: {}".format(job_name))

s3_url = "s3://{}/{}".format(s3_bucket,job_name)

intermediate_folder_key = "{}/output/intermediate/".format(job_name)
intermediate_url = "s3://{}/{}".format(s3_bucket, intermediate_folder_key)

print("S3 job path: {}".format(s3_url))
print("Intermediate folder path: {}".format(intermediate_url))
    
tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

Job name: rl-solarcar-2021-02-24-03-53-51-962
S3 job path: s3://sagemaker-ap-southeast-2-019676274883/rl-solarcar-2021-02-24-03-53-51-962
Intermediate folder path: s3://sagemaker-ap-southeast-2-019676274883/rl-solarcar-2021-02-24-03-53-51-962/output/intermediate/
Create local folder /tmp/rl-solarcar-2021-02-24-03-53-51-962


## Evaluation of RL models

We use the last checkpointed model to run evaluation for the RL Agent. 

### Load checkpointed model

Checkpointed data from the previously trained models will be passed on for evaluation / inference in the checkpoint channel. In local mode, we can simply use the local directory, whereas in the SageMaker mode, it needs to be moved to S3 first.

In [15]:
if local_mode:
    model_tar_key = "{}/model.tar.gz".format(job_name)
else:
    model_tar_key = "{}/output/model.tar.gz".format(job_name)

In [16]:
if local_mode:
    model_tar_key = "{}/model.tar.gz".format(job_name)
else:
    model_tar_key = "{}/output/model.tar.gz".format(job_name)
     
local_checkpoint_dir = "{}/model".format(tmp_dir)

wait_for_s3_object(s3_bucket, model_tar_key, tmp_dir, training_job_name=job_name)  

if not os.path.isfile("{}/model.tar.gz".format(tmp_dir)):
    raise FileNotFoundError("File model.tar.gz not found")
    
os.system("mkdir -p {}".format(local_checkpoint_dir))
os.system("tar -xvzf {}/model.tar.gz -C {}".format(tmp_dir, local_checkpoint_dir))

print("Checkpoint directory {}".format(local_checkpoint_dir))

Waiting for s3://sagemaker-ap-southeast-2-019676274883/rl-solarcar-2021-02-24-03-53-51-962/output/model.tar.gz...
Downloading rl-solarcar-2021-02-24-03-53-51-962/output/model.tar.gz
Checkpoint directory /tmp/rl-solarcar-2021-02-24-03-53-51-962/model


In [17]:
if local_mode:
    checkpoint_path = 'file://{}'.format(local_checkpoint_dir)
    print("Local checkpoint file path: {}".format(local_checkpoint_dir))
else:
    checkpoint_path = "s3://{}/{}/checkpoint/".format(s3_bucket, job_name)
    if not os.listdir(local_checkpoint_dir):
        raise FileNotFoundError("Checkpoint files not found under the path")
    os.system("aws s3 cp --recursive {} {}".format(local_checkpoint_dir, checkpoint_path))
    print("S3 checkpoint file path: {}".format(checkpoint_path))

S3 checkpoint file path: s3://sagemaker-ap-southeast-2-019676274883/rl-solarcar-2021-02-24-03-53-51-962/checkpoint/


# Model deployment

Now let us deploy the RL policy so that we can get the optimal action, given an environment observation. 

Trouble Shooting: if encounter problem of instances limit please come back and clear endpoints.

Loading the model into predictor. We will use predictor to predict the action from given observation.

In [18]:
# estimator.model_data =
# role = 

In [20]:
from sagemaker.tensorflow.model import TensorFlowModel

model = TensorFlowModel(model_data=estimator.model_data,
              framework_version='2.1.0',
              role=role)

predictor = model.deploy(initial_instance_count=1, 
                         instance_type='ml.m4.xlarge',)

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


-------------!

Now let us predict the actions using a dummy observation

In [21]:
# the below input is an example of observation
input = {"inputs": {'observations': [[1.82170394e+07, 0.00000000e+00, 3.60634351e+01, 1.08236378e+00, 6.00000000e+01]],
                    'prev_action': 0,
                    'is_training': False,
                    'prev_reward': 0,
                    'seq_lens': 3000
                   }
            }
# predictor is an agent
result = predictor.predict(input)

# preview of the results
result

{'outputs': {'action_logp': [0.0],
  'action_prob': [1.0],
  'vf_preds': [63.1483],
  'behaviour_logits': [[-0.387050927,
    0.249935687,
    0.149270773,
    -0.11494574,
    0.0901589394,
    -0.222717956,
    -0.300766319,
    0.000956379343,
    -0.0444840193,
    -0.0152243637,
    -0.0298270546,
    -0.405828685,
    -0.0304550435,
    0.260118306,
    0.167551503,
    -0.082584843,
    0.0938704908,
    0.105107881,
    0.119761,
    0.1863866]],
  'actions': [13]}}

From the observation above, we can see that the action number 1

In [22]:
# example observation
obs = [[1.82170394e+07, 0.00000000e+00, 3.60634351e+01, 1.08236378e+00, 6.00000000e+01]]

# example input
input = {"inputs": {'observations': obs,
                    'prev_action': 16,
                    'is_training': False,
                    'prev_reward': 66,
                    'seq_lens': -1
                   }
            }

# example predicted action
predictor.predict(input)['outputs']['actions'][0]

13

In the cell below, we will create a validation environment to check the performance of model. 

In [None]:
import math
import random
import datetime

from datetime import timedelta
from solarcar_env_nogym import *
import pprint

# import pandas as pd

# new environment is car, this environment from solarcar_env_nogym.
car = SolarCarEnv()

# reser the environment, get initial observation
car.reset()
input = {"inputs": {'observations': [car.reset().tolist()],
                    'prev_action': 0,
                    'is_training': False,
                    'prev_reward': 0,
                    'seq_lens': -1
                   }
            }
exp = []
done = False
rewards = 0
while not done:
#for i in range(6):
    obs, reward, done, info = car.step(predictor.predict(input)['outputs']['actions'][0])
    rewards += reward
    input = {"inputs": {'observations': [obs.tolist()],
                    'prev_action': predictor.predict(input)['outputs']['actions'][0],
                    'is_training': False,
                    'prev_reward': rewards,
                    'seq_lens': -1
                   }
            }
    
    exp.append(info.copy())

Now we can check the performance at the very end of the race and plot the battery used during the race. 

In [None]:
exp[-1]

In [None]:
import matplotlib.pyplot as plt
df = pd.read_csv('historical_sample.csv')
#p_batt = [exp[i]['p_batt'] for i in range(len(exp))]
#p_sun = [exp[i]['p_sun'] for i in range(len(exp))]
batterypower = [exp[i]['battery_joules_left']/3600 for i in range(len(exp))]
#plt.plot(p_batt, label = "Power in battery")
#plt.plot(p_sun, label = "power from the sun")
plt.figure(figsize = (20,6))
#t = df['batterypower']*3600
#t.plot(figsize = (20,6), label = "battery voltage original")
plt.plot(batterypower, label = "Battery power agent")
#plt.plot(t, label = "battery voltage original")
plt.legend()
plt.title('Battery Power (trained by CPU)')

plt.show()

Plot the agent's speed and the orginal speed.

In [None]:
agent_speed = [exp[i]['vehicle speed'] for i in range(len(exp))]
plt.figure(figsize = (20,5))
original_speed = df['vehiclevelocity']*3.6
original_speed.plot(figsize = (20,6), label = "Vehicle speed original")
plt.plot(agent_speed, label = 'Agent speed')
plt.legend()
plt.title('Agent speed vs original speed')
plt.show()

### Clean up endpoint

In [80]:
predictor.delete_endpoint()