### Disclaimer

Distribution authorized to U.S. Government agencies and their contractors. Other requests for this document shall be referred to the MIT Lincoln Laboratory Technology Office.

This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.

© 2019 Massachusetts Institute of Technology.

The software/firmware is provided to you on an As-Is basis

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.


### Treasure Hunt Challenge

This notebook demonstrates using [Stable Baselines](https://stable-baselines.readthedocs.io/en/master/) Proximal Policy Optimization to train a CNN-LSTM agent on the TESSE Autonomous Treasure Hunt Challange. For this challange, an agent must find as many 'treasures', placed around a TESSE environment, as possible in the alloted time (currently 100 timesteps).

`tesse_gym` allows for interface customizations, some of which are demonstrated here. Specifically, this notebook contains an example of changing the agent's observation from the default rgb image to combined rgb, segmentation, and depth. Additional configurations such as the reward function and action space may also be customized.

In [None]:
from stable_baselines.common.policies import CnnLstmPolicy
from stable_baselines.common.vec_env import SubprocVecEnv, VecVideoRecorder, DummyVecEnv
from stable_baselines import PPO2
import time
from pathlib import Path

from tesse.msgs import *
from tesse_gym.treasure_hunt import TreasureHunt

#### Path to TESSE build

If you're on the LLAN, you can grab the build at: `\\group104\share\users\GriffithDan\public\tess\builds\tesse_multiscene_v0.5.0_linux.zip`

In [None]:
filename = Path.home() / 'tess/builds/v0.5.0/tesse_multiscene_v0.5.0_linux.x86_64'

#### Configure Cameras
Below is an example of adjusting camera resolution, field of view, clipping planes, and position.

In [None]:
def set_cameras(tesse_interface):
    tesse_interface.env.request(SetCameraParametersRequest(camera=Camera.RGB_LEFT, 
                                                           height_in_pixels=240, 
                                                           width_in_pixels=320, 
                                                           field_of_view=45, 
                                                           near_clip_plane=0.05, 
                                                           far_clip_plane=50))
    tesse_interface.env.request(SetCameraParametersRequest(camera=Camera.SEGMENTATION, 
                                                           height_in_pixels=240, 
                                                           width_in_pixels=320, 
                                                           field_of_view=45, 
                                                           near_clip_plane=0.05, 
                                                           far_clip_plane=50))
    tesse_interface.env.request(SetCameraParametersRequest(camera=Camera.DEPTH, 
                                                           height_in_pixels=240, 
                                                           width_in_pixels=320, 
                                                           field_of_view=45, 
                                                           near_clip_plane=0.05, 
                                                           far_clip_plane=50))
    tesse_interface.env.request(SetCameraPositionRequest(camera=Camera.RGB_LEFT, 
                                                         x=0, 
                                                         y=0, 
                                                         z=-0.1))
    tesse_interface.env.request(SetCameraPositionRequest(camera=Camera.SEGMENTATION, 
                                                         x=0, 
                                                         y=0, 
                                                         z=-0.1))
    tesse_interface.env.request(SetCameraPositionRequest(camera=Camera.DEPTH, 
                                                         x=0, 
                                                         y=0, 
                                                         z=-0.1))

#### Adjust agent observation
Override observation methods to give RGB, segmentation, and depth information to agent.  This requires

1. Defining the observation space
2. Overriding `form_agent_observation()`
3. Overriding `observe()`

In [None]:
from tesse_gym.treasure_hunt import HuntMode
from gym import spaces
class RGBSegDepthInput(TreasureHunt):   
    @property
    def observation_space(self):
        """ This must be defined for custom observations. """
        return spaces.Box(0, 255, dtype=np.float32, shape=(240, 320, 7))
    
    def form_agent_observation(self, tesse_data):
        """ Create the agent's observation from a TESSE data response. """
        eo, seg, depth = tesse_data.images
        observation = np.concatenate((eo / 255.0, 
                                      seg / 255.0, 
                                      depth[..., np.newaxis]), axis=-1)
        return observation
    
    def observe(self):
        cameras = [
            (Camera.RGB_LEFT, Compression.OFF, Channels.THREE),
            (Camera.SEGMENTATION, Compression.OFF, Channels.THREE),
            (Camera.DEPTH, Compression.OFF, Channels.THREE)
        ]
        agent_data = self.env.request(DataRequest(metadata=True, cameras=cameras))           
        return agent_data

#### Define logging directory and callback function to save checkpoints
This will save intermediate checkpoints

In [None]:
log_dir = Path('results/testing/')
log_dir.mkdir(parents=True, exist_ok=True)

def save_checkpoint_callback(local_vars,  global_vars):
    total_updates = local_vars['update'] 
    if total_updates % 50 == 0:
        local_vars["self"].save(str(log_dir / f'{total_updates:09d}.pkl'))

#### Setting environment parameters


__Note__: For ease of debugging this uses `DummyVecEnv` with 1 environment. For actual exeriments change this to `SubprocVecEnv` with multiple environments.

In [None]:
total_timesteps = 6000000
scene_id = 5
success_dist = 2
n_targets = 50
max_steps = 100
restart_on_collision = False
n_environments = 1
    
def make_unity_env(filename, num_env):
    """ Create a wrapped Unity environment. """
    def make_env(rank):
        def _thunk():
            env = RGBSegDepthInput(filename, 
                                'localhost',
                                'localhost', 
                                worker_id=rank, 
                                step_rate=30,
                                scene_id=scene_id)
            return env
        return _thunk
    
    return DummyVecEnv([make_env(i) for i in range(num_env)])

#### Next, we launch environments.

In [None]:
env = make_unity_env(filename, n_environments)

#### Specify the agent model for learning.

In [None]:
model = PPO2(CnnLstmPolicy, env, verbose=1, tensorboard_log="./tensorboard/", nminibatches=1)

#### Train the model

In [None]:
model.learn(total_timesteps=total_timesteps, callback=save_checkpoint_callback)
model.save("the.policy")  # save finals policy

### Make a video

Demonstrates loading the model and executing it to construct a video.

In [None]:
model = PPO2.load('the.policy')

In [None]:
video_length = 500

video_env = VecVideoRecorder(env,
                             video_folder='videos',
                             record_video_trigger=lambda x: x == 0,
                             video_length=video_length,
                             name_prefix='test-1'
                            )

obs = video_env.reset()
for _ in range(video_length + 1):
    action, _ = model.predict(obs)
    obs, _, _, _ = video_env.step(action)