In [1]:
#from  train_highway_env import *
from bokeh.io import output_notebook
from bokeh.plotting import show
import rlrom.plots
from rlrom.testers import RLTester
import rlrom.utils as utils
from pprint import pprint
output_notebook()  

In [4]:
cfg = utils.load_cfg('cfg_hw_min.yml')
pprint(cfg)


Imported module highway
{'cfg_env': {'manual_control': True},
 'cfg_test': {'init_seeds': 1,
              'num_ep': 5,
              'num_steps': 100,
              'render_mode': 'human'},
 'env_name': 'highway-v0',
 'import_module': 'highway',
 'make_env_test': 'make_env_test',
 'make_env_train': 'make_env_train'}


In [5]:
T= RLTester(cfg)
Tres= T.run_cfg_test()
T.print_res_all_ep(Tres)

Imported module highway
INFO: manual_control set to True, stay alert.
.....
mean_ep_len: 26.4 | mean_ep_rew: 24.46


## Testing and Monitoring agents

RLROM works with using configuration files in the YAML format. E.g., below is the `cfg_main0.yml` configuration file.

```yaml
env_name: highway-v0
cfg_env: cfg_env.yml
cfg_specs: cfg_specs0.yml
cfg_train: 
  model_name: basic
  model_path: ./models
  algo:
    ppo:    
      batch_size: 128
      n_envs: 12
      neurons: 128
      learning_rate: 5e-4    
      total_timesteps: 200_000
      tensorboard:
        tb_path: ./tb_logs    
cfg_test: 
  init_seeds: 0
  num_ep: 5
  num_steps: 100 
  render: true
```
Note that sections can be included from other files, such as `cfg_env.yml` and `cfg_specs0.yml`.


The main class of RLROM is named `RLTester`. It is instantiated with a configuration file or dictionary. 

In [None]:
#cfg ='cfg_main0.yml'
#cfg ='cfg_main0_with_obs.yml'
cfg ='cfg_main_right_lane_with_obs.yml'
T = RLTester(cfg)

The method `run_cfg_test()` will run the tests as specified in `cfg_test`, in that case 5 episodes with seeds 0 to 4.

In [None]:
Tres = T.run_cfg_test()
T.print_res_all_ep(Tres)

`Tres` contains a number of evaluations for the different formulas defined in the configuration files. `print_res_all_ep` prints results for the formulas in [cfg_eval.yml](cfg_eval.yml). The `Tres['res']` dictionnary also contains results for the reward_formulas. E.g.,  

In [None]:
Tres['res']['danger']

Here, the `num_sat` shows the number of steps for each episode in which the `danger` formula is true. We can plot signals in episode 3 to see what happens:

In [None]:
lay = """
_ep(1)
car1_danger, car2_danger
sat(danger)
reward
"""
fig, _=  T.get_fig(lay)
show(fig)

# Testing Models

To train models using the `cfg_main.yml` configuration, we used the script `train_highway_env.py`, using the command:
```
$ python train_highway_env.py cfg_main.yml
```
Below, we test several pre-trained model. All formulas are defined in [hw-env_specs.stl](hw-env_specs.stl). The results below show that the trained model overall tend to behave in accordance to the formulas used for training; they can certainly be further improved.  

In [None]:
num_tests= 100
render   = False

### Baseline: collision + danger + speed

Basic configuration. Collision penalty, adding danger formula with weight -20, velocity reward:
```yaml
# linear combination : new_reward = reward + w1 rho1 + w2 rho2 etc
reward_formulas:
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -20
    lower_bound: 0.0 # we do not want reward when no danger, so we cap negative values for danger to 0. 
```

In [None]:
cfg = utils.load_cfg('cfg_main0.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg0=cfg
T0 = RLTester(cfg0)
T0_res = T0.run_cfg_test()
T0.print_res_all_ep(T0_res)

## Baseline with obs

Same thing, but adding danger predicate in observation: 

```yaml
obs_formulas:
  danger:
    obs_name: 'obs_danger'
    past_horizon: 0

reward_formulas:
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -20
    lower_bound: 0.0
```

In [None]:
cfg = utils.load_cfg('cfg_main0_with_obs.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg0_obs=cfg
T0_obs = RLTester(cfg0_obs)
T0_obs_res = T0_obs.run_cfg_test()
T0_obs.print_res_all_ep(T0_obs_res)

Overall performance is improved about 10%, as seen by `mean_ep_len` and `sum` for `ego_moves`. 

## Right lane with obs

Here we enforce the agent to drive in the right lane.
```yaml
obs_formulas: 
  ego_right_lane:
    obs_name: 'obs_ego_right_lane'
    past_horizon: 0
  danger:
    obs_name: 'obs_danger'
    past_horizon: 0

reward_formulas:
  ego_right_lane:
    past_horizon: 0
    weight: 1      
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -20
    lower_bound: 0.0
```

In [None]:
cfg = utils.load_cfg('cfg_main_right_lane_with_obs.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg_right_lane_obs=cfg
Tright_lane_obs = RLTester(cfg_right_lane_obs)
Tright_lane_obs_res = Tright_lane_obs.run_cfg_test()
Tright_lane_obs.print_res_all_ep(Tright_lane_obs_res)

About 50% of the episodes satisfy the formula `phi_right_lane`. 

## No passing on right 

Here we tried to prevent the agent to pass other vehicules on their closest lane on the right. 

```yaml
obs_formulas: 
  car_left:
    obs_name: 'obs_car_left'
    past_horizon: 0
  danger:
    obs_name: 'obs_danger'
    past_horizon: 0

reward_formulas:
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -25
    lower_bound: 0.0
  car_left:
    past_horizon: 0
    weight: -25
    lower_bound: 0
```

In [None]:
cfg = utils.load_cfg('cfg_main_car_left.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg_car_left=cfg
Tcar_left = RLTester(cfg_car_left)
Tcar_left_res = Tcar_left.run_cfg_test()
Tcar_left.print_res_all_ep(Tcar_left_res)

Results show a decrease of satisfaction of `car_left` and increase of `phi_car_left` which says that ego should not stay on the right of a car more than 10 steps (`phi_car_left := alw_[0,50] ev_[0,10] (not car_left)` )