In [1]:
from  train_highway_env import *
from bokeh.io import output_notebook
from bokeh.plotting import show
import rlrom.plots
from rlrom.testers import RLTester
from pprint import pprint
output_notebook()  

2025-07-27 21:59:43.536848: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-07-27 21:59:43.553072: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-07-27 21:59:43.557916: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-27 21:59:43.569333: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Testing and Monitoring agents
RLROM works with using configuration files in the YAML format. E.g., below is the `cfg_main0.yml` configuration file.

```yaml
env_name: highway-v0
cfg_env: cfg_env.yml
cfg_specs: cfg_specs0.yml
cfg_train: 
  model_name: basic
  model_path: ./models
  algo:
    ppo:    
      batch_size: 128
      n_envs: 12
      neurons: 128
      learning_rate: 5e-4    
      total_timesteps: 200_000
      tensorboard:
        tb_path: ./tb_logs    
cfg_test: 
  init_seeds: 0
  num_ep: 5
  num_steps: 100 
  render: true
```
Note that sections can be included from other files, such as `cfg_env.yml` and `cfg_specs0.yml`.


The main class of RLROM is the `RLTester`, which is instantiated with a configuration file or dictionary. 

In [8]:
cfg ='cfg_main0.yml'
T = RLTester(cfg)

loading field [ cfg_env ] from YAML file [ cfg_env.yml ]
loading field [ cfg_specs ] from YAML file [ cfg_specs0.yml ]
loading field [ eval_formulas ] from YAML file [ cfg_eval.yml ]


The method `run_cfg_test()` will run the tests as specified in `cfg_test`, in that case 5 episodes with seeds 0 to 4.

In [9]:
Tres = T.run_cfg_test()
T.print_res_all_ep(Tres)

INFO: Loading model  /home/alex/workspace_local/rlrom/examples/highway_env/models/basic.zip
loading PPO model succeeded


Exception: code() argument 13 must be str, not int
Exception: code() argument 13 must be str, not int


.....
mean_ep_len: 32.6 | mean_ep_rew: 3.504
phi_speed: ratio_sat= 0.2
ego_moves: sum= 958.7 | mean= 29.25 | num_sat= 32.6
ego_fast: sum= 79.04 | mean= 2.336 | num_sat= 31.4
ego_right_lane: sum= -12.78 | mean= -0.2832 | num_sat= 5.2
phi_right_lane: ratio_sat= 0.2
phi_car_left: ratio_sat= 0.6
car_left: sum= -6.507 | mean= -0.1572 | num_sat= 6.4


`Tres` contains a number of evaluations for the different formulas defined in the configuration files. `print_res_all_ep` prints results for the formulas in [cfg_eval.yml](cfg_eval.yml). The `Tres['res']` dictionnary also contains results for the reward_formulas. E.g.,  

In [18]:
Tres['res']['danger']

{'mean': array([-0.24917592, -0.09975279, -0.19560276, -0.12232095, -0.02748708]),
 'sum': array([-2.49175923, -2.59357242, -3.91205512, -3.9142704 , -2.06153064]),
 'num_sat': array([3, 2, 5, 5, 4])}

Here, the `num_sat` shows the number of steps for each episode in which the `danger` formula is true. We can plot signals in episode 3 to see what happens:

In [20]:
lay = """
_ep(3)
car1_danger, car2_danger
sat(danger)
reward
"""
fig, _=  T.get_fig(lay)
show(fig)

_ep(3)
car1_danger
car2_danger
sat(danger)
reward


# Testing Models

To train models using the `cfg_main.yml` configuration, we used the script `train_highway_env.py`, using the command:
```
$ python train_highway_env.py cfg_main.yml
```
Below, we test several pre-trained model. All formulas are defined in [hw-env_specs.stl](hw-env_specs.stl). The results below show that the trained model overall tend to behave in accordance to the formulas used for training; they can certainly be further improved.  

In [13]:
num_tests= 100
render   = False

### Baseline: collision + danger + speed

Basic configuration. Collision penalty, adding danger formula with weight -20, velocity reward:
```yaml
# linear combination : new_reward = reward + w1 rho1 + w2 rho2 etc
reward_formulas:
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -20
    lower_bound: 0.0 # we do not want reward when no danger, so we cap negative values for danger to 0. 
```

In [14]:
cfg = utils.load_cfg('cfg_main0.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg0=cfg
T0 = RLTester(cfg0)
T0_res = T0.run_cfg_test()
T0.print_res_all_ep(T0_res)

loading field [ cfg_env ] from file [ cfg_env.yml ]
loading field [ cfg_specs ] from file [ cfg_specs0.yml ]
loading field [ eval_formulas ] from file [ cfg_eval.yml ]
INFO: Loading model  /home/alex/workspace_wsl_bigguy/rlrom/examples/highway_env/models/basic.zip
loading PPO model succeeded
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|

mean_ep_len: 37.26 | mean_ep_rew: 4.654
phi_speed: ratio_sat= 0.39
ego_moves: sum= 1086 | mean= 28.94 | num_sat= 37.26
ego_fast: sum= 84.78 | mean= 2.169 | num_sat= 34.27
ego_right_lane: sum= -15.8 | mean= -0.4015 | num_sat= 1.66
phi_right_lane: ratio_sat= 0
phi_car_left: ratio_sat= 0.75
car_left: sum= -6.347 | mean= -0.1633 | num_sat= 7.93


## Baseline with obs

Same thing, but adding danger predicate in observation: 

```yaml
obs_formulas:
  danger:
    obs_name: 'obs_danger'
    past_horizon: 0

reward_formulas:
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -20
    lower_bound: 0.0
```

In [16]:
cfg = utils.load_cfg('cfg_main0_with_obs.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg0_obs=cfg
T0_obs = RLTester(cfg0_obs)
T0_obs_res = T0_obs.run_cfg_test()
T0_obs.print_res_all_ep(T0_obs_res)

loading field [ cfg_env ] from file [ cfg_env.yml ]
loading field [ cfg_specs ] from file [ cfg_specs0_with_obs.yml ]
loading field [ eval_formulas ] from file [ cfg_eval.yml ]
INFO: Loading model  /home/alex/workspace_wsl_bigguy/rlrom/examples/highway_env/models/basic_with_obs.zip
loading PPO model succeeded
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|

mean_ep_len: 41.51 | mean_ep_rew: 5.033
phi_speed: ratio_sat= 0.32
ego_moves: sum= 1198 | mean= 28.74 | num_sat= 41.51
ego_fast: sum= 88.18 | mean= 2.056 | num_sat= 36.98
ego_right_lane: sum= -10.89 | mean= -0.2609 | num_sat= 11.19
phi_right_lane: ratio_sat= 0.03
phi_car_left: ratio_sat= 0.74
car_left: sum= -5.97 | mean= -0.1457 | num_sat= 8.49


Overall performance is improved about 10%, as seen by `mean_ep_len` and `sum` for `ego_moves`. 

## Right lane with obs

Here we enforce the agent to drive in the right lane.
```yaml
obs_formulas: 
  ego_right_lane:
    obs_name: 'obs_ego_right_lane'
    past_horizon: 0
  danger:
    obs_name: 'obs_danger'
    past_horizon: 0

reward_formulas:
  ego_right_lane:
    past_horizon: 0
    weight: 1      
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -20
    lower_bound: 0.0
```

In [17]:
cfg = utils.load_cfg('cfg_main_right_lane_with_obs.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg_right_lane_obs=cfg
Tright_lane_obs = RLTester(cfg_right_lane_obs)
Tright_lane_obs_res = Tright_lane_obs.run_cfg_test()
Tright_lane_obs.print_res_all_ep(Tright_lane_obs_res)

loading field [ cfg_env ] from file [ cfg_env.yml ]
loading field [ cfg_specs ] from file [ cfg_specs_right_lane_with_obs.yml ]
loading field [ eval_formulas ] from file [ cfg_eval.yml ]
INFO: Loading model  /home/alex/workspace_wsl_bigguy/rlrom/examples/highway_env/models/right_lane_with_obs.zip
loading PPO model succeeded
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|

mean_ep_len: 50.18 | mean_ep_rew: 4.763
phi_speed: ratio_sat= 0.36
ego_moves: sum= 1379 | mean= 27.39 | num_sat= 50.18
ego_fast: sum= 68.65 | mean= 1.313 | num_sat= 37.53
ego_right_lane: sum= 2.92 | mean= 0.05197 | num_sat= 33.99
phi_right_lane: ratio_sat= 0.52
phi_car_left: ratio_sat= 0.72
car_left: sum= -3.465 | mean= -0.07415 | num_sat= 15.07


About 50% of the episodes satisfy the formula `phi_right_lane`. 

## No passing on right 

Here we tried to prevent the agent to pass other vehicules on their closest lane on the right. 

```yaml
obs_formulas: 
  car_left:
    obs_name: 'obs_car_left'
    past_horizon: 0
  danger:
    obs_name: 'obs_danger'
    past_horizon: 0

reward_formulas:
  ego_fast:
    past_horizon: 0
    weight: .1
  danger:
    past_horizon: 0
    weight: -25
    lower_bound: 0.0
  car_left:
    past_horizon: 0
    weight: -25
    lower_bound: 0
```

In [20]:
cfg = utils.load_cfg('cfg_main_car_left.yml')
cfg['cfg_test']['num_ep'] = num_tests
cfg['cfg_test']['render'] = render
cfg['cfg_env']['manual_control'] = False
cfg_car_left=cfg
Tcar_left = RLTester(cfg_car_left)
Tcar_left_res = Tcar_left.run_cfg_test()
Tcar_left.print_res_all_ep(Tcar_left_res)

loading field [ cfg_env ] from file [ cfg_env.yml ]
loading field [ cfg_specs ] from file [ cfg_specs_car_left.yml ]
loading field [ eval_formulas ] from file [ cfg_eval.yml ]
INFO: Loading model  /home/alex/workspace_wsl_bigguy/rlrom/examples/highway_env/models/car_left.zip
loading PPO model succeeded
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|
..........|

mean_ep_len: 50.18 | mean_ep_rew: -10.65
phi_speed: ratio_sat= 0.14
ego_moves: sum= 1327 | mean= 26.94 | num_sat= 50.18
ego_fast: sum= 39.7 | mean= 1.066 | num_sat= 30.99
ego_right_lane: sum= -17.48 | mean= -0.3142 | num_sat= 11.75
phi_right_lane: ratio_sat= 0.01
phi_car_left: ratio_sat= 0.86
car_left: sum= -7.521 | mean= -0.1536 | num_sat= 5.96


Results show a decrease of satisfaction of `car_left` and increase of `phi_car_left` which says that ego should not stay on the right of a car more than 10 steps (`phi_car_left := alw_[0,50] ev_[0,10] (not car_left)` )