Distributional Reachability Policy Optimization (DRPO)

Code for the paper "Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate", co-authored by Dongjie Yu*, Wenjun Zou*, Yujie Yang*, Haitong Ma, Shengbo Eben Li, Yuming Yin, Jianyu Chen and Jingliang Duan.

Update

The paper has been accepted by IEEE Transactions on Automation Science and Engineering. Check the final version here. Congrats to all co-authors!

Acknowledgements

The code is based on SMBPO by Garrett Thomas. Thank him for his wonderful and clear implementation.

Branches Overview

Branch name	Usage
drpo-other_env-viz	DRPO implementation for `quadrotor` and `cartpole-move`; also for ablation_1 on different modules; training curves visualization.
drpo-safetygym-viz	DRPO implementation for `safetygym-car` and `safetygym-point`; also for ablation_1 on different modules; ablation_2 on different $\Phi^{-1}(\beta)$.
csc-other_env, csc-safetygym	`Conservative Safety Critics` and `MBPO-Lagrangian` implementation for different envs.
smbpo, smbpo-safetygym	`SMBPO` and `MBPO` implementation for different envs.
drpo-safetygym-ablation_3-constraints	Ablation_3 on different constraint formulations (intermediate policy or shield policy).
Other branches are all deprecated.

Prerequisites

Install MuJoCo and mujoco-py.
Clone safe-control-gym and safety-gym and run pip install -e . in both directories to install the two environments. Note that we make changes (such as time-up settings) to the envs so they are different from the versions developed by original authors. You need to install our repositories to run DRPO codes.
run pip install -r requirements.txt.
Set the ROOT_DIR in ./src/defaults.py as /your/path/to/this/repository. This is where experiments' logs and checkpoints will be placed.

Run the code

Run

python main.py -c config/ENV.json

or

sh run-exp_name.sh

in the command line.

More envs Now we only support ENV=cartpole-move, quadrotor, safetygym-car, safetygym-point. But you are free to customize your own env as long as you implement it with check_done, check_violation and get_constrained_values on top of basic gym envs. Remenber to put it in ./src/env and add it in ./src/shared.py
Change hyper parameters You are free to finetune hyper-parameters in three ways: (1) change values in different .py files; (2) change values in ./config/ENV.json and (3) change values in the command line with python main.py -c config/ENV.json -s PAMRM VALUE. Use . to specify hierarchical structure in the config, e.g. -s alg_cfg.horizon 10. The priorities of the three ways are from low to high (e.g., a value in (1) will be overrided by the value specified in (3)).
Experiments results will be stored in ./ENV/{time}_{alg_name}_{random_seed}, together with configs, checkpoints, training and evaluation data.

Test and visualize the trajectories (only for `cartpole-move` and `quadrotor`)

Check and run the command line in the ./src/tester.py and the results will be stored in the corresponding logs directories.
Now you can run python files in ./src/viz_cartpole and ./src/viz_quadrotor to see the learned multipliers, reachability certificates and the test trajectories. Images of cartpole-move will be stored in the tester directory in the logs while trajectories of quadrotor will be stored in ./src/viz_quadrotor.

Plot the training curves.

Collect the results of each algorithm in ./logs/ENV/ALGO/{time}_{alg_name}_{random_seed1}, ./logs/ENV/ALGO/{time}_{alg_name}_{random_seed2}, etc.
See ./src/viz_curves.ipynb and add your algorithms to alg_list in help_func().
plot_eval_results_of_all_alg_n_runs(ENV) and watch the curves.

Contributing

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with me before making a change. Also free to fork/star and make any changes.

If you find our paper/codes helpful, welcome to cite:

@ARTICLE{yu2023drpo,
  author={Yu, Dongjie and Zou, Wenjun and Yang, Yujie and Ma, Haitong and Li, Shengbo Eben and Yin, Yuming and Chen, Jianyu and Duan, Jingliang},
  journal={IEEE Transactions on Automation Science and Engineering}, 
  title={Safe Model-Based Reinforcement Learning With an Uncertainty-Aware Reachability Certificate}, 
  year={2023},
  volume={},
  number={},
  pages={1-14},
  doi={10.1109/TASE.2023.3292388}
}

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt
run-ablation-1_quadrotor.sh		run-ablation-1_quadrotor.sh
run-tracking.sh		run-tracking.sh
run.sh		run.sh

License

ManUtdMoon/Distributional-Reachability-Policy-Optimization

Folders and files

Latest commit

History

Repository files navigation

Distributional Reachability Policy Optimization (DRPO)

Update

Acknowledgements

Branches Overview

Prerequisites

Run the code

Test and visualize the trajectories (only for cartpole-move and quadrotor)

Plot the training curves.

Contributing

If you find our paper/codes helpful, welcome to cite:

About

Resources

License

Stars

Watchers

Forks

Languages

Test and visualize the trajectories (only for `cartpole-move` and `quadrotor`)