Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

This repository implements a model-free reach-avoid reinforcement learning (RARL) to guarantee safety and liveness, and additionally contains example uses and benchmark evaluations of the proposed algorithm on a range of nonlinear systems. RARL is primarily developed by Kai-Chieh Hsu, a PhD student in the Safe Robotics Lab, and Vicenç Rubies-Royo, a postdoc in the Hybrid Systems Lab.

The repository also serves as the companion code to our RSS 2021 paper, where you can find the theoretical properties of the proposed algorithm as well as the implementation details. All experiments in the paper are included as examples in this repository, and you can replicate the results by using the commands described in Section II below. With some simple modification, you can replicate the results in the preceding ICRA 19 paper, which considers the special case of reachability/safety only.

This tool is designed to work for arbitrary reinforcement learning environments, and uses two scalar signals (a target margin and a safety margin) rather than a single scalar reward signal. You just need to add your environment under gym_reachability and register through the standard method in gym. You can refer to some examples provided here. This tool learns the reach-avoid set by trial-and-error interactions with the environment, so it is not in itself a safe learning algorithm. However, it can be used in conjunction with an existing safe learning scheme, such as "shielding", to enable learning with safety guarantees (see Script 4 below as well as Section IV.B in the RSS 2021 paper for an example).

The implementation of tabular Q-learning is adapted from Denny Britz's implementation and the implementation of double deep Q-network and replay memory is adapted from PyTorch's tutorial (by Adam Paszke).

I. Dependencies

If you are using anaconda to control packages, you can use one of the following command to create an identical environment with the specification file:

conda create --name <myenv> --file doc/spec-mac.txt
conda create --name <myenv> --file doc/spec-linux.txt

Otherwise, you can install the following packages manually:

numpy=1.21.1
pytorch=1.9.0
gym=0.18.0
scipy=1.7.0
matplotlib=3.4.2
box2d-py=2.3.8
shapely=1.7.1

II. Replicating the results in the RSS 2021 paper

Each script will automatically generate a folder under experiments/ containing visualizations of the the training process and the weights of trained model. In addition, the script will generate a train.pkl file, which contains the following:

training loss
training accuracy
trajectory rollout outcome starting from a grid of states
action taken from a grid of states

Lunar lander in Figure 1

    python3 sim_lunar_lander.py -sf

Point object in Figure 2

    python3 sim_naive.py -w -sf -a -g 0.9 -mu 12000000 -cp 600000 -ut 20 -n anneal

Point object in Figure 3

    python3 sim_naive.py -sf -g 0.9999 -n 9999

Point object in Figure 4

    python3 sim_show.py -sf -g 0.9999 -n 9999

Dubins car in Figure 5

    python3 sim_car_one.py -sf -w -wi 5000 -g 0.9999 -n 9999

Dubins car (attack-defense game) in Figure 7 (Section IV.D):

    python3 sim_car_pe.py -sf -w -wi 30000 -g 0.9999 -n 9999

Paper Citation

If you use this code or find it helpful, please consider citing the companion RSS 2021 paper as:

@INPROCEEDINGS{hsu2021safety,
    AUTHOR    = {Kai-Chieh Hsu$^*$ and Vicenç Rubies-Royo$^*$ and Claire J. Tomlin and Jaime F. Fisac},
    TITLE     = {Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning},
    BOOKTITLE = {Proceedings of Robotics: Science and Systems},
    YEAR      = {2021},
    ADDRESS   = {Virtual},
    MONTH     = {July},
    DOI       = {10.15607/RSS.2021.XVII.077}
}

Name		Name	Last commit message	Last commit date
Latest commit History 632 Commits
RARL		RARL
doc		doc
gym_reachability/gym_reachability		gym_reachability/gym_reachability
tabular_q_learning		tabular_q_learning
utils		utils
LICENSE		LICENSE
README.md		README.md
TQ_zermelo.py		TQ_zermelo.py
colEstError.py		colEstError.py
colValResult.py		colValResult.py
genEstSamples.py		genEstSamples.py
genValSamples.py		genValSamples.py
sim_approx_defender.py		sim_approx_defender.py
sim_car_one.py		sim_car_one.py
sim_car_pe.py		sim_car_pe.py
sim_est_error.py		sim_est_error.py
sim_est_error_single.py		sim_est_error_single.py
sim_lunar_lander.py		sim_lunar_lander.py
sim_naive.py		sim_naive.py
sim_show.py		sim_show.py

License

HJReachability/safety_rl

Folders and files

Latest commit

History

Repository files navigation

Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

I. Dependencies

II. Replicating the results in the RSS 2021 paper

Paper Citation

About

Resources

License

Stars

Watchers

Forks

Languages