Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

OmniSafe's Safety-Gymnasium Benchmark for Offline Algorithms

The OmniSafe Safety-Gymnasium Benchmark for offline algorithms evaluates the effectiveness of OmniSafe's offline algorithms across multiple environments from the Safety-Gymnasium task suite. For each algorithm and environment supported, we provide:

Default hyperparameters used for the benchmark and scripts to reproduce the results.
Graphs and raw data that can be used for research purposes.
Log details obtained during training.
Some hints on how to fine-tune the algorithm for optimal results.

Supported algorithms are listed below:

[ICML 2019] Batch-Constrained deep Q-learning(BCQ)
The Lagrange version of BCQ (BCQ-Lag)
[NeurIPS 2020] Critic Regularized Regression
The Constrained version of CRR (C-CRR)
[ICLR 2022 (Spotlight)] COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Safety-Gymnasium

We highly recommend using Safety-Gymnasium to run the following experiments. To install, in a linux machine, type:

pip install safety_gymnasium

Training agents used to generate data

omnisafe train --env-id SafetyAntVelocity-v1 --algo PPOLag

Collect offline data

The PATH_TO_AGENT is the path of the directory containing the torch_save.

from omnisafe.common.offline.data_collector import OfflineDataCollector


# please change agent path
env_name = 'SafetyAntVelocity-v1'
size = 1_000_000
agents = [
    ('PATH_TO_AGENT', 'epoch-500.pt', 1_000_000),
]
save_dir = './data'

if __name__ == '__main__':
    col = OfflineDataCollector(size, env_name)
    for agent, model_name, num in agents:
        col.register_agent(agent, model_name, num)
    col.collect(save_dir)

Run the Benchmark

You can set the main function of examples/benchmarks/experiment_grid.py as:

if __name__ == '__main__':
    eg = ExperimentGrid(exp_name='Offline-Benchmarks')

    # set up the algorithms.
    offline_policy = ['VAEBC', 'BCQ', 'BCQLag', 'CCR', 'CCRR', 'COptiDICE']

    eg.add('algo', offline_policy)

    # you can use wandb to monitor the experiment.
    eg.add('logger_cfgs:use_wandb', [False])
    # you can use tensorboard to monitor the experiment.
    eg.add('logger_cfgs:use_tensorboard', [True])
    # add dataset path
    eg.add('train_cfgs:dataset', [dataset_path])

    # set up the environment.
    eg.add('env_id', [
        'SafetyAntVelocity-v1',
        ])
    eg.add('seed', [0, 5, 10, 15, 20])

    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=5)

After that, you can run the following command to run the benchmark:

cd examples/benchmarks
python run_experiment_grid.py

You can also plot the results by running the following command:

cd examples
python analyze_experiment_results.py

For a detailed usage of OmniSafe statistics tool, please refer to this tutorial.

Example benchmark

SafetyPointCircle1-v0($\beta = 0.25$)

SafetyPointCircle1-v0(beta=0.25)

Algorithms	Reward (OmniSafe)	Cost (OmniSafe)
VAE-BC	43.66±0.90	109.86±13.24
C-CRR	45.48±0.87	127.30±12.60
BCQLag	43.31±0.76	113.39±12.81
COptiDICE	40.68±0.93	67.11±13.15

SafetyPointCircle1-v0($\beta = 0.50$)

SafetyPointCircle1-v0(beta=0.5)

Algorithms	Reward (OmniSafe)	Cost (OmniSafe)
VAE-BC	42.84±1.36	62.34±14.84
C-CRR	45.99±1.36	97.20±13.57
BCQLag	44.68±1.97	95.06±33.07
COptiDICE	39.55±1.39	53.87±13.27

SafetyPointCircle1-v0($\beta = 0.75$)

SafetyPointCircle1-v0(beta=0.75))

Algorithms	Reward (OmniSafe)	Cost (OmniSafe)
VAE-BC	40.23±0.75	41.25±10.12
C-CRR	40.66±0.88	49.90±10.81
BCQLag	42.94±1.04	85.37±23.41
COptiDICE	40.98±0.89	70.40±12.14

SafetyCarCircle1-v0($\beta = 0.25$)

SafetyCarCircle1-v0(beta=0.25)

Algorithms	Reward (OmniSafe)	Cost (OmniSafe)
VAE-BC	19.62±0.28	150.54±7.63
C-CRR	18.53±0.45	122.63±13.14
BCQLag	18.88±0.61	125.44±15.68
COptiDICE	17.25±0.37	90.86±10.75

SafetyCarCircle1-v0($\beta = 0.50$)

SafetyCarCircle1-v0(beta=0.5)

Algorithms	Reward (OmniSafe)	Cost (OmniSafe)
VAE-BC	18.69±0.33	125.97±10.36
C-CRR	17.24±0.43	89.47±11.55
BCQLag	18.14±0.96	108.07±20.70
COptiDICE	16.38±0.43	70.54±12.36

SafetyCarCircle1-v0($\beta = 0.75$)

SafetyCarCircle1-v0(beta=0.75)

Algorithms	Reward (OmniSafe)	Cost (OmniSafe)
VAE-BC	17.31±0.33	85.53±11.33
C-CRR	15.74±0.42	48.38±10.31
BCQLag	17.10±0.84	77.54±14.07
COptiDICE	15.58±0.37	49.42±8.699

Some Insights

The Lagrange method is widely recognized as the simplest approach for implementing safe algorithms. However, its efficacy is limited in offline settings. This is due to the fact that the q-critic, which is based on TD-learning, cannot ensure the absolute accuracy of the q value. Since the policy relies on the q value, relative accuracy is sufficient for learning the appropriate state. However, for the Lagrange-based algorithm to acquire meaningful Lagrange multipliers, it requires q values that are absolutely accurate.
Currently, there exists no standardized data set for safe offline evaluation algorithms. This lack of uniformity renders the existing algorithm sensitive to the data set employed. When the data set comprises a significant number of unsafe trajectories, the algorithm struggles to learn the safe performance. We advocate for a construction method that involves training two online policies: one offering a high reward but lacking safety assurances, and the other offering a slightly lower reward but ensuring safety. The two policies are then used to collect data sets in a 1:1 ratio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

offline

offline

README.md

README.md

README.md

OmniSafe's Safety-Gymnasium Benchmark for Offline Algorithms

Safety-Gymnasium

Training agents used to generate data

Collect offline data

Run the Benchmark

Example benchmark

SafetyPointCircle1-v0($\beta = 0.25$)

SafetyPointCircle1-v0($\beta = 0.50$)

SafetyPointCircle1-v0($\beta = 0.75$)

SafetyCarCircle1-v0($\beta = 0.25$)

SafetyCarCircle1-v0($\beta = 0.50$)

SafetyCarCircle1-v0($\beta = 0.75$)

Some Insights

Files

offline

Directory actions

More options

Directory actions

More options

Latest commit

History

offline

Folders and files

parent directory

README.md

README.md

README.md

OmniSafe's Safety-Gymnasium Benchmark for Offline Algorithms

Safety-Gymnasium

Training agents used to generate data

Collect offline data

Run the Benchmark

Example benchmark

SafetyPointCircle1-v0($\beta = 0.25$)

SafetyPointCircle1-v0($\beta = 0.50$)

SafetyPointCircle1-v0($\beta = 0.75$)

SafetyCarCircle1-v0($\beta = 0.25$)

SafetyCarCircle1-v0($\beta = 0.50$)

SafetyCarCircle1-v0($\beta = 0.75$)

Some Insights