CAWR

Introduction

This project is the code for the CAWR (Corruption-averse Advantage-Weighted Regression) algorithm. It provides diverse policy loss functions (i.e. L2, L1, Huber, Skew, Flat) and priorities (i.e. None, Normal, Standard, AW, ODPR, Quantile) for policy optimization and prioritized decoupled resampling. Specific definitions are provided in the paper "CAWR: CORRUPTION-AVERSE ADVANTAGE-WEIGHTED REGRESSION FOR ROBUST POLICY OPTIMIZATION".

This project supports D4RL benchmark Mujoco locomotion tasks: Hopper, Walker2D, and HalfCheetah. We provide demo configs for replicating the paper results and for further use. The demos for pretrain (ablation study) are in the folder named pretrain, with the advPER and no_advPER indicating the codes for using prioritized resampling and not using, respectively. Similarly, the demos for direct training (comparison on the D4RL benchmarks) are in the folder named no_pretrain, with the advPER and no_advPER indicating the codes for using prioritized decoupled resampling and not using, respectively.

Notice that this code is modified based on the DI-engine, you can go to https://github.com/opendilab/DI-engine/tree/main for further information.

Configuration Instructions

The project is in Python language, please make sure the following libraries are installed before running:

python==3.9.19
torch==2.5.0
mujoco==3.2.4
mujoco-py==2.1.2.14
D4RL==1.1
DI-engine==0.5.2

Quick Start

To replicate the ablation study (using pre-trained advantage function) for CAWR with $L_1$ norm, you can go to pretrain/no_advPER/ and run:

python d4rl_pretrain_no_advPER_main.py --seed 10 --config hopper_medium_cawr_L1_config.py

To replicate the ablation study (using pre-trained advantage function) for CAWR with $L_1$ norm and Normal PER, you can go to pretrain/advPER/ and run:

python d4rl_pretrain_advPER_main.py --seed 10 --config hopper_medium_cawr_L1_Normal_config.py

To replicate the comparison experiment (not using pre-trained advantage function) for CAWR with $L_2$ norm evaluated on Walker2d-m-v2, you can go to no_pretrain/no_advPER/ and run:

python d4rl_no_advPER_main.py --seed 10 --config walker2d_medium_cawr_L2_config.py

To replicate the comparison experiment (not using pre-trained advantage function) for CAWR with $L_2$ norm and Normal PER evaluated on Walker2d-m-v2, you can go to no_pretrain/advPER/ and run:

python d4rl_advPER_main.py --seed 10 --config walker2d_medium_cawr_L2_Normal_config.py

(The experiments in the paper are conducted using GTX1080Ti, if you use a different GPU, you may not be able to replicate our results precisely because the generated random numbers are different.)

Citation

@misc{cawr,
      title={CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization},
      author={Ranting Hu},
      year={2025},
      eprint={2506.15654},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.15654},
}

License

CAWR released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
basic		basic
no_pretrain		no_pretrain
pretrain		pretrain
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAWR

Introduction

Configuration Instructions

Quick Start

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CAWR

Introduction

Configuration Instructions

Quick Start

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages