This project is the code for the CAWR (Corruption-averse Advantage-Weighted Regression) algorithm. It provides diverse policy loss functions (i.e. L2, L1, Huber, Skew, Flat) and priorities (i.e. None, Normal, Standard, AW, ODPR, Quantile) for policy optimization and prioritized decoupled resampling. Specific definitions are provided in the paper "CAWR: CORRUPTION-AVERSE ADVANTAGE-WEIGHTED REGRESSION FOR ROBUST POLICY OPTIMIZATION".
This project supports D4RL benchmark Mujoco locomotion tasks: Hopper, Walker2D, and HalfCheetah. We provide demo configs for replicating the paper results and for further use. The demos for pretrain (ablation study) are in the folder named pretrain, with the advPER and no_advPER indicating the codes for using prioritized resampling and not using, respectively. Similarly, the demos for direct training (comparison on the D4RL benchmarks) are in the folder named no_pretrain, with the advPER and no_advPER indicating the codes for using prioritized decoupled resampling and not using, respectively.
Notice that this code is modified based on the DI-engine, you can go to https://github.com/opendilab/DI-engine/tree/main for further information.
The project is in Python language, please make sure the following libraries are installed before running:
python==3.9.19
torch==2.5.0
mujoco==3.2.4
mujoco-py==2.1.2.14
D4RL==1.1
DI-engine==0.5.2To replicate the ablation study (using pre-trained advantage function) for CAWR with
python d4rl_pretrain_no_advPER_main.py --seed 10 --config hopper_medium_cawr_L1_config.pyTo replicate the ablation study (using pre-trained advantage function) for CAWR with
python d4rl_pretrain_advPER_main.py --seed 10 --config hopper_medium_cawr_L1_Normal_config.pyTo replicate the comparison experiment (not using pre-trained advantage function) for CAWR with
python d4rl_no_advPER_main.py --seed 10 --config walker2d_medium_cawr_L2_config.pyTo replicate the comparison experiment (not using pre-trained advantage function) for CAWR with
python d4rl_advPER_main.py --seed 10 --config walker2d_medium_cawr_L2_Normal_config.py(The experiments in the paper are conducted using GTX1080Ti, if you use a different GPU, you may not be able to replicate our results precisely because the generated random numbers are different.)
@misc{cawr,
title={CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization},
author={Ranting Hu},
year={2025},
eprint={2506.15654},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.15654},
}CAWR released under the Apache 2.0 license.