Skip to content

Code for Continuous Doubly Constrained Batch Reinforcement Learning, NeurIPS 2021.

License

Notifications You must be signed in to change notification settings

amazon-science/cdc-batch-rl

Continuous Doubly Constrained Batch Reinforcement Learning

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.

This repository provides the implementation of Continuous Doubly Constrained Batch Reinforcement Learning. If you use this code please cite the paper using the following bibtex:

@inproceedings{fakoor2021continuous,
title={Continuous Doubly Constrained Batch Reinforcement Learning},
author={Rasool Fakoor and Jonas Mueller and Kavosh Asadi and Pratik Chaudhari and Alexander J. Smola},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
}

Getting Started

python main.py --env_name halfcheetah-random-v0

'env_name' can be antmaze-umaze-v0, antmaze-umaze-diverse-v0, antmaze-medium-play-v0, antmaze-medium-diverse-v0,antmaze-large-play-v0,antmaze-large-diverse-v0, pen-human-v0,hammer-human-v0, door-human-v0,relocate-human-v0, pen-cloned-v0,hammer-cloned-v0, halfcheetah-random-v0,hopper-random-v0,walker2d-random-v0, door-cloned-v0, relocate-cloned-v0, kitchen-complete-v0, kitchen-partial-v0,kitchen-mixed-v0, halfcheetah-medium-v0, walker2d-medium-v0,hopper-medium-v0, halfcheetah-expert-v0, hopper-expert-v0, walker2d-expert-v0, halfcheetah-medium-expert-v0, walker2d-medium-expert-v0, hopper-medium-expert-v0, halfcheetah-medium-replay-v0, walker2d-medium-replay-v0, and hopper-medium-replay-v0.

The code works on both GPU and CPU machines. Most of the hyperparameters are included in the main.py and ./misc/params_info.py. Also you can refer to the paper appendix for a complete list of hyperparameters.

In order to run this code, you will need to install pytorch, gym, D4RL ,and MuJoCo. Please refer to Table S4 for more information.

New Environments

In order to run code with a new environment, you will need to add eta_coef and lambda_coef in ./misc/params_info.py for a new environment, update other hyperparameters in the main.py, and update misc/loader_batch.py.

Important Note

D4RL has been changed since publication of this paper. Thus, this code likely leads to different results from the published results in the cdc paper.

License

This project is licensed under the Apache-2.0 License.

Contact

Please open an issue on issues tracker to report problems or to ask questions or send an email to me, Rasool Fakoor.

About

Code for Continuous Doubly Constrained Batch Reinforcement Learning, NeurIPS 2021.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages