Continuous Doubly Constrained Batch Reinforcement Learning

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.

This repository provides the implementation of Continuous Doubly Constrained Batch Reinforcement Learning. If you use this code please cite the paper using the following bibtex:

@inproceedings{fakoor2021continuous,
title={Continuous Doubly Constrained Batch Reinforcement Learning},
author={Rasool Fakoor and Jonas Mueller and Kavosh Asadi and Pratik Chaudhari and Alexander J. Smola},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
}

Getting Started

python main.py --env_name halfcheetah-random-v0

'env_name' can be antmaze-umaze-v0, antmaze-umaze-diverse-v0, antmaze-medium-play-v0, antmaze-medium-diverse-v0,antmaze-large-play-v0,antmaze-large-diverse-v0, pen-human-v0,hammer-human-v0, door-human-v0,relocate-human-v0, pen-cloned-v0,hammer-cloned-v0, halfcheetah-random-v0,hopper-random-v0,walker2d-random-v0, door-cloned-v0, relocate-cloned-v0, kitchen-complete-v0, kitchen-partial-v0,kitchen-mixed-v0, halfcheetah-medium-v0, walker2d-medium-v0,hopper-medium-v0, halfcheetah-expert-v0, hopper-expert-v0, walker2d-expert-v0, halfcheetah-medium-expert-v0, walker2d-medium-expert-v0, hopper-medium-expert-v0, halfcheetah-medium-replay-v0, walker2d-medium-replay-v0, and hopper-medium-replay-v0.

The code works on both GPU and CPU machines. Most of the hyperparameters are included in the main.py and ./misc/params_info.py. Also you can refer to the paper appendix for a complete list of hyperparameters.

In order to run this code, you will need to install pytorch, gym, D4RL ,and MuJoCo. Please refer to Table S4 for more information.

New Environments

In order to run code with a new environment, you will need to add eta_coef and lambda_coef in ./misc/params_info.py for a new environment, update other hyperparameters in the main.py, and update misc/loader_batch.py.

Important Note

D4RL has been changed since publication of this paper. Thus, this code likely leads to different results from the published results in the cdc paper.

License

This project is licensed under the Apache-2.0 License.

Contact

Please open an issue on issues tracker to report problems or to ask questions or send an email to me, Rasool Fakoor.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
algs/CDC		algs/CDC
misc		misc
models		models
oailibs		oailibs
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
THIRD-PARTY		THIRD-PARTY
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous Doubly Constrained Batch Reinforcement Learning

Getting Started

New Environments

Important Note

License

Contact

About

Releases

Packages

Contributors 2

Languages

License

amazon-science/cdc-batch-rl

Folders and files

Latest commit

History

Repository files navigation

Continuous Doubly Constrained Batch Reinforcement Learning

Getting Started

New Environments

Important Note

License

Contact

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages