EDGE-OF-REACH is the official implementation of RAVL ("Reach-Aware Value Estimation") from the paper:
The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning; Anya Sims, Cong Lu, Yee Whye Teh, 2024 [ArXiv]
It includes:
- offline dynamics model training,
- offline model-based agent training using RAVL.
RAVL implementation has Weights and Biases integration, and is heavily inspired
by CORL for model-free offline RL - check them out too!
Setup | Running experiments | Citation
To start, clone the repository and install requirements with:
# Clone repository
git clone https://github.com/anyasims/edge-of-reach.git && cd edge-of-reach
# Install requirements in virtual environment "ravl"
python3 -m venv ravl
source ravl/bin/activate
pip install -r requirements.txt
Main requirements:
pytorch
gym
(MuJoCo RL environments*)d4rl
(offline RL datasets)wandb
(logging)
The code was tested with Python 3.8. *If you don't have MuJoCo installed, follow the instructions here: https://github.com/openai/mujoco-py#install-mujoco.
Training (offline model-based RL) includes:
- first training a dynamics model, and then
- training an agent (RAVL) in the dynamcis model.
Example:
python3 train_dynamics_model.py \
--env_name halfcheetah-medium-v2 \
--seed 0 \
--save_path <folder_for_saving_trained_dynamics_models>
Hyperparameters are: Q-ensemble size num_critics
, rollout length steps_k
, ratio of original to synthetic
data dataset_ratio
, and coefficient for EDAC regularizer eta
.
python3 train_ravl_agent.py \
--env_name halfcheetah-medium-v2 \
--num_critics 10 \
--steps_k 5 \
--dataset_ratio 0.05 \
--eta 1.0 \
--seed 0 \
--load_model_dir <path_to_trained_dynamics_model>
If you use this implementation in your work, please cite us with the following:
@misc{sims2024edgeofreach,
title={The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning},
author={Anya Sims and Cong Lu and Yee Whye Teh},
year={2024},
eprint={2402.12527},
archivePrefix={arXiv},
primaryClass={cs.LG}
}