This is the implementation for our paper Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning in Jax and Flax.
- paper link: https://arxiv.org/abs/2303.05479
- project page: https://nakamotoo.github.io/projects/Cal-QL/
- video: https://youtu.be/r9CCdLeMJTg
This codebase is built upon JaxCQL repository.
If you find this repository useful for your research, please cite:
@article{nakamoto2023calql,
author = {Mitsuhiko Nakamoto and Yuexiang Zhai and Anikait Singh and Max Sobol Mark and Yi Ma and Chelsea Finn and Aviral Kumar and Sergey Levine},
title = {Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning},
conference = {arXiv Pre-print},
year = {2023},
url = {https://arxiv.org/abs/2303.05479},
}
- Install MuJoCo
- Download MuJoCo key and MuJoCo 2.1 binaries
- Extract the downloaded
mujoco210
andmjkey.txt
into~/.mujoco/mujoco210
and~/.mujoco/mjkey.txt
- Add following environment variables into
~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
- Install and use the included Ananconda environment
$ conda create -c nvidia -n Cal-QL python=3.8 cuda-nvcc=11.3
$ conda activate Cal-QL
$ pip install -r requirements.txt
- Set up W&B API keys
This codebase visualizes the logs using Weights and Biases. To enable this, you first need to set up your W&B API key by:
- Make a file named
wandb_config.py
underJaxCQL
folder with the following information filled in
def get_wandb_config():
return dict (
WANDB_API_KEY = 'your api key',
WANDB_EMAIL = 'your email',
WANDB_USERNAME = 'user'
)
You can simply copy JaxCQL/wandb_config_example.py, rename it to wandb_config.py
and fill in the information.
You can run experiments using the following command:
$ bash scripts/run_antmaze.sh
Please check scripts/run_antmaze.sh for the details. All available command options can be seen in conservative_sac_main.py and conservative_sac.py.
- Download the offline dataset from here and unzip the files into
<this repositroy>/demonstrations/offpolicy_hand_data/*.npy
- We should also install
mj_envs
from this fork
$ git clone --recursive https://github.com/nakamotoo/mj_envs.git
$ cd mj_envs
$ git submodule update --remote
$ pip install -e .
- Now you can run experiments using the following command:
$ bash scripts/run_adroit.sh
Please check scripts/run_adroit.sh for the details.
At the moment, this repository only has AntMaze and Adroit implemented. FrankaKitchen is planned to be added soon, but if you are in a hurry or would like to try other tasks (such as the visual manipulation domain in the paper), please contact me at nakamoto[at]berkeley[dot]edu.
In order to enable other readers to replicate our results easily, we have conducted a sweep for Cal-QL and CQL in the AntMaze and Adroit domains and made the corresponding W&B logs publicly accessible. The logs can be found here: https://wandb.ai/mitsuhiko/Cal-QL--Examples?workspace=user-mitsuhiko
You can choose the environment to visualize by filering on env
. Cal-QL runs are indicated by enable-calql=True
, and CQL runs are denoted by enable-calql=False
. Each env has been run across 4 seeds.
This project is built upon Young Geng's JaxCQL repository. The CQL implementation is based on CQL.
In case of any questions, bugs, suggestions or improvements, please feel free to contact me at nakamoto[at]berkeley[dot]edu