Skip to content

dyth/doublegum

Repository files navigation

DoubleGum

Code for Double Gumbel Q-Learning

Data (5.4 MB): https://drive.google.com/file/d/12wyYZ92bvVdkEQIHms8mVR5zYJZue-cd/view?usp=sharing

Logs (4.21 GB): https://drive.google.com/file/d/1LpR3lrKUx-qTaCrI4YViAjc0QA5kb8P2/view?usp=sharing

Installation

On Python 3.9 with Cuda 12.2.1 and cudnn 8.8.0.

git clone git@github.com:dyth/doublegum.git
cd doublegum

create virtualenv

virtualenv <VIRTUALENV_LOCATION>/doublegum
source <VIRTUALENV_LOCATION>/doublegum

or conda

conda create --name doublegum python=3.9
conda activate doublegum

install mujoco

mkdir .mujoco
cd .mujoco
wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
tar -xf mujoco210-linux-x86_64.tar.gz

install packages

pip install -r requirements.txt
pip install "jax[cuda12_pip]==0.4.14" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

test that the code runs

./test.sh

Continuous Control

main_cont.py --env <ENV_NAME> --policy <POLICY>

MetaWorld envs are run with --env MetaWorld_<ENVNAME>

Policies benchmarked in our paper were:

Policies we created/modified as additional benchmarks were:

  • QR-DDPG: QR-DDPG (Quantile Regression [Dabney et al., 2018] with DDPG, defaults to use Twin Critics)
  • QR-DDPG --ensemble 1: QR-DDPG without Twin Critics
  • SAC --ensemble 1: SAC without Twin Critics
  • XQL: XQL with Twin Critics
  • TD3 --ensemble 5 --pessimism <p>: Finer TD3, where p is an integer between 0 and 4

Policies included in this repository but not benchmarked in our paper were:

Discrete Control

main_disc.py --env <ENV_NAME> --policy <POLICY>

Policies benchmarked in our paper were:

Policies we created/modified as additional benchmarks were:

  • DuellingDDQN: DuellingDDQN (Duelling Double DQN)

Graphs and Tables

Reproduced using raw data from Data and Logs. Logs (4.21 GB) contains data for Section 4 (Figures 1 and 2) and Appendix E.2 (Figures 6 and 7), while Data (5.4 MB) contains benchmark results for DoubleGum and baselines used in all other graphs, results and tables.

Ran by

python plotting/fig<x>.py
python tables/tab<x>.py

Acknowledgements