Skip to content

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline RL

License

Notifications You must be signed in to change notification settings

Baichenjia/UTDS

Repository files navigation

Pessimistic Value Iteration for Multi-Task Data Sharing

This repo contains a PyTorch implementation and the datasets for our paper titled "Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning" published at Artificial Intelligence Journal. This is the paper Link.

Datasets

We collect a Multi-Task Offline Dataset based on DeepMind Control Suite (DMC).

  • Download the Dataset to ./collect before you start training.
  • The users can collect new datasets based on collect_daeta.py. The supported tasks include standard tasks from DMC and custom tasks from ./custom_dmc_tasks/

Our dataset contains 3 domains with 4 tasks per domain, resulting in 12 tasks in total.

Domain Available task names
Walker walker_stand, walker_walk, walker_run, walker_flip
Quadruped quadruped_jump, quadruped_roll_fast
Jaco Arm jaco_reach_top_left, jaco_reach_top_right, jaco_reach_bottom_left, jaco_reach_bottom_right

alt tasks

For each task, we run TD3 to collect five types of datasets, including:

  • random data generated by a random agent.
  • medium data generated by a medium-level TD3 agent.
  • medium-replay data that collects all experiences in training a medium-level TD3 agent.
  • medium-expert data that collects all experiences in training an expert-level TD3 agent.
  • expert data generated by an expert-level TD3 agent.

Prerequisites

Install MuJoCo:

  • Download MuJoCo binaries here.
  • Unzip the downloaded archive into ~/.mujoco/.
  • Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate utds

Algorithms

We provide several algorithms to train the single-agent and multi-task data-sharing agent.

  • For single-agent training, we provide the following algorithms.
Algorithm Name Paper
Behavior Cloning bc paper
CQL cql paper
TD3-BC td3_bc paper
CRR crr paper
PBRL ddpg paper
  • For multi-task data sharing, we support the following algorithms.
Algorithm Name Paper
Direct Sharing cql paper
CDS cql_cds paper
Unlabeled-CDS cql_cdsz paper
UTDS pbrl our paper

Training

Train CDS

Train the CDS agent in quadruped_jump (random) task with data sharing from quadruped_roll_fast (replay) dataset, run

python train_offline_cds.py task=quadruped_jump "+share_task=[quadruped_jump, quadruped_roll_fast]" "+data_type=[random, replay]" 

Train UTDS

Train the CDS agent in quadruped_jump (random) task with data sharing from quadruped_roll_fast (replay) dataset, run

python train_offline_share.py task=quadruped_jump "+share_task=[quadruped_jump, quadruped_roll_fast]" "+data_type=[random, replay]" 

We support wandb by setting wandb: True in config*.yaml file.

Citation

@article{UTDS2023,
title = {Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning},
journal = {Artificial Intelligence},
author = {Chenjia Bai and Lingxiao Wang and Jianye Hao and Zhuoran Yang and Bin Zhao and Zhen Wang and Xuelong Li},
pages = {104048},
year = {2023},
issn = {0004-3702},
doi = {https://doi.org/10.1016/j.artint.2023.104048},
url = {https://www.sciencedirect.com/science/article/pii/S0004370223001947},
}

License

MIT license

About

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline RL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages