SEABO: A Simple Search-Based Method for Offline Imitation Learning (ICLR 2024)

SEABO is a simple yet effective method for assigning rewards to unlabeled offline datasets. Our key idea is that if a transition lie close to the expert trajectory, it ought to have a larger reward and vice versa. The illustration of our method can be found below.

How to run

For reproducing our reported results in the paper, please check the following instructions.

For IQL+SEABO, on MuJoCo tasks, run

CUDA_VISIBLE_DEVICES=0 nohup python main_iql.py --env halfcheetah-medium-expert-v2 --seed 5 > out.log 2>&1 &

For IQL+SEABO, on AntMaze tasks, run

CUDA_VISIBLE_DEVICES=0 nohup python main_iql.py --env antmaze-medium-diverse-v0 --seed 5 --beta 5.0 --temperature 10.0 --expectile 0.9 > out.log 2>&1 &

For IQL+SEABO, on Adroit tasks, run

CUDA_VISIBLE_DEVICES=0 nohup python main_iql.py --env pen-human-v0 --seed 1 --no_action_dim --temperature 0.5 --dropout_rate 0.1 > out.log 2>&1 &

For TD3_BC+SEABO, on MuJoCo tasks, run

CUDA_VISIBLE_DEVICES=0 nohup python main.py --env halfcheetah-medium-expert-v2 --seed 1 --dir logs/halfcheetah-medium-expert-v2/ > out.log 2>&1 &

Key flags

python main.py \
       --k 1 \
       # how many neighbors are used
       --beta 0.5 \
       # the weighting coefficient
       --scale 1 \
       # the reward scale
       --mode 'sas' \
       # what to query, contain actions or not
       --no_action_dim \
       # whether to include action dimension in the reward calculation

Citation

If you use our method or code in your research, please consider citing the paper as follows:

@inproceedings{lyu2024seaboasimple,
 title={SEABO: A Simple Search-Based Method for Offline Imitation Learning},
 author={Jiafei Lyu and Xiaoteng Ma and Le Wan and Runze Liu and Xiu Li and and Zongqing Lu},
 booktitle={International Conference on Learning Representations},
 year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
IQL		IQL
TD3_BC		TD3_BC
README.md		README.md
seabo.png		seabo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IQL

IQL

TD3_BC

TD3_BC

README.md

README.md

seabo.png

seabo.png

Repository files navigation

SEABO: A Simple Search-Based Method for Offline Imitation Learning (ICLR 2024)

How to run

Key flags

Citation

About

Releases

Packages

Languages

dmksjfl/SEABO

Folders and files

Latest commit

History

Repository files navigation

SEABO: A Simple Search-Based Method for Offline Imitation Learning (ICLR 2024)

How to run

Key flags

Citation

About

Resources

Stars

Watchers

Forks

Languages