[NeurIPS 2025] Human-assisted Robotic Policy Refinement via Action Preference Optimization

Introduction

In this work, we propose an action policy optimization (APO) method to correct interaction failure and achieve stable optimization for Vision-Language-Action (VLA) models via human-assisted preference alignment gathered through interaction with environment.

Installation

To install the dependencies for training, run the following command:

pip install -r requirements.txt

To install the dependencies for inference, please install the following packages:

MimicGen Installation

mkdir deps
cd ${project_path}/deps
git clone https://github.com/NVlabs/mimicgen.git
cd mimicgen
pip install -e .

Robosuite Installation

cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robosuite.git
cd robosuite
git checkout b9d8d3de5e3dfd1724f4a0e6555246c460407daa
pip install -e .

Robomimic Installation

cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robomimic.git
cd robomimic
git checkout d0b37cf214bd24fb590d182edb6384333f67b661
pip install -e .

Robosuite_task_zoo Installation

cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robosuite-task-zoo
cd robosuite-task-zoo
git checkout 74eab7f88214c21ca1ae8617c2b2f8d19718a9ed
pip install -e .

Flash-Attn Installation

pip install packaging ninja
ninja --version; echo $?
pip install "flash-attn==2.5.5" --no-build-isolation

Training

To train the APO model, run the following command:

bash scripts/apo_train.sh ${task_name}

In this work, we evaluate the performance of the APO model on the MimicGen dataset.

Inference

To evaluate the performance of the APO model, run the following command:

bash scripts/inference.sh ${task_name} ${adapter_path}

Citation

@article{xia2025robotic,
  title={Robotic Policy Learning via Human-assisted Action Preference Optimization},
  author={Xia, Wenke and Yang, Yichu and Wu, Hongtao and Ma, Xiao and Kong, Tao and Hu, Di},
  journal={arXiv preprint arXiv:2506.07127},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
accelerate_config		accelerate_config
config		config
dataset		dataset
experiments/robot		experiments/robot
prismatic		prismatic
scripts		scripts
tokenizer		tokenizer
trainer		trainer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pipeline.png		pipeline.png
requirements.txt		requirements.txt
sft_train.py		sft_train.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[NeurIPS 2025] Human-assisted Robotic Policy Refinement via Action Preference Optimization

Introduction

Installation

MimicGen Installation

Robosuite Installation

Robomimic Installation

Robosuite_task_zoo Installation

Flash-Attn Installation

Training

Inference

Citation

About

Uh oh!

Releases

Packages

Languages

License

GeWu-Lab/Action-Preference-Optimization

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2025] Human-assisted Robotic Policy Refinement via Action Preference Optimization

Introduction

Installation

MimicGen Installation

Robosuite Installation

Robomimic Installation

Robosuite_task_zoo Installation

Flash-Attn Installation

Training

Inference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages