In this work, we propose an action policy optimization (APO) method to correct interaction failure and achieve stable optimization for Vision-Language-Action (VLA) models via human-assisted preference alignment gathered through interaction with environment.
To install the dependencies for training, run the following command:
pip install -r requirements.txtTo install the dependencies for inference, please install the following packages:
mkdir deps
cd ${project_path}/deps
git clone https://github.com/NVlabs/mimicgen.git
cd mimicgen
pip install -e .
cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robosuite.git
cd robosuite
git checkout b9d8d3de5e3dfd1724f4a0e6555246c460407daa
pip install -e .
cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robomimic.git
cd robomimic
git checkout d0b37cf214bd24fb590d182edb6384333f67b661
pip install -e .
cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robosuite-task-zoo
cd robosuite-task-zoo
git checkout 74eab7f88214c21ca1ae8617c2b2f8d19718a9ed
pip install -e .
pip install packaging ninja
ninja --version; echo $?
pip install "flash-attn==2.5.5" --no-build-isolation
To train the APO model, run the following command:
bash scripts/apo_train.sh ${task_name}In this work, we evaluate the performance of the APO model on the MimicGen dataset.
To evaluate the performance of the APO model, run the following command:
bash scripts/inference.sh ${task_name} ${adapter_path}@article{xia2025robotic,
title={Robotic Policy Learning via Human-assisted Action Preference Optimization},
author={Xia, Wenke and Yang, Yichu and Wu, Hongtao and Ma, Xiao and Kong, Tao and Hu, Di},
journal={arXiv preprint arXiv:2506.07127},
year={2025}
}