Minho Park*, Kinam Kim*, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, and Jaegul Choo
DAVIAN Robotics, KAIST AI
arXiv 2025. (* indicates equal contribution)
Action Coherence Guidance (ACG) is a training-free, test-time guidance algorithm that improves temporal and spatial action consistency in Vision-Language-Action (VLA) models. It mitigates motion jitter, unintended pauses, and trajectory drift caused by noisy demonstrations, resulting in stable and precise robotic manipulation.
Vanilla_GR00T-N1_vs_Ours.mp4
- Training-Free Guidance: Enhances action coherence during inference without retraining or fine-tuning.
- Plug-and-Play Integration: Seamlessly compatible with existing diffusion and flow-matching VLA policies across multiple benchmarks.
- Proven Performance: Demonstrates consistent improvements in success rate on RoboCasa, DexMimicGen, and real-world SO-101 manipulation tasks.
# Create conda environment
conda create -n acg python=3.10 -y
conda activate acg
# Install PyTorch (adjust CUDA version if needed)
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
# Install core dependencies
pip install -e libs/Isaac-GR00T-N1
pip install --no-build-isolation flash-attn==2.8.3
# Robosuite and RoboCasa
pip install libs/robosuite/
pip install -e libs/robocasa/
# Download RoboCasa assets (~5GB)
python libs/robocasa/robocasa/scripts/download_kitchen_assets.py
# Robomimic and DexMimicGen
pip install -e libs/robomimic/ --no-dependencies
pip install -e libs/dexmimicgen/ --no-dependencies
# Remaining requirements
pip install -r requirements.txtTo the best of our knowledge, this is the first public repository offering post-trained GR00T-N1-2B models and rollout scripts on both RoboCasa and DexMimicGen benchmarks.
- RoboCasa: Reproduced score (32.6) closely matches the reported result (32.1).
- DexMimicGen: Reproduced score (40.6) is lower than the reported (58.5); the cause is under investigation.
⚠️ This is an unofficial reproduction. Contributions and issue reports are highly welcome.
-
Self-contained Finetuning & Inference for GR00T-N1-2B.
-
Training script:
gr00t_finetune_robocasa.py- Utilizes a modified Robomimic DataLoader to finetune GR00T-N1-2B on RoboCasa and DexMimicGen datasets.
- Supports multiple embodiments — SinglePandaGripper for RoboCasa, and BimanualPandaGripper, BimanualPandaHand, GR1 for DexMimicGen.
- Note: the official GR00T-N1-2B post-training code only supports humanoid (GR1) embodiments.
-
Rollout script:
rollout_with_robomimic.py- Adapted from the Robomimic rollout framework.
- You can directly perform rollouts using our finetuned models from Hugging Face.
We provide RoboCasa examples below.
For DexMimicGen and more detailed scripts, see ./scripts/run_gr00t.md.
Before starting, make sure to download each dataset from its official repository: RoboCasa, DexMimicGen.
Then, update the dataset paths in the Robomimic config files accordingly.
n_mg="100"
ngpu="1"
bs="64"
ga="2"
steps="60000"
training_seed="42"
exp_name="MG${n_mg}/LR=1e-4_Bs=${ngpu}x${bs}x${ga}_Steps=${steps}_Seed=${training_seed}${note}"
export WANDB_ENTITY="your-entity"
export WANDB_PROJECT="Your Robot Project"
python libs/Isaac-GR00T-N1/scripts/gr00t_finetune_robocasa.py \
--num-gpus ${ngpu} \
--output-dir checkpoints/robocasa/${exp_name} \
--data-configs robocasa_single_panda_gripper \
--video-backend decord \
--embodiment_tag single_panda_gripper \
--exp_name ${exp_name} \
--batch_size ${bs} \
--robomimic_config_json libs/Isaac-GR00T-N1/robomimic_configs/robocasa_mg${n_mg}.json \
--gradient_accumulation_steps ${ga} \
--no-save-only-model \
--dataloader_num_workers 16 \
--pin_memory \
--max-steps ${steps} \
--save_steps 1000 \
--save_total_limit 3 \
--training_seed ${training_seed}Without ACG:
note=""
n_rollouts="24"
num_batch_envs="8"
export MAX_NUM_EMBODIMENTS="32"
dataset_name="robocasa_mg100"
config_path="libs/Isaac-GR00T-N1/robomimic_configs/${dataset_name}.json"
model_path="DAVIAN-Robotics/GR00T-N1-2B-tuned-RoboCasa-MG100-FrankaPandaGripper"
seed="123"
bash scripts/base_rollout.sh ${config_path} ${model_path} ${seed} ${n_rollouts} ${num_batch_envs} "${note}"With ACG enabled:
note=""
n_rollouts="24"
num_batch_envs="8"
export MAX_NUM_EMBODIMENTS="32"
dataset_name="robocasa_mg100"
config_path="libs/Isaac-GR00T-N1/robomimic_configs/${dataset_name}.json"
model_path="DAVIAN-Robotics/GR00T-N1-2B-tuned-RoboCasa-MG100-FrankaPandaGripper"
acg_options="
algo.guidance.name=acg
algo.guidance.scale=3.0
algo.guidance.skip_blocks=7,9,11
" # Corresponding to 4th–6th self-attention layers.
note="${note}_acg"
seed="123"
bash scripts/base_rollout.sh ${config_path} ${model_path} ${seed} ${n_rollouts} ${num_batch_envs} "${note}" ${acg_options}@article{park2025acg,
title={ACG: Action Coherence Guidance for Flow-based VLA Models},
author={Park, Minho and Kim, Kinam and Hyung, Junha and Jang, Hyojin and Jin, Hoiyeong and Yun, Jooyeol and Lee, Hojoon and Choo, Jaegul},
journal={arXiv preprint arXiv:2510.22201},
year={2025}
}This repository builds upon the incredible open-source efforts of
Isaac-GR00T,
Robosuite,
Robomimic,
RoboCasa,
DexMimicGen, and
Diffusers.
We sincerely appreciate their outstanding contributions to the robotics and AI community.