Official code repository for "PointAction: 3D Points as Universal Action Representations for Robot Control".
Mutian Tong†,
Han Jiang†,*,
Qiao Feng,
Lingjie Liu,
Jiatao Gu
University of Pennsylvania
†Equal contribution | *Work done during internship at the University of Pennsylvania
PointAction bridges video prediction and robot control through explicit point-based 4D modeling. We fine-tune a foundation video model to jointly generate RGB frames and dynamic 3D pointmaps, then map these embodiment-agnostic point dynamics to executable actions via a lightweight per-arm decoder.
We are actively cleaning up the training and inference codebase. Initial release is expected by early July. ⭐ Star this repo to be notified when the code drops.
In the meantime, please refer to:
If you find this work useful in your research, please consider citing:
@misc{tong2026pointaction3dpointsuniversal,
title={PointAction: 3D Points as Universal Action Representations for Robot Control},
author={Mutian Tong and Han Jiang and Qiao Feng and Lingjie Liu and Jiatao Gu},
year={2026},
eprint={2606.03943},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2606.03943},
}