Qiqi Liu1,2,3*, Huan Xu3*, Jingyu Li1,2,3, Bin Sun3β , Zhihui Hao3β , Dangen She3, Xiatian Zhu4, Li Zhang1,2β‘
1Fudan University; 2Shanghai Innovation Institute; 3Li Auto Inc.; 4University of Surrey
* equal contribution; β project leader; β‘ corresponding author
- π Interleaved world modeling and planning: alternates future frame prediction and ego action/trajectory generation step-by-step, forming a closed-loop interaction that keeps planning conditioned on imagined observations.
- π€ Unified autoregressive VLA formulation: generates visual tokens and action queries in a single sequence, tightly coupling prediction and control under temporal causality.
- πΉ Depth integration for geometric cues: augments historical frames with monocular depth maps and fuses geometry features via cross-attention to improve long-horizon scene prediction.
Table 1. Closed-loop planning results on NAVSIM.
Table 2. World modeling / prediction results on NAVSIM.
Visualization.
- Release arXiv paper
- Release code
- Release model weights
@article{liu2026uniworld,
title = {Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving},
author = {Liu, Qiqi and Xu, Huan and Li, Jingyu and Sun, Bin and Hao, Zhihui and She, Dangen and Zhu, Xiatian and Zhang, Li},
journal = {arXiv preprint arXiv:2603.27287},
year = {2026},
}


