DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving

Paper | Project Page

DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving
Chen Shi*, Jinrui Xu*, Shaoshuai Shi, Kehua Sheng, Bo Zhang, Li Jiang†
The Chinese University of Hong Kong, Shenzhen & Voyager Research, Didi Chuxing
*Equal Contribution, †Corresponding Author

Highlights

Unified Video-Action Policy: Adapts a pretrained video diffusion transformer (Wan2.2-TI2V-5B) into an end-to-end driving policy via joint flow-matching over video and action tokens.
Scene-Evolving Driving Guidance: A frozen VLM (Qwen3-VL-8B) generates chunk-specific semantic intent injected via temporally localized cross-attention.
Selective KV Memory: Training-free modality-aware cache selection achieves 12x memory reduction for 300s rollouts with minimal accuracy loss.
Strong Performance: 90.1 PDMS on NAVSIM v1 (single front-view camera, simple regression head) and 0.83m ADE@4s on PhysicalAI-Autonomous-Vehicles benchmark.

News

[2025/27] Code will be released soon. Stay tuned!

TODO

Release inference code and pretrained checkpoints
Release data preparation scripts
Release training code

Citation

If you find this work useful, please consider citing:

@article{shi2025drivewam,
  title   = {DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving},
  author  = {Shi, Chen and Xu, Jinrui and Shi, Shaoshuai and Sheng, Kehua and Zhang, Bo and Jiang, Li},
  journal = {arXiv preprint arXiv:2605.28544},
  year    = {2025}
}

Acknowledgements

DriveWAM is built upon LinBotVA, Wan2.2, and Qwen3-VL. We thank the authors for their great work.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving

Paper | Project Page

Highlights

News

TODO

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving

Paper | Project Page

Highlights

News

TODO

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages