This repository contains the implementation of the Generative World Model task from our paper: "Spatial Retrieval Augmented Autonomous Driving".
We introduce a novel Spatial Retrieval Paradigm that retrieves offline geographic images (Satellite/Streetview) based on GPS coordinates to enhance autonomous driving tasks. For Detection, we design a plug-and-play Spatial Retrieval Adapter and a Reliability Estimation Gate to robustly fuse this external knowledge into model representations, followed retrieval injection mode of Bench2Drive-R.
We provides the implementation based on Unimlvg and MagicDriveDiT , finetuned on official checkpoint. For MagicDriveDiT, please check branch magicdrivedit
- [2025-12-09] Code and checkpoints for Generative World Model (Unimlvg & MagicdriveDiT) are released.
| Method | Modality | FVD | FID | Config | Download |
|---|---|---|---|---|---|
| Unimlvg | C | 36.11 | 5.82 | - | - |
| Unimlvg + Geo | C + Geo | 29.97 | 5.60 | config | model |
C: Camera, Geo: Geographic Images.
- The adapter is added at lines 618–661 in
src/dwm/models/crossview_temporal_dit.py, and the blocks defined in lines 641–750 ofsrc/dwm/models/crossview_temporal.py. - Training/Sampling pipeline is in
src/dwm/pipelines/ctsd.pyand no other features are modified. - All experimental configurations are located in
configs/ggearth/geo_train.jsonandconfigs/ggearth/geo_test.json.
- See Unimlvg:
README_intro_zh.md
Please refer to the official dataset configuration instructions to modify the dataset settings.
Optionally, using src/dwm/tools/cache.py to cache HDMap/3DBBox conditions on storage for boosting training.
Configure geographic data tools following the readme in: SpatialRetrievalAD-Dataset-Devkit project, prepare both the nuScenes-Geography dataset and its devkit
After install geographic data tools, configure paths and img settings such as resolution (align with nuscenes input size) in geoext_gen.py and run it for streetsat data cache.
Optionally, Download from geo_pkl for geo pkl.
Finally, define the paths of pkls and datasets in the config files, and prepare the required official checkpoints.
Train with 8 GPUs
scripts/geo_train.sh
Eval with 8 GPUs (After modifing ckpt paths in cfg)
scripts/8cardtest.sh
@misc{spad,
title={Spatial Retrieval Augmented Autonomous Driving},
author={Xiaosong Jia and Chenhe Zhang and Yule Jiang and Songbur Wong and Zhiyuan Zhang and Chen Chen and Shaofeng Zhang and Xuanhe Zhou and Xue Yang and Junchi Yan and Yu-Gang Jiang},
year={2025},
eprint={2512.06865},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.06865},
}
Thanks for the opensource effort of UniMLVG.
