Skip to content

SpatialRetrievalAD/Generative-World-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial Retrieval Augmented Autonomous Driving

Task: Generative World Model

arXiv Project Page

📖 Introduction

This repository contains the implementation of the Generative World Model task from our paper: "Spatial Retrieval Augmented Autonomous Driving".

We introduce a novel Spatial Retrieval Paradigm that retrieves offline geographic images (Satellite/Streetview) based on GPS coordinates to enhance autonomous driving tasks. For Detection, we design a plug-and-play Spatial Retrieval Adapter and a Reliability Estimation Gate to robustly fuse this external knowledge into model representations, followed retrieval injection mode of Bench2Drive-R.

We provides the implementation based on Unimlvg and MagicDriveDiT , finetuned on official checkpoint. For MagicDriveDiT, please check branch magicdrivedit

🚀 News

  • [2025-12-09] Code and checkpoints for Generative World Model (Unimlvg & MagicdriveDiT) are released.

📊 Main Results

Unimlvg

Method Modality FVD FID Config Download
Unimlvg C 36.11 5.82 - -
Unimlvg + Geo C + Geo 29.97 5.60 config model

C: Camera, Geo: Geographic Images.

🧩 Code Overview

  • The adapter is added at lines 618–661 in src/dwm/models/crossview_temporal_dit.py, and the blocks defined in lines 641–750 of src/dwm/models/crossview_temporal.py.
  • Training/Sampling pipeline is in src/dwm/pipelines/ctsd.py and no other features are modified.
  • All experimental configurations are located in configs/ggearth/geo_train.json and configs/ggearth/geo_test.json.

📦 Installation

Please follow the official installation instructions to configure the environment:

  • See Unimlvg: README_intro_zh.md

📂 Data Preparation

Step 1: Prepare Base Dataset (Following Unimlvg Workflow)

Please refer to the official dataset configuration instructions to modify the dataset settings.

Optionally, using src/dwm/tools/cache.py to cache HDMap/3DBBox conditions on storage for boosting training.

Step 2: Generate Geographic Data (nuScenes-Geography-Data)

Configure geographic data tools following the readme in: SpatialRetrievalAD-Dataset-Devkit project, prepare both the nuScenes-Geography dataset and its devkit

After install geographic data tools, configure paths and img settings such as resolution (align with nuscenes input size) in geoext_gen.py and run it for streetsat data cache.

Optionally, Download from geo_pkl for geo pkl.

Finally, define the paths of pkls and datasets in the config files, and prepare the required official checkpoints.

🚄 Training & Evaluation

Train with 8 GPUs

scripts/geo_train.sh

Eval with 8 GPUs (After modifing ckpt paths in cfg)

scripts/8cardtest.sh

🖊️ Citation

@misc{spad,
      title={Spatial Retrieval Augmented Autonomous Driving}, 
      author={Xiaosong Jia and Chenhe Zhang and Yule Jiang and Songbur Wong and Zhiyuan Zhang and Chen Chen and Shaofeng Zhang and Xuanhe Zhou and Xue Yang and Junchi Yan and Yu-Gang Jiang},
      year={2025},
      eprint={2512.06865},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.06865}, 
}

🙏 Acknowledgements

Thanks for the opensource effort of UniMLVG.

About

Generative World Model Unimlvg and MagicDriveDit with Spatial Retrieval Enchanced

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published