MagicDrive

✨ Check out our new work MagicDrive3D on 3D scene generation!

✨ If you want video generation, please find the code at the video branch.

Videos generated by MagicDrive (click the image to see the video).

This repository contains the implementation of the paper

MagicDrive: Street View Generation with Diverse 3D Geometry Control
Ruiyuan Gao^1*, Kai Chen^2*, Enze Xie^{3^}, Lanqing Hong³, Zhenguo Li³, Dit-Yan Yeung², Qiang Xu^{1^}
¹CUHK ²HKUST ³Huawei Noah's Ark Lab
^*Equal Contribution ^{^}Corresponding Authors

Abstract

TL; DR MagicDrive generates high-quality street-view images & videos with diverse 3D geometry control and multiview consistency, which can be used as a data engine in various perception tasks.

Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view image & video synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.

News

[2024/12/09] We release 60-frame video generation model on huggingface, please use the code in video branch to run.
[2024/12/09] We release two higher-resolution image generation models (424x800 model for visualization and 272x736 model for BEVFusion) with their training configs.
[2024/06/07] MagicDrive can generate 60-frame videos! We release the config: rawbox_mv2.0t_0.4.3_60.yaml. Check out our demos on the project page.
[2024/06/07] We release model checkpoint for 16-frame video generation. Check it out!
[2024/06/01] We hold the W-CODA workshop @ECCV2024. Challenge track 2 will use MagicDrive as the baseline. We will release more resources in the near future. Stay tuned!

Method

In MagicDrive, we employ two strategies (cross-attention and additive encoder branch) to inject text prompts, camera poses, object boxes, and road maps as conditions for generation. We also propose a cross-view attention module for multiview consistency.

TODO

config and model checkpoint for base resolution (224x400)
demo for base resolution (224x400)
GUI for interactive bbox editing
train and test code release
FID test code
config and checkpoint for high resolution

Getting Started

Environment Setup

Clone this repo with submodules

git clone --recursive https://github.com/cure-lab/MagicDrive.git

The code is tested with Pytorch==1.10.2 and cuda 10.2 on V100 servers. To setup the python environment, follow:

# option1: to run GUI only
pip install -r requirements/gui.txt
# 😍 our GUI does not need mm-series packages.
# continue to install diffusers from `third_party`.

# option2: to run the full testing demo (and also test your env before training)
cd ${ROOT}
pip install -r requirements/dev.txt
# continue to install `third_party`s as following.

We opt to install the source code for the following packages, with cd ${FOLDER}; pip -vvv install .

# install third-party
third_party/
├── bevfusion -> based on db75150
├── diffusers -> based on v0.17.1 (afcca39)
└── xformers  -> based on v0.0.19 (8bf59c9), optional

see note about our xformers. If you have issues with the environment setup, please check FAQ first.

Setup default configuration for accelerate with

accelerate config

Our default log directory is ${ROOT}/magicdrive-log. Please be prepared.

Pretrained Weights

Our training is based on stable-diffusion-v1-5. We assume you put them at ${ROOT}/pretrained/ as follows:

{ROOT}/pretrained/stable-diffusion-v1-5/
├── text_encoder
├── tokenizer
├── unet
├── vae
└── ...

Street-view Generation with MagicDrive

Download our model checkpoint for MagicDrive from

224x400 model: onedrive
272x736 model: huggingface
424x800 model: huggingface

and put them in ${ROOT}/pretrained/

Run our demo

👍 We recommend users run our interactive GUI first, because we have minimized the dependencies for the GUI demo.

cd ${ROOT}
python demo/interactive_gui.py
# a gradio-based gui, use your web browser

As suggested by #37, prompt is configurable through GUI!

Run our demo for camera view generation.

cd ${ROOT}
python demo/run.py resume_from_checkpoint=magicdrive-log/SDv1.5mv-rawbox_2023-09-07_18-39_224x400

The generated images will be located in magicdrive-log/test. More information can be find in demo doc.

Train MagicDrive

Prepare Data

We prepare the nuScenes dataset similar to bevfusion's instructions. Specifically,

Download the nuScenes dataset from the website and put them in ./data/. You should have these files:

data/nuscenes
├── maps
├── mini
├── samples
├── sweeps
├── v1.0-mini
└── v1.0-trainval

Tip

You can download the .pkl files from OneDrive. They should be enough for training and testing.

Generate mmdet3d annotation files by:

python tools/create_data.py nuscenes --root-path ./data/nuscenes \
  --out-dir ./data/nuscenes_mmdet3d_2 --extra-tag nuscenes

You should have these files:

data/nuscenes_mmdet3d_2
├── nuscenes_dbinfos_train.pkl (-> ${bevfusion-version}/nuscenes_dbinfos_train.pkl)
├── nuscenes_gt_database (-> ${bevfusion-version}/nuscenes_gt_database)
├── nuscenes_infos_train.pkl
└── nuscenes_infos_val.pkl

Note: As shown above, some files can be soft-linked with the original version from bevfusion. If some of the files is located in data/nuscenes, you can move them to data/nuscenes_mmdet3d_2 manually.

(Optional) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through tools/prepare_map_aux.py with different configs in configs/dataset. For example:
```
python tools/prepare_map_aux.py +process=train
python tools/prepare_map_aux.py +process=val
```
You will have files like ./val_tmp.h5 and ./train_tmp.h5. You have to rename the cache files correctly after generating them. Our default is
```
data/nuscenes_map_aux
├── train_26x200x200_map_aux_full.h5 (42G)
└── val_26x200x200_map_aux_full.h5 (9G)
```

Train the model

Launch training with (with 8xV100):

accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 8 tools/train.py \
  +exp=224x400 runner=8gpus

During training, you can check tensorboard for the log and intermediate results.

Besides, we provide debug config to test your environment and data loading process (with 2xV100):

accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 2 tools/train.py \
  +exp=224x400 runner=debug runner.validation_before_run=true

Test the model

After training, you can test your model for driving view generation through:

python tools/test.py resume_from_checkpoint=${YOUR MODEL}
# take our the 224x400 model checkpoint as an example
python tools/test.py resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400

Please find the results in ./magicdrive-log/test/.

To test FID

First, you should generate the full validation set with

python perception/data_prepare/val_set_gen.py \
  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
  task_id=224x400 fid.img_gen_dir=./tmp/224x400 +fid=data_gen +exp=224x400
  # for map=zero as the null condition for CFG, add `runner.pipeline_param.use_zero_map_as_unconditional=true`

For this script, multi-process / multi-node is also available by accelerate. Just launch it with commands similar to that of training.

Then, test the FID score with

# we assume your torch cache dir is at "../pretrained/torch_cache/". If you want
# to use the default place, please comment the second last line in "tools/fid_score.py".
python tools/fid_score.py cfg \
  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
  fid.rootb=tmp/224x400

Alternatively, we provide the pre-generated samples for validation set here. You can put them in ./tmp and launch the test through

python tools/fid_score.py cfg \
  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
  fid.rootb=tmp/224x400/samples  # FID=14.46065995481922
  # or `fid.rootb=tmp/224x400map0/samples`, FID=16.195992872931697

Quantitative Results

Compare MagicDrive with other methods for generation quality:

Training support with images generated from MagicDrive:

More results can be found in the main paper.

Qualitative Results

More results can be found in the main paper.

Cite Us

@inproceedings{gao2023magicdrive,
  title={{MagicDrive}: Street View Generation with Diverse 3D Geometry Control},
  author={Gao, Ruiyuan and Chen, Kai and Xie, Enze and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan and Xu, Qiang},
  booktitle = {International Conference on Learning Representations},
  year={2024}
}

Credit

We adopt the following open-sourced projects:

bevfusion: dataloader to handle 3d bounding boxes and BEV map
diffusers: framework to train stable diffusion
xformers: accelerator for attention mechanism
Thanks @pixeli99 for training the 60-frame video generation.

Name	Name	Last commit message	Last commit date
Latest commit flymin [doc] format Dec 9, 2024 d2fad7d · Dec 9, 2024 History 40 Commits
.github/workflows	.github/workflows	update stale.yml	Nov 18, 2024
assets	assets	update readme for #37	Jun 12, 2024
configs	configs	[feat] release two high-res model	Dec 9, 2024
data	data	code release	Jan 25, 2024
demo	demo	update for #42	Jun 14, 2024
doc	doc	[doc] update doc for xformers (#125 )	Dec 9, 2024
magicdrive	magicdrive	[feat] release two high-res model	Dec 9, 2024
perception	perception	add fid code (#65 )	Jul 26, 2024
pretrained	pretrained	code release	Jan 25, 2024
requirements	requirements	Disable flash_attn (#48 )	Jun 25, 2024
third_party	third_party	Disable flash_attn (#48 )	Jun 25, 2024
tools	tools	add fid code (#65 )	Jul 26, 2024
.gitignore	.gitignore	update readme	Mar 15, 2024
.gitmodules	.gitmodules	setup third-party to track changes	Jan 25, 2024
LICENSE	LICENSE	Update LICENSE	Dec 3, 2024
README.MD	README.MD	[doc] format	Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MagicDrive

Abstract

News

Method

TODO

Getting Started

Environment Setup

Pretrained Weights

Street-view Generation with MagicDrive

Train MagicDrive

Prepare Data

Train the model

Test the model

Quantitative Results

Qualitative Results

Cite Us

Credit

About

Releases

Packages

Languages

License

cure-lab/MagicDrive

Folders and files

Latest commit

History

Repository files navigation

MagicDrive

Abstract

News

Method

TODO

Getting Started

Environment Setup

Pretrained Weights

Street-view Generation with MagicDrive

Train MagicDrive

Prepare Data

Train the model

Test the model

Quantitative Results

Qualitative Results

Cite Us

Credit

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages