SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

Zhaoxi Chen Guangcong Wang Ziwei Liu

S-Lab, Nanyang Technological University

TPAMI, 2023

TL;DR

SceneDreamer learns to generate unbounded 3D scenes from in-the-wild 2D image collections.
Our method can synthesize diverse landscapes across different styles, with 3D consistency, well-defined depth, and free camera trajectory.

Paper | Project Page | Video | Hugging Face 🤗

Updates

[09/2023] SceneDreamer has been accepted to TPAMI 2023! 🥳

[09/2023] Training code released! 🤩

[04/2023] Hugging Face demo released!

[04/2023] Inference code released!

[02/2023] Paper uploaded to arXiv.

Citation

If you find our work useful for your research, please consider citing this paper:

@article {chen2023sd,
author = {Zhaoxi Chen and Guangcong Wang and Ziwei Liu},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
title = {{SceneDreamer}: Unbounded 3{D} Scene Generation From 2{D} Image Collections},
year = {2023},
volume = {45},
number = {12},
issn = {1939-3539},
pages = {15562-15576},
doi = {10.1109/TPAMI.2023.3321857},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {Dec}
}

Installation

We highly recommend using Anaconda to manage your python environment. You can setup the required environment by the following commands:

# install python dependencies
conda env create -f environment.yaml
conda activate scenedreamer

# compile third party libraries
export CUDA_VERSION=$(nvcc --version| grep -Po "(\d+\.)+\d+" | head -1)
CURRENT=$(pwd)
for p in correlation channelnorm resample2d bias_act upfirdn2d; do
    cd imaginaire/third_party/${p};
    rm -rf build dist *info;
    python setup.py install;
    cd ${CURRENT};
done

for p in gancraft/voxlib; do
  cd imaginaire/model_utils/${p};
  make all
  cd ${CURRENT};
done

cd gridencoder
python setup.py build_ext --inplace
python -m pip install .
cd ${CURRENT}

# Now, all done!

Inference

Download Pretrained Models

Please download our checkpoints from Google Drive to run the following inference scripts. You may store the checkpoint at the root directory of this repo:

├── ...
└── SceneDreamer
    ├── inference.py
    ├── README.md
    └── scenedreamer_released.pt

Render!

You can run the following command to generate your own 3D world!

python inference.py --config configs/scenedreamer_inference.yaml --output_dir ./test/ --seed 8888 --checkpoint ./scenedreamer_released.pt

The results will be saved under ./test as the following structures:

├── ...
└── test
    └── camera_{:02d} # camera mode for trajectory
        ├── rgb_render # per frame RGB renderings
            ├── 00000.png
            ├── 00001.png
            └── ...
        ├── rgb_render.mp4 # rendered video
        ├── height_map.png # height map
        ├── semantic_map.png # semantic map
        └── style.npy # sampled style code

Furthermore, you can modify the parameters for rendering in scenedreamer_inference.yaml, detailed as follows:

Parameter	Recommended Range	Description
`cam_mode`	0 - 9	Different camera trajectries for rendered sequence
`cam_maxstep`	0 - ∞	Total number of frames. Increase for a more smooth camera movement.
`resolution_hw`	[540, 960] - [2160, 3840]	The resolution of each rendered frame
`num_samples`	12 - 40	The number of sampled points per camera ray
`cam_ang`	50 - 130	The FOV of camera view
`scene_size`	1024 - 2048	The spatial resolution of sampled scene.

Here is a sampled scene with our default rendering parameters:

Gradio Demo

You can also locally launch our demo with gradio UI by:

python app_gradio.py

Alternatively, you can run the demo online

Training

Data Preparation

Generating BEV of Training Scenes

You need to first run the following command to generate the training scenes. This is a parallel script which will call the subprocess of single_terrain_gen.py. You could specify the total number of scenes using --bs and the number of workers in parallel using --num_workers. By default, the generated training scenes will be stored in ./data/terrain_dataset

python ./scripts/batch_terrain_gen.py --size 2048 --seed 42 --outdir ./data/terrain_dataset --bs 1024 --parallel --num_workers 16

Then, for training efficiency, we cache all training scenes as sparse voxels to avoid computing on-the-fly. You need to run the following command:

python ./scripts/pcg_cache.py --terrain ./data/terrain_dataset --outdir ./data/terrain_cache

Download Pretrained SPADE on 1.1M Images 💥

We release the checkpoint of SPADE which is trained on 1.1M images collected from the web. You could download it from Google Drive.

Prepare Images and Segmentation Masks

We refer you to public available datasets like LHQ for paired images and segmentation maps. Note that, we use 1.1M images collected from the web for training. The segmentation mask is generated using ViT-Adapter. Once you get the images and segmentations ready, please organize them as follows:

├── ...
└── ./data/lhq
    ├── train
        ├── images
        └── seg_maps
    └── val
        ├── images
        └── seg_maps

Then, you need the following command to dump all images into lmdb for efficient training:

for f in train val; do\
python scripts/build_lmdb.py \\
--config configs/img2lmdb.yaml \\
--data_root ./data/lhq/${f} \\
--output_root ./data/lhq_lmdb/${f} \\
--overwrite \\
--paired\
done

Launch Training 🚀

You are all set! Run the following command to launch the training of SceneDreamer:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=8888 train.py --config configs/scenedreamer_train.yaml --seed 3407

Note that, we use 8 GPUs for training by default. Please adjust --nproc_per_node to the number you want. Moreover, please specify the correct path for data in scenedreamer_train.yaml and correct path for the pretrained SPADE (./landscape1m-segformer.pt by default) in the landscape1m.yaml.

License

Distributed under the S-Lab License. See LICENSE for more information. Part of the codes are also subject to the LICENSE of imaginaire by NVIDIA.

Acknowledgements

This work is supported by the National Research Foundation, Singapore under its AI Singapore Programme, NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

SceneDreamer is implemented on top of the imaginaire. Thanks torch-ngp for the pytorch CUDA implementation of neural hash grid.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

TL;DR

SceneDreamer learns to generate unbounded 3D scenes from in-the-wild 2D image collections.
Our method can synthesize diverse landscapes across different styles, with 3D consistency, well-defined depth, and free camera trajectory.

Paper | Project Page | Video | Hugging Face 🤗

Updates

Citation

Installation

Inference

Download Pretrained Models

Render!

Gradio Demo

Training

Data Preparation

Generating BEV of Training Scenes

Download Pretrained SPADE on 1.1M Images 💥

Prepare Images and Segmentation Masks

Launch Training 🚀

License

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

TL;DR

SceneDreamer learns to generate unbounded 3D scenes from in-the-wild 2D image collections. Our method can synthesize diverse landscapes across different styles, with 3D consistency, well-defined depth, and free camera trajectory.

Paper | Project Page | Video | Hugging Face 🤗

Updates

Citation

Installation

Inference

Download Pretrained Models

Render!

Gradio Demo

Training

Data Preparation

Generating BEV of Training Scenes

Download Pretrained SPADE on 1.1M Images 💥

Prepare Images and Segmentation Masks

Launch Training 🚀

License

Acknowledgements

SceneDreamer learns to generate unbounded 3D scenes from in-the-wild 2D image collections.
Our method can synthesize diverse landscapes across different styles, with 3D consistency, well-defined depth, and free camera trajectory.