[ICLR 2024🔥] Continuous-Multiple Image Outpainting in One-Step
via Positional Query and A Diffusion-based Approach
Shaofeng Zhang1, Jinfa Huang2, Qiang Zhou3, Zhibin Wang3, Fan Wang4, Jiebo Luo2, Junchi Yan1,*
1Shanghai Jiao Tong University, 2University of Rochester, 3INF Tech Co., Ltd., 4Alibaba Group
Our PQDiff can outpaint images with arbitrary and continuous multiples in one step by learning the positional relationships and pixel information at the same time.
Checkpoint | Google Cloud | Baidu Yun |
---|---|---|
Scenery | Download | TBD |
Building Facades | TBD | TBD |
WikiArt | TBD | TBD |
We use Flickr, Buildings, and WikiArt datasets, which can be obtained at link.
We use the autoencoder transformed from the stable diffusion, and you can download it from link.
accelerate launch --multi_gpu --num_processes 8 --mixed_precision fp16 train_ldm.py --config=configs/flickr192_large.py
You can train on your own dataset by modifying dataset/dataset.py
We provide the 2.25x, 5x, and 11.7x outpainting settings (with copy operation). Run:
python3 -m torch.distributed.launch --nproc_per_node=8 \
--node_rank 0 \
--master_addr=${MASTER_ADDR:-127.0.0.1} \
--master_port=${MASTER_PORT:-46123} \
evaluate.py --target_expansion 0.25 0.25 0.25 0.25 --eval_dir ./eval_dir/scenery/1x/ --size 128 \
--config flickr192_large
You can outpaint images with arbitrary and continuous multiples by changing the target_expansion
parameters. The four parameters mean (top, down, left, right).
We provide scripts to evaluate inception scores, FID, and Centered PSNR scores in the eval_dir
. Run:
python eval_dir/inception.py --path ./path1/
python -m pytorch_fid ./path1/ ./path2/
python eval_dir/psnr.py --original ./ori_dir/ --contrast ./gen_dir/
Here are some generated samples:
Methodically, PQDiff can outpaint at any multiple in only one step, greatly increasing the applicability of image outpainting.
-
For training, we randomly crop the image twice with different random crop ratios to obtain two views. Then, we compute the relative positional embeddings of the anchor view (red box) and the target view (blue box).
-
For sampling, i.e. testing or generation, we first compute the target view (blue box) based on the anchor view (red box) to form a mode that means a positional relation. With different types of modes, we can perform arbitrary and controllable image outpainting.
- QueryOTR. The codebase provides image outpainting datasets and a strong baseline.
- PQCL. The codebase inspires the position query scheme in this work.
Please consider citing 📑 our papers if our repository is helpful to your work, thanks sincerely!
@misc{zhang2024continuousmultiple,
title={Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach},
author={Shaofeng Zhang and Jinfa Huang and Qiang Zhou and Zhibin Wang and Fan Wang and Jiebo Luo and Junchi Yan},
year={2024},
eprint={2401.15652},
archivePrefix={arXiv},
primaryClass={cs.CV}
}