GitHub - dcdcvgroup/layout-diffusion-mindspore

介绍

本项目是CVPR2023论文：LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation的mindspore实现。论文原始的pytorch实现请参考论文作者郑光聪(浙江大学)个人主页中的Layout Diffuion仓库。

LayoutDiffusion是一种基于扩散模型的布局到图像的生成方法，它可以在保证生成质量的同时，对全局布局和每个细节对象进行强控制。为了克服图像和布局之间的多模态融合难题，本文提出了一种结构化图像块的构造方法，将区域信息和图像块转换成特殊的布局，与普通布局以统一的形式进行融合。本文还提出了布局融合模块（LFM）和对象感知交叉注意力（OaCA），用于建模多个对象之间的关系，使其具有对象感知和位置敏感的特性，从而能够精确地控制空间相关信息。论文在COCO-stuff和VG数据集上进行了大量的实验，证明了其在FID和CAS指标上优于之前的最先进方法。

同时，本项目为了按照原论文的采样方案进行加速采样，实现了一个mindspore版本的dpm-solver，采样时将采样方法设置为dpm-solver，并将采样步数设置为25步，就可以进行更快的采样。

Gradio Webui演示

任务流水线

在COCO-stuff数据集上的效果

环境配置

conda create -n LayoutDiffusion python=3.8
conda activate LayoutDiffusion

conda install mindspore==2.0.0 cudatoolkit=11.2 -c mindspore -c conda-forge
pip install omegaconf opencv-python h5py==3.2.1 gradio==3.24.1

python setup.py build develop

启动Gradio Webui(不需要配置数据集)

  mpirun python scripts/launch_gradio_app.py \
  --config_file configs/COCO-stuff_256x256/LayoutDiffusion_large.yaml \
  sample.pretrained_model_path=./pretrained_models/COCO-stuff_256x256_LayoutDiffusion_large_ema_1150000.pt \
  -n 1

在“--config_file XXX”后添加“--share”以允许远程链接共享

数据集配置

参考这里

预训练模型

数据集	分辨率	step, FID (采样图片数量 x 次数)	链接
COCO-Stuff 2017 segmentation challenge	256 x 256	step=25 FID=16.50 ( 3097 x 5 )	Google drive
COCO-Stuff 2017 segmentation challenge	128 x 128	step=25 FID=16.47 ( 3097 x 5 )	Google drive
VG	256 x 256	step=25 FID=15.91 ( 5097 x 1 )	Google drive
VG	128 x 128	step=25 FID=16.22 ( 5097 x 1 )	Google drive

训练

bash/train.bash

  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
  mpirun python scripts/launch_gradio_app.py \
  --config_file configs/COCO-stuff_256x256/LayoutDiffusion_large.yaml \
  -n 8

采样

bash/sample.bash

评分标准

你需要为每一个评价指标配置一个（pytorch）环境。

FID

Fr‘echet Inception Distance (FID)指标使用这个仓库评测 pytorch-fid.

完成采样后，使用以下命令测试FID指标:

CUDA_VISIBLE_DEVICES=0 python fid_score.py path/to/generated_imgs path/to/gt_imgs --gpu 0

IS

使用 Improved-GAN 评估Inception Score(IS)。

采样后，使用以下命令测量IS：

cd inception_score
CUDA_VISIBLE_DEVICES=0 python model.py --path path/to/generated_imgs

DS

使用PerceptualSimilarity评估Diversity Score(DS)。

原论文代码修改了lpips_2dirs.py，以便更容易自动计算DS的均值和方差，请参考这里。

CUDA_VISIBLE_DEVICES=0 python lpips_2dirs.py -d0 path/to/generated_imgs_0 -d1 path/to/generated_imgs_1 -o imgs/example_dists.txt --use_gpu

YOLO Score

使用LAMA评估YOLO分数。

原论文代码修改了 test.py 来测量完整注释（在 coco 数据集中使用 instances_val2017.json）和 [过滤注释](https://drive.google. com/file/d/1T5A2AwNF2gZmi2LDArkE7ycBwGDuhq4w/view?usp=共享）。

采样后，使用以下命令测量YOLO Score：

cd yolo_experiments
cd data
CUDA_VISIBLE_DEVICES=0 python test.py --image_path path/to/generated_imgs

CAS

使用pytorch_image_classification评估分类分数(CAS)。

裁剪图像的 GT 框区域，并根据类别以 32×32 的分辨率调整对象的大小。然后在生成的图像上使用裁剪后的图像训练 ResNet101 分类器，并在真实图像上的裁剪后的图像上进行测试。最后，使用生成的图像测量 CAS。

CUDA_VISIBLE_DEVICES=0 python evaluate.py --config configs/test.yaml

您可以在configs/test.yaml 中配置ckpt路径和数据集信息。

引用

@misc{zheng2023layoutdiffusion,
    title={LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation}, 
    author={Guangcong Zheng and Xianpan Zhou and Xuewei Li and Zhongang Qi and Ying Shan and Xi Li},
    year={2023},
    eprint={2303.17189},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
bash		bash
configs		configs
dpm_solver		dpm_solver
figures		figures
layout_diffusion		layout_diffusion
scripts		scripts
.gitignore		.gitignore
DATASET_SETUP.md		DATASET_SETUP.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

License

dcdcvgroup/layout-diffusion-mindspore

Folders and files

Latest commit

History

Repository files navigation

介绍

Gradio Webui演示

任务流水线

在COCO-stuff数据集上的效果

环境配置

启动Gradio Webui(不需要配置数据集)

数据集配置

预训练模型

训练

采样

评分标准

FID

IS

DS

YOLO Score

CAS

引用

About

Resources

License

Stars

Watchers

Forks

Languages