Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample image sizes #16

Open
QiaoYang-CV opened this issue Dec 25, 2023 · 1 comment
Open

Sample image sizes #16

QiaoYang-CV opened this issue Dec 25, 2023 · 1 comment

Comments

@QiaoYang-CV
Copy link

This work is worth researching and provides new thinking for image fusion tasks. However, I have a question about the image size: When I ran the Sample.py, it scaled the image size to a multiple of 32. Can't it make inputs in any size? I may need help understanding the denoising diffusion model and would like your answer.

@Afreshbird
Copy link

This work is worth researching and provides new thinking for image fusion tasks. However, I have a question about the image size: When I ran the Sample.py, it scaled the image size to a multiple of 32. Can't it make inputs in any size? I may need help understanding the denoising diffusion model and would like your answer.

扩散模型使用U-Net预测输入图像上施加的噪声,U-Net模型分为三个部分——编码器、解码器、跳跃连接。

编码器会对输入图像进行下采样操作(假设进行四次下采样),并把下采样后的中间特征存下来传递到解码器中以实现跳跃连接操作,假设四个中间特征为 en_fea_1、en_fea_2、en_fea_3、en_fea_4;

解码器会把编码器最后一层输出的特征进行四次上采样,假设四次上采样后的特征为 de_fea_4、de_fea_3、de_fea_2、de_fea_1。

然后在解码器中会将与编码器对应的中间特征进行cat操作,即cat([en_fea_4, de_fea_4])、cat([en_fea_3, de_fea_3])、cat([en_fea_2 de_fea_2])、cat([en_fea_1, de_fea_1]),当任意一对中间特征的分辨率不相同时就会报错。

在进行下采样时,如果特征的分辨率不是2的倍数,则会向下取整,举个例子:
假设输入特征的形状为 (batch, channel, 67, 69),执行四次下采样,形状变化如下:
(batch, channel, 67, 69) --> (batch, channel, 33, 34) --> (batch, channel, 16, 17) --> (batch, channel, 8, 8) --> (batch, channel, 4, 4)
然后使用 (batch, channel, 4, 4) 进行四次上采样,形状变化如下:
(batch, channel, 4, 4) --> (batch, channel, 8, 8) --> (batch, channel, 16, 16) --> (batch, channel, 32, 32) --> (batch, channel, 64, 64)

此时 en_fea_1、en_fea_2、en_fea_3、en_fea_4 分别为 (batch, channel, 67, 69)、(batch, channel, 33, 34)、(batch, channel, 16, 17) 、(batch, channel, 8, 8) ,de_fea_4、de_fea_3、de_fea_2、de_fea_1 分别为 (batch, channel, 64, 64)、(batch, channel, 32, 32)、(batch, channel, 16, 16)、(batch, channel, 8, 8) ,在进行cat操作时就会报错。

在测试时,为了使模型不报错,将输入图像的分辨率缩放为了32的倍数(缩放的倍数根据网络使用下采样的次数N确定,目的是让输入图像的分辨率可以连续整除N次2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants