You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that in the config files for all the experiments, channel_mults is set to [1,2,4,8], while attn_res is at 16. This means that you dont use attention within the upsampling and downsampling block right? Since according to the documentation:
:param attn_res: a collection of downsample rates at which attention will take place. May be a set, list, or tuple. For example, if this contains 4, then at 4x downsampling, attention will be used.
Is this an intentional design choice?
Also, you mention in the read me that "We used the attention mechanism in low-resolution features (16×16) like vanilla DDPM.". Do you mean [32x32] as the images you train on are 256x256, and the feature size is of [32x32] when you reach the middleblock where attention is used.
Thank you for the great repo!
The text was updated successfully, but these errors were encountered:
There may be some problems with your understanding. the attention setting is for crosspoding image size ,16 means the image size after downsample,i will give you a example
6464->down sample to 3232
3232 ->down to 1616,so this layer use attention
1616->88
88->44
you could see from code in unet
I notice that in the config files for all the experiments, channel_mults is set to [1,2,4,8], while attn_res is at 16. This means that you dont use attention within the upsampling and downsampling block right? Since according to the documentation:
:param attn_res: a collection of downsample rates at which attention will take place. May be a set, list, or tuple. For example, if this contains 4, then at 4x downsampling, attention will be used.
Is this an intentional design choice?
Also, you mention in the read me that "We used the attention mechanism in low-resolution features (16×16) like vanilla DDPM.". Do you mean [32x32] as the images you train on are 256x256, and the feature size is of [32x32] when you reach the middleblock where attention is used.
Thank you for the great repo!
The text was updated successfully, but these errors were encountered: