Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you use attention in the upsampling and downsampling blocks when training? #55

Open
DanBigioi opened this issue Oct 26, 2022 · 1 comment

Comments

@DanBigioi
Copy link

I notice that in the config files for all the experiments, channel_mults is set to [1,2,4,8], while attn_res is at 16. This means that you dont use attention within the upsampling and downsampling block right? Since according to the documentation:

:param attn_res: a collection of downsample rates at which attention will take place. May be a set, list, or tuple. For example, if this contains 4, then at 4x downsampling, attention will be used.

Is this an intentional design choice?

Also, you mention in the read me that "We used the attention mechanism in low-resolution features (16×16) like vanilla DDPM.". Do you mean [32x32] as the images you train on are 256x256, and the feature size is of [32x32] when you reach the middleblock where attention is used.

Thank you for the great repo!

@codgodtao
Copy link

There may be some problems with your understanding. the attention setting is for crosspoding image size ,16 means the image size after downsample,i will give you a example
6464->down sample to 3232
3232 ->down to 1616,so this layer use attention
1616->88
88->44
you could see from code in unet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants