Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the size of neighborhood #27

Closed
wangning7149 opened this issue May 7, 2022 · 4 comments
Closed

About the size of neighborhood #27

wangning7149 opened this issue May 7, 2022 · 4 comments
Labels
question Further information is requested

Comments

@wangning7149
Copy link

Hi
a neighborhood of size L × L ,Is L here equal to 3?

@qwopqwop200
Copy link

qwopqwop200 commented May 8, 2022

According to the paper, the overall setup follows Swin, where Swin has an L size of 7 and NAT is the same.
https://github.com/SHI-Labs/Neighborhood-Attention-Transformer/blob/main/classification/nat.py#L259

@wangning7149
Copy link
Author

wangning7149 commented May 8, 2022 via email

@alihassanijr
Copy link
Member

Hello and thank you for your interest.

Firstly, L x L is the term we use to denote kernel (window) size in the paper. Neighborhood size would technically be half the window size, because in theory, each query has L // 2 neighbors on each side of it across each axis, thus L // 2 * 2 neighbors plus itself yields L total pixels across each axis. That's actually why we force kernel size to be specifically odd numbers, so that query pixels can be centered.

We followed Swin in setting the window size to 7x7 so that both end up having the same sized receptive fields. In other words, in every attention module, both NA and SWSA limit each query to exactly 7x7 keys and values.

As for the models, we used a new configuration that is different from Swin. We firstly found overlapping convolutions to be more effective than patched convolutions for both tokenization and downsampling. We also found that with slightly deeper models (but with thinner inverted bottlenecks), we can achieve even better performance.
That's why our final models end up with fewer FLOPs than their Swin counterparts.

We've done an ablation study on these changes, which is presented in the paper.

I hope this answers both of your questions.

@alihassanijr alihassanijr added the question Further information is requested label May 8, 2022
@alihassanijr
Copy link
Member

Closing this due to inactivity. If you still have questions feel free to open it back up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants