About the size of neighborhood #27

wangning7149 · 2022-05-07T08:47:56Z

Hi
a neighborhood of size L × L ，Is L here equal to 3？

qwopqwop200 · 2022-05-08T07:02:32Z

According to the paper, the overall setup follows Swin, where Swin has an L size of 7 and NAT is the same.
https://github.com/SHI-Labs/Neighborhood-Attention-Transformer/blob/main/classification/nat.py#L259

wangning7149 · 2022-05-08T10:52:53Z

Isn't NAT pixel by pixel? So why is it lower than the flops of swin? 发自我的iPhone

…

------------------ Original ------------------ From: qwopqwop200 ***@***.***> Date: Sun,May 8,2022 3:02 PM To: SHI-Labs/Neighborhood-Attention-Transformer ***@***.***> Cc: wangning7149 ***@***.***>, Author ***@***.***> Subject: Re: [SHI-Labs/Neighborhood-Attention-Transformer] About the size of neighborhood (Issue #27) The size of L for this NAT is 7, same as Swin. https://github.com/SHI-Labs/Neighborhood-Attention-Transformer/blob/main/classification/nat.py#L259 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

alihassanijr · 2022-05-08T20:05:55Z

Hello and thank you for your interest.

Firstly, L x L is the term we use to denote kernel (window) size in the paper. Neighborhood size would technically be half the window size, because in theory, each query has L // 2 neighbors on each side of it across each axis, thus L // 2 * 2 neighbors plus itself yields L total pixels across each axis. That's actually why we force kernel size to be specifically odd numbers, so that query pixels can be centered.

We followed Swin in setting the window size to 7x7 so that both end up having the same sized receptive fields. In other words, in every attention module, both NA and SWSA limit each query to exactly 7x7 keys and values.

As for the models, we used a new configuration that is different from Swin. We firstly found overlapping convolutions to be more effective than patched convolutions for both tokenization and downsampling. We also found that with slightly deeper models (but with thinner inverted bottlenecks), we can achieve even better performance.
That's why our final models end up with fewer FLOPs than their Swin counterparts.

We've done an ablation study on these changes, which is presented in the paper.

I hope this answers both of your questions.

alihassanijr · 2022-05-20T15:53:55Z

Closing this due to inactivity. If you still have questions feel free to open it back up.

alihassanijr added the question Further information is requested label May 8, 2022

alihassanijr closed this as completed May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the size of neighborhood #27

About the size of neighborhood #27

wangning7149 commented May 7, 2022

qwopqwop200 commented May 8, 2022 •

edited

wangning7149 commented May 8, 2022 via email

alihassanijr commented May 8, 2022

alihassanijr commented May 20, 2022

About the size of neighborhood #27

About the size of neighborhood #27

Comments

wangning7149 commented May 7, 2022

qwopqwop200 commented May 8, 2022 • edited

wangning7149 commented May 8, 2022 via email

alihassanijr commented May 8, 2022

alihassanijr commented May 20, 2022

qwopqwop200 commented May 8, 2022 •

edited