Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about GSA #9

Closed
kejie-cn opened this issue May 24, 2021 · 2 comments
Closed

Question about GSA #9

kejie-cn opened this issue May 24, 2021 · 2 comments

Comments

@kejie-cn
Copy link

Hello, thank you very much for your excellent work. I have some questions about GSA. According to my personal understanding, GSA in the paper takes one representation from each window, so the sr_ratio should be the same as the window size ([7, 7, 7, 7]) when calculating Key and Value, but it is [8, 4, 2, 1] in the code. Is there anything wrong with my understanding?

@BACKBONES.register_module()
class alt_gvt_large(ALTGVT):
    def __init__(self, **kwargs):
        super(alt_gvt_large, self).__init__(
            patch_size=4, embed_dims=[128, 256, 512, 1024], num_heads=[4, 8, 16, 32], mlp_ratios=[4, 4, 4, 4], qkv_bias=True,
            norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 18, 2], wss=[7, 7, 7, 7], sr_ratios=[8, 4, 2, 1],
            extra_norm=True, drop_path_rate=0.3,
        )
@cxxgtxy
Copy link
Collaborator

cxxgtxy commented May 24, 2021

Thanks for your attention.
You are right, and we will make it more clear in the next version of the paper.
If we use 7 x 7 in the last stage, there will be only 1 key (the feature map is 7 x 7). Therefore, in such case GSA is normal global self attention (see the code).
As for stage 3 (feature size 14 x 14), if we use sr=7, there will be only 4 keys, which will limit the representative power of the network.
In general, GSA is a mechanism of collecting global information efficiently. In fact, you can try different sr_ratios in your project.

@kejie-cn
Copy link
Author

Thanks for your attention.
You are right and e will make it more clear in the next version of the paper.
If we use 7_7 in the last stage, there will be only 1 key (the feature map is 7_7). Therefore, in such case GSA is normal global self attention (see the code).
As for stage 3 (feature size 14*14), if we use sr=7, there will be only 4 keys, which will limit the representative power of the network.
In general, GSA is a mechanism of collecting global information efficiently. In fact, you can try different sr_ratios in your project.

I see. Thanks for your reply.

littleSunlxy pushed a commit to littleSunlxy/Twins that referenced this issue Nov 4, 2021
* add test tutorial

* remote torch/torchvision from requirements

* update getting started

* rename drop_out_ratio -> dropout_ratio
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants