Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization code of Figure 1 in paper. #5

Closed
MaureenZOU opened this issue Sep 13, 2021 · 6 comments
Closed

Visualization code of Figure 1 in paper. #5

MaureenZOU opened this issue Sep 13, 2021 · 6 comments

Comments

@MaureenZOU
Copy link

Hi Author,

First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

The attention maps I get is quite similar with the one shown in DETR paper:

A random object query:
Screen Shot 2021-09-13 at 10 46 41 AM

A random object query on head A:
Screen Shot 2021-09-13 at 10 47 18 AM

A random object query on head B:
Screen Shot 2021-09-13 at 10 47 25 AM

A random object query on head C:
Screen Shot 2021-09-13 at 10 47 30 AM

Could you please give some information on how to generate attention in Figure 1? Thanks!

@SISTMrL
Copy link

SISTMrL commented Sep 23, 2021

hello, have you generated the attention map like fig. 1? @MaureenZOU

@MaureenZOU
Copy link
Author

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

@GWwangshuo
Copy link

GWwangshuo commented Sep 28, 2021

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

@MaureenZOU
Could you please kindly provide the souce code for visualizing the attention map? That will be greatly helpful. Thanks a lot.

@DeppMeng
Copy link
Collaborator

Hi, @GWwangshuo @MaureenZOU @SISTMrL,

Thank you for your attention. Sorry for the late reply. We did not release the visualization code yet since we find that it is not easy to write a neat and clean version of it. When we finished re-writing this part of code, we will make a release (there is no certain schedule yet, the authors are busy working on recent ddls).

Here is a brief guide:

  1. Perform validation process, record: content, position attention weights, predictions.
  2. Filter out predictions with low classification score, as well as too-small objects.
  3. Plot the original image.
  4. Plot the content/position attention map on top of it.
  5. Plot the prediction box on top of it.
  6. Arrange plots in order you would like (e.g., order of attention heads).

@wulele2
Copy link

wulele2 commented Apr 11, 2022

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

Hello, when I tried to visualize detr, I first read the self-attn of the last layer of decoder to get cq:[100,1,256]; In addition, pQ is read from the trained model: [100,256]; Then get the pk of the feature map: [1,256,h, W]; Then calculate ((cq + pq)T * pk).softmax(-1).view(h,w) found out the effect is inconsistent. I really hope to get yours reply.

@Flyooofly
Copy link

Hi Author,

First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

The attention maps I get is quite similar with the one shown in DETR paper:

A random object query: Screen Shot 2021-09-13 at 10 46 41 AM

A random object query on head A: Screen Shot 2021-09-13 at 10 47 18 AM

A random object query on head B: Screen Shot 2021-09-13 at 10 47 25 AM

A random object query on head C: Screen Shot 2021-09-13 at 10 47 30 AM

Could you please give some information on how to generate attention in Figure 1? Thanks!

你好,请问有研究过如何可视化Deformable-detr的注意力权重吗,我基于DETR提供的绘制代码一直不能得到正确结果~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants