Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The diagnoal matrix meaning? #16

Closed
JosonChan1998 opened this issue Mar 15, 2022 · 4 comments
Closed

The diagnoal matrix meaning? #16

JosonChan1998 opened this issue Mar 15, 2022 · 4 comments

Comments

@JosonChan1998
Copy link

Hi, thank your nice work about Transformer in Object Detection. But I have some questions when reading the paper and code. I hope you can give me some answers。

  1. What 's the insight of the pos_transformation T in 3.3 ?

  2. What 's the meaning about diagonal vector \lamda q described in 3.3. And I don't find the code about the diagonal operator in this repo. And i just find the pos_transformation just generated by learnable weights :

    pos_transformation = self.query_scale(output)

  3. I can't figure out the difference bewteen "Block" , "Full" and "Diagonal" in Fig5.

The above are all my questions. I sincerely hope I can get your help. Thanks!

@DeppMeng
Copy link
Collaborator

DeppMeng commented Apr 1, 2022

Sorry for the late reply.

  1. About T. T is a learnable linear projection. It is obtained by applying a FFN on decoder embedding f. Since f contains displacement information of the distinct regions w.r.t the reference point, so we expect T to be a displacement transformation in p embedding space. T could be a full matrix, a block matrix, or a diagonal matrix. We empirically studied these types of matrices and choose the diagonal option.
  2. \lambda_q is the diagonal elements of matrix T. It is `pos_transformation' in our code.
  3. For the details please refer to paragraph ``The effect of linear projections T forming the transfor- mation.'' in our paper.

@JosonChan1998
Copy link
Author

Thanks for your reply!

@WYHZQ
Copy link

WYHZQ commented Dec 1, 2022

Sorry for the late reply.

1. About T. T is a learnable linear projection. It is obtained by applying a FFN on decoder embedding f. Since f contains displacement information of the distinct regions w.r.t the reference point, so we expect T to be a displacement transformation in p embedding space. T could be a full matrix, a block matrix, or a diagonal matrix. We empirically studied these types of matrices and choose the diagonal option.

2. \lambda_q is the diagonal elements of matrix T. It is `pos_transformation' in our code.

3. For the details please refer to paragraph ``The effect of linear projections T forming the transfor- mation.'' in our paper.

Thank you for your reply. You said lamq is the diagonal element of matrix A. But the "pos_transformation" obtained after FFN does not extract diagonal elements, but directly performs point multiplication with "query_sine_embedded", that is, "query_sine_embedded=query_sine_embedded * pos_transformation". Can you explain the principle?

@Vincent-luo
Copy link

@WYHZQ Have you figured it out? I have the same confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants