You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your excellent work!
I have questions about your provided model.In the provided conditional detr model"conditional detr resnet50",the transformer.decoder.layer.cross_attn.out_proj.weight/bias is of dimension of 256x256 and 256 seperately,but since the input of this cross attention is the concatenation of two 256-d query, it seems should be 512x512 and 512.It really confuses me.Looking forward to your help,thanks!
The text was updated successfully, but these errors were encountered:
Sorry to disturb you again.My question is about the out_proj of cross_attn,i.e.
self.cross_attn = nn.MultiheadAttention(d_model * 2, nhead, dropout=dropout, vdim=d_model)
instead of your mentioned out_proj .In the source code of nn.MultiheadAttention,the out_proj is set as
self.out_proj = NonDynamicallyQuantizableLinear(embed_dim, embed_dim, bias=bias, **factory_kwargs),
where in your code, the embed_dim is set as d_model*2,i.e. 512,so i think the out_proj seems should be 512-d instead of 256-d,but the model provided is all 256-d.
Thanks for your excellent work!
I have questions about your provided model.In the provided conditional detr model"conditional detr resnet50",the transformer.decoder.layer.cross_attn.out_proj.weight/bias is of dimension of 256x256 and 256 seperately,but since the input of this cross attention is the concatenation of two 256-d query, it seems should be 512x512 and 512.It really confuses me.Looking forward to your help,thanks!
The text was updated successfully, but these errors were encountered: