Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting access to the one-shot object detection training code #25

Open
JosephAssaker opened this issue Jul 18, 2022 · 0 comments
Open

Comments

@JosephAssaker
Copy link

Hello there!

As the code for the one-shot object detection task is not available in this repository, would there be any way to access it? If not would it be possible for you to share with me this code?

I tried to re-implement the ideas presented in your paper on top of DETR, but was unsuccessful in replicating the results shown in the paper. In fact, I was not able to build a model that "learns", as the loss remains high throughout the training without ever showing a consistent downwards trend.

What I've done in detail is the following: I took DETR's architecture, added to it the queries as input, passed the queries through the same backbone CNN as the target image, forwarded the resulting embedding to an average pooling layer to reduce the H*W dimensions to 1 (nn.AdaptiveAvgPool2d((1, 1))), forwarded the resulting vector to a projection linear layer (nn.Linear(backbone.num_channels, hidden_dim)) to project the features from an N-dimensional space to an M-dimensional space (where N is the channels dimension of the CNN backbone and M is the dimension within the encoder-decoder transformers), and finally, repeated the resulting vector X times (X being the number of object queries in the architecture) and added that to the object queries vectors (according to our discussion in #24 ).

My goal was to replicate the results (shown below) of "DETR" (without pretraining) in your paper for one-shot object detection on PASCAL VOC.

2022-07-18 09_38_03-2011 09094 pdf

Unfortunately, I was not able to replicate these results, and in fact have not had a converging model that learned the task at all (loss is always high and oscillating). I Tried various backbone learning rates, such as 1e-4, 5e-5, 1e-5, and 0 and all resulted in approximately the same results. Lastly, I tried to also add to my code your proposed feature reconstruction loss (both with backbone lr = 0 and > 0), but that also didn't help.

Thank you for your time, and I'm looking forward to hearing back from you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant