Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the author's results with the pre-trained models #9

Closed
wilderrodrigues opened this issue Jul 29, 2021 · 6 comments
Closed

Comments

@wilderrodrigues
Copy link

Hi there,

I'm currently experimenting with some Few/One/Zero-Shot for object detection and classification. For one of the tasks, your paper has been experimented with.

Unfortunately, I haven't been able to reproduce your results with the pre-trained models you have made available. I also noticed that the inference code you made available does not work out of the box. To support my points, here some details:

  1. At the moment is not possible to use the latest PyTorch with the latest TorchVision. The latter should be pinned to version 0.9.0.
  2. For the ImageNet pre-trained model
  • In your code samples, you use 6 patches only, but the model has been trained with the default 100 queries and 10 patches. The README file needs adjustments

Results:

  • ImageNet pre-trained model (I duplicated some patches to make sure I had 10, same kittens image used)

Patches
image

Detections
image

  • COCO pre-trained model (custom image used)

Patches
image

Detections
image

Hardware used

  • MacBook Air M1
  • NVIDIA GeForce RTX2080i

Yeah, I tried with both CPU and a CUDA compliant device.

Are you sure you have uploaded the rights checkpoint files?

Thanks in advance and looking to hear from you.

@dddzg
Copy link
Owner

dddzg commented Jul 30, 2021

At the moment is not possible to use the latest PyTorch with the latest TorchVision. The latter should be pinned to version 0.9.0.

I will update the code to support latest pytorch version.

The result seems a little weird. Could you provide more details to reimplement the result?

@dddzg
Copy link
Owner

dddzg commented Jul 30, 2021

Could you get the same result as our provided notebook?

@wilderrodrigues
Copy link
Author

Hi @dddzg ,

Thanks for the reply, much appreciated.

It's working now. ;) It was a mistake on my side. The source-code is not that different, I just refactored it a bit and added unit tests. The problem was that for the checkpoint loader I had a separate unit test. So, the order is not guaranteed and the checkpoint was not loaded in time for the inference to happen.

To fix it, I moved the model building and checkpoint loading phase to a setUp function in my unit test. The results are below:

Custom image with 10 hand-engineered patches
image

Kittens image with 10 hand-engineered patches
image

Kittens image with 10 random generated patches
image

Custom image with 10 random generated patches
image

As you can see, it works. However, we need some improvements when it comes to random-patches. The boxes are not good. The authors of this paper claimed to have improved it using region proposal via selective search: https://arxiv.org/pdf/2106.04550.pdf. No pre-trained models yet, though.

Thanks again and congrats on the good work.

@dddzg
Copy link
Owner

dddzg commented Jul 30, 2021

Hi, @wilderrodrigues . The boxes are not good with random cropped patches. Because the task is just the pre-text task. We don't really care about the accuracy of the pre-text task.
During pre-training, we freeze the CNN backbone( to preserve the CNN discrimation), so it is reasonable the boxes are not that good. As far as we observe, you can pre-train the CNN backbone together with transformers, if you only care about the accuracy of the boxes. I guess, it will improve a lot.

@wilderrodrigues
Copy link
Author

wilderrodrigues commented Jul 30, 2021

Hi @dddzg ,

Thanks for the extra info. We will probably try to fine-tune it and see how it behaves. Quick question: during fine-tuning I can change the number of queries / patches, right? Because now it's upper bounded to 100 and 10, respectively. For instance, when trying the fine-tuned COCO checkpoint you made available, on my custom image, I got this:

image

I used 10 random patches. So, the result is reasonably good. I would expect that with more patches we could find more objects.

Will keep you posted on my experiments / changes to the code.

Thanks again.

@wilderrodrigues
Copy link
Author

Yeah, just checked and I will fine tune and change the queries and patches. :)

    parser.add_argument('--num_queries', default=100, type=int, help="Number of query slots")
    parser.add_argument('--num_patches', default=10, type=int, help='number of query patches')

Need to convert my dataset to the COCO format first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants