Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict bounding boxes on a new image ? #4

Open
Rahul-Venugopal opened this issue Dec 6, 2018 · 5 comments
Open

predict bounding boxes on a new image ? #4

Rahul-Venugopal opened this issue Dec 6, 2018 · 5 comments

Comments

@Rahul-Venugopal
Copy link

Rahul-Venugopal commented Dec 6, 2018

Hi ,
I read the paper and it is really an interesting work.
I have a doubt regarding it. Will the network be able to predict bounding box on a new image (lets say a random image of figure skater downloaded from internet ) ?
If it is possible , is there any test script available which can use trained weights on your figure-skating dataset ?
Also can you share the details of how the input data to assessor and localizer looks like ?
Thanks
Rahul

@Bartzi
Copy link
Owner

Bartzi commented Dec 7, 2018

Hi,

if you download an image from the internet, it should actually work with the code...
You can download one of the models and use the script image_sheeping.py with a single image. We've described how to use the script in (this section)[https://github.com/Bartzi/loans#visualization-of-the-train-results].

Well the input data to the localizer is an RGB image, mostly resized to 224x224 and the input to the asessor is also an image, with sizes of 75x75 for the sheep experiments and either 50x100 or 75x100 for the figure skating experiments

@Rahul-Venugopal
Copy link
Author

Rahul-Venugopal commented Jan 11, 2019

Hi ,
I was also not clear or I am misunderstanding something about the concept , It would be great if anyone can help me to understand this :

If the localizer crops a bigger region containing the target object , the assesor will output a ratio less than 1 and it helps the localizer to crop a smaller region containing more of the target object. This will give desired output.
But , what if localizer crops only a portion of target object ? In that case ratio provided by the Assesor will be high , as the cropped image is contains only the target object (but not full portion) .

(PS : Any help to clarify this doubt is very much appreciated. please correct me if I understood it conceptually wrong.
Actually I was testing with figure _skating model and it predicts correct bounding box whenever the input image contains only a figure skater . Unexpectedly there was an image of Lion in the test folder and the model draws a bounding box around the face of the lion . Is that suppose to behave in this way ? )

@Bartzi
Copy link
Owner

Bartzi commented Jan 14, 2019

But , what if localizer crops only a portion of target object ?

This case is handled by preparing the dataset in such a way that if the localizer crops only a portion of the object the assessor predicts a number lower than one. Remember: The objective of the assessor is to predict the intersection over union of the cropped image and the target object!

Unexpectedly there was an image of Lion in the test folder and the model draws a bounding box around the face of the lion . Is that suppose to behave in this way ?

There are several reasons why this might be odd to you:

  1. The localizer has no means to determine whether the crop is a good crop or not. So the localizer will always produce a prediction regardless of the image. You could use the assessor and use its judgement to determine whether to keep the prediction or not.
  2. It might like the face of the lion because it contains features the localizer is looking for. In case you used a model that was finetuned from a pre-trained ImageNet model, it might also be that ImageNet features still interfere with the prediction, but the main problem is still that there is (right now) now way of determining whether the generated crop is a good crop or not.

I hope this helps^^

@Rahul-Venugopal
Copy link
Author

Thanks @Bartzi for replying and your explanation really helped.

But I would like to clarify about the explanation you gave about the issue with Lion image

  1. I understood that the localizer cannot determine whether the crop is good or not .
    Let's say that localizer crops image of lion and send it to assessor . In that case assessor will provide a
    ratio less than 1 as the target object is not lion. Another question is
    "Does a trained assessor knows to accurately distinguish between target object(figure-skater) and
    random object (Lion) ?
    "

  2. To determine whether the crop is good or not , will the following steps be a helpful ;
    I take the final crop ( which has ratio 1 from assessor ) made by the localizer and feed it as an input
    to classifier trained to classify whether an image is a figure-skater or not. If the output of classifier is
    figure-skater , then the crop is correct else it is not. Will this work ?

@Bartzi
Copy link
Owner

Bartzi commented Jan 16, 2019

Hmm, good questions 😉

  1. The assessor does not really 'know' to distinguish between objects... it is used to predict an overlap ration so to say... but still it kind of knows that a lion should not be the real object, and hence give a lower score, so it is difficult to say actually, because we do not really know what the assessor looks at...
  2. well yes that would work, but this would add more complexity to the system... so you are suggesting that the label used for the classifier should be based on the output of the assessor? If so, it would also make sense to put this classifier as an extra head of the localizer and train this in a multi-task fashion, so that the localizer itself might be able to provide you with a prediction of the quality of the crop. This would be a very elegant solution if it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants