Why is object classification loss multiplied with the Faster R-CNN confidence score? #19

j-min · 2019-10-01T01:34:23Z

During training, mask_conf is multiplied to feature regression and object classification loss, which are defined here.
It is reasonable to mask feature regression loss on masked regions, but I don't understand the reason of multiplying the Faster R-CNN confidence score (top object probability) to object classification loss (which is cross-entropy loss).
Is this sort of knowledge distillation? This is not mentioned in the EMNLP paper.

The text was updated successfully, but these errors were encountered:

airsplay · 2019-10-01T03:01:40Z

Instead of arguing for some ”look-like" reason, I must say that it is a pratical consideration when I wrote it :->.

It is designed to stop over-fitting. I observed that the pre-training process is easy to over-fit the image features/labels: the training loss keeps decreasing while the validation loss (of obj feats / labels) will increase after 3 epochs. I thus multiply by this confidence: it's OK to overfit, but please overfit something more correct! A side effect is that the RoI-feature regression loss is no-longer overfitted when it is multiplied by this confidence.

I personally think that it would be better to use KL divergence between the detected labels's confidence and predicted probabilities (i.e., distillation) as the loss. For now, the code takes the term with largest confidence score. I currently do not have an answer with support from experiments.

j-min · 2019-10-02T17:27:31Z

Thanks for clarification! I'm also going to EMNLP. Hopefully we can talk some more about this work in person soon :)

airsplay · 2019-10-02T18:07:58Z

Willing to talk; see you in Hong Kong :)!

By the way, I just think of a counter-intuitive finding w.r.t. the visual pre-training losses.

When I realize the pre-training overfits the visual losses, the first thing I did is to increase the mask rate of objects. Intuitively, a higher mask rate makes the vision tasks harder, hence the over-fitting should be somehow relieved.

However, the val loss starts to increase (the bad direction) after 1 epoch (instead of 3 epochs) when the mask rate is increased...

A possible explanation (provided by Jie Lei) is that the higher mask rate increases the amount of "supervision" (more labels in detected-label cls and feat reg) per batch...
I accept this explanation but I think that the pre-training of the visual branch might need a fundamental improvement (still not happen yet).

airsplay mentioned this issue Oct 12, 2019

Can you provide a log file of lxmert pretraining #22

Open

forjiuzhou mentioned this issue Feb 19, 2020

Why does object and attribute loss not be masked? #41

Open

j-min closed this as completed Apr 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is object classification loss multiplied with the Faster R-CNN confidence score? #19

Why is object classification loss multiplied with the Faster R-CNN confidence score? #19

j-min commented Oct 1, 2019 •

edited

airsplay commented Oct 1, 2019

j-min commented Oct 2, 2019

airsplay commented Oct 2, 2019

Why is object classification loss multiplied with the Faster R-CNN confidence score? #19

Why is object classification loss multiplied with the Faster R-CNN confidence score? #19

Comments

j-min commented Oct 1, 2019 • edited

airsplay commented Oct 1, 2019

j-min commented Oct 2, 2019

airsplay commented Oct 2, 2019

j-min commented Oct 1, 2019 •

edited