Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[object detection] precision and recall calculations inaccuracy #1112

Open
sherifshehata opened this issue Sep 23, 2016 · 7 comments
Open

[object detection] precision and recall calculations inaccuracy #1112

sherifshehata opened this issue Sep 23, 2016 · 7 comments

Comments

@sherifshehata
Copy link

Hello,
This issue is related to nv-caffe and DIGITS together.

I am exploring the accuracy and precision provided by the graph in detectnet. These values are calculated by the layer "mAP". The issue is that precision and recall are calculated per validation batch, then the final graphed metrics are deduced from per-batch metric (i think as average, i couldn't find the exact code that does this). I don't think this is correct, and i believe it results in wrong values for precision and recall.

I think that the error happens because of the "batch specific" division here:
https://github.com/NVIDIA/caffe/blob/caffe-0.15/python/caffe/layers/detectnet/mean_ap.py#L161

My suggestion is that "mAP" layer calculates true_positives, false_positives, and true_negatives. And then the division to calculate precision and recall should be done at DIGITS side after accumulating the true_positives, false_positives and true_negatives across batches.

@sherifshehata sherifshehata changed the title [object detection] mAP in [object detection] precision and recall calculations inaccuracy Sep 23, 2016
@lukeyeager
Copy link
Member

@sherifshehata yes, it's the average.
https://github.com/NVIDIA/caffe/blob/v0.15.13/src/caffe/solver.cpp#L405
https://github.com/NVIDIA/caffe/blob/v0.15.13/src/caffe/solver.cpp#L424

@drendleman I believe I raised this concern with you a while back - I thought I remembered you saying you double-checked the math?

@lukeyeager
Copy link
Member

lukeyeager commented Sep 23, 2016

@sherifshehata you're right - the math is wrong. A contrived example:

Batch1 Batch 2 Totals
TruePos 70 20 90
TrueNeg 0 0 0
FalsePos 10 15 25
FalseNeg 20 15 35
Batch 1 Batch 2 Avg across batches Correct answer
Precision 87.5 57.1 72.3 85.7
Recall 77.8 57.1 67.5 72

@lukeyeager
Copy link
Member

And you're also right that if we calculate the average TP/FP/FN as network outputs, then we could post-process those to calculate Precision and Recall.

Unfortunately, Caffe wouldn't let us calculate the total TP/FP/FN values - we'd have to get the average values per image. But that math would still work out.

@sherifshehata
Copy link
Author

I noticed the problem because the recall decreased dramatically when i doubled the validation set size, which is not expected

@RSly
Copy link

RSly commented Jan 4, 2017

Hello,

  • I wonder if this issue has been addressed yet?
  • Also, I have recognized that the displayed values for mAP,precision,recall do not correspond to the "Accuracy" axis on the right side. They correspond to the "Loss" axis on the left side. This is confusing and a bug (the loss is not normalized so it may have any large value whereas mAP,precision,recall <= 100%)

image

@raziehaskari
Copy link

Hi every one
I have a question: how do I calculate Precision, Recall and f-measure?
I read comments and understand that I should sum count of TP,TN,FP,FN , and then calculate precision, recall and etc, and this is means: I should sum cell of confusion of matrices, is it right?

@ccckblaze
Copy link

ccckblaze commented Jan 17, 2018

same issues found, i think the real problem is that the "Accuracy axis" is not fixed, once you hidden other data, you will found the graph scales and the value ratio is right, but it shows lower than 0.1%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants