-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[object detection] precision and recall calculations inaccuracy #1112
Comments
@sherifshehata yes, it's the average. @drendleman I believe I raised this concern with you a while back - I thought I remembered you saying you double-checked the math? |
@sherifshehata you're right - the math is wrong. A contrived example:
|
And you're also right that if we calculate the average TP/FP/FN as network outputs, then we could post-process those to calculate Precision and Recall. Unfortunately, Caffe wouldn't let us calculate the total TP/FP/FN values - we'd have to get the average values per image. But that math would still work out. |
I noticed the problem because the recall decreased dramatically when i doubled the validation set size, which is not expected |
Hello,
|
Hi every one |
same issues found, i think the real problem is that the "Accuracy axis" is not fixed, once you hidden other data, you will found the graph scales and the value ratio is right, but it shows lower than 0.1%. |
Hello,
This issue is related to nv-caffe and DIGITS together.
I am exploring the accuracy and precision provided by the graph in detectnet. These values are calculated by the layer "mAP". The issue is that precision and recall are calculated per validation batch, then the final graphed metrics are deduced from per-batch metric (i think as average, i couldn't find the exact code that does this). I don't think this is correct, and i believe it results in wrong values for precision and recall.
I think that the error happens because of the "batch specific" division here:
https://github.com/NVIDIA/caffe/blob/caffe-0.15/python/caffe/layers/detectnet/mean_ap.py#L161
My suggestion is that "mAP" layer calculates true_positives, false_positives, and true_negatives. And then the division to calculate precision and recall should be done at DIGITS side after accumulating the true_positives, false_positives and true_negatives across batches.
The text was updated successfully, but these errors were encountered: