Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Calculation of Mean Average Precision #1184

Closed
RostyslavUA opened this issue Aug 15, 2022 · 11 comments · Fixed by #1832 · May be fixed by dhananjaisharma10/metrics#1
Closed

Wrong Calculation of Mean Average Precision #1184

RostyslavUA opened this issue Aug 15, 2022 · 11 comments · Fixed by #1832 · May be fixed by dhananjaisharma10/metrics#1
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Milestone

Comments

@RostyslavUA
Copy link

RostyslavUA commented Aug 15, 2022

🐛 Bug

Wrong calculation of mean Average Precision (mAP). I get

{'map': tensor(1.),
 'map_50': tensor(1.),
 'map_75': tensor(-1),
 'map_small': tensor(1.),
 'map_medium': tensor(-1.),
 'map_large': tensor(-1.),
 'mar_1': tensor(1.),
 'mar_10': tensor(1.),
 'mar_100': tensor(1.),
 'mar_small': tensor(1.),
 'mar_medium': tensor(-1.),
 'mar_large': tensor(-1.),
 'map_per_class': tensor([ 1.,  1., -1., -1.]),
 'mar_100_per_class': tensor([ 1.,  1., -1., -1.])}

while I would expect to have 'map': tensor(0.5) and map_per_class': tensor([ 1., 1., 0, 0.]) (see example below)

To Reproduce

Steps to reproduce the behavior...

from torchmetrics.detection.mean_ap import MeanAveragePrecision
metric = MeanAveragePrecision(iou_thresholds=[0.5], class_metrics=True)
preds = [
  dict(
    boxes=torch.Tensor([[0, 0, 20, 20],
                        [30, 30, 50, 50],
                        [70, 70, 90, 90],  # FP
                        [100, 100, 120, 120]]),  # FP
    scores=torch.Tensor([0.6, 0.6, 0.6, 0.6]),
    labels=torch.IntTensor([0, 1, 2, 3]),
  )

]

targets = [
  dict(
    boxes=torch.Tensor([[0, 0, 20, 20],
                        [30, 30, 50, 50]]),
    labels=torch.IntTensor([0, 1]),
  )
]
metric.update(preds, targets)
metric.compute()

Expected behavior

AP per class 2 and 3 must be 0. Thus, the mAP would also become 0.5.

Environment

  • TM version is 0.9.3 (installed with pip)
  • Python 3.7.4
  • Torch 1.10.0+cu111 (installed with pip)
  • Windows Subsystem for Linux
@RostyslavUA RostyslavUA added bug / fix Something isn't working help wanted Extra attention is needed labels Aug 15, 2022
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@MihailMihaylov97
Copy link

Any updates on this issue?

@dhananjaisharma10
Copy link

dhananjaisharma10 commented Sep 10, 2022

Why does the above happen?

  • Precision and Recall are initialized as tensors containing -1s here.
  • When the code enters these nested for-loops, the static method MeanAveragePrecision.__calculate_recall_precision_scores is called. For class indices 2 and 3, this method does not cause any change in the corresponding locations in the Precision and Recall tensors (precision[:, :, class_idx, :, :] and recall[:, class_idx, :, :], respectively). This happens because this method returns without any change here for the given example. Hence, all entries are still -1s for these classes.
  • Since all entries are -1s, this call used for calculating mean AP returns a torch.tensor([-1.0]).

What can be done to solve this issue?

In such a case, the static method MeanAveragePrecision.__calculate_recall_precision_scores can be changed to modify precision and recall when there are no GT bboxes. Basically, your example represents an edge case where there cannot be any True Positives (TP), hence, Precision must be zero. Recall will be undefined in this case since the numbers of both TP and FN are zero. Note that, in your example, you had FP for classes 2 and 3. If they were not present, then Precision would also be undefined for these classes.

Hope it helps! 😄

@SkafteNicki SkafteNicki added this to the v0.11 milestone Sep 14, 2022
@Borda
Copy link
Member

Borda commented Oct 19, 2022

What can be done to solve this issue?

@dhananjaisharma10 would you be interested to send a draft PR with a suggested fix? 🐰

@Borda Borda modified the milestones: v0.11, v0.10 Oct 19, 2022
@dhananjaisharma10
Copy link

What can be done to solve this issue?

@dhananjaisharma10 would you be interested to send a draft PR with a suggested fix? 🐰

@Borda Sure, I can commence work on this.

@Borda
Copy link
Member

Borda commented Nov 7, 2022

@dhananjaisharma10, how are you doing, can we help you with something? 🦦

@dhananjaisharma10
Copy link

dhananjaisharma10 commented Nov 7, 2022

@Borda I was a little held up in the last 2-3 weeks, though I am relatively free now. Is it okay if I provide an update by this weekend?

EDIT: I have commenced work on it.

@Borda
Copy link
Member

Borda commented Nov 7, 2022

Is it okay if I provide an update by this weekend?

sure, when ever you have time :)

@dhananjaisharma10
Copy link

@Borda could you please check the Draft PR and guide further. Thanks.

@Borda
Copy link
Member

Borda commented Nov 9, 2022

@Borda could you please check the Draft PR and guide further. Thanks.

could you pls open the PR to this repo? 🦦

@SkafteNicki
Copy link
Member

Hi @RostyslavUA, thanks for raising this issue.
I am trying to revamp our MeanAveragePrecision implementation (PR #1832) to use pycocotools directly and in that process check all issues that has to do with the metric, like this issue. What I am seeing is that the per class mean average precision by pycocotools is also

torch.tensor([ 1.,  1., -1., -1.])

and not torch.tensor([ 1., 1., 0., 0.]) as you are expecting, thus the map value should actually be 1 and not 0.5 as you expect.

The explanation:
First of not a expert in object detection, but I been looking at the original code for some time now. This is just my explanation why the results actually makes sense. For the example you provide we can quickly identify that the number of correct detections must be 1 for class {0,1} and 0 for class {2,3}. To calculate precision/recall we use the formulas:

precision = true object detection / all detected boxes
recall = true object detection / all ground truth boxes

precision is fine, it becomes 1.0 for class {0,1} and 0 for class {2,3}. However, recall is an issue. For class {0,1} it is still 1, but for class {2,3} we will be calculating 0 / 0 because we have no ground truth boxes. Thus for these two cases the recall is undefined, which means that the precision-recall curve is undefined, which means that the average precision for these two classes are undefined. This is the exact reason our current implementation (and pycocotools) assign them a score of -1 which means undefined in some way. As map is defined as the mean over the classwise average precision, that would naively indicate that we should average over two class with a well defined score and two class with a undefined score, again resulting in something undefined. The official implementation goes around this problem by simply filtering away scores marked as undefined with a -1:
https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L452-L455
Thus we arrive at a score of 1 and not 0.5.

Hope this makes sense :]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
6 participants
@Borda @SkafteNicki @dhananjaisharma10 @MihailMihaylov97 @RostyslavUA and others