Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mean Average Precision (mAP) metric #53

Closed
ternaus opened this issue Jul 8, 2020 · 44 comments · Fixed by #467
Closed

Add Mean Average Precision (mAP) metric #53

ternaus opened this issue Jul 8, 2020 · 44 comments · Fixed by #467
Assignees
Labels

Comments

@ternaus
Copy link

ternaus commented Jul 8, 2020

The main metric for object detection tasks is the Mean Average Precision, implemented in PyTorch, and computed on GPU.

It would be nice to add it to the collection of the metrics.

The example implementation using numpy:

https://github.com/ternaus/iglovikov_helper_functions/blob/master/iglovikov_helper_functions/metrics/map.py

@OrthoDex
Copy link

OrthoDex commented Jul 9, 2020

Hello! First time contributor here. I'd like to take a shot at this!

@awaelchli
Copy link
Member

would be great to have that metric!

@NumesSanguis
Copy link

Not yet sure yet, but it looks like AveragePrecision (https://pytorch-lightning.readthedocs.io/en/stable/metrics.html) seems to be acting like mAP. but they didn't add "Mean" to the name?

@Kshitij09
Copy link

I recently managed to implement a minimal version of Kaggle's Mean Average Precision metric. The difference is in the calculation itself. You can find the details here:

https://www.kaggle.com/c/global-wheat-detection/overview/evaluation

My kernel: pytorch-mean-absolute-precision-calculation

With slight modifications in the formula of mAP, I suppose we would be able to integrate this metric in pytorch-lightning (since we already have the average-precision implemented)

I've already written a logic to map predicted boxes to ground truth ones (taking their respective scores into consideration) so have a look at the kernel and let me know if you found any issues.

@briankosw
Copy link

Hi, I would love to take this on. @SkafteNicki could you assign it to me?

@SkafteNicki
Copy link
Member

SkafteNicki commented Oct 28, 2020

@briankosw thanks you for wanting to contribute!
Please ping in your PR.

@Vichoko
Copy link

Vichoko commented Oct 29, 2020

We need this! 🤩

For PyTorch lightning modules focused on multi-class classification, would be useful to obtain mAP metric over the test_dataloader data samples in a standard way that works with DDP with minor changes to the architecture. Because this metric is commonly used to conclude about the representation learning side of the deep learning architectures without using any thresholds or max functions to take hard decisions on the predictions.

Most classifiers have a pair of Fully-connected layers that do a classwise likelihood prediction. The output of the last layer before the FC layers could be used for a retrieval task over the test dataset and mAP could be calculated over this.

@Borda Borda transferred this issue from Lightning-AI/pytorch-lightning Mar 12, 2021
@Borda Borda added enhancement New feature or request help wanted Extra attention is needed labels Mar 17, 2021
@jspaezp
Copy link
Contributor

jspaezp commented Mar 18, 2021

@Borda I can give this a go.

But I am wondering what input format would be the most consistent with the rest of the package ...
Most of the stuff that it implemented in the package takes directly tensors of shape [N,C, ...] ... and what scope would you like for it ...

So directly ...

  1. Should it support only pre-aligned inputs? EDIT: I don't think so ... but supporting box inputs requires this to be implemented
    • (preds = [N,C] , targets = [N,C] )
  2. Should it support bounding boxes ? if so in what format (xxyy, xyxy, xwyh, xyhw ...)? Should each update be for a single "group" (each image for instance) or should it take a grouping argument?
    • (preds: float = [N1,C] , targets: long/bool = [N2,C] , pred_boxes = [x,x,y,y], target_boxes = [x,x,y,y], pred_grouping: long = [N1], target_grouping: long = [N2], iou_cutoff = 0.5); where the values in the grouping are in the range [0, NUM_GROUPS] so only boxes in the same group are compared with each other.
  3. Should it support masks?
    • Option 1: (preds: float = [N,C] , targets: long/bool = [N,C] , pred_masks = [N,H,W], target_masks = [N,H,W], grouping = [N], iou_cutoff = 0.5, mask_threshold = 0.5)
    • Option 2: (preds: float = [N1,C,H,W] , targets: long/bool = [N2,C,H,W], pred_grouping: long = [N1], target_grouping: long = [N2], iou_cutoff = 0.5, mask_threshold = 0.5)

On the other hand ... the COCO implementation has for each image a set of positive and negative labels in each image and ignores the rest ... should that be supported?

lmk what you think
Kindest wishes,
Sebastian

edit: leaving here for reference

@lucadiliello
Copy link
Contributor

mAP was added to PL some months ago and is now part of the torchmetrics package.

@Borda
Copy link
Member

Borda commented Mar 22, 2021

added with 60c0eec

@Borda Borda closed this as completed Mar 22, 2021
@SkafteNicki
Copy link
Member

reopening as the current implementation in torchmetrics is for information retrieval and here is being asked for an implementation for object detection.

@SkafteNicki SkafteNicki reopened this Mar 22, 2021
@jspaezp
Copy link
Contributor

jspaezp commented Mar 26, 2021

Just FYI, I have been working in a implementation over here https://github.com/jspaezp/metrics/tree/feature-spatial-average-precision but it is not quite ready for a PR.

Right now it needs ... (this is more a personal checklist before I send the PR)

  • DDP testing
  • Documentation (it is there, just has some argument updating and moving it up the classes and be more user friendly)
  • "Translating" tests to be consistent with the rest of the testing suite (RN it is more to do some quick and dirty TDD and not really for production consistency)
  • A fair amount of code cleanup/reorganization so the code order makes logical sense.

If you have any suggestion before I send the PR it would be great but not expected.
Kindest wishes,
Sebastian

@SkafteNicki
Copy link
Member

@jspaezp if you are going to send a PR, please also take a look at this PR Lightning-AI/pytorch-lightning#4564 (it got closed after we moved from lightning to torchmetrics)
It basically was a nearly finished PR with working map implementation, but just need a bit more work on the testing aspects.

@jspaezp
Copy link
Contributor

jspaezp commented Mar 28, 2021

@SkafteNicki thanks for letting me know! I feel like that implementation would be really hard to convert to a class metric due to how "monolitic" it is but I can definitely work on it.

Best,
Sebastian
(btw: it would have been nice to have that PR referenced in this issue before this time)

@SkafteNicki
Copy link
Member

@jspaezp I am fine with only having a functional interface (atleast to begin with).
Sorry about not referencing this before, some work got forgotten in the transfer.

@patricieni
Copy link

hello everyone! What is the status on this PR, is someone actively working on this?

From the prior PR in Lightning-AI/pytorch-lightning#4564 it doesn't seem that would work for DDP, which seems to be the main challenge when implementing this as a metric.

Has anyone tried first implementing this metric as a subclass of torchmetrics.Metric? - our team is currently working on this

@zhiqwang
Copy link

zhiqwang commented May 6, 2021

Hi, as discussed in #190 , Perhaps adding a wrapper of pycocotools is also an option?

@stale stale bot added the wontfix label Jul 5, 2021
@SkafteNicki SkafteNicki added this to To do in Image processing via automation Jul 7, 2021
@Borda Borda changed the title Feature request: Add Mean Average Precision (mAP) metric Add Mean Average Precision (mAP) metric Jul 7, 2021
@Lightning-AI Lightning-AI deleted a comment from github-actions bot Jul 7, 2021
@Lightning-AI Lightning-AI deleted a comment from stale bot Jul 7, 2021
@Borda Borda unpinned this issue Aug 8, 2021
@tkupek
Copy link
Contributor

tkupek commented Aug 8, 2021

Sorry, not yet. Will have a look at it in my vacation in two weeks.

@gau-nernst
Copy link

Hello, I was also looking for a way to evaluate my object detector in PyTorch Lightning. In the end I implemented some simple logic to make predictions in validation_step(), gather them in validation_epoch_end(), write them to a temp file and run pycocotools. This probably won't work with multi-GPU training.

I tried implementing this as a torchmetrics.Metric subclass before but was not successful, I was not sure why. I can try again.

Just a note on technical details of Metric. Can a metric state be a list of lists, where each is a list of Tensors? The documentation seems to imply that a metric state can only be a torch.Tensor or a list of torch.Tensor.

Also, is it safe to move tensor to CPU inside the compute() method?

@zhiqwang
Copy link

zhiqwang commented Aug 18, 2021

Hi @gau-nernst ,

I've also written a torchmetrics.Metric wrapper of pycocotools here (currently only supports 1 GPU), it can be used with the lightning-assembly, but there is still a lack of numerical unit-tests and multi-GPU supports.

@tkupek
Copy link
Contributor

tkupek commented Aug 18, 2021

The main problem with your approach (moving to CPU, writing to file) will be performance and the lack of multi GPU training.

@gau-nernst
Copy link

Hey @zhiqwang,

Awesome work. I see it's based on torchvision/detectron2 evaluation code. From what I understand from your code, it requires a COCO ground truth annotations file. I think if we add object detection mAP to torchmetrics, it should be agnostic to dataset format, and can be computed based on a set of ground truth and predicted boxes only.

@tkupek Yes these are the obvious problems with my approach. If we want performance, the logic should be implemented in torch and not used pycocotools. Some people have done this I believe. But the problem is whether the torch implementation will be accurate and consistent with pycocotools.

For multi-GPU training, if torchmetrics synchronizes target and predicted boxes accross GPUs, it should be okay to transfer the results to CPU to calculate mAP using pycocotools?

@zhiqwang
Copy link

zhiqwang commented Aug 18, 2021

I see it's based on torchvision/detectron2 evaluation code. From what I understand from your code, it requires a COCO ground truth annotations file. I think if we add object detection mAP to torchmetrics, it should be agnostic to dataset format, and can be computed based on a set of ground truth and predicted boxes only.

@gau-nernst Yep, I agree with you, the design and functionality of this module still requires some deliberation here.

For multi-GPU training, if torchmetrics synchronizes target and predicted boxes accross GPUs, it should be okay to transfer the results to CPU to calculate mAP using pycocotools?

I guess it is possiable, and seems that pytorch 1.9 provides a better interface to support the arbitrary picklable data synchronizes for multi-GPUs mode with torch.distributed.all_gather_object , and it has been applied in torchvision: pytorch/vision#3857

@SkafteNicki
Copy link
Member

It is clearly that you all are more experts than me on this specific metric, but let me get this straight:
Do we need from torchmetrics side to support gathering of arbitrarily objects if we want this metric to work or is there some trivial way around it?

@tkupek
Copy link
Contributor

tkupek commented Aug 18, 2021

@gau-nernst I have an approach where I bypassed the need for a temp file and initialize the CocoEval class with the detections and targets directly.
To the best of my knowledge, this works with the current torchmetrics functionality.
Let me upload my rough draft in the evening, to discuss the approach together. If we agree, we can improve the code to make it acceptable for the torchmetrics package.

@tkupek
Copy link
Contributor

tkupek commented Aug 18, 2021

So this is my approach:
https://github.com/tkupek/pytorchlightning-metrics/blob/mean-average-precision/torchmetrics/image/map.py

It takes the preds and targets and converts them into COCO format in memory, then initializes the COCOeval class and runs the evaluation.
In my company this works smoothly with only a minor performance impact and even for multi GPU training.

Incomplete TODO list:

  • check if pycocoeval can handle tensors to avoid .cpu() calls
  • standardize MAPMetricResults to have all evaluation results in there
  • refactor some code parts (e.g. join get_coco_target and get_coco_preds methods)
  • add unittests and documentation in torchmetrics format

@zhiqwang @gau-nernst @SkafteNicki if you agree with this approach I can try to finalize the draft in the next week (or we can work on this together).

@SkafteNicki
Copy link
Member

@tkupek Looks great to me. Please send a PR to us with what you have, then we can begin to review/comment/add to the your code.

@SkafteNicki
Copy link
Member

If anyone has any opinion on the input format that preds and target should have in this metric, please comment on #467

@tkupek
Copy link
Contributor

tkupek commented Aug 20, 2021

I had a look how tensorflow is handling this and it seems like a dict is most common:
https://github.com/tensorflow/models/blob/master/research/object_detection/metrics/coco_evaluation.py

They use this for the groundtruth (target):

groundtruth_dict = {
  "groundtruth_boxes": [ymin, xmin, ymax, xmax],
  "groundtruth_classes": [num_boxes]
}

and this for predictions (preds):

detections_dict = {
  "detection_boxes": [ymin, xmin, ymax, xmax],
  "detection_scores": [num_boxes],
  "detection_classes": [num_boxes]
}

I find this intuitive and would go the same way. One could discuss if we want different dict keys for groundtruth and detections (but I think we should do things more standardized).

The CocoEval class needs a list of values anyways, but I guess it would be more consistent for the torchmetrics API to expect fiels of type Tensor? -> I drafted a version that allows torch.Tensor, np.ndarray and List

Interesting sidenote: They also use a pycocotools wrapper.

@twsl
Copy link
Contributor

twsl commented Aug 25, 2021

I'd like to add, that this would be beneficial for object detection and segmentation as well

@tkupek
Copy link
Contributor

tkupek commented Aug 26, 2021

@twsl we are implementing this for object detection (bounding boxes). You can have a look at the PR.
Do you know if the pycocotools work for segmentation as well? Maybe you even have an example?

@twsl
Copy link
Contributor

twsl commented Aug 26, 2021

Yes, i do think so.
Actually something similar based on IoU is already available here

@tkupek
Copy link
Contributor

tkupek commented Aug 26, 2021

@SkafteNicki do you think this needs to be a part of this PR or can we make this an additional improvement?

@SkafteNicki
Copy link
Member

@tkupek, no lets try to get the metric merge as it is right now.
@twsl feel free to open a new issue when this is closed with the proposed enchancements :]

@senarvi
Copy link

senarvi commented Sep 12, 2021

PyTorch Lightning Bolts includes two object detection models. They take a list of dictionaries containing:

  • "boxes": [num_boxes, 4] the ground truth boxes in (x1, y1, x2, y2) format
  • "labels": [num_boxes] the class label for each ground truth box

The detections contain:

  • "boxes": [num_boxes, 4] predicted boxes in (x1, y1, x2, y2) format
  • "scores": [num_boxes] detection confidences
  • "labels": [num_boxes] the predicted class label for each box

Torchvision ops, for example NMS, take "boxes" and "scores" consistently in the same format:

It would be convenient to standardize the keys and use the same keys as what the models use. (We can also change the models to use "classes" instead of "labels" if that's better.) Most importantly, I would prefer to use the same format for the boxes (x1, y1, x2, y2). @tkupek why do you think (y1, x1, y2, x2) is intuitive? I haven't seen this format being used before? Also note that torchvision.ops.box_convert() supports formats "xyxy" and "xywh", but not "yxyx"

@tkupek
Copy link
Contributor

tkupek commented Sep 12, 2021

@senarvi Thank you for your comment.
I think this is reasonable and fixed the dict keys to your suggestion (boxes, scores, labels).

The (y1, x1, y2, x2) format was an error in the docstring, we actually do use (x1, y1, x2, y2). Fixed that, too.

@SkafteNicki @Borda where are we on the issues with DDP and adjusting the test. Any time to look at this yet? I'm happy to help but I need some hints.

@Borda Borda removed the help wanted Extra attention is needed label Sep 20, 2021
@twsl
Copy link
Contributor

twsl commented Sep 23, 2021

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

@senarvi
Copy link

senarvi commented Sep 23, 2021

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

Using the COCOeval based metric is slow. I haven't checked if it's the the GPU-CPU transfer or if the operation is just so slow. If you can make it faster, that would be great, but I have a feeling that it's going take some effort to utilize the GPU efficiently and still keep the implementation correct.

@tkupek
Copy link
Contributor

tkupek commented Sep 23, 2021

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

I agree with @senarvi, it would be great but is probably not trivial. Feel free to give it a try based on the existing PR and let us know what you learned 🙂

@tkupek
Copy link
Contributor

tkupek commented Oct 15, 2021

The PR looks quite complete to me. Can we finalize it?
#467

Image processing automation moved this from To do to Done Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging a pull request may close this issue.