MeanAveragePrecision is slow #1024

senarvi · 2022-05-10T13:20:41Z

🐛 Bug

It's extremely slow to compute the mean-average-precision since torchmetrics > 0.6.0.

To Reproduce

I noticed that my training times have almost doubled since I upgraded torchmetrics from 0.6.0, because validation using the MAP / MeanAveragePrecision metric is so much slower. During validation steps I call update(), and in the end of a validation epoch I call compute() on the MeanAveragePrecision object.

I calculated the time that spent inside compute() with different torchmetrics versions:

torchmetrics 0.6.0: 12 s
torchmetrics 0.6.1: didn't work for some reason
torchmetrics 0.6.2: 9.5 min
torchmetrics 0.7.0: 9.4 min
torchmetrics 0.7.1: 1.9 min
torchmetrics 0.7.2: 2.0 min
torchmetrics 0.7.3: 1.9 min
torchmetrics 0.8.0: 4.5 min
torchmetrics 0.8.1: 4.6 min
torchmetrics 0.8.2: 4.6 min

It seems that after 0.6.0 the time to run compute() has increased from 10 seconds to 9.5 minutes. In 0.7.1 it was improved and took 2 minutes. Then in 0.8.0 things got worse again and it took 4.5 minutes to run compute(). This is more than 20x slower than with 0.6.0 and for example when training 100 epochs adds another 7 hours to the training time.

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): 0.6.0 through 0.8.2, installed using pip
Python & PyTorch Version (e.g., 1.0): Python 3.8.11, PyTorch 1.10.0
Any other relevant information such as OS (e.g., Linux): Linux

The text was updated successfully, but these errors were encountered:

github-actions · 2022-05-10T13:21:21Z

Hi! thanks for your contribution!, great first issue!

Borda · 2022-05-10T14:09:12Z

interesting and thx for such rigorous comparison per version... Could you pls also share your benchmarking code?

SkafteNicki · 2022-05-10T14:24:50Z

So the change from 0.6.0 to 0.6.2+ was due to us changing from using Pycocotools to pure Pytorch Torch-based mAP #632
In 0.7.1 we did improve performance Improve mAP performance #742
I assume that in 0.8.0 it was this PR that made it slower Fix MAP device placement #950

@Borda we can either try to improve our own version to get computational time down (if that is possible) or have an option to have Pycocotools as backend for users

SkafteNicki · 2022-05-11T05:59:30Z

@senarvi could you kindly provide the benchmarking script that you have used, such that we have something to use when trying to improve runtime

senarvi · 2022-05-11T06:47:08Z

I created a subclass of the YOLO model, but I quess you could subclass any model in the same way. I used proprietary data, but you could use any data. However, if we want to compare each other's results, we should decide what model and data we use. But maybe it would be easiest to just create a bunch of random detections and targets? Anyway, this is more or less the code that I used:

import time
from pl_bolts.models.detection import YOLO

try:
    from torchmetrics.detection import MeanAveragePrecision
except ImportError:
    from torchmetrics.detection import MAP
    MeanAveragePrecision = MAP

class LogTime:
    def __init__(self, name, model):
        self._name = name
        self._model = model

    def __enter__(self):
        self._start_time = time.perf_counter()

    def __exit__(self, exc_type, exc_val, exc_tb):
        end_time = time.perf_counter()
        self._model.log(self._name, end_time - self._start_time)
        return True

class TimingYOLO(YOLO):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._val_map = MeanAveragePrecision()
        self._test_map = MeanAveragePrecision()

    def validation_step(self, batch, batch_idx):
        with LogTime("val/time_detection", self):
            images, targets = self._validate_batch(batch)
            detections, losses = self(images, targets)

        with LogTime("val/time_processing", self):
            detections = self.process_detections(detections)
            targets = self.process_targets(targets)
            self._val_map.update(detections, targets)

    def validation_epoch_end(self, outputs):
        with LogTime("val/time_scoring", self):
            map_scores = self._val_map.compute()
            map_scores = {"val/" + k: v for k, v in map_scores.items()}
            self.log_dict(map_scores, sync_dist=True)
            self._val_map.reset()

    def test_step(self, batch, batch_idx):
        with LogTime("test/time_detection", self):
            images, targets = self._validate_batch(batch)
            detections, losses = self(images, targets)

        with LogTime("test/time_processing", self):
            detections = self.process_detections(detections)
            targets = self.process_targets(targets)
            self._test_map.update(detections, targets)

    def test_epoch_end(self, outputs):
        with LogTime("test/time_scoring", self):
            map_scores = self._test_map.compute()
            map_scores = {"test/" + k: v for k, v in map_scores.items()}
            self.log_dict(map_scores, sync_dist=True)
            self._test_map.reset()

SkafteNicki · 2022-05-11T07:06:27Z

@senarvi, yes the model does not really matter, it is the metric computations that are the important.
Essentially, this is the code we want to measure:

with LogTime("init"):
   metric = MeanAveragePrecision()
with LogTime("update"):
   for batch in dataloader:
      metric.update(batch)
with LogTime("compute"):
   _ = metric.compute()

This should remove any variation from other lightning code. If you could provide some code to generate random data for testing that should therefore be enough.

senarvi · 2022-05-11T08:38:37Z

I can try to do that.

senarvi · 2022-05-11T13:48:45Z

Here's some kind of a benchmark script:

import time
import torch

try:
    from torchmetrics.detection import MeanAveragePrecision
except ImportError:
    from torchmetrics.detection import MAP
    MeanAveragePrecision = MAP

total_time = dict()

class UpdateTime:
    def __init__(self, name):
        self._name = name

    def __enter__(self):
        self._start_time = time.perf_counter()

    def __exit__(self, exc_type, exc_val, exc_tb):
        end_time = time.perf_counter()
        if self._name in total_time:
            total_time[self._name] += end_time - self._start_time
        else:
            total_time[self._name] = end_time - self._start_time
        return True

def generate(n):
    boxes = torch.rand(n, 4) * 1000
    boxes[:, 2:] += boxes[:, :2]
    labels = torch.randint(0, 10, (n,))
    scores = torch.rand(n)
    return {"boxes": boxes, "labels": labels, "scores": scores}

with UpdateTime("init"):
    map = MeanAveragePrecision()

for batch_idx in range(100):
    with UpdateTime("update"):
        detections = [generate(100) for _ in range(10)]
        targets = [generate(10) for _ in range(10)]
        map.update(detections, targets)

with UpdateTime("compute"):
    map.compute()

for name, time in total_time.items():
    print(f"Total time in {name}: {time}")

My results:

$ pip install torchmetrics==0.6.0
$ ./map_benchmark.py
Total time in init: 1.5747292000014568
Total time in update: 0.1246876999939559
Total time in compute: 6.245588799996767
$ pip install torchmetrics==0.8.2
$ ./map_benchmark.py
Total time in init: 0.0003580999909900129
Total time in update: 0.08986139997432474
Total time in compute: 151.69804470000963

SkafteNicki · 2022-05-12T07:49:04Z

@senarvi cool, this already clearly shows that any improvements that we should be able to do is in compute :)

SkafteNicki · 2022-05-12T12:00:19Z

I just ran a profile of the script.

It is clear that the majority of time is spend in the _find_best_gt_match function. It is important to note that each function call is actually quite fast, however the function gets called a ridicules number of times.

Borda · 2022-05-12T12:41:27Z

It is clear that the majority of time is spend in the _find_best_gt_match function. It is important to note that each function call is actually quite fast, however the function gets called a ridicules number of times.

very nice finding, does any of you want to take it and find some boost?
cc: @twsl @PyTorchLightning/core-metrics

24hours · 2022-06-10T06:17:08Z

Perhaps we can implement this for fast evaluation.
It does require c++ however.

https://github.com/facebookresearch/detectron2/blob/cbbc1ce26473cb2a5cc8f58e8ada9ae14cb41052/detectron2/evaluation/fast_eval_api.py

justusschock · 2022-06-10T07:02:07Z

@24hours I think the way to go here would be to first try and clean up the code before we decide to dispatch to C++

mjun0812 · 2022-09-17T14:32:48Z

Hi. Thanks for the great package.

I found the following simple corrections and redundancies about mean_ap.py

https://github.com/Lightning-AI/metrics/blob/fbe06ef4d9c5b5a21ead813e40d371d6fdbd3cd7/src/torchmetrics/detection/mean_ap.py#L479-L481

The above code should be changed as follows.
It may not have much effect on speed, but it should be better than scanning every box.

if len(inds) > max_det:
    inds = inds[:max_det]
det = [det[i] for i in inds]

The above code will appear in other parts of the program, so some changes are necessary.

https://github.com/Lightning-AI/metrics/blob/fbe06ef4d9c5b5a21ead813e40d371d6fdbd3cd7/src/torchmetrics/detection/mean_ap.py#L526-L528

https://github.com/Lightning-AI/metrics/blob/fbe06ef4d9c5b5a21ead813e40d371d6fdbd3cd7/src/torchmetrics/detection/mean_ap.py#L603-L605

I also think there is some redundancy here.

https://github.com/Lightning-AI/metrics/blob/fbe06ef4d9c5b5a21ead813e40d371d6fdbd3cd7/src/torchmetrics/detection/mean_ap.py#L458-L471

l. 464 and l.470 appear to be a duplicate.

mjun0812 · 2022-09-18T16:46:51Z

According to my quick survey, this seems to be the most bottlenecked area

https://github.com/Lightning-AI/metrics/blob/fbe06ef4d9c5b5a21ead813e40d371d6fdbd3cd7/src/torchmetrics/detection/mean_ap.py#L734-L739

stancld · 2022-10-10T07:10:43Z

Hi @senarvi, thanks for reporting this issue. Could you please try the implementation from #1259 and verify if you can observe any improvement, and if all results are correct? :] I believe there is more space to optimize the metric, but let's go step by step :]

senarvi · 2022-10-17T17:51:17Z

Hi @stancld , sorry I didn't have time to respond earlier. I wrote my observations in that pull request. It was indeed a lot faster, but in one case the results were significantly different. I don't normally see such a large variation between test runs.

stancld · 2022-10-17T17:56:44Z

Hi @senarvi, thanks for the feedback. Do you have any batch example, you can share, where the results are different so that we can test it and debug please?

senarvi · 2022-10-17T20:07:56Z

@stancld Hmm. The data's not public. I wonder if you could debug it using random boxes, like in the speed test. I modified it to make the task a little bit easier and to make sure that the results are deterministic:

import torch
from torchmetrics.detection import MeanAveragePrecision

torch.manual_seed(1)

def generate(n):
    boxes = torch.rand(n, 4) * 10
    boxes[:, 2:] += boxes[:, :2] + 10
    labels = torch.randint(0, 2, (n,))
    scores = torch.rand(n)
    return {"boxes": boxes, "labels": labels, "scores": scores}

batches = []
for _ in range(100):
    detections = [generate(100) for _ in range(10)]
    targets = [generate(10) for _ in range(10)]
    batches.append((detections, targets))

map = MeanAveragePrecision()
for detections, targets in batches:
    map.update(detections, targets)
print(map.compute())

With torchmetrics 0.10.0 I get:

map: 0.1534
map_50: 0.5260
map_75: 0.0336
map_small: 0.1534
mar_1: 0.0449
mar_10: 0.3039
mar_100: 0.5445
mar_small: 0.5445

With the code from your PR I get

map: 0.2222
map_50: 0.7135
map_75: 0.0594
map_small: 0.2222
mar_1: 0.0449
mar_10: 0.4453
mar_100: 2.2028
mar_small: 2.2028

Some recall values are also > 1.

wilderrodrigues · 2022-12-13T22:19:05Z

Hi there,

I just fixed it. :) PR coming your way.

Run 1 - GPU:

Total time in init: 1.0306465921457857
Total time in update: 0.0780688391532749
Total time in compute: 241.5502170859836

Run 2 - GPU:

Total time in init: 1.086872072191909
Total time in update: 0.07920253812335432
Total time in compute: 2.4084888100624084

wilderrodrigues · 2022-12-13T23:12:35Z

Fix is ready, with unit tests to make sure we do not face the same regression again. It check the runtime against CPU and GPU with a relative tolerance of 1.0.

I'm creating a PR at the moment with more details.

DataAndi · 2023-03-13T13:36:57Z

Does anyone else have very long compute times for metric.compute() ( Mean Average Precision).
I have for version:
0.11.0 == 420s
011.1 == 394s
0.11.2 == 382s
011.3 == 371s
0.11.4 == 396s
Used the above mentioned script to evaluate the computation time.
Or is there maybe a way to compute it faster with cuda or something?

heitorrapela · 2023-03-27T21:32:47Z

@DataAndi, I was having the same problem with the 0.8.2 until I found this thread, then I downgraded to 0.6.0 as I am only using the mAP from torchmetrics. I hope the calculation from 0.6.0 is fine because it is much faster.

ckyrkou · 2023-04-18T17:45:05Z

@DataAndi, I was having the same problem with the 0.8.2 until I found this thread, then I downgraded to 0.6.0 as I am only using the mAP from torchmetrics. I hope the calculation from 0.6.0 is fine because it is much faster.

So this is the best option for now? For how long will 0.6.0 be active?

heitorrapela · 2023-04-18T17:49:47Z

@ckyrkou I think there is no problem, at least with using the 0.6.0. The main problem is compatibility with other libraries. I tried to upgrade other libraries like torchvision and torch with its last versions, but there is no compatibility with the torchmetrics 0.6.0, so I will keep the ones that I am using for now. I don't think they will remove 0.6.0 👍🏻

ckyrkou · 2023-04-18T17:55:39Z

@ckyrkou I think there is no problem, at least with using the 0.6.0. The main problem is compatibility with other libraries. I tried to upgrade other libraries like torchvision and torch with its last versions, but there is no compatibility with the torchmetrics 0.6.0, so I will keep the ones that I am using for now. I don't think they will remove 0.6.0 👍🏻

I have just installed 0.6.0 to try it out. When I import
from torchmetrics.detection.mean_ap import MeanAveragePrecision

I get an error that mean_ap does not exist. I guess things changed between versions. Any idea how this used to be used?

heitorrapela · 2023-04-18T18:03:50Z

@ckyrkou yes, they just changed the path and name definition. You should use as follow:

# This is to prevent the different definitions of mAP on diff versions of torchmetrics
try:
    from torchmetrics.detection import MeanAveragePrecision
except ImportError:
    from torchmetrics.detection import MAP
    MeanAveragePrecision = MAP

my_map = MeanAveragePrecision()

Borda · 2023-04-18T18:27:01Z

@heitorrapela @ckyrkou I am very happy about your interest, and we are trying to improve, but this is quite challenging, so any help would be very welcome...
see some ongoing PRs: #1389 #1330

ckyrkou · 2023-04-19T19:12:24Z

@heitorrapela @ckyrkou I am very happy about your interest, and we are trying to improve, but this is quite challenging, so any help would be very welcome... see some ongoing PRs: #1389 #1330

The implementation in this repo is quite fast if you want to look into it.
https://github.com/MathGaron/mean_average_precision

tkupek · 2023-06-05T10:52:10Z

@Borda we can either try to improve our own version to get computational time down (if that is possible) or have an option to have Pycocotools as backend for users

@SkafteNicki How about this solution from your first response? Maybe we can make an optional pycocotools backend available so users can manually switch without downgrading to 0.6.0?

SkafteNicki · 2023-06-05T11:50:41Z

@tkupek yeah, I am more and more moving in that direction. I would actually reintroduce pycocotools as an required dependency to MAP because we are seeing multiple issues that indicates that something is wrong with our implementation:

As the primary maintainer of TM, I do not have the expertise in MAP required to solve these issues and the metric is therefore unmaintainable at the moment. And relying on contributors from experts does not seem to be the solution (because you have other things to do). I would therefore much rather accept defeat and revert back to something, where all details about the calculation is dealt with by pycocotools and I only need to worry about the user interface.

One consequence that this will have, is that since v0.6 we have introduced the iou_type argument. Is it possible to convert the input when using iou_type="segm" to iou_type="bbox" such that we in both cases can rely on pycocotools to do the calculation? Else I would propose that we

Reintroduce implementation from v0.6 called BBoxMeanAveragePrecision that corresponds to iou_type="bbox"
Refactor current implemetation into new metric SegmMeanAveragePrecision that corresponds to `iou_type="segm"

Pinging @Borda, @justusschock, @senarvi, @wilderrodrigues for opinions.

tkupek · 2023-06-05T12:00:31Z

@SkafteNicki If I remember correctly, pycocotools should have segmentation support, so it might be easy to keep the iou_type parameter:
https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L178

SkafteNicki · 2023-06-05T12:08:11Z

@SkafteNicki If I remember correctly, pycocotools should have segmentation support, so it might be easy to keep the iou_type parameter: https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L178

Super cool, I would also prefer to keep everything as it is and just change the backend to pycocotools.

senarvi added bug / fix Something isn't working help wanted Extra attention is needed labels May 10, 2022

Borda added enhancement New feature or request and removed bug / fix Something isn't working labels May 10, 2022

Borda added this to To do in Image processing via automation May 10, 2022

stancld mentioned this issue Oct 8, 2022

Vectorize _find_best_gt_match in MeanAveragePrecision #1259

Merged

4 tasks

stancld mentioned this issue Oct 17, 2022

Incorrect implementation of MAP #1275

Closed

This was referenced Dec 9, 2022

Downgrade torchmetrics to 0.6.0 IntelLabs/MART#18

Closed

Downgrade torchmetrics to 0.6.0 IntelLabs/MART#19

Merged

wilderrodrigues mentioned this issue Dec 13, 2022

Improve Mean Average Precision performance #1389

Closed

4 tasks

dxoigmn mentioned this issue Jan 31, 2023

Bump torchmetrics from 0.6.0 to 0.11.1 IntelLabs/MART#57

Closed

FANGAreNotGnu mentioned this issue May 4, 2023

Add support for python 3.11 autogluon/autogluon#3190

Merged

dxoigmn mentioned this issue Jun 5, 2023

Update torch requirement from ~=1.13.1 to ~=2.0.1 IntelLabs/MART#148

Open

mzweilin mentioned this issue Jun 6, 2023

Upgrade template 1.4.0->2.0.2 IntelLabs/MART#157

Draft

13 tasks

SkafteNicki mentioned this issue Jun 8, 2023

Revert MAP implementation to pycocotools backend #1832

Merged

13 tasks

Borda closed this as completed in #1832 Jul 3, 2023

Image processing automation moved this from To do to Done Jul 3, 2023

Borda added the topic: Image label Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MeanAveragePrecision is slow #1024

MeanAveragePrecision is slow #1024

senarvi commented May 10, 2022

github-actions bot commented May 10, 2022

Borda commented May 10, 2022

SkafteNicki commented May 10, 2022

SkafteNicki commented May 11, 2022

senarvi commented May 11, 2022 •

edited

SkafteNicki commented May 11, 2022

senarvi commented May 11, 2022

senarvi commented May 11, 2022 •

edited

SkafteNicki commented May 12, 2022

SkafteNicki commented May 12, 2022

Borda commented May 12, 2022

24hours commented Jun 10, 2022

justusschock commented Jun 10, 2022

mjun0812 commented Sep 17, 2022

mjun0812 commented Sep 18, 2022

stancld commented Oct 10, 2022 •

edited

senarvi commented Oct 17, 2022

stancld commented Oct 17, 2022

senarvi commented Oct 17, 2022

wilderrodrigues commented Dec 13, 2022

wilderrodrigues commented Dec 13, 2022

DataAndi commented Mar 13, 2023 •

edited

heitorrapela commented Mar 27, 2023 •

edited

ckyrkou commented Apr 18, 2023

heitorrapela commented Apr 18, 2023 •

edited

ckyrkou commented Apr 18, 2023

heitorrapela commented Apr 18, 2023 •

edited

Borda commented Apr 18, 2023

ckyrkou commented Apr 19, 2023

tkupek commented Jun 5, 2023

SkafteNicki commented Jun 5, 2023

tkupek commented Jun 5, 2023

SkafteNicki commented Jun 5, 2023

MeanAveragePrecision is slow #1024

MeanAveragePrecision is slow #1024

Comments

senarvi commented May 10, 2022

🐛 Bug

To Reproduce

Environment

github-actions bot commented May 10, 2022

Borda commented May 10, 2022

SkafteNicki commented May 10, 2022

SkafteNicki commented May 11, 2022

senarvi commented May 11, 2022 • edited

SkafteNicki commented May 11, 2022

senarvi commented May 11, 2022

senarvi commented May 11, 2022 • edited

SkafteNicki commented May 12, 2022

SkafteNicki commented May 12, 2022

Borda commented May 12, 2022

24hours commented Jun 10, 2022

justusschock commented Jun 10, 2022

mjun0812 commented Sep 17, 2022

mjun0812 commented Sep 18, 2022

stancld commented Oct 10, 2022 • edited

senarvi commented Oct 17, 2022

stancld commented Oct 17, 2022

senarvi commented Oct 17, 2022

wilderrodrigues commented Dec 13, 2022

wilderrodrigues commented Dec 13, 2022

DataAndi commented Mar 13, 2023 • edited

heitorrapela commented Mar 27, 2023 • edited

ckyrkou commented Apr 18, 2023

heitorrapela commented Apr 18, 2023 • edited

ckyrkou commented Apr 18, 2023

heitorrapela commented Apr 18, 2023 • edited

Borda commented Apr 18, 2023

ckyrkou commented Apr 19, 2023

tkupek commented Jun 5, 2023

SkafteNicki commented Jun 5, 2023

tkupek commented Jun 5, 2023

SkafteNicki commented Jun 5, 2023

senarvi commented May 11, 2022 •

edited

senarvi commented May 11, 2022 •

edited

stancld commented Oct 10, 2022 •

edited

DataAndi commented Mar 13, 2023 •

edited

heitorrapela commented Mar 27, 2023 •

edited

heitorrapela commented Apr 18, 2023 •

edited

heitorrapela commented Apr 18, 2023 •

edited