-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
MeanAveragePrecision is slow #1024
Comments
Hi! thanks for your contribution!, great first issue! |
interesting and thx for such rigorous comparison per version... Could you pls also share your benchmarking code? |
@Borda we can either try to improve our own version to get computational time down (if that is possible) or have an option to have Pycocotools as backend for users |
@senarvi could you kindly provide the benchmarking script that you have used, such that we have something to use when trying to improve runtime |
I created a subclass of the YOLO model, but I quess you could subclass any model in the same way. I used proprietary data, but you could use any data. However, if we want to compare each other's results, we should decide what model and data we use. But maybe it would be easiest to just create a bunch of random detections and targets? Anyway, this is more or less the code that I used: import time
from pl_bolts.models.detection import YOLO
try:
from torchmetrics.detection import MeanAveragePrecision
except ImportError:
from torchmetrics.detection import MAP
MeanAveragePrecision = MAP
class LogTime:
def __init__(self, name, model):
self._name = name
self._model = model
def __enter__(self):
self._start_time = time.perf_counter()
def __exit__(self, exc_type, exc_val, exc_tb):
end_time = time.perf_counter()
self._model.log(self._name, end_time - self._start_time)
return True
class TimingYOLO(YOLO):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._val_map = MeanAveragePrecision()
self._test_map = MeanAveragePrecision()
def validation_step(self, batch, batch_idx):
with LogTime("val/time_detection", self):
images, targets = self._validate_batch(batch)
detections, losses = self(images, targets)
with LogTime("val/time_processing", self):
detections = self.process_detections(detections)
targets = self.process_targets(targets)
self._val_map.update(detections, targets)
def validation_epoch_end(self, outputs):
with LogTime("val/time_scoring", self):
map_scores = self._val_map.compute()
map_scores = {"val/" + k: v for k, v in map_scores.items()}
self.log_dict(map_scores, sync_dist=True)
self._val_map.reset()
def test_step(self, batch, batch_idx):
with LogTime("test/time_detection", self):
images, targets = self._validate_batch(batch)
detections, losses = self(images, targets)
with LogTime("test/time_processing", self):
detections = self.process_detections(detections)
targets = self.process_targets(targets)
self._test_map.update(detections, targets)
def test_epoch_end(self, outputs):
with LogTime("test/time_scoring", self):
map_scores = self._test_map.compute()
map_scores = {"test/" + k: v for k, v in map_scores.items()}
self.log_dict(map_scores, sync_dist=True)
self._test_map.reset() |
@senarvi, yes the model does not really matter, it is the metric computations that are the important. with LogTime("init"):
metric = MeanAveragePrecision()
with LogTime("update"):
for batch in dataloader:
metric.update(batch)
with LogTime("compute"):
_ = metric.compute() This should remove any variation from other lightning code. If you could provide some code to generate random data for testing that should therefore be enough. |
I can try to do that. |
Here's some kind of a benchmark script: import time
import torch
try:
from torchmetrics.detection import MeanAveragePrecision
except ImportError:
from torchmetrics.detection import MAP
MeanAveragePrecision = MAP
total_time = dict()
class UpdateTime:
def __init__(self, name):
self._name = name
def __enter__(self):
self._start_time = time.perf_counter()
def __exit__(self, exc_type, exc_val, exc_tb):
end_time = time.perf_counter()
if self._name in total_time:
total_time[self._name] += end_time - self._start_time
else:
total_time[self._name] = end_time - self._start_time
return True
def generate(n):
boxes = torch.rand(n, 4) * 1000
boxes[:, 2:] += boxes[:, :2]
labels = torch.randint(0, 10, (n,))
scores = torch.rand(n)
return {"boxes": boxes, "labels": labels, "scores": scores}
with UpdateTime("init"):
map = MeanAveragePrecision()
for batch_idx in range(100):
with UpdateTime("update"):
detections = [generate(100) for _ in range(10)]
targets = [generate(10) for _ in range(10)]
map.update(detections, targets)
with UpdateTime("compute"):
map.compute()
for name, time in total_time.items():
print(f"Total time in {name}: {time}") My results: $ pip install torchmetrics==0.6.0
$ ./map_benchmark.py
Total time in init: 1.5747292000014568
Total time in update: 0.1246876999939559
Total time in compute: 6.245588799996767
$ pip install torchmetrics==0.8.2
$ ./map_benchmark.py
Total time in init: 0.0003580999909900129
Total time in update: 0.08986139997432474
Total time in compute: 151.69804470000963 |
@senarvi cool, this already clearly shows that any improvements that we should be able to do is in |
very nice finding, does any of you want to take it and find some boost? |
Perhaps we can implement this for fast evaluation. |
@24hours I think the way to go here would be to first try and clean up the code before we decide to dispatch to C++ |
Hi. Thanks for the great package. I found the following simple corrections and redundancies about The above code should be changed as follows. if len(inds) > max_det:
inds = inds[:max_det]
det = [det[i] for i in inds] The above code will appear in other parts of the program, so some changes are necessary. I also think there is some redundancy here. l. 464 and l.470 appear to be a duplicate. |
According to my quick survey, this seems to be the most bottlenecked area |
Hi @stancld , sorry I didn't have time to respond earlier. I wrote my observations in that pull request. It was indeed a lot faster, but in one case the results were significantly different. I don't normally see such a large variation between test runs. |
Hi @senarvi, thanks for the feedback. Do you have any batch example, you can share, where the results are different so that we can test it and debug please? |
@stancld Hmm. The data's not public. I wonder if you could debug it using random boxes, like in the speed test. I modified it to make the task a little bit easier and to make sure that the results are deterministic: import torch
from torchmetrics.detection import MeanAveragePrecision
torch.manual_seed(1)
def generate(n):
boxes = torch.rand(n, 4) * 10
boxes[:, 2:] += boxes[:, :2] + 10
labels = torch.randint(0, 2, (n,))
scores = torch.rand(n)
return {"boxes": boxes, "labels": labels, "scores": scores}
batches = []
for _ in range(100):
detections = [generate(100) for _ in range(10)]
targets = [generate(10) for _ in range(10)]
batches.append((detections, targets))
map = MeanAveragePrecision()
for detections, targets in batches:
map.update(detections, targets)
print(map.compute()) With torchmetrics 0.10.0 I get: map: 0.1534 With the code from your PR I get map: 0.2222 Some recall values are also > 1. |
Hi there, I just fixed it. :) PR coming your way.
|
Fix is ready, with unit tests to make sure we do not face the same regression again. It check the runtime against CPU and GPU with a relative tolerance of 1.0. I'm creating a PR at the moment with more details. |
Does anyone else have very long compute times for metric.compute() ( Mean Average Precision). |
@DataAndi, I was having the same problem with the 0.8.2 until I found this thread, then I downgraded to 0.6.0 as I am only using the mAP from torchmetrics. I hope the calculation from 0.6.0 is fine because it is much faster. |
So this is the best option for now? For how long will 0.6.0 be active? |
@ckyrkou I think there is no problem, at least with using the 0.6.0. The main problem is compatibility with other libraries. I tried to upgrade other libraries like torchvision and torch with its last versions, but there is no compatibility with the torchmetrics 0.6.0, so I will keep the ones that I am using for now. I don't think they will remove 0.6.0 馃憤馃徎 |
I have just installed 0.6.0 to try it out. When I import I get an error that mean_ap does not exist. I guess things changed between versions. Any idea how this used to be used? |
@ckyrkou yes, they just changed the path and name definition. You should use as follow:
|
@heitorrapela @ckyrkou I am very happy about your interest, and we are trying to improve, but this is quite challenging, so any help would be very welcome... |
The implementation in this repo is quite fast if you want to look into it. |
@SkafteNicki How about this solution from your first response? Maybe we can make an optional |
@tkupek yeah, I am more and more moving in that direction. I would actually reintroduce
As the primary maintainer of TM, I do not have the expertise in MAP required to solve these issues and the metric is therefore unmaintainable at the moment. And relying on contributors from experts does not seem to be the solution (because you have other things to do). I would therefore much rather accept defeat and revert back to something, where all details about the calculation is dealt with by One consequence that this will have, is that since v0.6 we have introduced the
Pinging @Borda, @justusschock, @senarvi, @wilderrodrigues for opinions. |
@SkafteNicki If I remember correctly, |
Super cool, I would also prefer to keep everything as it is and just change the backend to |
馃悰 Bug
It's extremely slow to compute the mean-average-precision since torchmetrics > 0.6.0.
To Reproduce
I noticed that my training times have almost doubled since I upgraded torchmetrics from 0.6.0, because validation using the MAP / MeanAveragePrecision metric is so much slower. During validation steps I call
update()
, and in the end of a validation epoch I callcompute()
on theMeanAveragePrecision
object.I calculated the time that spent inside
compute()
with different torchmetrics versions:It seems that after 0.6.0 the time to run
compute()
has increased from 10 seconds to 9.5 minutes. In 0.7.1 it was improved and took 2 minutes. Then in 0.8.0 things got worse again and it took 4.5 minutes to runcompute()
. This is more than 20x slower than with 0.6.0 and for example when training 100 epochs adds another 7 hours to the training time.Environment
conda
,pip
, build from source): 0.6.0 through 0.8.2, installed using pipThe text was updated successfully, but these errors were encountered: