Improve mAP performance #742

twsl · 2022-01-11T10:11:44Z

What does this PR do?

Fixes #677

Work of @tkupek @OlofHarrysson @twsl
First steps to get performance on par with pycocotools/numpy

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

for more information, see https://pre-commit.ci

codecov · 2022-01-11T10:14:54Z

Codecov Report

Merging #742 (a04dbab) into master (9c9c430) will decrease coverage by 8%.
The diff coverage is 97%.

@@          Coverage Diff           @@
##           master   #742    +/-   ##
======================================
- Coverage      91%    83%    -8%     
======================================
  Files         166    166            
  Lines        6817   6832    +15     
======================================
- Hits         6181   5651   -530     
- Misses        636   1181   +545

tkupek · 2022-01-17T07:52:49Z

The performance improvements are great and really necessary.

However, CUDA calculations are still by a factor 9-10 slower than CPU. Computations are not CUDA optimized and I don't know if it's possible (or how much effort it is).

Running metric on 10 samples on cuda:0
Total time: 8.060346603393555
Time per sample 0.8060346603393554

Running metric on 10 samples on cpu
Total time: 0.98065185546875
Time per sample 0.098065185546875

Until then, I suggest to make the mAP metric CPU only, to avoid that somebody uses the slow CUDA version.
I would move everything to CPU, as we did it with the pycocotools implementation.

@twsl @Borda @OlofHarrysson @SkafteNicki

Borda · 2022-01-17T08:48:23Z

Until then, I suggest to make the mAP metric CPU only, to avoid that somebody uses the slow CUDA version.
I would move everything to CPU, as we did it with the pycocotools implementation.

sounds reasonable to me 🐰

* Remove deprecated functions, and warnings * Update links for docstring * chlog Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

for more information, see https://pre-commit.ci

tkupek · 2022-01-27T11:12:44Z

I ran a benchmark on a real-works use-case with 1088 samples and ~10 bounding boxes per sample.

Pycocotools (previous implementation): 74.91s
Current implementation: 1742.07s
This branch: 140.79s

IMHO this should be merged and released ASAP to make the metric usable again on GPUs.
Further improvements should be done to get closer to the pycocotools benchmark.

@twsl @OlofHarrysson @Borda

torchmetrics/detection/map.py

SkafteNicki · 2022-01-27T17:06:11Z

@tkupek @OlofHarrysson @twsl thanks for really trying to improve performance for this metric.
I agree that we should merge this and do a small release.
However, out hole GPU testing pipeline is down at the moment so no PRs are currently getting merged :(

torchmetrics/detection/map.py

SkafteNicki · 2022-01-31T19:20:55Z

@Borda please approve and merge.
Passes locally for multi-gpu support:

* Simplify id generation * rework and speed up _find_best_gt_match * add: Refactor to avoid duplicate calculations * precision,recall,scores on correct device (-20%) * arguments to python lists * enumerate instead of range * compute on device * Remove exception * Replace prec score loop * Fix auc flattening * draft to run metric on cpu only * move tensors to cpu on compute, need to be on GPU for multi GPU syncing Co-authored-by: tobias-kupek-swarm <tobias.kupek@swarm-analytics.com> Co-authored-by: Olof Harrysson <harrysson.olof@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tobias Kupek <tkupek@users.noreply.github.com> Co-authored-by: SkafteNicki <skaftenicki@gmail.com> (cherry picked from commit 408cabe)

Borda · 2022-02-22T13:40:50Z

Pycocotools (previous implementation): 74.91s
Current implementation: 1742.07s
This branch: 140.79s

Further improvements should be done to get closer to the pycocotools benchmark.

@twsl @tkupek do you think that you can make further improvements to get speed parity with cocotooks?

tkupek · 2022-02-24T12:09:39Z

Unfortunately, I don't have any concrete ideas in mind, nor the time to look at it right now.

twsl · 2022-02-24T13:00:34Z

I might give it another shot after i handed in my thesis next week. But cant promise any results or timeline

twsl and others added 14 commits December 27, 2021 23:31

Simplify id generation

2db054f

rework and speed up _find_best_gt_match

34f673e

fix gpu test and move inputs to gpu

c66de8c

fix: boxes xywh format

b926929

add: Refactor to avoid duplicate calculations

ebff078

precision,recall,scores on correct device (-20%)

d260d81

arguments to python lists

ee1d0a0

enumerate instead of range

4fd42ba

compute on device

45806f7

Remove exception

ab91460

Replace prec score loop

a714acf

Fix auc flattening

27783a0

Merge branch 'PyTorchLightning:master' into fix/map-perf

d67808f

[pre-commit.ci] auto fixes from pre-commit.com hooks

7274117

for more information, see https://pre-commit.ci

Borda added the enhancement New feature or request label Jan 12, 2022

ashutoshml and others added 4 commits January 18, 2022 23:54

Remove deprecated functions, and warnings - Text (#773)

43a2261

* Remove deprecated functions, and warnings * Update links for docstring * chlog Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

draft to run metric on cpu only

d68aaaa

Merge branch 'master' into fix/map-perf

738369c

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ab73ae

for more information, see https://pre-commit.ci

Borda force-pushed the master branch from cfe5e87 to cccb7a6 Compare January 19, 2022 22:07

move tensors to cpu on compute, need to be on GPU for multi GPU syncing

48675de

Borda assigned SkafteNicki Jan 27, 2022

twsl added 3 commits January 27, 2022 12:24

Remove performance test script

b3884c3

Merge branch 'master' into fix/map-perf

38ed36a

Remove unused imports

5bf856f

twsl marked this pull request as ready for review January 27, 2022 11:31

twsl requested review from Borda, ethanwharris, justusschock, SeanNaren, SkafteNicki and tchaton as code owners January 27, 2022 11:31

changelog

f65ce31

SkafteNicki approved these changes Jan 27, 2022

View reviewed changes

torchmetrics/detection/map.py Show resolved Hide resolved

torchmetrics/detection/map.py Outdated Show resolved Hide resolved

torchmetrics/detection/map.py Show resolved Hide resolved

torchmetrics/detection/map.py Show resolved Hide resolved

mergify bot added ready has conflicts labels Jan 27, 2022

Merge branch 'master' into fix/map-perf

c154707

mergify bot removed has conflicts ready labels Jan 27, 2022

Merge branch 'master' into fix/map-perf

789eaf5

mergify bot added the ready label Jan 31, 2022

SkafteNicki added 2 commits January 31, 2022 16:39

fix mypy

bf3f4f0

suggestions

5b2ff30

SkafteNicki reviewed Jan 31, 2022

View reviewed changes

torchmetrics/detection/map.py Outdated Show resolved Hide resolved

Update torchmetrics/detection/map.py

a04dbab

Borda approved these changes Jan 31, 2022

View reviewed changes

Borda merged commit 408cabe into Lightning-AI:master Jan 31, 2022

twsl deleted the fix/map-perf branch February 1, 2022 02:01

aaronzs mentioned this pull request Feb 28, 2022

MeanAveragePrecision CPU overload when using multi-device ddp #866

Closed

SkafteNicki mentioned this pull request May 10, 2022

MeanAveragePrecision is slow #1024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve mAP performance #742

Improve mAP performance #742

twsl commented Jan 11, 2022

codecov bot commented Jan 11, 2022 •

edited

Loading

tkupek commented Jan 17, 2022

Borda commented Jan 17, 2022

tkupek commented Jan 27, 2022

SkafteNicki commented Jan 27, 2022

SkafteNicki commented Jan 31, 2022

Borda commented Feb 22, 2022

tkupek commented Feb 24, 2022

twsl commented Feb 24, 2022

Improve mAP performance #742

Improve mAP performance #742

Conversation

twsl commented Jan 11, 2022

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Jan 11, 2022 • edited Loading

Codecov Report

tkupek commented Jan 17, 2022

Borda commented Jan 17, 2022

tkupek commented Jan 27, 2022

SkafteNicki commented Jan 27, 2022

SkafteNicki commented Jan 31, 2022

Borda commented Feb 22, 2022

tkupek commented Feb 24, 2022

twsl commented Feb 24, 2022

codecov bot commented Jan 11, 2022 •

edited

Loading