Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mAP performance #742

Merged
merged 28 commits into from
Jan 31, 2022
Merged

Improve mAP performance #742

merged 28 commits into from
Jan 31, 2022

Conversation

twsl
Copy link
Contributor

@twsl twsl commented Jan 11, 2022

What does this PR do?

Fixes #677

Work of @tkupek @OlofHarrysson @twsl
First steps to get performance on par with pycocotools/numpy

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Jan 11, 2022

Codecov Report

Merging #742 (a04dbab) into master (9c9c430) will decrease coverage by 8%.
The diff coverage is 97%.

@@          Coverage Diff           @@
##           master   #742    +/-   ##
======================================
- Coverage      91%    83%    -8%     
======================================
  Files         166    166            
  Lines        6817   6832    +15     
======================================
- Hits         6181   5651   -530     
- Misses        636   1181   +545     

@Borda Borda added the enhancement New feature or request label Jan 12, 2022
@tkupek
Copy link
Contributor

tkupek commented Jan 17, 2022

The performance improvements are great and really necessary.

However, CUDA calculations are still by a factor 9-10 slower than CPU. Computations are not CUDA optimized and I don't know if it's possible (or how much effort it is).

Running metric on 10 samples on cuda:0
Total time: 8.060346603393555
Time per sample 0.8060346603393554

Running metric on 10 samples on cpu
Total time: 0.98065185546875
Time per sample 0.098065185546875

Until then, I suggest to make the mAP metric CPU only, to avoid that somebody uses the slow CUDA version.
I would move everything to CPU, as we did it with the pycocotools implementation.

@twsl @Borda @OlofHarrysson @SkafteNicki

@Borda
Copy link
Member

Borda commented Jan 17, 2022

Until then, I suggest to make the mAP metric CPU only, to avoid that somebody uses the slow CUDA version.
I would move everything to CPU, as we did it with the pycocotools implementation.

sounds reasonable to me 🐰

ashutoshml and others added 4 commits January 18, 2022 23:54
* Remove deprecated functions, and warnings
* Update links for docstring
* chlog

Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@tkupek
Copy link
Contributor

tkupek commented Jan 27, 2022

I ran a benchmark on a real-works use-case with 1088 samples and ~10 bounding boxes per sample.

Pycocotools (previous implementation): 74.91s
Current implementation: 1742.07s
This branch: 140.79s

IMHO this should be merged and released ASAP to make the metric usable again on GPUs.
Further improvements should be done to get closer to the pycocotools benchmark.

@twsl @OlofHarrysson @Borda

@twsl twsl marked this pull request as ready for review January 27, 2022 11:31
torchmetrics/detection/map.py Show resolved Hide resolved
torchmetrics/detection/map.py Outdated Show resolved Hide resolved
torchmetrics/detection/map.py Show resolved Hide resolved
torchmetrics/detection/map.py Show resolved Hide resolved
@SkafteNicki
Copy link
Member

@tkupek @OlofHarrysson @twsl thanks for really trying to improve performance for this metric.
I agree that we should merge this and do a small release.
However, out hole GPU testing pipeline is down at the moment so no PRs are currently getting merged :(

@mergify mergify bot added the ready label Jan 31, 2022
@SkafteNicki
Copy link
Member

@Borda please approve and merge.
Passes locally for multi-gpu support:
image

@Borda Borda merged commit 408cabe into Lightning-AI:master Jan 31, 2022
@twsl twsl deleted the fix/map-perf branch February 1, 2022 02:01
Borda pushed a commit that referenced this pull request Feb 1, 2022
* Simplify id generation
* rework and speed up _find_best_gt_match
* add: Refactor to avoid duplicate calculations
* precision,recall,scores  on correct device (-20%)
* arguments to python lists
* enumerate instead of range
* compute on device
* Remove exception
* Replace prec score loop
* Fix auc flattening
* draft to run metric on cpu only
* move tensors to cpu on compute, need to be on GPU for multi GPU syncing

Co-authored-by: tobias-kupek-swarm <tobias.kupek@swarm-analytics.com>
Co-authored-by: Olof Harrysson <harrysson.olof@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tobias Kupek <tkupek@users.noreply.github.com>
Co-authored-by: SkafteNicki <skaftenicki@gmail.com>

(cherry picked from commit 408cabe)
@Borda
Copy link
Member

Borda commented Feb 22, 2022

Pycocotools (previous implementation): 74.91s
Current implementation: 1742.07s
This branch: 140.79s

Further improvements should be done to get closer to the pycocotools benchmark.

@twsl @tkupek do you think that you can make further improvements to get speed parity with cocotooks?

@tkupek
Copy link
Contributor

tkupek commented Feb 24, 2022

Unfortunately, I don't have any concrete ideas in mind, nor the time to look at it right now.

@twsl
Copy link
Contributor Author

twsl commented Feb 24, 2022

I might give it another shot after i handed in my thesis next week. But cant promise any results or timeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue for own mAP implementation
6 participants