Skip to content

Conversation

wilderrodrigues
Copy link
Contributor

@wilderrodrigues wilderrodrigues commented Dec 13, 2022

What does this PR do?

Fixes #1024

Update: The initial PR contained erroneous information about the running time due to a silent issues that was happening, killing the run because some tensors were in a different device: CPU instead of GPU and vice-versa. The changes are still interesting as the performance increased a lot.

It seems that we faced some regression between version 0.6.0 and the latest releases. With the refactor of moving away from pycocotools and the addition of new operations, the appropriate device was not being used. That behaviour lead to unnecessary transfer of computation between GPU and CPU.

Based on the code provided in the issue #1024 discussion, I also added a couple of unit tests to cover for computation with GPUs and CPUs. Since the results might vary a bit, depending on the GPU/CPU model, I added a relative tolerance of 1.0 trying to cover for the gap.

Some results:

Before (GPU):

Total time in init: 1.0306465921457857
Total time in update: 0.0780688391532749
Total time in compute: 241.5502170859836

After (GPU):

Total time in init: 1.1004462020937353
Total time in update: 0.08146677981130779
Total time in compute: 137.3318270179443

At it has been pointed out, those computations when performed on the CPU might be faster. The important part is making sure that the device is properly allocated to the tensors. When running the same tests on CPU I got the following results:

Total time in init: 0.020693536149337888
Total time in update: 0.08699228917248547
Total time in compute: 72.37102157599293

I will run some tests with our production model and share the deltas here.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yep! This was fun. After trying different versions and commit hashes, I just decided to clone, fix and push the PR. It will help us a lot and I'm sure it will help out the community.

Test with real data

  • 2 RTX 3090
  • VisDrone dataset

image

The pink line is the result with the same run using the master branch (which is faster than the latest 0.11 release)
Don't bother about the flat line at the end of each run. That's just PyCharm holding the GPU busy before I clicked on stop.

@Borda
Copy link
Collaborator

Borda commented Dec 14, 2022

wow, very nice boost!

@wilderrodrigues wilderrodrigues changed the title Wilder/improve map performance WIP: Wilder/improve map performance Dec 14, 2022
@wilderrodrigues
Copy link
Contributor Author

wow, very nice boost!

I marked it as WIP for now because there is sill something wrong with it. The initial results could be wrong due to some internal error that was not picked up. For instance, some tensors being on the CPU and other on the GPU. I’m looking into it right now.

@wilderrodrigues wilderrodrigues force-pushed the wilder/improve-map-performance branch from 3f495d0 to 2063f18 Compare December 14, 2022 12:50
@Borda Borda force-pushed the wilder/improve-map-performance branch from 2063f18 to 13e4f43 Compare December 14, 2022 12:50
@wilderrodrigues wilderrodrigues force-pushed the wilder/improve-map-performance branch from 10ddd03 to 65292ed Compare December 14, 2022 12:52
@wilderrodrigues wilderrodrigues changed the title WIP: Wilder/improve map performance Wilder/improve map performance Dec 14, 2022
@wilderrodrigues wilderrodrigues changed the title Wilder/improve map performance Improve Mean Average Precision performance Dec 14, 2022
@wilderrodrigues
Copy link
Contributor Author

My apologies, @Borda. It seems that when I rebased with master and force-pushed, a lot of other things came up to the PR. It looks a bit overwhelming now to review. Could have also been the pre-commit I ran.

@Borda
Copy link
Collaborator

Borda commented Dec 14, 2022

@wilderrodrigues no problem :)

@wilderrodrigues
Copy link
Contributor Author

Just a quick update on this PR:

  1. We have consistency across multiple runs w.r.t runtime (see fig. 1)
  2. We have consistency - determinism - on our metrics across multiple runs
    • See fig. 2 for Val mAP and fig.3 for Test mAP
  • 2 RTX 3090 GPUs
  • VisDrone dataset
  • 32 batch size
  • 16 workers
  • batch limits for train, val and test is 50
  • 10 epochs each run, validating every 2 epochs

image

image

image

@wilderrodrigues wilderrodrigues force-pushed the wilder/improve-map-performance branch from 4089fcb to 1ea26b2 Compare December 15, 2022 12:32
@wilderrodrigues
Copy link
Contributor Author

Can this one get some love, @Borda @SkafteNicki @justusschock @tchaton :)

@justusschock
Copy link
Member

@wilderrodrigues sorry for the long return time in this review. Your PR looks good to me!

@wilderrodrigues
Copy link
Contributor Author

wilderrodrigues commented Dec 20, 2022

@wilderrodrigues sorry for the long return time in this review. Your PR looks good to me!

Thanks, @Borda .

Been also a bit away in the last few days, a lot of work to catch up with before the holidays season. I will rebase the PR and wait for the merge to happen. :)

@wilderrodrigues wilderrodrigues force-pushed the wilder/improve-map-performance branch from 403e8cc to 8b7bbb6 Compare December 20, 2022 07:04
@wilderrodrigues
Copy link
Contributor Author

A second approval and we can merge this one, @Borda .

@Borda
Copy link
Collaborator

Borda commented Dec 20, 2022

A second approval and we can merge this one, @Borda .

could we pls first fix the failing tests :)

@wilderrodrigues
Copy link
Contributor Author

A second approval and we can merge this one, @Borda .

could we pls first fix the failing tests :)

Oh yeah, sure. I haven't seen them.

@wilderrodrigues wilderrodrigues force-pushed the wilder/improve-map-performance branch 2 times, most recently from 89e464e to afadcc5 Compare December 21, 2022 14:44
@mergify mergify bot removed the has conflicts label May 25, 2023
@Borda Borda enabled auto-merge (squash) May 25, 2023 19:23
@ddelange
Copy link

Hi @wilderrodrigues 👋

Out of curiosity: any chance you could run a substantial map calculation through py-spy and post the flamegraph here?

pip install py-spy
py-spy record -o profile.svg -- python myprogram.py

@senarvi
Copy link

senarvi commented May 25, 2023

All tests are good now, just trying to improve speed a little bit more, as we now are above the previous (yet broken) implementation.

That's really great!

@wilderrodrigues
Copy link
Contributor Author

Hi @wilderrodrigues 👋

Out of curiosity: any chance you could run a substantial map calculation through py-spy and post the flamegraph here?

pip install py-spy
py-spy record -o profile.svg -- python myprogram.py

Hi @ddelange ,

Sure. I'm busy with some last things in terms of speed and will run it once I push the last commits.

@Borda
Copy link
Collaborator

Borda commented May 26, 2023

We can sequence it out to another PR, let's just get this green...

last one is:

ERROR unittests/detection/benchmark/benchmark_setup.py - TypeError: 'type' object is not subscriptable

@SkafteNicki SkafteNicki modified the milestones: v1.0.0, future Jun 3, 2023
@Borda
Copy link
Collaborator

Borda commented Jul 3, 2023

@wilderrodrigues, we value your work and thank you for your contribution 💜 could you pls revisit this PR with respect to the recently merged revert to COCO implementation #1327 and setting the torch version as an Internal option? 🐰

@Borda Borda marked this pull request as draft July 3, 2023 18:16
auto-merge was automatically disabled July 3, 2023 18:16

Pull request was converted to draft

@Borda
Copy link
Collaborator

Borda commented Aug 8, 2023

Once again thank you for all your effort! 💜
Feel free to reopen this PR, and happy to talk about it any time in the future 🐿️

@Borda Borda closed this Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MeanAveragePrecision is slow

9 participants