Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add PyTorch Profiler. #5560

Merged
merged 43 commits into from Jan 26, 2021
Merged

[feat] Add PyTorch Profiler. #5560

merged 43 commits into from Jan 26, 2021

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Jan 18, 2021

What does this PR do?

This PR adds support for PyTorch Profiler and the following features.

  • It adapts to the current version of PyTorch to expose only the available parameters
  • Supports emit_nvtx with nvprof
  • Support export_to_chrome

Fixes # (issue) <- this links related issue to this PR

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Check that target branch and milestone match!

Did you have fun?

Make sure you had fun coding 🙃

@tchaton tchaton self-assigned this Jan 18, 2021
@tchaton tchaton added this to the 1.2 milestone Jan 19, 2021
@pep8speaks
Copy link

pep8speaks commented Jan 19, 2021

Hello @tchaton! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-26 09:46:02 UTC

@tchaton tchaton added the feature Is an improvement or enhancement label Jan 19, 2021
@tchaton tchaton marked this pull request as ready for review January 19, 2021 11:05
@carmocca carmocca self-requested a review January 19, 2021 14:28
@tchaton tchaton mentioned this pull request Jan 19, 2021
5 tasks
Copy link
Member

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, I still need to pull this branch and play with it.

Also we want to check the docs render correctly and add this to https://pytorch-lightning.readthedocs.io/en/stable/profiler.html

pytorch_lightning/profiler/__init__.py Outdated Show resolved Hide resolved
pytorch_lightning/profiler/__init__.py Outdated Show resolved Hide resolved
pytorch_lightning/profiler/__init__.py Outdated Show resolved Hide resolved
pytorch_lightning/profiler/__init__.py Show resolved Hide resolved
pytorch_lightning/profiler/__init__.py Outdated Show resolved Hide resolved
pytorch_lightning/profiler/profilers.py Outdated Show resolved Hide resolved
tests/trainer/test_trainer.py Outdated Show resolved Hide resolved
tests/trainer/test_trainer.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 22, 2021

Codecov Report

Merging #5560 (8d62f41) into release/1.2-dev (f782230) will decrease coverage by 0%.
The diff coverage is 89%.

@@               Coverage Diff                @@
##           release/1.2-dev   #5560    +/-   ##
================================================
- Coverage               93%     92%    -0%     
================================================
  Files                  153     153            
  Lines                10743   10870   +127     
================================================
+ Hits                  9941   10054   +113     
- Misses                 802     816    +14     

@quinor
Copy link
Contributor

quinor commented Jan 22, 2021

looks good to me.

@SeanNaren
Copy link
Contributor

Nice work @tchaton and @quinor!

Few questions:

  • Are there any limitations? Multi-node/multi GPU for profiling? If there are no limitations fine, but if not would be good to do some validation checking/add to the docs!
  • Do we know if wall clock time is still valid using the profiler? Consider that this profiler inspects individual operations and a lot of CUDA calls happen async. This could make the wall clock time unrepresentative of the true wall to wall clock time at run time. If this is the case we should make it clear that this should be used to find bottlenecks, but not to judge the end to end run time (use the simple profiler instead)

@quinor
Copy link
Contributor

quinor commented Jan 23, 2021

Hey @SeanNaren

The wall clock time is from my experience extended by the overhead of the profiling, that's definitely a good point to add to the docs.

As for the multinode/multigpu, I would expect it to either fail in a weird way or profile a single worker in the ddp mode, but we did not test its behavior I don't think. I did not think of adding this to the docs since I was basing my code on the AdvancedProfiler and the docs there don't touch using it in DDP setting (based on the code I would expect it to fail similarly). Maybe it is then a bigger upgrade to add to the profiler docs in general?

@SeanNaren
Copy link
Contributor

The wall clock time is from my experience extended by the overhead of the profiling, that's definitely a good point to add to the docs.

@tchaton let's add a note about this!

As for the multinode/multigpu, I would expect it to either fail in a weird way or profile a single worker in the ddp mode, but we did not test its behavior I don't think. I did not think of adding this to the docs since I was basing my code on the AdvancedProfiler and the docs there don't touch using it in DDP setting (based on the code I would expect it to fail similarly). Maybe it is then a bigger upgrade to add to the profiler docs in general?

Totally fine! Let's just add a small note in the docs that this is not tested using DDP/Multiple processes, and we can address this in a follow up PR.

@tchaton
Copy link
Contributor Author

tchaton commented Jan 25, 2021

The wall clock time is from my experience extended by the overhead of the profiling, that's definitely a good point to add to the docs.

@tchaton let's add a note about this!

As for the multinode/multigpu, I would expect it to either fail in a weird way or profile a single worker in the ddp mode, but we did not test its behavior I don't think. I did not think of adding this to the docs since I was basing my code on the AdvancedProfiler and the docs there don't touch using it in DDP setting (based on the code I would expect it to fail similarly). Maybe it is then a bigger upgrade to add to the profiler docs in general?

Totally fine! Let's just add a small note in the docs that this is not tested using DDP/Multiple processes, and we can address this in a follow up PR.

Hey @SeanNaren, great comment ! I just added support for DDP :)
If output_filename, each process will save to their own file.

Best,
T.C

@tchaton tchaton enabled auto-merge (squash) January 25, 2021 17:42
@mergify mergify bot removed the has conflicts label Jan 26, 2021
@tchaton tchaton merged commit 5f33728 into release/1.2-dev Jan 26, 2021
@tchaton tchaton deleted the feat/torch_profiler branch January 26, 2021 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants