Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add torch.cuda rng state to seed save/load #14384

Merged
merged 13 commits into from
Aug 26, 2022

Conversation

Anner-deJong
Copy link
Contributor

@Anner-deJong Anner-deJong commented Aug 24, 2022

What does this PR do?

Add torch.cuda rng state to the default seed save/load functionality:

  • [Confirmed through testing on dev GPU server] torch.get/set_rng_state() does not include torch.cuda * and torch.cuda rng is not reset
  • Confirmed manually on a dev GPU server the test should do what it is supposed, actual branch + dev is on a macbook without gpu, I presume the github merge tests will cover

* Presumably cause torch.cuda is not included in the torch default_generator .set_state() and .get_state() (C implementation here)

Fixes #UNKNOWN

Does your PR introduce any breaking changes? If yes, please list them.

Not aware

Before submitting

  • [No] Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • [Yes] Did you read the contributor guideline, Pull Request section?
  • [Yes] Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [Yes] Did you make sure to update the documentation with your changes? (if necessary)
  • [Yes] Did you write any new necessary tests? (not for typos and docs)
  • [Yes] Did you verify new and existing tests pass locally with your changes?
  • [UNAWARE OF ANY] Did you list all the breaking changes introduced by this pull request?
  • [Yes] Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • [Yes] Is this pull request ready for review? (if not, please submit in draft mode)
  • [Mostly] Check that all items from Before submitting are resolved
  • [Yes] Make sure the title is self-explanatory and the description concisely explains the PR
  • [Added by @awaelchli, thanks!] Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

YES LIGHTNING IS AWESOME ⚡

Make sure you had fun coding 🙃

I believe these arent part of the torch `default_generator` `.set_state()` and `.get_state()` (C implementation [here](https://github.com/pytorch/pytorch/blob/master/torch/csrc/Generator.cpp))
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 24, 2022
@Anner-deJong
Copy link
Contributor Author

Only issue I can come up with if cuda is not available, torch.cuda returns None?
Found the tests (I forked too small part of the repo, will check + maybe update)
Putting PR in draft mode for the time being

@Anner-deJong Anner-deJong marked this pull request as draft August 24, 2022 16:46
@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #14384 (943e1fd) into master (34f9883) will increase coverage by 15%.
The diff coverage is 100%.

❗ Current head 943e1fd differs from pull request most recent head 1ed4258. Consider uploading reports for the commit 1ed4258 to get more accurate results

@@            Coverage Diff            @@
##           master   #14384     +/-   ##
=========================================
+ Coverage      61%      76%    +15%     
=========================================
  Files         332      332             
  Lines       26848    26892     +44     
=========================================
+ Hits        16419    20430   +4011     
+ Misses      10429     6462   -3967     

@Anner-deJong Anner-deJong marked this pull request as ready for review August 25, 2022 11:26
@Anner-deJong Anner-deJong marked this pull request as draft August 25, 2022 11:26
@Anner-deJong Anner-deJong marked this pull request as ready for review August 25, 2022 11:58
@Anner-deJong
Copy link
Contributor Author

Only issue I can come up with if cuda is not available, torch.cuda returns None? Found the tests (I forked too small part of the repo, will check + maybe update) Putting PR in draft mode for the time being

Updated the test and did some interactive testing, ready for review. If torch is not compiled with cuda, torch.cuda.get/set.. doesnt run into any error. On the test side, torch.cuda.FloatTensor(2).normal_(), which is not guarded by an torch.cuda.is_available()

Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

src/pytorch_lightning/utilities/seed.py Outdated Show resolved Hide resolved
src/pytorch_lightning/utilities/seed.py Show resolved Hide resolved
@awaelchli awaelchli added checkpointing Related to checkpointing feature Is an improvement or enhancement labels Aug 25, 2022
@awaelchli awaelchli added this to the pl:1.8 milestone Aug 25, 2022
@awaelchli
Copy link
Member

Could you also add a changelog entry? <3

@awaelchli awaelchli self-assigned this Aug 25, 2022
@Anner-deJong
Copy link
Contributor Author

Could you also add a changelog entry? <3

Done!

@mergify mergify bot added the ready PRs ready to be merged label Aug 25, 2022
@rohitgr7 rohitgr7 enabled auto-merge (squash) August 25, 2022 17:47
auto-merge was automatically disabled August 25, 2022 17:51

Head branch was pushed to by a user without write access

@rohitgr7 rohitgr7 enabled auto-merge (squash) August 26, 2022 04:46
@rohitgr7 rohitgr7 merged commit 33a5ed9 into Lightning-AI:master Aug 26, 2022
@Anner-deJong Anner-deJong deleted the patch-4 branch August 26, 2022 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checkpointing Related to checkpointing feature Is an improvement or enhancement pl Generic label for PyTorch Lightning package ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants