Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call set_epoch for distributed batch samplers #13396

Merged
merged 11 commits into from Jun 29, 2022

Conversation

awaelchli
Copy link
Member

@awaelchli awaelchli commented Jun 23, 2022

What does this PR do?

Fixes #13316

  • The loops now call either dataloader.sampler.set_epoch or dataloader.batch_sampler.set_epoch if applicable
  • Moved duplicated code to a utility function
  • Improved tests and fixed description

Does your PR introduce any breaking changes? If yes, please list them.

Yes (reproducibility). If you relied on distributed batch samplers before and didn't know that Lightning didnt handle the shuffling, you will get different results. This is unavoidable.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 馃檭

cc @Borda @justusschock @awaelchli @ninginthecloud @rohitgr7 @otaj @carmocca @ananthsub

@awaelchli awaelchli added bug Something isn't working data handling Generic data-related topic loops Related to the Loop API labels Jun 24, 2022
@awaelchli awaelchli added this to the pl:1.6.x milestone Jun 24, 2022
@mergify mergify bot removed the has conflicts label Jun 27, 2022
@mergify mergify bot added the ready PRs ready to be merged label Jun 27, 2022
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

src/pytorch_lightning/loops/utilities.py Outdated Show resolved Hide resolved
@awaelchli awaelchli enabled auto-merge (squash) June 29, 2022 14:37
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Jun 29, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jun 29, 2022
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Jun 29, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jun 29, 2022
@awaelchli awaelchli merged commit 2dd332f into master Jun 29, 2022
@awaelchli awaelchli deleted the bugfix/batch-sampler-set-epoch branch June 29, 2022 19:09
@rohitgr7 rohitgr7 mentioned this pull request Jul 1, 2022
12 tasks
rohitgr7 added a commit that referenced this pull request Jul 1, 2022
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Borda pushed a commit that referenced this pull request Jul 1, 2022
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

(cherry picked from commit 2dd332f)
lexierule pushed a commit that referenced this pull request Jul 12, 2022
* update NGC docker (#13136)

* update docker
* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mochol铆 <carlossmocholi@gmail.com>

* Decouple pulling legacy checkpoints from existing GHA workflows and docker files (#13185)

* Add pull-legacy-checkpoints action
* Replace pulls with the new action and script
* Simplify

* Merge pull request #13250 from PyTorchLightning/ci/rm-base

CI: Remove simple test `ci_test-base.yml`

* Update rich requirement from !=10.15.*,<=12.0.0,>=10.2.2 to >=10.2.2,!=10.15.0.a,<13.0.0 in /requirements (#13047)

* Update rich requirement in /requirements

Updates the requirements on [rich](https://github.com/willmcgugan/rich) to permit the latest version.
- [Release notes](https://github.com/willmcgugan/rich/releases)
- [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md)
- [Commits](Textualize/rich@v10.2.2...v12.4.1)

---
updated-dependencies:
- dependency-name: rich
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix torch.distributed._sharded_tensor DeprecationWarning (#13261)

* update tutorials (#13268)

* [BUG] `estimated_stepping_batches` requires distributed comms in `configure_optimizers` for `DeepSpeedStrategy` (#13350)

* Update torchmetrics requirement from <=0.7.2,>=0.4.1 to >=0.4.1,<0.9.2 in /requirements (#13275)

Update torchmetrics requirement in /requirements

Updates the requirements on [torchmetrics](https://github.com/PyTorchLightning/metrics) to permit the latest version.
- [Release notes](https://github.com/PyTorchLightning/metrics/releases)
- [Changelog](https://github.com/PyTorchLightning/metrics/blob/master/CHANGELOG.md)
- [Commits](Lightning-AI/torchmetrics@v0.4.1...v0.9.1)

---
updated-dependencies:
- dependency-name: torchmetrics
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix mypy errors for model summary utilities (#13384)

* rename org Lightning AI

* Modified python version check to accommodate for legacy version styles (#13420)

Co-authored-by: Carlos Mochol铆 <carlossmocholi@gmail.com>

(cherry picked from commit b332b66)

* Call `set_epoch` for distributed batch samplers (#13396)

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

(cherry picked from commit 2dd332f)

* _RICH_AVAILABLE

* _FAIRSCALE_AVAILABLE

* _BAGUA_AVAILABLE

* redefine

* chlog spaces

* CI: Fix `fatal: unsafe repository` (#13515)

* update release date

* CI: azure rename

* Restore log step during restart (#13467)

Co-authored-by: Carlos Mochol铆 <carlossmocholi@gmail.com>

* remove redundant test

* Update CI setup (#13291)

* drop mamba
* use legacy GPU machines

* fix schema check

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mochol铆 <carlossmocholi@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Adrian W盲lchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Martino Sorbaro <martinosorb@users.noreply.github.com>
jerome-habana pushed a commit to jerome-habana/lightning that referenced this pull request Jul 14, 2022
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data handling Generic data-related topic loops Related to the Loop API ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

set_epoch not called for BatchSampler
6 participants