Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed smoothing in tqdm to decrease variability of time remaining #1194

Merged
merged 1 commit into from Mar 27, 2020

Conversation

pertschuk
Copy link
Contributor

@pertschuk pertschuk commented Mar 19, 2020

between training / eval

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you write any new necessary tests?
  • Did you make sure to update the docs?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Temporary fix for #1096

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Mar 19, 2020

Codecov Report

Merging #1194 into master will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #1194   +/-   ##
======================================
  Coverage      91%     91%           
======================================
  Files          62      62           
  Lines        3119    3119           
======================================
  Hits         2828    2828           
  Misses        291     291

@Borda Borda added feature Is an improvement or enhancement help wanted Open to be worked on labels Mar 20, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pertschuk could you elaborate why it is temporary? so it does no solve the linked problem?

@pertschuk
Copy link
Contributor Author

@Borda

So originally, the main_progress_bar had exponential smoothing of 0.3 (tqdm default), causing the time remaining to fluctuate within an epoch depending on whether it was running training or evaluation steps.

This PR changes smoothing to 0.0, so that the estimated time remaining is based an an average of ALL steps thus and fluctuates less represents a more accurate epoch ETA.

But time remaining is still not entirely accurate (especially prior to the first validation run within epoch, as it doesn't know how long these validation steps will be).

I experimented with implementing a custom timer for this (by timing the dummy_eval_steps), but it was a significant addition of code, and didn't work with tqdm.

This is because TQDM assumes that all steps be roughly equivalent time, thus having eval steps and train steps increment the same tqdm iterator (main_progress_bar) is probably not best practice (but it may be okay here for now).

If we were to want it to be 100% accurate we should probably separate out training / eval into separate loops with nested progress bars like such:

In this example there are 5 eval runs, meaning val_check_interval=0.2 and then training would be split into chunks accordingly. So basically the main_progress_bar would have to be separated out into sub-progress bars

Epochs:   0%|          | 0/4 [00:00<?, ?it/s]
Eval Runs:   0%|          | 0/5 [00:00<?, ?it/s]
Training:   0%|          | 0/50 [00:00<?, ?it/s]

And then:

Epochs:   0%|          | 0/4 [00:00<?, ?it/s]
Eval Runs:   0%|          | 0/5 [00:00<?, ?it/s]
Val:   0%|          | 0/50 [00:00<?, ?it/s]

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@Borda Borda added this to the 0.7.2 milestone Mar 25, 2020
@Borda Borda added ready PRs ready to be merged and removed help wanted Open to be worked on labels Mar 25, 2020
@williamFalcon williamFalcon merged commit 12b39a7 into Lightning-AI:master Mar 27, 2020
williamFalcon added a commit that referenced this pull request Mar 28, 2020
* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* CI: split tests-examples (#990)

* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* updated example image

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

* Disable validation when val_percent_check=0 (#1251)

* fix disable validation

* add test

* update changelog

* update docs for val_percent_check

* make "fast training" docs consistent

* calling self.forward() -> self() (#1211)

* self.forward() -> self()

* update changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Fix requirements-extra.txt Trains package to release version (#1229)

* Fix requirement-extra use released Trains package

* Update README.md add Trains and links to the external Visualization section

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove unnecessary parameters to super() in documentation and source code (#1240)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update deprecation warning (#1258)

* update docs for progress bat values (#1253)

* lower timeouts for inactive issues (#1250)

* update contrib list (#1241)

Co-authored-by: William Falcon <waf2107@columbia.edu>

* Fix outdated docs (#1227)

* Fix typo (#1224)

* drop unused Tox (#1242)

* system info (#1234)

* system info

* update big info

* test script

* update config

* rename script

* import path

* Changed smoothing in tqdm to decrease variability of time remaining between training / eval (#1194)

* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>
Co-authored-by: Tyler Yep <tyep@stanford.edu>
Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com>
Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>
alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* CI: split tests-examples (Lightning-AI#990)

* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* updated example image

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

* Disable validation when val_percent_check=0 (Lightning-AI#1251)

* fix disable validation

* add test

* update changelog

* update docs for val_percent_check

* make "fast training" docs consistent

* calling self.forward() -> self() (Lightning-AI#1211)

* self.forward() -> self()

* update changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Fix requirements-extra.txt Trains package to release version (Lightning-AI#1229)

* Fix requirement-extra use released Trains package

* Update README.md add Trains and links to the external Visualization section

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove unnecessary parameters to super() in documentation and source code (Lightning-AI#1240)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update deprecation warning (Lightning-AI#1258)

* update docs for progress bat values (Lightning-AI#1253)

* lower timeouts for inactive issues (Lightning-AI#1250)

* update contrib list (Lightning-AI#1241)

Co-authored-by: William Falcon <waf2107@columbia.edu>

* Fix outdated docs (Lightning-AI#1227)

* Fix typo (Lightning-AI#1224)

* drop unused Tox (Lightning-AI#1242)

* system info (Lightning-AI#1234)

* system info

* update big info

* test script

* update config

* rename script

* import path

* Changed smoothing in tqdm to decrease variability of time remaining between training / eval (Lightning-AI#1194)

* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>
Co-authored-by: Tyler Yep <tyep@stanford.edu>
Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com>
Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>
@Borda Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants