Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for batch_size-related issues #799 and #755 #809

Merged
merged 9 commits into from May 23, 2020

Conversation

yhn112
Copy link
Contributor

@yhn112 yhn112 commented May 15, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contribution guide, Pull Request section?
  • Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle).
  • Did you make sure to update the docs? We use Google format for all the methods and classes.
  • Did you check the docs with make check-docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

Description

Fixes the incorrect epoch metric averaging. Also, makes counters of samples and batches more correct and consistent.

Related Issue

#799
#755

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

You can use 'Login as guest' to see Teamcity build logs.

@pep8speaks
Copy link

pep8speaks commented May 15, 2020

Hello @yhn112! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 13:38: E203 whitespace before ':'
Line 15:46: E203 whitespace before ':'
Line 23:17: E203 whitespace before ':'

Comment last updated at 2020-05-21 16:21:03 UTC

@yhn112 yhn112 changed the title Fixes for https://github.com/catalyst-team/catalyst/issues/799 and https://github.com/catalyst-team/catalyst/issues/755 Fixes for batch_size-related issues #799 and #/755 May 15, 2020
@yhn112 yhn112 changed the title Fixes for batch_size-related issues #799 and #/755 Fixes for batch_size-related issues #799 and #755 May 15, 2020
@yhn112
Copy link
Contributor Author

yhn112 commented May 16, 2020

A bit of comments to these changes:

  1. I renamed global_step to global_samples for naming consistency because in the current version global_step counts samples, but loader_step counts batches. I'm willing to see alternatives.
  2. Of course, I'm not happy that I had to change the AverageValueMeter for this "small" fix, but it seems there was no other option.

@yhn112 yhn112 marked this pull request as ready for review May 16, 2020 00:51
catalyst/contrib/utils/tools/tests/test_tensorboard.py Outdated Show resolved Hide resolved
catalyst/utils/meters/averagevaluemeter.py Show resolved Hide resolved
Comment on lines 358 to 367
self.loader_step: int = 0
self.loader_samples: int = 0
self.loader_len: int = 0
self.loader_batch_size = 0

self.batch_size: int = 0

self.global_samples: int = 0
self.global_step: int = 0
self.global_epoch: int = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about naming change?
for example:

global_epoch - epoch counter
global_batch_step - batch counter (not sure if we need it)
global_sample_step - samples counter (should be used for logging)

loader_batch_step - current loader batch step
loader_sample_step - current loader sample step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with any consistent naming here. =)

About global_batch_step -- I would like to make a selectable option for loggers, to log over samples or over batches. Also, IMHO it's better to have a consistent API of state. For future usage or for users.

catalyst/core/runner.py Outdated Show resolved Hide resolved
Comment on lines 371 to 374
if isinstance(batch, list):
self.state.batch_size = len(batch[0])
else:
self.state.batch_size = next(iter(batch.values())).shape[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moreover, what do you think about performance of this solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python if statement is really computation heavy, but same is deep learning pipelines :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate this solution at all.
It would be better if there were only dict-like batches in catalyst, IMHO.
(or even better, if torch dataloaders had something like next_batch_size property, but I definitely cannot do anything with it)

This does not seem to affect the performance very much. but ok, will do some formal tests.

@Scitator
Copy link
Member

Scitator commented May 21, 2020

@yhn112
Copy link
Contributor Author

yhn112 commented May 21, 2020

@Scitator, FYI:
black enforces blank line before inner function
flake8 enforces no blank lines after outer function docstring
The result? Schizophrenia. And unwillingness to use your catalyst-make-codestyle && catalyst-check-codestyle.

But ok, fixed.

@Scitator
Copy link
Member

@yhn112 This issue only occures on funcitons with inner functions :)
In this case you can use noqa for quick fixing.

Copy link
Member

@Scitator Scitator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last moment, could you please update the CHANGELOG?

@Scitator Scitator merged commit 4fcdd99 into catalyst-team:master May 23, 2020
@yhn112
Copy link
Contributor Author

yhn112 commented May 23, 2020

I've just started writing CHANGELOG update...

@Scitator
Copy link
Member

nevermind, already done, last preparations before 20.05.1 release :)

lightforever pushed a commit that referenced this pull request May 23, 2020
* Init batch_size changes

* PEP8 (line too long) fixes

* global_samples rename and docs update

* Fix for batches of type `list`

* New renaming and fixes

* Fix for tuple batches

* pep8 line length

* Added tests for AverageValueMeter

* Code style (kind of)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants