Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group tensorboard metrics #39

Merged
merged 12 commits into from
Aug 5, 2021
Merged

Group tensorboard metrics #39

merged 12 commits into from
Aug 5, 2021

Conversation

VictorSanh
Copy link
Member

@VictorSanh VictorSanh commented Aug 4, 2021

As discussed in issue #38:

  • I grouped the metrics into a few groups
  • Added speed metrics (under iteration-time)
  • Added the mapping iteration (or step) vs samples

A preview of the tensorboard on a dummy training can be found here.

Copy link
Member

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing this, @VictorSanh

I proposed to use f-strings consistently.

And also a few small questions.

megatron/global_vars.py Outdated Show resolved Hide resolved
megatron/training.py Outdated Show resolved Hide resolved
megatron/training.py Outdated Show resolved Hide resolved
megatron/training.py Outdated Show resolved Hide resolved
megatron/training.py Outdated Show resolved Hide resolved
megatron/training.py Outdated Show resolved Hide resolved
VictorSanh and others added 5 commits August 4, 2021 18:47
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
@stas00
Copy link
Member

stas00 commented Aug 4, 2021

btw, if you go to "Files Changed" tab, you can merge multiple suggestions in one click, so you then need to click half as much ;)

@VictorSanh
Copy link
Member Author

btw, if you go to "Files Changed" tab, you can merge multiple suggestions in one click, so you then need to click half as much ;)

ooooh, wasn't aware of that, learned something!

@VictorSanh
Copy link
Member Author

merging now, thank you @stas00 for the review!

@VictorSanh VictorSanh merged commit 9e75429 into main Aug 5, 2021
@stas00
Copy link
Member

stas00 commented Aug 5, 2021

BTW, that's not what I meant. Perhaps let's give a bit of time to have a chance to review changes in the future before merging.

# bug fix in xyz comments are great inside tests since it's then self-explanatory of what the test is testing.

Such comments in the code have no added value and only make the code harder to read. All of software is ridden with bugs, the normal (as compared to tests) code comments are useful when they explain why something is done when it's not obvious from reading the code.

So when I made that comment I was proposing to add a comment why it was is_last_rank(), if it wasn't obvious from the code, which it is not, which you explained in your Issue report to Megatron. This is what we want in the code. Currently the way it's committed the user has to go and research why. Whereas the only comment needed there was:

# only the last rank process has a non-None _GLOBAL_TENSORBOARD_WRITER

Hope my logic makes sense.

@VictorSanh
Copy link
Member Author

Thanks for flagging that @stas00, always good to write better comments!
I fixed that with your suggestion (5e3963d).

I initially merged since you approved the changes. Usually, I use "Request changes" when things are bugging me and "Approved" as a final signal that the PR can be merged when things have been fixed. Does this make any sense to you?

@stas00
Copy link
Member

stas00 commented Aug 6, 2021

Thank you for fixing the comment, Victor

This is the workflow pattern we use at transformers, a general Approval is made when things are more or less correct and then the nits can take some time to sort out. This always works when the PR writer is not a committer too, since they can't merge. When it's between us the committers we usually check in again when we feel it's ready to be merged.

We can try a different pattern here, and not to Approve until it's ready to be merged. I think this is a less ambiguous communication style. Over at transformers an early Approval I think helps the contributor to know they are "out of the woods" and the finish line is near.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants