Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

WandB batch metrics logging error #1290

Closed
ivan-chai opened this issue Sep 10, 2021 · 20 comments
Closed

WandB batch metrics logging error #1290

ivan-chai opened this issue Sep 10, 2021 · 20 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ivan-chai
Copy link

馃悰 Bug Report

In wandb all batch metrics are logged as single value per epoch.

Expected behavior

Batch metrics must be logged once per step.

Catalyst version: 21.7

Additional context

The problem is here:

https://github.com/catalyst-team/catalyst/blob/master/catalyst/loggers/wandb.py#L115

Step must be equal to global_sample_step, not global_epoch_step.

@ivan-chai ivan-chai added bug Something isn't working help wanted Extra attention is needed labels Sep 10, 2021
@github-actions
Copy link

Hi! Thank you for your contribution! Please re-check all issue template checklists - unfilled issues would be closed automatically. And do not forget to join our slack for collaboration.

@Scitator
Copy link
Member

maybe, @AyushExel could help 馃憖

@AyushExel
Copy link
Contributor

  • @Scitator I'm out on a vacation. I'll have a look at this on monday. But it sounds like we might not have a good solution for this as W&B only has one global step.

@AyushExel
Copy link
Contributor

@ivan-chai I remember working on this. the reason that the batch metrics are also logged on epoch level is that W&B supports 1 global step and you cannot edit or log at previous steps, so I decided to log everything on epoch level. There are other methods that will end up dropping data if any previous step is encountered.

@ivan-chai
Copy link
Author

Thank you for your response! May be it is better to always log sample_step?

@Scitator
Copy link
Member

yeah, I think, we could just use global_sample_step as a wandb step for all .log

@AyushExel
Copy link
Contributor

@Scitator @ivan-chai Is the global_sample_step strictly increaseing? If it is, then yes we can use that

@Scitator
Copy link
Member

Yes ;)
let's use it

@Scitator
Copy link
Member

@AyushExel could you please drop a PR with such a small hotfix?
we are going to release 21.09 version soon ;)

@AyushExel
Copy link
Contributor

@Scitator sounds good. I'll do it within 2 days

@AyushExel
Copy link
Contributor

@Scitator I'm working on it now. I noticed that the logger API now supports artifacts as well. I'll add that too. I just need a bit of clarification on the usage.

    def log_artifact(
        self,
        tag: str,
        artifact: object = None,
        path_to_artifact: str = None,

here, artifact: object is an identifier for the artifact right? Or it can be an in-memory object?
I saw this example usage here

@Scitator
Copy link
Member

yup, looks like so

artifact: object = None,  - in-memory object
path_to_artifact: str = None, - on-disk object

here is another example -

if artifact is not None and path_to_artifact is not None:

@Scitator
Copy link
Member

btw, @ditwoo why do we use the profiler in such a way? 馃槀

@AyushExel
Copy link
Contributor

@Scitator in the 2nd example that you linked, it'll raise exception if both artifact and oath_to_artifact is not None. But in the profiler example, both of them are not None so it'll raise an exception. Is this an intended use case?
More specifically, how'd you preferred handling the cases when both of the arguments are set?

@Scitator
Copy link
Member

@AyushExel yes, and I think it's correct behavior.
The profiler case looks a bit strange in such a case and I think, it's a little bug 馃槄
@ditwoo could you please help with the profiler initial logic?

@AyushExel
Copy link
Contributor

Okay, thanks for the clarification. I'll update the logger with the intended artifacts use case.

@AyushExel
Copy link
Contributor

@Scitator I have fixed this and added artifacts support. Can you provide an example training script using catalyst trainer which log artifacts? I want to check all the use cases before submitting the PR. Sorry, I couldn't find something relevant in the quickstart section of docs.

@Scitator
Copy link
Member

btw, we are going to release another version tomorrow, so, the fix would truly welcome ;)

@AyushExel
Copy link
Contributor

Just running tests. PR coming soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants