Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightning logger aggregation API deprecation #9145

Closed
edward-io opened this issue Aug 26, 2021 · 11 comments
Closed

Lightning logger aggregation API deprecation #9145

edward-io opened this issue Aug 26, 2021 · 11 comments
Assignees
Labels
deprecation Includes a deprecation feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement logger Related to the Loggers refactor
Milestone

Comments

@edward-io
Copy link
Contributor

edward-io commented Aug 26, 2021

Proposed refactoring or deprecation

Move and/or deprecate aggregation-related code to individual loggers

Motivation

We are auditing the Lightning components and APIs to assess opportunities for improvements:

Revisiting some of the API decisions regarding aggregations in Lightning Logger to simplify the interface and move logic specific to individual loggers away from the base class:

  • Aggregation is currently being done by the step dimension. Users may want more control over complex aggregations, like aggregating metrics over a time window.
  • It's not clear to the end-user whether they should use agg_and_log_metrics or log_metrics. In LoggerConnector, when log_metrics is called, we call agg_and_log_metrics.
    • Which method(s) should developers override when creating their custom logger?
  • LightningLoggerBase.__init__ accepts two arguments, agg_key_funcs, and agg_default_func. They aren't being called in any sub-classed loggers within Lightning. They can be implementation details of the loggers instead.
    • Do we know if users are adding aggregation functionality using update_agg_funcs directly?

Pitch

  • Simplify LightningLoggerBase.__init__ by removing agg_key_funcs and agg_default_func
  • Move aggregation-related logger code out of base class to simplify the interface
  • Provide clarity between what the Lightning trainer calls between log_metrics and agg_and_log_metrics. Proposal: the trainer should only call log_metrics

Additional context

Related issues:

#8991

#9004


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

@edward-io edward-io added feature Is an improvement or enhancement help wanted Open to be worked on refactor labels Aug 26, 2021
@ananthsub ananthsub added deprecation Includes a deprecation logger Related to the Loggers labels Aug 26, 2021
@ananthsub
Copy link
Contributor

ananthsub commented Aug 26, 2021

Thanks for filing this @edward-io !

I believe this is very related to #8991 and #9004

I agree that we can simplify the base interface for logging metrics. At the very least I see:

  • log_metrics -> can internally aggregate/buffer data before logging
  • teardown -> flushes any buffered data and closes the experiment

Do we need a flush interface at all on the base logger in this case?

@awaelchli @tchaton @carmocca ?

@awaelchli
Copy link
Member

Move aggregation-related logger code out of base class to simplify the interface

where would you move it?

Provide clarity between what the Lightning trainer calls between log_metrics and agg_and_log_metrics. Proposal: the trainer should only call log_metrics

sounds good as long, as we keep exactly the same logging behavior

Do we need a flush interface at all on the base logger in this case?

what we know as flush in the logger connector is just the save() function in the loggers.

@tchaton tchaton added the let's do it! approved to implement label Sep 13, 2021
@stale
Copy link

stale bot commented Oct 15, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Oct 15, 2021
@stale stale bot closed this as completed Oct 30, 2021
@awaelchli awaelchli reopened this Oct 30, 2021
@stale stale bot removed the won't fix This will not be worked on label Oct 30, 2021
@awaelchli awaelchli added this to To be approved in Lighting Sprint Oct 30, 2021
@tchaton tchaton added this to To do in Lightning Sprint via automation Nov 15, 2021
@tchaton tchaton moved this from To do to To be approved in Lightning Sprint Nov 15, 2021
@tchaton tchaton moved this from To be approved to To do in Lightning Sprint Nov 15, 2021
@daniellepintz daniellepintz self-assigned this Jan 5, 2022
@wilson100hong
Copy link

We use some metrics as counter which keep accumulating new logged values. Without agg_and_log_metrics(), users might need API to get metrics values from the logger (like get_metric() in the below example) to do that?

new_value = logger.get_metric(key) + value.  # we don't have get_metric() now 
logger.set_metrics({key: new_value})

@daniellepintz
Copy link
Contributor

@wilson100hong can you put your custom logic into log_metrics directly?

@daniellepintz
Copy link
Contributor

daniellepintz commented Jan 31, 2022

I did a test in https://colab.research.google.com/drive/1SSOOmeJJf8eU4HE4UkPgd2GbtpfRPQa_?usp=sharing and it seems to me that the aggregation logic in agg_and_log_metrics is not currently having an effect. See in the colab I am logging the value 2.0 on even batch_indices, and 3.0 on odd batch_indices, with accumulate_grad_batches=2, but instead of averaging the values, in the TB logger we are just outputting 3.0 on every step.

Thus I believe we can move forward with the deprecation of agg_and_log_metrics, and can immediately replace this line:
https://github.com/PyTorchLightning/pytorch-lightning/blob/86b177ebe5427725b35fde1a8808a7b59b8a277a/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py#L119

with self.trainer.logger.log_metrics(scalar_metrics, step=step)

@awaelchli
Copy link
Member

awaelchli commented Feb 14, 2022

See in the colab I am logging the value 2.0 on even steps and 3.0 on odd steps, with accumulate_grad_batches=2, but instead of averaging the values, in the TB logger we are just outputting 3.0 on every step.

Why would they be averaged? The example only makes sense when the keys you log are different.
The aggregation is pointless if we log the same key. Aggregation is meant for the following situation:

agg_and_log_metrics("key1", 1.0)  # this adds the key1, does not write out the logs
agg_and_log_metrics("key2", 2.0)  # this adds the key2, does not write out the logs
agg_and_log_metrics("key3", 3.0)  # this adds the key3, does not write out the logs

# this will make the actual call to self.log_metrics
# and this will log a single dict with all keys aggregated: {"key1": 1.0, "key2": 2.0, "key3": 3.0} 
save() 

Let's please not remove this feature under the wrong understanding.
Instead, we should check whether it is redundant today. After all, the logger connector has been added and revamped after logger base was initially designed. The logger connector may already take the role of aggregation today. cc @carmocca

@daniellepintz
Copy link
Contributor

@awaelchli I am confused by your last comment :/ I was under the impression that aggregation was done per key?
see docstring here: https://github.com/PyTorchLightning/pytorch-lightning/blob/f79d75d4b5e50eb3e37073606eba5ca10d48c99a/pytorch_lightning/loggers/base.py#L409-L415

@daniellepintz
Copy link
Contributor

Why would they be averaged?

They would be averaged because that is the default aggregation function: https://github.com/PyTorchLightning/pytorch-lightning/blob/f79d75d4b5e50eb3e37073606eba5ca10d48c99a/pytorch_lightning/loggers/base.py#L66

Also, in my previous comment where I said "I am logging the value 2.0 on even steps and 3.0 on odd steps" instead of step I meant batch_idx. Sorry if that added confusion. I have updated the comment.

@carmocca
Copy link
Contributor

carmocca commented Feb 16, 2022

On these topics, it helps to dig up the history: #1173 (issue), #1278 (PR)

I think this can be removed as "logging" manages key collisions already.

@ananthsub ananthsub added this to the 1.6 milestone Feb 16, 2022
@daniellepintz
Copy link
Contributor

Closing this issue as all PRs have landed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Includes a deprecation feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement logger Related to the Loggers refactor
Projects
No open projects
Status: Done
Lighting Sprint
  
To be approved
Development

No branches or pull requests

8 participants