New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on_validation_epoch_end() invocation order #17131
Comments
For context, this is a limitation of the hook system. We had the same problem for the built-in callback hooks that monitor a metric ( The issue above describes the same problem but for validation. The problem to address can be boiled down to this snippet: callback_metrics = {}
class LightningModule:
def on_validation_epoch_end(self):
# writes the value
callback_metrics["val_loss"] = 123
class Callback:
def on_validation_epoch_end(self):
# reads the value
# raises KeyError
print(callback_metrics["val_loss"])
class Trainer:
def fit(self):
...
# before 2.0, this was run here
# pl_module.validation_epoch_end()
# the callback hook is called first
callback.on_validation_epoch_end()
pl_module.on_validation_epoch_end()
pl_module = LightningModule()
callback = Callback()
trainer = Trainer()
trainer.fit() I can think of various solutions (not necessarily good):
|
Hi. I am affected by this same issue now that I am trying to update to lightning 2.0. For me, option a) is the one that makes more sense, because Option b) feels a bit of cluttering the code. Option c) is nice but would take more time to figure out how to do correctly. Can we get the old behavior back, which in my opinion makes more sense? Thanks. |
Hi I'm also affected by this, is there a workaround? |
Bug description
In v1.9, a model's
validation_epoch_end()
gets called before a Callback'son_validation_epoch_end()
, but in v2.0, a model'son_validation_epoch_end()
gets called after a Callback'son_validation_epoch_end()
.In my use case, which worked under v1.9, there is a Callback that implements
on_validation_epoch_end()
where it reads the validation metric fromtrainer.logged_metrics
, which has been updated with the validation metric by the model'svalidation_epoch_end()
. It then checks whether there has been an improvement and logs that. In v2.0, this approach no longer works as the invocation order has changed.One workaround is to do all this in the model itself and skip the Callback. However, I prefer to do the improvement checking and logging in a Callback instead of in the model because the necessary member variables would be useless if the model is not used for training (e.g. used only for testing or inference). Also, using a callback is a cleaner and more modular approach.
For my use case, it may be helpful to be able to express the priorities of the callbacks relative to one another and to the model. There may also be other, simpler, solutions.
How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
More info
No response
cc @Borda @tchaton @justusschock @awaelchli @carmocca
The text was updated successfully, but these errors were encountered: