Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No in-built functionality for tracking of metrics during training #222

Open
cle-ros opened this issue Jan 29, 2020 · 3 comments
Open

No in-built functionality for tracking of metrics during training #222

cle-ros opened this issue Jan 29, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@cle-ros
Copy link
Collaborator

cle-ros commented Jan 29, 2020

This is a feature request bordering on a bug. Right now, flambe does not allow to track metrics during training. This, however, is essential to monitor learning.

One problem that I see is that it does not make sense to compute the train metrics after an entire train epoch, as flambe does for test/eval metrics. Given the size of some datasets, this is not really feasible.

Consequently, the interface to the metrics needs to be able to accommodate for the incremental computation of the metrics. That, in turn, requires a decision as to how this should be implemented, partly because not every metric supports incremental computation (think: AUC).
Unfortunately, having incremental computation requires to keep track of previous computations - i.e., we need a state that we update incrementally

From the top of my head, these are the choices we have:

First option: make the metrics state-ful.

  • The metrics would then have to be "reset" at the beginning of each epoch
  • An incremental method, added to the metric, could be used to update the metric

Second option: add a metric-state object.

  • Flambe initializes a metric-state object at the beginning of each epoch.
  • This metric-state object is passed into each incremental call of the metric (and any other, possibly, to have a uniform interface)
  • Logging can happen automatically in a method of that state-object

Third option: add local tracking for each metric
(I don't think this is a good option, but I wanted to mention it for completeness)

  • This works like the metric state object, but with individual state objects per metric.
@cle-ros cle-ros added the enhancement New feature or request label Jan 29, 2020
@cle-ros cle-ros changed the title No in-build functionality for tracking of metrics during training No in-built functionality for tracking of metrics during training Jan 29, 2020
@jeremyasapp
Copy link
Contributor

we could also consider just computing the metrics on a per batch level during training and logging that, but then things like dropout will affect the training metrics. That's true in your proposed solutions as well, unless you re thinking of doing this during the eval step?

@cle-ros
Copy link
Collaborator Author

cle-ros commented Jan 29, 2020

The problem with the per batch level are things like AUC. If we are using the batch as negatives (as is quite common), computing the AUC per batch will be much less accurate than computing it per epoch (and using all samples from an epoch as negatives).

Besides, either approach would allow us to unify this (taken from _eval_step in train.py):

log(f'{tb_prefix}Validation/Loss', val_loss, self._step)
log(f'{tb_prefix}Validation/{self.metric_fn}', val_metric, self._step)
log(f'{tb_prefix}Best/{self.metric_fn}', self._best_metric, self._step)  # type: ignore
for metric_name, metric in self.extra_validation_metrics.items():
    log(f'{tb_prefix}Validation/{metric_name}',
        metric(preds, targets).item(), self._step)  # type: ignore

With either

for metric in self.metrics:
    metric.finalize()
    metric.log(log_func) . # log_func could be any log function, defaulting to the one above

Or

for metric in self.metrics:
    metric.finalize(metrics_state)
    metric.log(log_func, metrics_state)

That has the additional advantage that we would support logging of metrics that are more complex natively. Imagine, e.g., a combined recall-precision-fscore metric, that could jointly log all three. Or one that computes a conditional metric if, say, you have different types of samples. Than it could log things like "accuracy for type1: ..." and "accuracy for type2: ..."

@jeremyasapp
Copy link
Contributor

What do you propose to fo with the training and validationloss over the whole dataset? Since people generally use torch loss objects which won't have the "incremental" logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants