Skip to content

Log training metrics for each epoch #914

@jbschiratti

Description

@jbschiratti

Currently, I am able to log training metrics to Tensorboard using:

import pytorch_lightning as pl
from pytorch_lightning.loggers import TensorBoardLogger

logger = TensorBoardLogger(save_dir=save_dir, name="my_model")

[...]

trainer = pl.Trainer(logger=logger)

This logs training metrics (loss, for instance) after each batch. I would like to be able to average these metrics across all batches and log them to TensorBoard only once, at the end of each epoch. This is what the validation_end method does in your example: https://github.com/PyTorchLightning/pytorch-lightning/blob/446a1e23d7fe3b2e07f1a5887fe819d0dfa7d4e0/pl_examples/basic_examples/lightning_module_template.py#L145.

I first thought about writing my own training_end method. But this method is called after each batch instead of being called at the end of an epoch (as I would have thought). The method on_epoch_end seems interesting but does not receive an outputs argument as training_end does. Basically, in my model, I would like to write something like: self.logger.experiment.add_scalar('training_loss', train_loss_mean, global_step=self.current_epoch), but I do not know where to put this line.

  • OS: Debian GNU/Linux 9.11 (stretch)
  • Packaging: PIP
  • Version 0.6.1.dev0

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: 0High priority taskquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions