Remove memory-retaining epoch-end hooks #16520

carmocca · 2023-01-26T18:01:03Z

Migration guide

training_epoch_end -> on_train_epoch_end

 class MyLightningModule(L.LightningModule):
+    def __init__(self):
+        super().__init__()
+        self.training_step_outputs = []

     def training_step(self, ...):
         loss = ...
+        self.training_step_outputs.append(loss)
         return loss

-    def training_epoch_end(self, outputs):
-        epoch_average = torch.stack([output["loss"] for output in outputs]).mean()
+    def on_train_epoch_end(self):
+        epoch_average = torch.stack(self.training_step_outputs).mean()
         self.log("training_epoch_average", epoch_average)
+        self.training_step_outputs.clear()  # free memory

The same suggestions apply to those implementing Callback.training_epoch_end

validation_epoch_end -> on_validation_epoch_end

 class MyLightningModule(L.LightningModule):
+    def __init__(self):
+        super().__init__()
+        self.validation_step_outputs = []

     def validation_step(self, ...):
         loss = ...
+        self.validation_step_outputs.append(loss)
         return loss

-    def validation_epoch_end(self, outputs):
-        epoch_average = torch.stack(outputs).mean()
+    def on_validation_epoch_end(self):
+        epoch_average = torch.stack(self.validation_step_outputs).mean()
         self.log("validation_epoch_average", epoch_average)
+        self.validation_step_outputs.clear()  # free memory

The same suggestions apply to those implementing Callback.validation_epoch_end

test_epoch_end -> on_test_epoch_end

 class MyLightningModule(L.LightningModule):
+    def __init__(self):
+        super().__init__()
+        self.test_step_outputs = []

     def test_step(self, ...):
         loss = ...
+        self.test_step_outputs.append(loss)
         return loss

-    def test_epoch_end(self, outputs):
-        epoch_average = torch.stack(outputs).mean()
+    def on_test_epoch_end(self):
+        epoch_average = torch.stack(self.test_step_outputs).mean()
         self.log("test_epoch_average", epoch_average)
+        self.test_step_outputs.clear()  # free memory

The same suggestions apply to those implementing Callback.test_epoch_end

Example with two DataLoaders

 class MyLightningModule(L.LightningModule):
+    def __init__(self):
+        super().__init__()
+        self.test_step_outputs = [[], []]  # two dataloaders

     def test_step(self, batch, batch_idx, dataloader_idx=0):
         loss = ...
+        self.test_step_outputs[dataloader_idx].append(loss)
         return loss

-    def test_epoch_end(self, outputs):
+    def on_test_epoch_end(self):
-        for dl_idx in range(len(outputs)):
+        for dl_idx in range(len(self.test_step_outputs)):
-            dataloader_epoch_average = torch.stack(outputs[dl_idx]).mean()
+            dataloader_epoch_average = torch.stack(self.test_step_outputs[dl_idx]).mean()
             self.log(f"test_epoch_average_dl_{dl_idx}", dataloader_epoch_average)
-            outputs[dl_idx].clear()
+            self.test_step_outputs[dl_idx].clear()

     def test_dataloader(self):
         dl1 = DataLoader(RandomDataset(32, 64), batch_size=2)
         dl2 = DataLoader(RandomDataset(32, 64), batch_size=2)
         return dl1, dl2

Example with strategy="dp" (DataParallel)

 class MyLightningModule(L.LightningModule):
+    def __init__(self):
+        super().__init__()
+        self.training_step_outputs = []
+        self.validation_step_outputs = []

     def training_step(self, batch, batch_idx):
         output = ...
         return output
 
     def validation_step(self, batch, batch_idx):
         output = ...
         return output

+    def training_step_end(self, training_step_output):
+        training_step_output = self.trainer.strategy.reduce(training_step_output)
+        self.training_step_outputs.append(training_step_output)
+        return training_step_output

+    def validation_step_end(self, validation_step_output):
+        self.validation_step_outputs.append(validation_step_output)
 
-    def training_epoch_end(self, outputs):
-        epoch_average = torch.stack([output["loss"] for output in outputs]).mean()
+    def on_train_epoch_end(self):
+        epoch_average = torch.stack(self.training_step_outputs).mean()
         self.log("training_epoch_average", epoch_average)
+        self.training_step_outputs.clear()  # free memory
 
-    def validation_epoch_end(self, outputs):
+    def on_validation_epoch_end(self):
         epoch_average = torch.stack(self.validation_step_outputs).mean()
         self.log("validation_epoch_average", epoch_average)
+        self.validation_step_outputs.clear()  # free memory

If you have questions about how to migrate your use case, you can ask in this PR.

What does this PR do?

Removes the training_epoch_end, validation_epoch_end, and test_epoch_end hooks.
In favor of on_train_epoch_end, on_validation_epoch_end, and on_test_epoch_end.

These hooks were becoming problematic as just implementing them could lead to memory issues if the user was unaware of their implementation.
They also increased the loop's complexity and were hard to hack or customize externally.

At runtime, we check whether the old hooks are overridden, and fail if they are with an error message that points to the migration guide above

Blocked by #16567

Fixes #8731
Closes #9380
Closes #9968
Closes #10878
Closes #11554

Follow-up things to address:
#8479: need to remove outputs from on_predict_epoch_end

Does your PR introduce any breaking changes? If yes, please list them.

Removes the hooks described above.

cc @Borda @justusschock @carmocca @awaelchli

github-actions · 2023-01-30T17:56:31Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu (macOS-11, lightning, 3.8, 1.11)	success	✅
pl-cpu (macOS-11, lightning, 3.9, 1.12)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 1.13)	success	✅
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.11)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.12)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (windows-2022, lightning, 3.9, 1.11)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 1.12)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 1.13)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (slow, macOS-11, lightning, 3.8, 1.11)	success	✅
pl-cpu (slow, ubuntu-20.04, lightning, 3.8, 1.11)	success	✅
pl-cpu (slow, windows-2022, lightning, 3.8, 1.11)	success	✅
pl-cpu (macOS-11, pytorch, 3.8, 1.13)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13)	success	✅
pl-cpu (windows-2022, pytorch, 3.8, 1.13)	success	✅

These checks are required after the changes to src/lightning/pytorch/callbacks/callback.py, src/lightning/pytorch/core/hooks.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/demos/boring_classes.py, src/lightning/pytorch/loops/dataloader/evaluation_loop.py, src/lightning/pytorch/loops/epoch/evaluation_epoch_loop.py, src/lightning/pytorch/loops/epoch/training_epoch_loop.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/optimization/manual.py, src/lightning/pytorch/trainer/configuration_validator.py, src/lightning/pytorch/trainer/connectors/logger_connector/fx_validator.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/types.py, tests/tests_pytorch/accelerators/test_ipu.py, tests/tests_pytorch/accelerators/test_tpu.py, tests/tests_pytorch/callbacks/progress/test_tqdm_progress_bar.py, tests/tests_pytorch/callbacks/test_callback_hook_outputs.py, tests/tests_pytorch/callbacks/test_lr_monitor.py, tests/tests_pytorch/checkpointing/test_checkpoint_callback_frequency.py, tests/tests_pytorch/checkpointing/test_model_checkpoint.py, tests/tests_pytorch/checkpointing/test_trainer_checkpoint.py, tests/tests_pytorch/core/test_datamodules.py, tests/tests_pytorch/core/test_lightning_module.py, tests/tests_pytorch/core/test_lightning_optimizer.py, tests/tests_pytorch/helpers/deterministic_model.py, tests/tests_pytorch/loggers/test_all.py, tests/tests_pytorch/loggers/test_logger.py, tests/tests_pytorch/loggers/test_neptune.py, tests/tests_pytorch/loggers/test_tensorboard.py, tests/tests_pytorch/loops/optimization/test_optimizer_loop.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/loops/test_evaluation_loop_flow.py, tests/tests_pytorch/loops/test_flow_warnings.py, tests/tests_pytorch/loops/test_loops.py, tests/tests_pytorch/loops/test_training_loop.py, tests/tests_pytorch/loops/test_training_loop_flow_dict.py, tests/tests_pytorch/loops/test_training_loop_flow_scalar.py, tests/tests_pytorch/models/test_hooks.py, tests/tests_pytorch/plugins/test_double_plugin.py, tests/tests_pytorch/strategies/test_deepspeed_strategy.py, tests/tests_pytorch/strategies/test_dp.py, tests/tests_pytorch/trainer/connectors/test_data_connector.py, tests/tests_pytorch/trainer/dynamic_args/test_multiple_eval_dataloaders.py, tests/tests_pytorch/trainer/flags/test_fast_dev_run.py, tests/tests_pytorch/trainer/flags/test_min_max_epochs.py, tests/tests_pytorch/trainer/logging_/test_distributed_logging.py, tests/tests_pytorch/trainer/logging_/test_eval_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_logger_connector.py, tests/tests_pytorch/trainer/logging_/test_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_train_loop_logging.py, tests/tests_pytorch/trainer/optimization/test_manual_optimization.py, tests/tests_pytorch/trainer/optimization/test_multiple_optimizers.py, tests/tests_pytorch/trainer/optimization/test_optimizers.py, tests/tests_pytorch/trainer/test_config_validator.py, tests/tests_pytorch/trainer/test_dataloaders.py, tests/tests_pytorch/trainer/test_trainer.py, tests/tests_pytorch/tuner/test_scale_batch_size.py, tests/tests_pytorch/utilities/test_all_gather_grad.py, tests/tests_pytorch/utilities/test_auto_restart.py, tests/tests_pytorch/utilities/test_fetching.py.

🟢 pytorch_lightning: Azure GPU

Check ID	Status
pytorch-lightning (GPUs)	success	✅

These checks are required after the changes to src/lightning/pytorch/callbacks/callback.py, src/lightning/pytorch/core/hooks.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/demos/boring_classes.py, src/lightning/pytorch/loops/dataloader/evaluation_loop.py, src/lightning/pytorch/loops/epoch/evaluation_epoch_loop.py, src/lightning/pytorch/loops/epoch/training_epoch_loop.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/optimization/manual.py, src/lightning/pytorch/trainer/configuration_validator.py, src/lightning/pytorch/trainer/connectors/logger_connector/fx_validator.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/types.py, tests/tests_pytorch/accelerators/test_ipu.py, tests/tests_pytorch/accelerators/test_tpu.py, tests/tests_pytorch/callbacks/progress/test_tqdm_progress_bar.py, tests/tests_pytorch/callbacks/test_callback_hook_outputs.py, tests/tests_pytorch/callbacks/test_lr_monitor.py, tests/tests_pytorch/checkpointing/test_checkpoint_callback_frequency.py, tests/tests_pytorch/checkpointing/test_model_checkpoint.py, tests/tests_pytorch/checkpointing/test_trainer_checkpoint.py, tests/tests_pytorch/core/test_datamodules.py, tests/tests_pytorch/core/test_lightning_module.py, tests/tests_pytorch/core/test_lightning_optimizer.py, tests/tests_pytorch/helpers/deterministic_model.py, tests/tests_pytorch/loggers/test_all.py, tests/tests_pytorch/loggers/test_logger.py, tests/tests_pytorch/loggers/test_neptune.py, tests/tests_pytorch/loggers/test_tensorboard.py, tests/tests_pytorch/loops/optimization/test_optimizer_loop.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/loops/test_evaluation_loop_flow.py, tests/tests_pytorch/loops/test_flow_warnings.py, tests/tests_pytorch/loops/test_loops.py, tests/tests_pytorch/loops/test_training_loop.py, tests/tests_pytorch/loops/test_training_loop_flow_dict.py, tests/tests_pytorch/loops/test_training_loop_flow_scalar.py, tests/tests_pytorch/models/test_hooks.py, tests/tests_pytorch/plugins/test_double_plugin.py, tests/tests_pytorch/strategies/test_deepspeed_strategy.py, tests/tests_pytorch/strategies/test_dp.py, tests/tests_pytorch/trainer/connectors/test_data_connector.py, tests/tests_pytorch/trainer/dynamic_args/test_multiple_eval_dataloaders.py, tests/tests_pytorch/trainer/flags/test_fast_dev_run.py, tests/tests_pytorch/trainer/flags/test_min_max_epochs.py, tests/tests_pytorch/trainer/logging_/test_distributed_logging.py, tests/tests_pytorch/trainer/logging_/test_eval_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_logger_connector.py, tests/tests_pytorch/trainer/logging_/test_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_train_loop_logging.py, tests/tests_pytorch/trainer/optimization/test_manual_optimization.py, tests/tests_pytorch/trainer/optimization/test_multiple_optimizers.py, tests/tests_pytorch/trainer/optimization/test_optimizers.py, tests/tests_pytorch/trainer/test_config_validator.py, tests/tests_pytorch/trainer/test_dataloaders.py, tests/tests_pytorch/trainer/test_trainer.py, tests/tests_pytorch/tuner/test_scale_batch_size.py, tests/tests_pytorch/utilities/test_all_gather_grad.py, tests/tests_pytorch/utilities/test_auto_restart.py, tests/tests_pytorch/utilities/test_fetching.py.

🟢 pytorch_lightning: Azure HPU

Check ID	Status
pytorch-lightning (HPUs)	success	✅

These checks are required after the changes to src/lightning/pytorch/callbacks/callback.py, src/lightning/pytorch/core/hooks.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/demos/boring_classes.py, src/lightning/pytorch/loops/dataloader/evaluation_loop.py, src/lightning/pytorch/loops/epoch/evaluation_epoch_loop.py, src/lightning/pytorch/loops/epoch/training_epoch_loop.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/optimization/manual.py, src/lightning/pytorch/trainer/configuration_validator.py, src/lightning/pytorch/trainer/connectors/logger_connector/fx_validator.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/types.py, tests/tests_pytorch/accelerators/test_ipu.py, tests/tests_pytorch/accelerators/test_tpu.py, tests/tests_pytorch/callbacks/progress/test_tqdm_progress_bar.py, tests/tests_pytorch/callbacks/test_callback_hook_outputs.py, tests/tests_pytorch/callbacks/test_lr_monitor.py, tests/tests_pytorch/checkpointing/test_checkpoint_callback_frequency.py, tests/tests_pytorch/checkpointing/test_model_checkpoint.py, tests/tests_pytorch/checkpointing/test_trainer_checkpoint.py, tests/tests_pytorch/core/test_datamodules.py, tests/tests_pytorch/core/test_lightning_module.py, tests/tests_pytorch/core/test_lightning_optimizer.py, tests/tests_pytorch/helpers/deterministic_model.py, tests/tests_pytorch/loggers/test_all.py, tests/tests_pytorch/loggers/test_logger.py, tests/tests_pytorch/loggers/test_neptune.py, tests/tests_pytorch/loggers/test_tensorboard.py, tests/tests_pytorch/loops/optimization/test_optimizer_loop.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/loops/test_evaluation_loop_flow.py, tests/tests_pytorch/loops/test_flow_warnings.py, tests/tests_pytorch/loops/test_loops.py, tests/tests_pytorch/loops/test_training_loop.py, tests/tests_pytorch/loops/test_training_loop_flow_dict.py, tests/tests_pytorch/loops/test_training_loop_flow_scalar.py, tests/tests_pytorch/models/test_hooks.py, tests/tests_pytorch/plugins/test_double_plugin.py, tests/tests_pytorch/strategies/test_deepspeed_strategy.py, tests/tests_pytorch/strategies/test_dp.py, tests/tests_pytorch/trainer/connectors/test_data_connector.py, tests/tests_pytorch/trainer/dynamic_args/test_multiple_eval_dataloaders.py, tests/tests_pytorch/trainer/flags/test_fast_dev_run.py, tests/tests_pytorch/trainer/flags/test_min_max_epochs.py, tests/tests_pytorch/trainer/logging_/test_distributed_logging.py, tests/tests_pytorch/trainer/logging_/test_eval_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_logger_connector.py, tests/tests_pytorch/trainer/logging_/test_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_train_loop_logging.py, tests/tests_pytorch/trainer/optimization/test_manual_optimization.py, tests/tests_pytorch/trainer/optimization/test_multiple_optimizers.py, tests/tests_pytorch/trainer/optimization/test_optimizers.py, tests/tests_pytorch/trainer/test_config_validator.py, tests/tests_pytorch/trainer/test_dataloaders.py, tests/tests_pytorch/trainer/test_trainer.py, tests/tests_pytorch/tuner/test_scale_batch_size.py, tests/tests_pytorch/utilities/test_all_gather_grad.py, tests/tests_pytorch/utilities/test_auto_restart.py, tests/tests_pytorch/utilities/test_fetching.py.

🟢 pytorch_lightning: Azure IPU

Check ID	Status
pytorch-lightning (IPUs)	success	✅

These checks are required after the changes to src/lightning/pytorch/callbacks/callback.py, src/lightning/pytorch/core/hooks.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/demos/boring_classes.py, src/lightning/pytorch/loops/dataloader/evaluation_loop.py, src/lightning/pytorch/loops/epoch/evaluation_epoch_loop.py, src/lightning/pytorch/loops/epoch/training_epoch_loop.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/optimization/manual.py, src/lightning/pytorch/trainer/configuration_validator.py, src/lightning/pytorch/trainer/connectors/logger_connector/fx_validator.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/types.py, tests/tests_pytorch/accelerators/test_ipu.py, tests/tests_pytorch/accelerators/test_tpu.py, tests/tests_pytorch/callbacks/progress/test_tqdm_progress_bar.py, tests/tests_pytorch/callbacks/test_callback_hook_outputs.py, tests/tests_pytorch/callbacks/test_lr_monitor.py, tests/tests_pytorch/checkpointing/test_checkpoint_callback_frequency.py, tests/tests_pytorch/checkpointing/test_model_checkpoint.py, tests/tests_pytorch/checkpointing/test_trainer_checkpoint.py, tests/tests_pytorch/core/test_datamodules.py, tests/tests_pytorch/core/test_lightning_module.py, tests/tests_pytorch/core/test_lightning_optimizer.py, tests/tests_pytorch/helpers/deterministic_model.py, tests/tests_pytorch/loggers/test_all.py, tests/tests_pytorch/loggers/test_logger.py, tests/tests_pytorch/loggers/test_neptune.py, tests/tests_pytorch/loggers/test_tensorboard.py, tests/tests_pytorch/loops/optimization/test_optimizer_loop.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/loops/test_evaluation_loop_flow.py, tests/tests_pytorch/loops/test_flow_warnings.py, tests/tests_pytorch/loops/test_loops.py, tests/tests_pytorch/loops/test_training_loop.py, tests/tests_pytorch/loops/test_training_loop_flow_dict.py, tests/tests_pytorch/loops/test_training_loop_flow_scalar.py, tests/tests_pytorch/models/test_hooks.py, tests/tests_pytorch/plugins/test_double_plugin.py, tests/tests_pytorch/strategies/test_deepspeed_strategy.py, tests/tests_pytorch/strategies/test_dp.py, tests/tests_pytorch/trainer/connectors/test_data_connector.py, tests/tests_pytorch/trainer/dynamic_args/test_multiple_eval_dataloaders.py, tests/tests_pytorch/trainer/flags/test_fast_dev_run.py, tests/tests_pytorch/trainer/flags/test_min_max_epochs.py, tests/tests_pytorch/trainer/logging_/test_distributed_logging.py, tests/tests_pytorch/trainer/logging_/test_eval_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_logger_connector.py, tests/tests_pytorch/trainer/logging_/test_loop_logging.py, tests/tests_pytorch/trainer/logging_/test_train_loop_logging.py, tests/tests_pytorch/trainer/optimization/test_manual_optimization.py, tests/tests_pytorch/trainer/optimization/test_multiple_optimizers.py, tests/tests_pytorch/trainer/optimization/test_optimizers.py, tests/tests_pytorch/trainer/test_config_validator.py, tests/tests_pytorch/trainer/test_dataloaders.py, tests/tests_pytorch/trainer/test_trainer.py, tests/tests_pytorch/tuner/test_scale_batch_size.py, tests/tests_pytorch/utilities/test_all_gather_grad.py, tests/tests_pytorch/utilities/test_auto_restart.py, tests/tests_pytorch/utilities/test_fetching.py.

🟢 pytorch_lightning: Docs

Check ID	Status
make-doctest (pytorch)	success	✅
make-html (pytorch)	success	✅

These checks are required after the changes to src/lightning/pytorch/callbacks/callback.py, src/lightning/pytorch/core/hooks.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/demos/boring_classes.py, src/lightning/pytorch/loops/dataloader/evaluation_loop.py, src/lightning/pytorch/loops/epoch/evaluation_epoch_loop.py, src/lightning/pytorch/loops/epoch/training_epoch_loop.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/optimization/manual.py, src/lightning/pytorch/trainer/configuration_validator.py, src/lightning/pytorch/trainer/connectors/logger_connector/fx_validator.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/types.py, docs/source-pytorch/accelerators/accelerator_prepare.rst, docs/source-pytorch/common/lightning_module.rst, docs/source-pytorch/extensions/logging.rst, docs/source-pytorch/model/manual_optimization.rst, docs/source-pytorch/starter/style_guide.rst, docs/source-pytorch/visualize/logging_advanced.rst.

🟢 lightning_app: Tests workflow

Check ID	Status
app-pytest (macOS-11, lightning, 3.8, latest)	success	✅
app-pytest (macOS-11, lightning, 3.8, oldest)	success	✅
app-pytest (macOS-11, app, 3.9, latest)	success	✅
app-pytest (ubuntu-20.04, lightning, 3.8, latest)	success	✅
app-pytest (ubuntu-20.04, lightning, 3.8, oldest)	success	✅
app-pytest (ubuntu-20.04, app, 3.9, latest)	success	✅
app-pytest (windows-2022, lightning, 3.8, latest)	success	✅
app-pytest (windows-2022, lightning, 3.8, oldest)	success	✅
app-pytest (windows-2022, app, 3.8, latest)	success	✅

These checks are required after the changes to src/lightning/app/utilities/introspection.py.

🟢 lightning_app: Examples

Check ID	Status
app-examples (macOS-11, lightning, 3.9, latest)	success	✅
app-examples (macOS-11, lightning, 3.9, oldest)	success	✅
app-examples (macOS-11, app, 3.9, latest)	success	✅
app-examples (ubuntu-20.04, lightning, 3.9, latest)	success	✅
app-examples (ubuntu-20.04, lightning, 3.9, oldest)	success	✅
app-examples (ubuntu-20.04, app, 3.9, latest)	success	✅
app-examples (windows-2022, lightning, 3.9, latest)	success	✅
app-examples (windows-2022, lightning, 3.9, oldest)	success	✅
app-examples (windows-2022, app, 3.9, latest)	success	✅

These checks are required after the changes to src/lightning/app/utilities/introspection.py.

🟢 lightning_app: Azure

Check ID	Status
App.cloud-e2e	success	✅

These checks are required after the changes to src/lightning/app/utilities/introspection.py.

🟢 lightning_app: Docs

Check ID	Status
make-doctest (app)	success	✅
make-html (app)	success	✅

These checks are required after the changes to src/lightning/app/utilities/introspection.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to src/lightning/app/utilities/introspection.py, src/lightning/pytorch/callbacks/callback.py, src/lightning/pytorch/core/hooks.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/demos/boring_classes.py, src/lightning/pytorch/loops/dataloader/evaluation_loop.py, src/lightning/pytorch/loops/epoch/evaluation_epoch_loop.py, src/lightning/pytorch/loops/epoch/training_epoch_loop.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/optimization/manual.py, src/lightning/pytorch/trainer/configuration_validator.py, src/lightning/pytorch/trainer/connectors/logger_connector/fx_validator.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/types.py.

🟢 install

Check ID	Status
install-pkg (ubuntu-22.04, app, 3.8)	success	✅
install-pkg (ubuntu-22.04, app, 3.10)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.8)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.10)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.8)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.10)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.8)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.10)	success	✅
install-pkg (ubuntu-22.04, notset, 3.8)	success	✅
install-pkg (ubuntu-22.04, notset, 3.10)	success	✅
install-pkg (macOS-12, app, 3.8)	success	✅
install-pkg (macOS-12, app, 3.10)	success	✅
install-pkg (macOS-12, fabric, 3.8)	success	✅
install-pkg (macOS-12, fabric, 3.10)	success	✅
install-pkg (macOS-12, pytorch, 3.8)	success	✅
install-pkg (macOS-12, pytorch, 3.10)	success	✅
install-pkg (macOS-12, lightning, 3.8)	success	✅
install-pkg (macOS-12, lightning, 3.10)	success	✅
install-pkg (macOS-12, notset, 3.8)	success	✅
install-pkg (macOS-12, notset, 3.10)	success	✅
install-pkg (windows-2022, app, 3.8)	success	✅
install-pkg (windows-2022, app, 3.10)	success	✅
install-pkg (windows-2022, fabric, 3.8)	success	✅
install-pkg (windows-2022, fabric, 3.10)	success	✅
install-pkg (windows-2022, pytorch, 3.8)	success	✅
install-pkg (windows-2022, pytorch, 3.10)	success	✅
install-pkg (windows-2022, lightning, 3.8)	success	✅
install-pkg (windows-2022, lightning, 3.10)	success	✅
install-pkg (windows-2022, notset, 3.8)	success	✅
install-pkg (windows-2022, notset, 3.10)	success	✅

These checks are required after the changes to src/lightning/app/utilities/introspection.py, src/lightning/pytorch/callbacks/callback.py, src/lightning/pytorch/core/hooks.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/demos/boring_classes.py, src/lightning/pytorch/loops/dataloader/evaluation_loop.py, src/lightning/pytorch/loops/epoch/evaluation_epoch_loop.py, src/lightning/pytorch/loops/epoch/training_epoch_loop.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/optimization/manual.py, src/lightning/pytorch/trainer/configuration_validator.py, src/lightning/pytorch/trainer/connectors/logger_connector/fx_validator.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/types.py.

🟢 link-check

Check ID	Status
markdown-link-check	success	✅

These checks are required after the changes to src/lightning/pytorch/CHANGELOG.md.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

juliawilkins · 2023-04-05T14:50:25Z

Hi,

I have a question about the migration. I've been doing averaging of some metrics at the end of a validation epoch (nearly identical to the example in the migration description above). I'm training with DDP across 8 devices and I'm wondering whether the averaging in the example above accounts for multiple devices.

If I understand correctly, the model is instantiated separately for each of the 8 devices. If I append to a metrics list for each model, there would be 8 lists (for each of the 8 models). What is the right way to average the metrics from all 8 devices at the end of the validation epoch?

Thanks a lot!

@fishbotics @carmocca Following up on this, do you have a simple working example of how to best log with 2.0 + DDP? I'm still a bit confused on the best way to go about this - with maintaining multiple lists (manually?) or does something like sync_dist=Truedo the trick out of the box? Thanks!

TheShadow29 · 2023-04-07T22:36:52Z

If I am understanding correctly, to regain previous functionality we need to do:

class Module(pl.LightningModule):
   def on_validation_epoch_start(self) -> None:
        super().on_validation_epoch_start()
        self.val_output_list = []
        return
   def validation_step(self, batch):
        ....
        self.val_output_list += results

   def on_validation_epoch_end(self):
       # use val_output_list as needed

carmocca · 2023-04-10T13:52:05Z

@juliawilkins What exactly do you want to log? The "keeping outputs in memory" feature was completely separate from self.log(). You can log any scalar value. sync_dist will reduce the scalar across processes

ref Lightning-AI/pytorch-lightning#16520

carmocca self-assigned this Jan 26, 2023

github-actions bot added app Generic label for Lightning App package pl Generic label for PyTorch Lightning package labels Jan 26, 2023

carmocca added breaking change Includes a breaking change lightningmodule pl.LightningModule hooks Related to the hooks API and removed app Generic label for Lightning App package labels Jan 26, 2023

carmocca force-pushed the refactor/epoch-end-hook-removal branch from 2f301a1 to 55a5a51 Compare January 26, 2023 18:17

github-actions bot added the app Generic label for Lightning App package label Jan 26, 2023

carmocca added this to the 2.0 milestone Jan 30, 2023

carmocca marked this pull request as ready for review January 30, 2023 17:55

carmocca requested review from tchaton, lantiga, awaelchli, hhsecond, ethanwharris, williamFalcon, justusschock, edenlightning and Borda as code owners January 30, 2023 17:55

carmocca added 3 commits January 30, 2023 19:10

Update docs

70b8cb6

Update src

dba6fa6

Update tests

b27175e

carmocca force-pushed the refactor/epoch-end-hook-removal branch from e7b6f0f to b27175e Compare January 30, 2023 18:13

carmocca mentioned this pull request Jan 30, 2023

Run on_train_epoch_end after the LM for callbacks that monitor #16567

Merged

mergify bot added the has conflicts label Feb 1, 2023

Merge branch 'master' into refactor/epoch-end-hook-removal

2b1f714

mergify bot added has conflicts and removed has conflicts labels Feb 1, 2023

AzulGarza mentioned this pull request Apr 3, 2023

[FEAT] Add support for lightning>=2.0.0, and torch>=2.0.0 Nixtla/neuralforecast#498

Merged

aleksandr-mokrov mentioned this pull request Apr 3, 2023

110: Replace MO CLI with Python API and use PTQ nncf API openvinotoolkit/openvino_notebooks#965

Merged

chinnusai25 mentioned this pull request Apr 6, 2023

Issue with pytorch lightning ChenFengYe/motion-latent-diffusion#24

Closed

kdgutier mentioned this pull request Apr 7, 2023

Validation Memory Leakage for RecurrentBased models with Pytorch 2.0 / PL 2.0 Nixtla/neuralforecast#513

Closed

cash-mckeeman mentioned this pull request Apr 16, 2023

Misconfiguration Exception: The provided lr scheduler 'StepLR' doesn't follow PyTorch's LRScheduler API. And Leaked semaphore objects to clean up at shutdown. Nixtla/neuralforecast#524

Closed

soutrik7771 mentioned this pull request Apr 16, 2023

[Auto] FileNotFoundError on Windows Nixtla/neuralforecast#526

Closed

SagiPolaczek mentioned this pull request Apr 19, 2023

Support Lightning >=2.0.0 and Pandas >=2.0.0 BiomedSciAI/fuse-med-ml#301

Merged

xianyuanliu mentioned this pull request May 3, 2023

PyTorch 2.0 pykale/pykale#375

Closed

ddelange mentioned this pull request May 4, 2023

Add support for python 3.11 autogluon/autogluon#3190

Merged

ddelange added a commit to ddelange/autogluon that referenced this pull request May 4, 2023

Rename validation_epoch_end -> on_validation_epoch_end

0d4f8a4

ref Lightning-AI/pytorch-lightning#16520

yinweisu pushed a commit to autogluon/autogluon that referenced this pull request May 8, 2023

Rename validation_epoch_end -> on_validation_epoch_end

cea73b2

ref Lightning-AI/pytorch-lightning#16520

Wadha-Almattar mentioned this pull request May 27, 2023

NotImplementedError: Support for training_epoch_end shatz01/MoCoV2_CIFAR10#2

Open

pallaviyn referenced this pull request in talhaanwarch/youtube-tutorials Jun 2, 2023

Add files via upload

b6f500e

OrilinZ mentioned this pull request Jun 15, 2023

validation_epoch_end problem, how can I fix it? ljh0v0/D3PM-Pytorch#1

Open

shobhitagrawal1 mentioned this pull request Jul 11, 2023

several errors in trial runs nitzanlab/biolord#6

Closed

athitten mentioned this pull request Jul 27, 2023

Upgrade to pytorch lightning 2.0 NVIDIA/NeMo#6433

Merged

8 tasks

HaviZou mentioned this pull request Aug 2, 2023

训练时，train 命令出现异常 breezedeus/CnOCR#270

Closed

anteju mentioned this pull request Aug 9, 2023

[Fix] AudioCodecModel training fails with PTL 2.0 NVIDIA/NeMo#7190

Closed

8 tasks

utility-aagrawal mentioned this pull request Sep 7, 2023

Requirements/pre-requisites for this code to run? SShah30-hue/sentiment-analysis-ensemble#1

Open

xings-sdnu mentioned this pull request Oct 13, 2023

训练问题 RangiLyu/nanodet#535

Open

4theKnowledge mentioned this pull request Oct 17, 2023

version incompatible Babelscape/rebel#72

Closed

thomaschhh mentioned this pull request Nov 20, 2023

Training - NotImplementedError bene-ges/nemo_compatible#15

Closed

jimmylihui mentioned this pull request Nov 23, 2023

training_epoch_end error amazon-science/unconditional-time-series-diffusion#2

Closed

darkway-s mentioned this pull request Nov 24, 2023

Issue about requirements.txt danielschroter/human_value_detector#7

Open

Madjid-CH mentioned this pull request Jan 3, 2024

fixed backward compatibility problem with pytorch lightening v2.0.0 bezirganyan/m2-mixer#3

Merged

alexanderwerning mentioned this pull request Feb 27, 2024

SEDWrapper sed_model problem RetroCirce/HTS-Audio-Transformer#54

Closed

ChingisBadmaev mentioned this pull request Mar 11, 2024

examples (notebooks) have not been updated qubvel/segmentation_models.pytorch#860

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove memory-retaining epoch-end hooks #16520

Remove memory-retaining epoch-end hooks #16520

carmocca commented Jan 26, 2023 •

edited

github-actions bot commented Jan 30, 2023 •

edited

juliawilkins commented Apr 5, 2023 •

edited

TheShadow29 commented Apr 7, 2023

carmocca commented Apr 10, 2023 •

edited

Remove memory-retaining epoch-end hooks #16520

Remove memory-retaining epoch-end hooks #16520

Conversation

carmocca commented Jan 26, 2023 • edited

Migration guide

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

github-actions bot commented Jan 30, 2023 • edited

⚡ Required checks status: All passing 🟢

Groups summary

juliawilkins commented Apr 5, 2023 • edited

TheShadow29 commented Apr 7, 2023

carmocca commented Apr 10, 2023 • edited

carmocca commented Jan 26, 2023 •

edited

github-actions bot commented Jan 30, 2023 •

edited

juliawilkins commented Apr 5, 2023 •

edited

carmocca commented Apr 10, 2023 •

edited