diff --git a/docs/source/common/optimizers.rst b/docs/source/common/optimizers.rst index 804bf2b1d991a..7bd72730ac4ce 100644 --- a/docs/source/common/optimizers.rst +++ b/docs/source/common/optimizers.rst @@ -299,10 +299,14 @@ Under the hood, Lightning does the following: for epoch in epochs: for batch in data: - loss = model.training_step(batch, batch_idx, ...) - optimizer.zero_grad() - loss.backward() - optimizer.step() + + def closure(): + loss = model.training_step(batch, batch_idx, ...) + optimizer.zero_grad() + loss.backward() + return loss + + optimizer.step(closure) for lr_scheduler in lr_schedulers: lr_scheduler.step() @@ -314,14 +318,22 @@ In the case of multiple optimizers, Lightning does the following: for epoch in epochs: for batch in data: for opt in optimizers: - loss = model.training_step(batch, batch_idx, optimizer_idx) - opt.zero_grad() - loss.backward() - opt.step() + + def closure(): + loss = model.training_step(batch, batch_idx, optimizer_idx) + opt.zero_grad() + loss.backward() + return loss + + opt.step(closure) for lr_scheduler in lr_schedulers: lr_scheduler.step() +As can be seen in the code snippet above, Lightning defines a closure with ``training_step``, ``zero_grad`` +and ``backward`` for the optimizer to execute. This mechanism is in place to support optimizers which operate on the +output of the closure (e.g. the loss) or need to call the closure several times (e.g. :class:`~torch.optim.LBFGS`). + .. warning:: Before 1.2.2, Lightning internally calls ``backward``, ``step`` and ``zero_grad`` in the order. From 1.2.2, the order is changed to ``zero_grad``, ``backward`` and ``step``. @@ -396,8 +408,11 @@ For example, here step optimizer A every batch and optimizer B every 2 batches. # update discriminator every 2 steps if optimizer_idx == 1: if (batch_idx + 1) % 2 == 0: - # the closure (which includes the `training_step`) won't run if the line below isn't executed + # the closure (which includes the `training_step`) will be executed by `optimizer.step` optimizer.step(closure=optimizer_closure) + else: + # optional: call the closure by itself to run `training_step` + `backward` without an optimizer step + optimizer_closure() # ... # add as many optimizers as you want diff --git a/docs/source/governance.rst b/docs/source/governance.rst index 5c29f7d0da544..4114ccdb8a818 100644 --- a/docs/source/governance.rst +++ b/docs/source/governance.rst @@ -39,7 +39,7 @@ Board Alumni ------ -- Jeff Yang (`ydcjeff `_) +- Jeff Yang (`ydcjeff `_) - Jeff Ling (`jeffling `_) - Teddy Koker (`teddykoker `_) - Nate Raw (`nateraw `_)