Fix `lr_finder` gradient calculation for `mode="exponential"` #21055

SkafteNicki · 2025-08-12T05:24:14Z

What does this PR do?

Fix the gradient calculation for lr_finder when suggestion a learning rate to start from. Currently the gradient when mode=exponential does not account for the non-homogenous stepping size which leads to wrong approximate of learning rate.

An example, on master this is the current suggested lr

with the fix

You can see that in the second plot, the red point is located earlier on the downward slope — closer to the point where the curve transitions from flat to rapidly decreasing. In contrast, the first plot’s red point is further down the slope, where the loss is still decreasing but less sharply.

Expand for script

#!/usr/bin/env python3
import lightning as L
import torch
import torchvision
import os


# Set seed for reproducibility
L.seed_everything(42)


class SimpleModel(L.LightningModule):
    def __init__(self, lr=1e-4):
        super().__init__()
        self.lr = lr
        self.model = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(28 * 28, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 64),
            torch.nn.ReLU(),
            torch.nn.Linear(64, 10)
        )

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = torch.nn.functional.cross_entropy(y_hat, y)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)


class MNISTDataModule(L.LightningDataModule):
    def __init__(self, data_dir: str = "MNIST_test/", batch_size: int = 64):
        super().__init__()
        self.data_dir = data_dir
        self.batch_size = batch_size

    def setup(self, stage: str):
        os.makedirs(self.data_dir, exist_ok=True)
        
        mnist_full = torchvision.datasets.MNIST(
            self.data_dir, train=True, download=True, 
            transform=torchvision.transforms.ToTensor()
        )
        self.mnist_train, _ = torch.utils.data.random_split(mnist_full, [5000, 55000])

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
            self.mnist_train, batch_size=self.batch_size, shuffle=True
        )


def main():
    L.seed_everything(42)

    # Create model and data
    model = SimpleModel()
    dm = MNISTDataModule()
    
    
    from lightning.pytorch.callbacks import LearningRateFinder
    lr_finder = LearningRateFinder(
        min_lr=1e-6,
        max_lr=1,
        num_training_steps=100,
        mode='exponential',
        early_stop_threshold=4.0,
        update_attr=True,
    )

    trainer = L.Trainer(
        max_epochs=1,
        callbacks=[lr_finder],
        enable_progress_bar=False,
        deterministic=True
    )
    trainer.fit(model, dm)

    lr_finder.optimal_lr.plot(suggest=True, show=True)
    print(f"Suggested learning rate: {lr_finder.optimal_lr.suggestion}")


if __name__ == "__main__":
    main()

``

</details>



<!-- Does your PR introduce any breaking changes? If yes, please list them. -->

<details>
  <summary><b>Before submitting</b></summary>

- Was this **discussed/agreed** via a GitHub issue? (not for typos and docs)
- [ ] Did you read the [contributor guideline](https://github.com/Lightning-AI/pytorch-lightning/blob/master/.github/CONTRIBUTING.md), **Pull Request** section?
- [ ] Did you make sure your **PR does only one thing**, instead of bundling different changes together?
- Did you make sure to **update the documentation** with your changes? (if necessary)
- Did you write any **new necessary tests**? (not for typos and docs)
- [ ] Did you verify new and **existing tests pass** locally with your changes?
- Did you list all the **breaking changes** introduced by this pull request?
- Did you **update the CHANGELOG**? (not for typos, docs, test updates, or minor internal changes/refactors)

<!-- In the CHANGELOG, separate each item in the unreleased section by a blank line to reduce collisions -->

</details>

## PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the [review guidelines](https://github.com/Lightning-AI/lightning/wiki/Review-guidelines). In short, see the following bullet-list:

<details>
  <summary>Reviewer checklist</summary>

- [ ] Is this pull request ready for review? (if not, please submit in draft mode)
- [ ] Check that all items from **Before submitting** are resolved
- [ ] Make sure the title is self-explanatory and the description concisely explains the PR
- [ ] Add labels and milestones (and optionally projects) to the PR so it can be classified

</details>

<!--

Did you have fun?

Make sure you had fun coding 🙃

-->


<!-- readthedocs-preview pytorch-lightning start -->
----
📚 Documentation preview 📚: https://pytorch-lightning--21055.org.readthedocs.build/en/21055/

<!-- readthedocs-preview pytorch-lightning end -->

…cki/pytorch-lightning into lr_finder/spacing_issue

* fix impl for exponential spacing * add testing * small doc fixes (cherry picked from commit 105bb20)

SkafteNicki added 3 commits August 11, 2025 12:42

fix impl for exponential spacing

eed3b69

add testing

ce7508b

small doc fixes

d500bec

SkafteNicki requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners August 12, 2025 05:24

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 12, 2025

changelog

b1ed8e2

SkafteNicki changed the title ~~Lr finder/spacing issue~~ Fix lr_finder gradient calculation for mode="exponential" Aug 12, 2025

SkafteNicki added 3 commits August 12, 2025 07:26

Merge branch 'master' into lr_finder/spacing_issue

044dd97

fix

3f63c50

Merge branch 'lr_finder/spacing_issue' of https://github.com/SkafteNi…

0c4451c

…cki/pytorch-lightning into lr_finder/spacing_issue

github-actions bot added the has conflicts label Aug 12, 2025

Merge branch 'master' into lr_finder/spacing_issue

b31b81b

github-actions bot removed the has conflicts label Aug 12, 2025

Borda approved these changes Aug 12, 2025

View reviewed changes

Borda merged commit 105bb20 into Lightning-AI:master Aug 12, 2025
84 of 92 checks passed

Borda pushed a commit that referenced this pull request Aug 13, 2025

Fix lr_finder gradient calculation for mode="exponential" (#21055)

73f6d3d

* fix impl for exponential spacing * add testing * small doc fixes (cherry picked from commit 105bb20)

Borda pushed a commit that referenced this pull request Aug 13, 2025

Fix lr_finder gradient calculation for mode="exponential" (#21055)

41c9967

* fix impl for exponential spacing * add testing * small doc fixes (cherry picked from commit 105bb20)

dennisbader mentioned this pull request Sep 1, 2025

[BUG] Learning Rate finder does not find optimal lr anymore since version 2.5.3 #21141

Open

daidahao mentioned this pull request Sep 13, 2025

LR Finder: compute gradients w.r.t. log10(lr) in exponential mode (fix torch.gradient spacing) #21171

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `lr_finder` gradient calculation for `mode="exponential"` #21055

Fix `lr_finder` gradient calculation for `mode="exponential"` #21055

Uh oh!

SkafteNicki commented Aug 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

Fix lr_finder gradient calculation for mode="exponential" #21055

Fix lr_finder gradient calculation for mode="exponential" #21055

Uh oh!

Conversation

SkafteNicki commented Aug 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Uh oh!

Uh oh!

Fix `lr_finder` gradient calculation for `mode="exponential"` #21055

Fix `lr_finder` gradient calculation for `mode="exponential"` #21055

SkafteNicki commented Aug 12, 2025 •

edited by github-actions bot

Loading