[PyTorch] Use same API in optimizer `zero_grad` as PyTorch optimizers by timmoon10 · Pull Request #1466 · NVIDIA/TransformerEngine

timmoon10 · 2025-02-08T02:24:22Z

Description

The fused optimizers (copied from Apex in #867) have a non-standard API that makes it annoying to swap with the vanilla PyTorch optimizers. This PR deprecates the set_grad_none kwarg in FusedAdam and FusedSGD in favor of a set_to_none kwarg in zero_grad, similar to the vanilla PyTorch API. I've also modified some of the constructor kwargs to be more consistent with PyTorch.

Closes #1453.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Deprecate set_grad_none kwarg in FusedAdam and FusedSGD in favor of set_to_none kwarg in zero_grad
Reorder kwargs in FusedAdam and FusedSGD constructors

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-02-08T02:25:03Z

/te-ci pytorch

ptrendx · 2025-02-20T22:01:37Z

+        weight_decay: float = 0.0,
+        nesterov: bool = False,
+        *,
        wd_after_momentum=False,


Why not add the type info to those as well?

I focused on matching the PyTorch API: https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD
I considered other options out of scope.

timmoon10 · 2025-02-21T03:19:56Z

/te-ci pytorch

…#1466) Use same API in optimizer zero_grad as PyT optimizers Signed-off-by: Tim Moon <tmoon@nvidia.com>

Use same API in optimizer zero_grad as PyT optimizers

b25612b

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added the enhancement New feature or request label Feb 8, 2025

ptrendx added the 2.1.0 label Feb 15, 2025

ptrendx reviewed Feb 20, 2025

View reviewed changes

ptrendx approved these changes Feb 20, 2025

View reviewed changes

Merge branch 'main' into optim-set-to-none

bf7d82a

timmoon10 merged commit b4fbc2b into NVIDIA:main Feb 22, 2025

timmoon10 deleted the optim-set-to-none branch February 22, 2025 02:13

timmoon10 added a commit that referenced this pull request Feb 26, 2025

[PyTorch] Use same API in optimizer zero_grad as PyTorch optimizers (…

1b384b9

…#1466) Use same API in optimizer zero_grad as PyT optimizers Signed-off-by: Tim Moon <tmoon@nvidia.com>

hungryGeek16 mentioned this pull request May 31, 2026

fix unfused padding causal sdpa #3063

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Use same API in optimizer `zero_grad` as PyTorch optimizers#1466

[PyTorch] Use same API in optimizer `zero_grad` as PyTorch optimizers#1466
timmoon10 merged 2 commits into
NVIDIA:mainfrom
timmoon10:optim-set-to-none

timmoon10 commented Feb 8, 2025

Uh oh!

timmoon10 commented Feb 8, 2025

Uh oh!

ptrendx Feb 20, 2025

Uh oh!

timmoon10 Feb 21, 2025 •

edited

Loading

Uh oh!

timmoon10 commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timmoon10 commented Feb 8, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Feb 8, 2025

Uh oh!

ptrendx Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timmoon10 Feb 21, 2025 •

edited

Loading