FusedAdam optimizer doesn't have `set_to_none` keyword argument

PyTorch Optimizer has `set_to_none` keyword argument. FusedAdam from TE doesn't have this kwarg, despite inheriting from the `torch.optim.Optimizer`. It's a broken inheritance protocol and it leads to various issues. For example `torch.distributed.checkpoint()` assumes `set_to_none` is present in the Optimizer when initializing the Optimizer states in [this code line](https://github.com/pytorch/pytorch/blob/main/torch/distributed/checkpoint/state_dict.py#L625). Currently it's broken with the TE FusedAdam optimizer.

I understand TE FusedAdam has `set_grad_none` attribute, but it should still incorporate `set_to_none` kwargs to `zero_grad` method, otherwise some PyTorch functionalities are broken.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FusedAdam optimizer doesn't have `set_to_none` keyword argument #1453

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FusedAdam optimizer doesn't have set_to_none keyword argument #1453

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

FusedAdam optimizer doesn't have `set_to_none` keyword argument #1453