max_grad_norm is ignored in FP16 training

I am currently fighting with a dynamic loss scale that is constantly decreasing due to gradient overflows. Setting `max_grad_norm` in the JSON config has no effect since it is overidden in `deepspeed_light.py` https://github.com/microsoft/DeepSpeed/blob/13fd3dca2abb6d6a6af62d46457fff8a1a678a4a/deepspeed/pt/deepspeed_light.py#L407-L408:

I think this modification should be removed.

Kind regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_grad_norm is ignored in FP16 training #102

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

max_grad_norm is ignored in FP16 training #102

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions