I am currently fighting with a dynamic loss scale that is constantly decreasing due to gradient overflows. Setting max_grad_norm in the JSON config has no effect since it is overidden in deepspeed_light.py https://github.com/microsoft/DeepSpeed/blob/13fd3dca2abb6d6a6af62d46457fff8a1a678a4a/deepspeed/pt/deepspeed_light.py#L407-L408:
I think this modification should be removed.
Kind regards