Use of FP16 in backward with create_graph = True? #22

sjscotti · 2021-09-08T13:52:41Z

Hi
I have a quick question. For your transformer or any other application, have you used FP16 when getting gradients from a backward call? In the model I am working with, for any scale factor on the loss that I’ve tried, backward seems to give reasonable gradients when I don’t set create_graph to True. But when I do set it to true, while some of the gradients are the same as with it set to False, many others show up as nan’s. All seems OK when I use FP32 operations, but I’d like to get FP16’s advantages in GPU memory/speed.
Any suggestions you can provide would be appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of FP16 in backward with create_graph = True? #22

Use of FP16 in backward with create_graph = True? #22

sjscotti commented Sep 8, 2021

Use of FP16 in backward with create_graph = True? #22

Use of FP16 in backward with create_graph = True? #22

Comments

sjscotti commented Sep 8, 2021