Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of FP16 in backward with create_graph = True? #22

Open
sjscotti opened this issue Sep 8, 2021 · 0 comments
Open

Use of FP16 in backward with create_graph = True? #22

sjscotti opened this issue Sep 8, 2021 · 0 comments

Comments

@sjscotti
Copy link

sjscotti commented Sep 8, 2021

Hi
I have a quick question. For your transformer or any other application, have you used FP16 when getting gradients from a backward call? In the model I am working with, for any scale factor on the loss that I’ve tried, backward seems to give reasonable gradients when I don’t set create_graph to True. But when I do set it to true, while some of the gradients are the same as with it set to False, many others show up as nan’s. All seems OK when I use FP32 operations, but I’d like to get FP16’s advantages in GPU memory/speed.
Any suggestions you can provide would be appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant