-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GNMT v2 Tensorflow: How to enable automatic mixed precision for evaluation run #282
Comments
Using AMP with official Tensorflow is a little bit different than with NGC containers, and changes you've done, should be enough to make AMP working with official Tensorflow. I've tried to reproduce your problem. I took If this patch doesn't work for you, then the problem is probably with your setup.
|
Thank you for the patch. Your patch is doing exactly what I was intending to and I retested my code with TF-1.15 and it seems to be working while TF1.14 was throwing the error I'd posted in my original issue. Is the error showing for TF1.14 a design change or a bug in 1.14? |
It was a bug in 1.14 and it has been fixed in 1.15 |
Thanks, I'll close this issue then. |
I'm trying to run the GNMT TF code on a baremetal system and I've setup the CUDA stack and
tensorflow-gpu
v1.15. There were a few API changes for Tensorflow from 1.14 to 1.15 but after solving that, I was able to run the code for training as well as evaluation.However, looking at the logs and comparing from the NGC container, I see that this baremetal run isn't making use of AMP. I went into Nvidia's docs and found the way to enable it for training here.
I added the following line before here:
However, I can't see automatic mixed precision being used for evaluation since the optimizer is only called during Backprop. So, I tried modifying the eval function by adding the mixed_precision_rewrite to the
eval_fn()
by modifying the graph config in estimator.py:and commenting out this call.
However, this gives an error on running:
Any leads would be helpful to enable automatic mixed precision for evaluation. Thanks :)
The text was updated successfully, but these errors were encountered: