-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exceptions and failures when use MultiWorkerMirroredStrategy #373
Comments
can you try to replace AdamWeightDecay by simple Adam first ? |
Yes. I did it but no luck and faced the same error again. |
Yes. I did it but no luck and faced the same error again. |
Hi, any update? |
Hi, just to confirm, the fix added (given below) will solve my problem? Please confirm if the fix is against this bug. Support Multi-GPU gradient Accumulate for trainer. #377 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
When I use tf.distribute.experimental.MultiWorkerMirroredStrategy to run training on multiple machine I face following errors. Please advise when other necessary changes are needed.
2020-11-16 12:03:50,968 (cross_device_ops:1130) INFO: Collective batch_all_reduce for IndexedSlices: 1 all-reduces, group_size = 2
2020-11-16 12:03:56.443402: W tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:439] error: Internal: Complete shape not known for AdamWeightDecay/allreduce/CollectiveReduce_23
2020-11-16 12:03:56.443474: W tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:1121] error: Internal: Complete shape not known for AdamWeightDecay/allreduce/CollectiveReduce_23
2020-11-16 12:03:56.443606: E tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:1138] ScopedAllocatorOptimizer: Internal: Complete shape not known for AdamWeightDecay/allreduce/CollectiveReduce_23
The text was updated successfully, but these errors were encountered: