Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Aggregate SGD #13346
Currently MXNet optimizers are invoked 1 weight at a time. This leads to a lot of synchronization overhead, as updates (especially for convolutions and batchnorm) tend to be small, but each one needs to by synchronized upon.
Please feel free to remove inapplicable items for your PR.
This PR is part of upstreaming improvements to MXNet that are available in NVIDIA's NGC 18.11 MXNet container. I will use results from that container to show the impact once all the other improvements are in place. The benchmark shown is ResNet v1.5 training on single V100 32GB in DGX1-V, batch size 32.