NAG Optimizer with multi-precision support #14568

anirudhacharya · 2019-03-29T18:02:03Z

Description

NAG Optimizer with multi-precision support. Tests already exist for this.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

C++ implementation of nag opt with multi-precision support.

@eric-haibin-lin @ptrendx

anirudhacharya · 2019-03-29T18:02:40Z

Still need to add proper doc strings for the update functions. The PR can be reviewed, cpplint has failed which i will fix

abhinavs95 · 2019-03-29T19:35:50Z

@mxnet-label-bot add [Optimizer, pr-awaiting-review]

ptrendx · 2019-03-29T19:45:27Z

Cool :-) - I actually wanted to do this after finishing AMP. Since you used the same lgeneral layoutlike I did for SGD - do you think it would be beneficial to generalize it a little bit (so that adding more optimizer like this is easier in the future)?

anirudhacharya · 2019-03-29T20:19:27Z

@ptrendx MP_NAG_InferType can be generalized right away. I will see what else can be generalized. do you have any specific suggestions?

lupesko · 2019-03-29T21:19:04Z

@anirudhacharya very cool! Would be great if you can comment a bit about how/when is NAG delivering better results compared to other optimizers.

pengzhao-intel · 2019-03-30T08:25:44Z

@anirudhacharya @lupesko it's a great feature which is also very useful for CPU BF16 :)

Feel free to ping me if anything needs our team to cover.
@ZhennanQin @TaoLv

piyushghai · 2019-04-09T00:45:09Z

@anirudhacharya Can you look into the CI failures on this one. ?

anirudhacharya · 2019-04-11T00:57:45Z

@lupesko
Nesterov Accelerated Gradient or NAG is an improvement over SGD with Momentum optimizer.

As we know SGD with Momentum helps to accelerate the optimizers descent to the desired minima by reducing the oscillations in the parameter updates by adding a fraction of the update of the past time step to the update of the current time step. This enables faster convergence of the model.

But this can also cause the optimizer to overshoot the global minima due to larger parameter updates. NAG optimizer fixes this by changing the update rules to decelerate the optimizer as it nears the global minima. This has shown better performance while training RNNs as described here - https://arxiv.org/abs/1212.0901. The following diagram will illustrate the difference -

(image source is a stack exchange thread)

This PR also adds multi-precision support to the NAG optimizer, which is very useful while training in fp16 because multi-precision optimizers keep a copy of the weights in fp32 but performs backward pass and parameter updates in fp16. This technique prevents any loss in accuracy of the model while giving us significant benefits in improved memory and time taken for training. For more details on mixed precision training, please see here - https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/

python/mxnet/optimizer/optimizer.py

src/operator/optimizer_op.cc

tests/python/unittest/test_optimizer.py

src/operator/optimizer_op.cc

anirudhacharya · 2019-05-06T21:03:07Z

@mxnet-label-bot update [pr-awaiting-merge]

pinaraws · 2019-05-20T16:38:33Z

@nswamy @sandeep-krishnamurthy @anirudh2290 - Please review and merge

* nag_mp * doc * reuse sgd updates where convenient

anirudhacharya requested a review from eric-haibin-lin as a code owner March 29, 2019 18:02

anirudhacharya force-pushed the nag_mp branch 2 times, most recently from fb7ba2a to 041262b Compare March 29, 2019 18:42

marcoabreu added Optimizer pr-awaiting-review PR is waiting for code review labels Mar 29, 2019

anirudhacharya force-pushed the nag_mp branch 3 times, most recently from f8afbf3 to 9d02dd3 Compare April 11, 2019 00:12

eric-haibin-lin suggested changes Apr 12, 2019

View reviewed changes

python/mxnet/optimizer/optimizer.py Outdated Show resolved Hide resolved

eric-haibin-lin reviewed Apr 14, 2019

View reviewed changes

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved

anirudhacharya force-pushed the nag_mp branch from e2eb3cf to d78b03d Compare April 14, 2019 06:03

anirudhacharya force-pushed the nag_mp branch 3 times, most recently from 56f0e19 to af9d8df Compare April 25, 2019 19:49

eric-haibin-lin reviewed Apr 26, 2019

View reviewed changes

tests/python/unittest/test_optimizer.py Show resolved Hide resolved

eric-haibin-lin reviewed Apr 26, 2019

View reviewed changes

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved

anirudhacharya force-pushed the nag_mp branch 3 times, most recently from 5fdd6f4 to 3d32e96 Compare April 30, 2019 17:49

eric-haibin-lin approved these changes May 5, 2019

View reviewed changes

marcoabreu removed the pr-awaiting-review PR is waiting for code review label May 6, 2019

marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed Optimizer labels May 6, 2019

Anirudh Acharya added 3 commits May 20, 2019 12:42

nag_mp

4ac083b

doc

11895e5

reuse sgd updates where convenient

5c17c79

anirudhacharya force-pushed the nag_mp branch from de6d4e4 to 5c17c79 Compare May 20, 2019 19:43

anirudh2290 merged commit 50495d7 into apache:master May 30, 2019

anirudhacharya deleted the nag_mp branch May 30, 2019 22:17

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

NAG Optimizer with multi-precision support (apache#14568)

b4db5c4

* nag_mp * doc * reuse sgd updates where convenient

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAG Optimizer with multi-precision support #14568

NAG Optimizer with multi-precision support #14568

anirudhacharya commented Mar 29, 2019 •

edited

anirudhacharya commented Mar 29, 2019 •

edited

abhinavs95 commented Mar 29, 2019

ptrendx commented Mar 29, 2019

anirudhacharya commented Mar 29, 2019

lupesko commented Mar 29, 2019

pengzhao-intel commented Mar 30, 2019

piyushghai commented Apr 9, 2019

anirudhacharya commented Apr 11, 2019

anirudhacharya commented May 6, 2019 •

edited

pinaraws commented May 20, 2019

NAG Optimizer with multi-precision support #14568

NAG Optimizer with multi-precision support #14568

Conversation

anirudhacharya commented Mar 29, 2019 • edited

Description

Checklist

Essentials

Changes

anirudhacharya commented Mar 29, 2019 • edited

abhinavs95 commented Mar 29, 2019

ptrendx commented Mar 29, 2019

anirudhacharya commented Mar 29, 2019

lupesko commented Mar 29, 2019

pengzhao-intel commented Mar 30, 2019

piyushghai commented Apr 9, 2019

anirudhacharya commented Apr 11, 2019

anirudhacharya commented May 6, 2019 • edited

pinaraws commented May 20, 2019

anirudhacharya commented Mar 29, 2019 •

edited

anirudhacharya commented Mar 29, 2019 •

edited

anirudhacharya commented May 6, 2019 •

edited