Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add… #8918

ashokei · 2017-12-02T00:11:25Z

Large-Batch SGD with a warmup, and a LARS strategy.

Description

Added in Large-Batch SGD with a warmup, and a LARS strategy. Also added in a Polynomial Decay learning rate scheduler. Modified the example image fit code to allow these options to be selectable.

Checklist

Essentials

[x ] Passed code style checking (make lint)
[x ] Changes are complete (i.e. I finished coding on this PR)
[x ] All changes have test coverage
For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
[x ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Added in Large-Batch SGD with a warmup, and a LARS strategy. Also added in a Polynomial Decay learning rate scheduler. Modified the example image fit code to allow these options to be selectable.

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

piiswrong · 2017-12-12T22:35:28Z

@zhreshold

eric-haibin-lin · 2018-01-12T18:53:39Z

@zhreshold could you help review?

zhreshold · 2018-01-12T22:18:30Z

python/mxnet/optimizer.py

+            elif (strategy == 'power2'):
+                mult = 1.0 + (maxmult - 1) * (nup * nup) / (nwup * nwup)
+            elif (strategy == 'power3'):
+                mult = 1.0 + (maxmult - 1) * (nup * nup) / (nwup * nwup)


Power3 is wrong

zhreshold · 2018-01-12T22:21:26Z

@ashokei See comments. Could you rebase and make pylint as well?

ashokei · 2018-01-12T23:42:35Z

@zhreshold thank you, will make requested changes.

ashokei · 2018-01-18T05:53:16Z

@zhreshold i made the requested changes, please review. thank you,

zhreshold

Please see comments. The rest part LGTM and good to be merged once this issue resolved.

zhreshold · 2018-01-26T22:56:26Z

python/mxnet/lr_scheduler.py

+        self.max_update = max_update
+        self.power = pwr
+        self.count = num_update
+        if num_update <= max_update:


This this duplicate with line 173. I understand it is for resume training, but that should be handled in __call__, see the example in MultiFactorScheduler. Therefore, num_update is not necessary in __init__

@zhreshold i removed the num_update duplicate line, can you please check.thanks.

zhreshold · 2018-01-27T06:24:55Z

python/mxnet/lr_scheduler.py

+
+    """
+
+    def __init__(self, num_update, max_update, base_lr=0.01, pwr=2):


num_update useless here

zhreshold · 2018-01-27T06:25:12Z

python/mxnet/lr_scheduler.py

+        self.base_lr_orig = self.base_lr
+        self.max_update = max_update
+        self.power = pwr
+        self.count = num_update


same for self.count

zhreshold · 2018-01-27T06:25:21Z

python/mxnet/lr_scheduler.py

+        if num_update <= self.max_update:
+            self.base_lr = self.base_lr_orig * pow(1.0 - float(num_update) / float(self.max_update),
+                                                   self.power)
+        self.count += 1


@zhreshold thanks.. i see "count" is not being used, i removed it. Though, MultiFactorScheduler seems to be tracking count.

zhreshold · 2018-01-27T21:24:32Z

@szha test_io.test_LibSVMIter fails accasionally, we should host the data (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/news20.t.bz2
) to avoid that. Add it to #9412 ??

zhreshold · 2018-01-28T22:55:46Z

@ashokei Try rebase and trigger the CI once more. We can merge once passed.

…ed in a Polynomial Decay learning rate scheduler. Modified the example image fit code to allow these options to be selectable.

ashokei · 2018-01-29T06:17:13Z

@zhreshold all done, thanks!

piiswrong · 2018-01-29T19:04:52Z

python/mxnet/optimizer.py

+        state = momentum * state + lr * rescale_grad * clip(grad, clip_gradient) + wd * weight
+        weight = weight - state
+
+    For details of the update algorithm see :class:`~mxnet.ndarray.lbsgd_update` and


@ashokei @zhreshold
Where is lbsgd_update defined? I don't see it.

Please add proper reference to relevant papar

it is the update method in LBSGD class. we can fix that to resolve to right method, the '_' is misleading.
the paper is here:
https://arxiv.org/pdf/1708.03888.pdf

piiswrong · 2018-01-29T19:05:20Z

python/mxnet/optimizer.py

+        self.adaptive = False
+        self.admult = 1  # adaptation constant
+
+    def create_state(self, index, weight):


Is this copied from SGD?
Why not inherit SGD instead?

@ashokei As suggested, could you change to inherit SGD and override create_state_multi_precision, create_state, update, update_multi_precision only if necessary. Seems like you are mixing multi_precision part into the normal one.

piiswrong · 2018-01-29T19:07:45Z

python/mxnet/optimizer.py

+    warmup_strategy: string ('linear', 'power2', 'sqrt'. , 'lars'   default : 'linear')
+    warmup_epochs: unsigned, default: 5
+    batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+    updates_per_epoch: updates_per_epoch (default: 32, Default might not reflect true number batches per epoch. Used for warmup.)


Why use warmup_epochs and updates_per_epoch? Why not just warmup_updates?
Why should it have a default value?

I guess it requires the epoch number to stop warming up, which does not depend on the number of updates.

piiswrong · 2018-01-29T19:08:05Z

python/mxnet/optimizer.py

+    warmup_epochs: unsigned, default: 5
+    batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+    updates_per_epoch: updates_per_epoch (default: 32, Default might not reflect true number batches per epoch. Used for warmup.)
+    begin_epoch: unsigned, default 0, starting epoch.


What's starting epoch? What would it do before start epoch?

@ashokei please add more details describing the strategy.

@piiswrong, The begin_epoch flag is because our data scientist saw that it was possible to stop a training partway, save it, then resume it again. We wanted to be able to have that option, which required passing in what the starting epoch is to do the learning rate decay calculations correctly.

apache#8918) * Added in Large-Batch SGD with a warmup, and a LARS startegy. Also added in a Polynomial Decay learning rate scheduler. Modified the example image fit code to allow these options to be selectable. * Fix pylint issues * pylint fixes * remove duplicate num_update * remove unused count

chenchao50 · 2018-03-08T12:26:17Z

Have you tested the accuracy of Resnet50(or AlexnetBN) in ImageNet using 'Lars' method ?

rahul003 · 2018-05-10T06:35:40Z

@ashokei Similar question as above. How do I use this? I'm unable to train resnet50 with lbsgd. What configuration works? I have an effective batch size of about 20k across 20 worker nodes

rahul003 · 2018-05-10T22:13:44Z

example/image-classification/common/fit.py

-        initializer = mx.init.Xavier(
-            rnd_type='gaussian', factor_type="in", magnitude=2)
+    # A limited number of optimizers have a warmup period
+    has_warmup = {'lbsgd', 'lbnag'}


It looks like LBNAG doesn't exist?

@chaseadams509 can you please address above comments. thanks.

Correct, lbnag was an optimizer we were experimenting with with incorporating the large-batch algorithm with Nesterov accelerated gradient. LBNAG was not showing the desired improvements, so we hadn't pushed it yet.

@chaseadams509 How large batch sizes did you experiment with?

apache#8918) * Added in Large-Batch SGD with a warmup, and a LARS startegy. Also added in a Polynomial Decay learning rate scheduler. Modified the example image fit code to allow these options to be selectable. * Fix pylint issues * pylint fixes * remove duplicate num_update * remove unused count

ThomasDelteil · 2018-07-05T23:25:56Z

@ashokei were the comments above ever addressed in a follow-up pull request?
What is this comment about: https://github.com/apache/incubator-mxnet/blame/master/python/mxnet/optimizer.py#L734 ?

Has multiply been defined anywhere: https://github.com/apache/incubator-mxnet/blame/master/python/mxnet/optimizer.py#L769 ?

See:
#11278

piiswrong assigned zhreshold and unassigned zhreshold Dec 12, 2017

zhreshold reviewed Jan 12, 2018

View reviewed changes

ashokei force-pushed the lbsgd-optimizer branch from 20deebf to 4bbea48 Compare January 18, 2018 05:51

zhreshold suggested changes Jan 26, 2018

View reviewed changes

ashokei requested a review from szha as a code owner January 27, 2018 02:04

zhreshold reviewed Jan 27, 2018

View reviewed changes

zhreshold approved these changes Jan 27, 2018

View reviewed changes

chaseadams509 and others added 5 commits January 28, 2018 18:57

Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add…

9702a04

…ed in a Polynomial Decay learning rate scheduler. Modified the example image fit code to allow these options to be selectable.

Fix pylint issues

115a8ce

pylint fixes

fad5d2a

remove duplicate num_update

d10b9e3

remove unused count

860eaeb

ashokei force-pushed the lbsgd-optimizer branch from a1717f6 to 860eaeb Compare January 29, 2018 02:59

zhreshold merged commit 785690c into apache:master Jan 29, 2018

piiswrong reviewed Jan 29, 2018

View reviewed changes

rahul003 reviewed May 10, 2018

View reviewed changes

ashokei deleted the lbsgd-optimizer branch May 10, 2018 23:12

anirudhacharya mentioned this pull request Aug 22, 2018

LBSGD doc not rendering correctly #12171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add… #8918

Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add… #8918

ashokei commented Dec 2, 2017

piiswrong commented Dec 12, 2017

eric-haibin-lin commented Jan 12, 2018

zhreshold Jan 12, 2018

zhreshold commented Jan 12, 2018

ashokei commented Jan 12, 2018

ashokei commented Jan 18, 2018 •

edited

Loading

zhreshold left a comment

zhreshold Jan 26, 2018

ashokei Jan 27, 2018

zhreshold Jan 27, 2018

zhreshold Jan 27, 2018

zhreshold Jan 27, 2018

ashokei Jan 27, 2018 •

edited

Loading

zhreshold commented Jan 27, 2018

zhreshold commented Jan 28, 2018

ashokei commented Jan 29, 2018

piiswrong Jan 29, 2018

ashokei Jan 29, 2018

piiswrong Jan 29, 2018

zhreshold Jan 29, 2018

piiswrong Jan 29, 2018

zhreshold Jan 29, 2018

piiswrong Jan 29, 2018

zhreshold Jan 29, 2018

chaseadams509 Jan 31, 2018

chenchao50 commented Mar 8, 2018

rahul003 commented May 10, 2018 •

edited

Loading

rahul003 May 10, 2018

ashokei May 10, 2018

chaseadams509 May 10, 2018 •

edited

Loading

rahul003 May 10, 2018

ThomasDelteil commented Jul 5, 2018 •

edited

Loading


		"""

		def __init__(self, num_update, max_update, base_lr=0.01, pwr=2):

Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add… #8918

Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add… #8918

Conversation

ashokei commented Dec 2, 2017

Description

Checklist

Essentials

Changes

Comments

piiswrong commented Dec 12, 2017

eric-haibin-lin commented Jan 12, 2018

Choose a reason for hiding this comment

zhreshold commented Jan 12, 2018

ashokei commented Jan 12, 2018

ashokei commented Jan 18, 2018 • edited Loading

zhreshold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashokei Jan 27, 2018 • edited Loading

Choose a reason for hiding this comment

zhreshold commented Jan 27, 2018

zhreshold commented Jan 28, 2018

ashokei commented Jan 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenchao50 commented Mar 8, 2018

rahul003 commented May 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaseadams509 May 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasDelteil commented Jul 5, 2018 • edited Loading

ashokei commented Jan 18, 2018 •

edited

Loading

ashokei Jan 27, 2018 •

edited

Loading

rahul003 commented May 10, 2018 •

edited

Loading

chaseadams509 May 10, 2018 •

edited

Loading

ThomasDelteil commented Jul 5, 2018 •

edited

Loading