Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

the grad of lars should be scaled in lbsgd #15102

Open
starimpact opened this issue May 30, 2019 · 7 comments
Open

the grad of lars should be scaled in lbsgd #15102

starimpact opened this issue May 30, 2019 · 7 comments

Comments

@starimpact
Copy link
Contributor

starimpact commented May 30, 2019

 776     def _get_lars(self, weight, g, wd):
 777         """Returns a scaling factor for the learning rate for this layer
 778         default is 1
 779         """
 780         weight2 = self._l2norm(weight)
 781         grad2 = self._l2norm(g)

 782         grad2 = grad2*(self.rescale_grad**2)

 783         lars = math.sqrt(weight2 / (grad2 + wd * weight2 + 1e-18))
 784         if lars < 0.01:
 785             lars = 0.01
 786         elif lars > 100:
 787             lars = 100
 788         return lars
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@frankfliu
Copy link
Contributor

@mxnet-label-bot add [operator, bug]

@abhinavs95
Copy link
Contributor

Hi @starimpact Could you provide some more info like a brief description of the problem with a minimum reproducible example?

@abhinavs95
Copy link
Contributor

@mxnet-label-bot add [Pending Requester Info]

@lanking520
Copy link
Member

lanking520 commented Jul 17, 2019

The user point to a valid location:

python/mxnet/optimizer/optimizer.py

Please track this file for further investigation.

@starimpact could you please bring more information about why this change is necessary?

@anirudhacharya
Copy link
Member

@starimpact please try this optimizer https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer/optimizer.py#L788 and close this issue if your concern is addressed. lbsgd is likely to be deprecated.

@starimpact
Copy link
Contributor Author

_l2norm is time consuming for a large parameter, so I suggest it should be:

    def _l2norm(self, v):
        "inner product implementation"
        #for big local parameter
        v = v.reshape(-1)
        if len(v) > 100000:
            step = len(v)/100000+1
            v = v[::step]
        norm = multiply(v, v).asnumpy().sum()
        #norm = (multiply(v, v).sum()).asnumpy()
        norm = math.sqrt(norm)
        return norm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants