L2 regularization seems to be reduplicated for FTRL optimization #223

matricer · 2019-02-22T09:50:29Z

L2 regularization seems to be reduplicated for FTRL optimization. Take LR as an example.

Proximal operator in FTRL has cover the L2 regularization, so the former one seems to be reduplicated. FM and FFM have similar problem.

aksnzhy · 2019-02-22T12:23:03Z

@matricer Thanks for your issue. I will check it as soon as possible.

etveritas · 2019-02-25T15:09:16Z

@matricer
This is the pseudocde in paper Ad Click Prediction: a View from the Trenches

I think line 141 calculate the gradient with L2 regularization, and line 150-152 update the paramter w.

matricer · 2019-02-25T15:36:11Z

@etveritas,
you can see the pseudocde:
p_t = \sigma (x_t * w)
g_i = (p_t - y_t) x_t
there is no L2 regularization attached to the gradient computing, because the following proximal operator (updating w_{t,i}) has covered both L1 regularization and L2 regularization.
In fact, these are two ways to implement L2 regularization: weight decay and proximal operator. Weight decay is also called shrinkage-type L2 (which is the addition of an L2 penalty to the loss function). Proximal operator is called online L2 (the L2 penalty given in the paper above).
You can also refer to MXNet or Tensorflow FTRL implementation.

etveritas · 2019-02-26T02:04:46Z

@matricer I see what you mean.I find that comment in tensorflow, it says:

See this paper.This version has support for both online L2 (the L2 penalty given in the paper
above) and shrinkage-type L2 (which is the addition of an L2 penalty to the
loss function

I guess it's same as tensorflow, they both use both online L2 and shrinkage-type L2, and this two L2 is same value in xlearn.

matricer · 2019-02-26T06:36:13Z

@etveritas I get your idea. In tensorflow, in the absence of L1 regularization, FTRL updating gives:
w_{t+1} = w_t - lr_t / (1 + 2 * L2 * lr_t) * g_t - 2 * L2_shrinkage * lr_t / (1 + 2 * L2 * lr_t) * w_t
here, L2 is online L2 and L2_shrinkage is shrinkage-type L2. In XLearn, without L1 regularization FTRL updating gives:
w_{t+1} = w_t - lr_t / (1 + 2 * L2* lr_t) * g_t - 2 * L2 * lr_t / (1 + 2 * L2 * lr_t) * w_t

etveritas · 2019-02-26T07:17:21Z

@matricer yep.

matricer · 2019-02-26T09:39:09Z

@aksnzhy @etveritas thanks~

matricer closed this as completed Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L2 regularization seems to be reduplicated for FTRL optimization #223

L2 regularization seems to be reduplicated for FTRL optimization #223

matricer commented Feb 22, 2019

aksnzhy commented Feb 22, 2019

etveritas commented Feb 25, 2019

matricer commented Feb 25, 2019

etveritas commented Feb 26, 2019

matricer commented Feb 26, 2019 •

edited

etveritas commented Feb 26, 2019

matricer commented Feb 26, 2019

L2 regularization seems to be reduplicated for FTRL optimization #223

L2 regularization seems to be reduplicated for FTRL optimization #223

Comments

matricer commented Feb 22, 2019

aksnzhy commented Feb 22, 2019

etveritas commented Feb 25, 2019

matricer commented Feb 25, 2019

etveritas commented Feb 26, 2019

matricer commented Feb 26, 2019 • edited

etveritas commented Feb 26, 2019

matricer commented Feb 26, 2019

matricer commented Feb 26, 2019 •

edited