Adding Proximal Gradient Descent #4848

kavyasrinet · 2017-10-16T23:01:44Z

Adding proximal gradient descent and test:
prox_param = param - learning_rate * grad
param = sign(prox_param) / (1 + learning_rate * l2) *
max { |prox_param| - learning_rate * l1 , 0 }

… proximal_gradient_descent

lcy-seso

LGTM. Thank you.

lcy-seso · 2017-10-18T11:02:19Z

paddle/operators/proximal_gd_op.cc

+      : OpProtoAndCheckerMaker(proto, op_checker) {
+    AddInput("Param",
+             "(Tensor, default Tensor<float>) "
+             "Input parameter value that has to be updated");


I think it is better to add periods at the end of the comment. Below are the same.

Sure, will do.

lcy-seso · 2017-10-18T11:15:48Z

paddle/operators/proximal_gd_op.cu

@@ -0,0 +1,20 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.


The indentation of the license has the problem. The correct indentation is like that in proximal_gd_op.cc.

lcy-seso · 2017-10-18T11:16:32Z

paddle/operators/proximal_gd_op.h

+
+    param_out->mutable_data<T>(ctx.GetPlace());
+
+    auto grad = ctx.Input<Tensor>("Grad");


auto*

Actually using auto or auto* is not consistent in our codes. Some operators implemented very early use auto and some newly implemented operators use auto* when the return value is a pointer.

I prefer to use auto*, or at least we can keep consistent inside one operator.

I see. I used auto since the variable is being used in two places:

auto g = EigenVector<T>::Flatten(*grad);

Eigen::DSizes<int, 1> grad_dsize(grad->numel());
And I could reuse the computation.

lcy-seso · 2017-10-18T12:34:35Z

paddle/operators/proximal_gd_op.h

+    auto grad = ctx.Input<Tensor>("Grad");
+
+    float l1 = ctx.Attr<float>("l1");
+    float l2 = ctx.Attr<float>("l2");


I think here it is better to use

auto l1 = static_cast<T>(ctx.Attr<float>("l1")); auto l2 = static_cast<T>(ctx.Attr<float>("l2"));

or set l1 and l2 as a template parameter as in dropout operator.

because l1 and l2 will be used in the calculation. If we use double in future, the floats will be upconverted to doubles. But I wonder if we use half-precision or other lower precision, the final results will be as we expected? so, maybe it is safer to cast l1 and l2 to T.

https://stackoverflow.com/questions/4239770/what-are-the-rules-governing-c-single-and-double-precision-mixed-calculations

… proximal_gradient_descent

lcy-seso

LGTM, thank you.

Adding Proximal Gradient Descent

dcc605e

kavyasrinet assigned dzhwinter Oct 16, 2017

qingqing01 requested review from qingqing01, dzhwinter and lcy-seso October 17, 2017 03:04

qingqing01 added the OpPorting label Oct 17, 2017

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e514379

… proximal_gradient_descent

lcy-seso previously approved these changes Oct 18, 2017

View reviewed changes

lcy-seso reviewed Oct 18, 2017

View reviewed changes

Kavya Srinet added 2 commits October 18, 2017 11:37

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

289c748

… proximal_gradient_descent

Fixing review comments

7b2fad8

kavyasrinet dismissed lcy-seso’s stale review via 7b2fad8 October 18, 2017 20:02

kavyasrinet mentioned this pull request Oct 18, 2017

ProximalGradientDescent Optimizer #4689

Closed

lcy-seso approved these changes Oct 18, 2017

View reviewed changes

kavyasrinet merged commit c10b8e8 into PaddlePaddle:develop Oct 18, 2017

kavyasrinet deleted the proximal_gradient_descent branch November 1, 2017 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Proximal Gradient Descent #4848

Adding Proximal Gradient Descent #4848

kavyasrinet commented Oct 16, 2017

lcy-seso left a comment

lcy-seso Oct 18, 2017

kavyasrinet Oct 18, 2017

lcy-seso Oct 18, 2017

lcy-seso Oct 18, 2017

kavyasrinet Oct 18, 2017

lcy-seso Oct 18, 2017 •

edited

Loading

lcy-seso left a comment

		@@ -0,0 +1,20 @@
		/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.


		param_out->mutable_data<T>(ctx.GetPlace());

		auto grad = ctx.Input<Tensor>("Grad");

Adding Proximal Gradient Descent #4848

Adding Proximal Gradient Descent #4848

Conversation

kavyasrinet commented Oct 16, 2017

lcy-seso left a comment

Choose a reason for hiding this comment

lcy-seso Oct 18, 2017

Choose a reason for hiding this comment

kavyasrinet Oct 18, 2017

Choose a reason for hiding this comment

lcy-seso Oct 18, 2017

Choose a reason for hiding this comment

lcy-seso Oct 18, 2017

Choose a reason for hiding this comment

kavyasrinet Oct 18, 2017

Choose a reason for hiding this comment

lcy-seso Oct 18, 2017 • edited Loading

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

lcy-seso Oct 18, 2017 •

edited

Loading