Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Proximal Gradient Descent #4848

Merged

Conversation

kavyasrinet
Copy link

Adding proximal gradient descent and test:
prox_param = param - learning_rate * grad
param = sign(prox_param) / (1 + learning_rate * l2) *
max { |prox_param| - learning_rate * l1 , 0 }

lcy-seso
lcy-seso previously approved these changes Oct 18, 2017
Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param",
"(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to add periods at the end of the comment. Below are the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@@ -0,0 +1,20 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation of the license has the problem. The correct indentation is like that in proximal_gd_op.cc.


param_out->mutable_data<T>(ctx.GetPlace());

auto grad = ctx.Input<Tensor>("Grad");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto*

Actually using auto or auto* is not consistent in our codes. Some operators implemented very early use auto and some newly implemented operators use auto* when the return value is a pointer.

I prefer to use auto*, or at least we can keep consistent inside one operator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I used auto since the variable is being used in two places:

  1. auto g = EigenVector<T>::Flatten(*grad);
  2. Eigen::DSizes<int, 1> grad_dsize(grad->numel());
    And I could reuse the computation.

auto grad = ctx.Input<Tensor>("Grad");

float l1 = ctx.Attr<float>("l1");
float l2 = ctx.Attr<float>("l2");
Copy link
Contributor

@lcy-seso lcy-seso Oct 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it is better to use

auto l1 = static_cast<T>(ctx.Attr<float>("l1"));
auto l2 = static_cast<T>(ctx.Attr<float>("l2"));

or set l1 and l2 as a template parameter as in dropout operator.

because l1 and l2 will be used in the calculation. If we use double in future, the floats will be upconverted to doubles. But I wonder if we use half-precision or other lower precision, the final results will be as we expected? so, maybe it is safer to cast l1 and l2 to T.

https://stackoverflow.com/questions/4239770/what-are-the-rules-governing-c-single-and-double-precision-mixed-calculations

Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@kavyasrinet kavyasrinet merged commit c10b8e8 into PaddlePaddle:develop Oct 18, 2017
@kavyasrinet kavyasrinet deleted the proximal_gradient_descent branch November 1, 2017 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants