-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Proximal Gradient Descent #4848
Adding Proximal Gradient Descent #4848
Conversation
… proximal_gradient_descent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
paddle/operators/proximal_gd_op.cc
Outdated
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddInput("Param", | ||
"(Tensor, default Tensor<float>) " | ||
"Input parameter value that has to be updated"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to add periods at the end of the comment. Below are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
@@ -0,0 +1,20 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation of the license has the problem. The correct indentation is like that in proximal_gd_op.cc
.
|
||
param_out->mutable_data<T>(ctx.GetPlace()); | ||
|
||
auto grad = ctx.Input<Tensor>("Grad"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto*
Actually using auto
or auto*
is not consistent in our codes. Some operators implemented very early use auto
and some newly implemented operators use auto*
when the return value is a pointer.
I prefer to use auto*
, or at least we can keep consistent inside one operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I used auto
since the variable is being used in two places:
auto g = EigenVector<T>::Flatten(*grad);
Eigen::DSizes<int, 1> grad_dsize(grad->numel());
And I could reuse the computation.
paddle/operators/proximal_gd_op.h
Outdated
auto grad = ctx.Input<Tensor>("Grad"); | ||
|
||
float l1 = ctx.Attr<float>("l1"); | ||
float l2 = ctx.Attr<float>("l2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here it is better to use
auto l1 = static_cast<T>(ctx.Attr<float>("l1"));
auto l2 = static_cast<T>(ctx.Attr<float>("l2"));
or set l1 and l2 as a template parameter as in dropout operator.
because l1
and l2
will be used in the calculation. If we use double in future, the floats will be upconverted to doubles. But I wonder if we use half-precision or other lower precision, the final results will be as we expected? so, maybe it is safer to cast l1 and l2 to T.
… proximal_gradient_descent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you.
Adding proximal gradient descent and test:
prox_param = param - learning_rate * grad
param = sign(prox_param) / (1 + learning_rate * l2) *
max { |prox_param| - learning_rate * l1 , 0 }