New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some inconsistency in gradient variable names. #4331
Conversation
The Travis failure seems unrelated to the PR. |
jenkins, test this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. I left two comments.
chainer/functions/activation/tanh.py
Outdated
y_mul_ggx = y * ggx | ||
gx = -2 * gy * y_mul_ggx | ||
ggy = ggx - y * y_mul_ggx | ||
return gx, ggy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you keep the name grad_y
, since it's not a gradient of x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Oh, TanhGrad
is defined as a function of y
, not x
!
g, = grad_outputs | ||
return LinearInterpolateGrad().apply((p, x, y, g)) | ||
gz, = grad_outputs | ||
return LinearInterpolateGrad().apply((p, x, y, gz)) | ||
|
||
|
||
class LinearInterpolateGrad(function_node.FunctionNode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use the same name for an input in forward
and backward
. (g
vs gz
)
Thank you for your review. I've made some changes according to your comments. Please confirm. |
jenkins, test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
According to the source code and the slides about double backprop., it seems a common practice to use
gx
andggx
to represent the gradients wrtx
andgx
.However, some variables have contradictory names to this convention and new users may get confused about it. I hope they will be fixed by this PR.