-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNVM] Fix grads for sum and expand_like #1455
Conversation
Strange, looks like the rounding function works differently on gpu and cpu, and I'm not sure if this is connected to the changes or just a coincidence. Is it possible to rerun the tests? |
@kazum @kevinthesun @srkreddy1238 can you please review this PR? |
nnvm/python/nnvm/testing/gradient.py
Outdated
from ..import symbol | ||
from ..import graph | ||
|
||
def check_gradients_numeric(y, inputs=None, dtype='float32', print_graph=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to implement another gradient test interface? Since there is backward interface to test operator gradient, can we reuse or enhance that interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the helper
function from test_top_level1.py
? Yes, I think it's a good idea to move this helper function into nnvm.testing and improve it with the ability to compute gradients numerically. I'll try to do this in a couple of days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That would be great.
#1460 may be able to fix the ci error |
nnvm/src/top/tensor/reduce.cc
Outdated
return std::vector<NodeEntry>{ | ||
MakeNode("expand_like", n->attrs.name + "_grad", | ||
{ograds[0], n->inputs[0]}, | ||
{{"axis", axis.str()}, | ||
{"exclude", std::to_string(param.exclude)}}) | ||
{"keepdims", std::to_string(param.keepdims)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing keepdims
to expand_like looks a bit strange since expand_like is not a reduce operation actually. Whether the parameter is specified or not, the output dimension does not changed.
An alternative I came up with was squeezing the output of sum beforehand like as follows. This might be less efficient, but looks easier to understand what we want to do.
if (param.keepdims) {
NodeEntry squeezed = MakeNode("squeeze", n->attrs.name + "_grad_sqz",
{ograds[0]},
{{"axis", axis.str()}});
return std::vector<NodeEntry>{
MakeNode("expand_like", n->attrs.name + "_grad",
{squeezed, n->inputs[0]},
{{"axis", axis.str()},
{"exclude", std::to_string(exclude)}})};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, expand_like is not a reduction operation, but it might be considered dual to reduction: when given the same parameters as a reduction operation, it expands exactly the same dimensions as the reduction operation reduces.
Your solution needs adding an exclude parameter to the squeeze operation, otherwise it won't work when exclude=True
. Another approach, probably even less efficient, but reliable, is to use any reduction operator instead of squeeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, expand_like is not a reduction operation, but it might be considered dual to reduction: when given the same parameters as a reduction operation, it expands exactly the same dimensions as the reduction operation reduces.
I'd agree that those are dual, but I think it's not necessary to have the keepdims
parameter in expand operators. I'll send a comment about it later.
Your solution needs adding an exclude parameter to the squeeze operation, otherwise it won't work when exclude=True.
Having the exclude option in squeeze sounds good to me. But if you don't prefer it, I'm fine with broadcasting in expand_like. :)
nnvm/python/nnvm/top/transform.py
Outdated
@@ -15,6 +15,9 @@ | |||
@reg.register_compute("expand_like") | |||
def compute_expand_like(attrs, inputs, _): | |||
"""Compute definition of expand_like""" | |||
if attrs.get_bool("keepdims"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace this line with if len(inputs[0].shape) == len(inputs[1].shape):
. I'd suggest removing keepdims
from the expand_like arguments since it only implies that inputs[0]
was reduced with keepdims=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks elegant and works, I'll go with this one.
Full exception: AttributeError: '<class 'tvm.tensor.PlaceholderOp'>' object has no attribute 'axis'
Merging numerical gradient testing into existing testing functions turned out to be a bit more difficult than I thought, so I'll create a separate PR for it when it's finished. Now this is just a fix for sum, expand_like and their gradients for some special cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added minor comments. Other parts look good to me.
nnvm/python/nnvm/top/transform.py
Outdated
@@ -15,6 +15,10 @@ | |||
@reg.register_compute("expand_like") | |||
def compute_expand_like(attrs, inputs, _): | |||
"""Compute definition of expand_like""" | |||
if len(inputs[0].shape) == len(inputs[1].shape): | |||
# If the shape is not changed then it is just a broadcasting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not shape but dimension?
@@ -391,11 +398,17 @@ def forward(x, y): | |||
|
|||
def backward(head_grads, x, y): | |||
odim = len(out_shape) | |||
|
|||
keepdims = len(x.shape) == len(y.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove a trailing space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
Thanks @sgrechanik-h @kazum @kevinthesun this is now merged |
Add a function for testing gradients by comparing them with numerically computed gradients. It is used to check gradients for the sum operation.