Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-978] Higher Order Gradient Support reciprocal, abs. #15413

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
54 changes: 52 additions & 2 deletions src/operator/tensor/elemwise_unary_op_basic.cc
Expand Up @@ -717,7 +717,38 @@ Example::

MXNET_OPERATOR_REGISTER_BINARY(_backward_reciprocal)
.set_attr<FCompute>("FCompute<cpu>",
ElemwiseBinaryOp::Compute<cpu, unary_bwd<mshadow_op::reciprocal_grad> >);
ElemwiseBinaryOp::Compute<cpu, unary_bwd<mshadow_op::reciprocal_grad> >)
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
// ograds[0]: dL/dxgrad
// inputs[0]: dL/dy
// inputs[1]: x
// f(x) = y = 1/x
// f'(x) = -1/x^2
// f''(x) = 2/x^3 = -2 * (f'(x) * f(x))

const std::unordered_map<std::string, std::string> args = {{"scalar", "-2.0"}};

auto dydx_mul_dldy = nnvm::NodeEntry{n}; // f'(x) * head_grads
auto dydx = MakeNode("elemwise_div", n->attrs.name + "_dydx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to divide this explicitly here? I think the final _backward_grad_grad_input will also carry the term head_grads in the output, we may not need this extra node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see that you need this node for the first output "_backward_grad_grad"

{dydx_mul_dldy, n->inputs[0]}, nullptr, &n);
auto fx = MakeNode("reciprocal", n->attrs.name + "_fx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, Could we get fx from the first backward (node->inputs) if we do ElemwiseGradUseInOut ? I guess we would avoid additional divisions if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use it as that would work as our _backward_reciprocal which is binary will have to support 3 inputs.

https://github.com/apache/incubator-mxnet/blob/8ebaa5c0384ecbef244150859b3e24ea2f02095d/src/operator/elemwise_op_common.h#L213-L227

{n->inputs[1]}, nullptr, &n);

auto d2ydx2_mid = MakeNode("elemwise_mul", n->attrs.name + "_d2ydx2_mid",
{dydx_mul_dldy, nnvm::NodeEntry{fx}}, nullptr, &n);

auto d2ydx2 = MakeNode("_mul_scalar", n->attrs.name + "_d2ydx2",
{nnvm::NodeEntry{d2ydx2_mid}}, &args, &n);

std::vector<nnvm::NodeEntry> ret;

ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
Copy link
Contributor

@larroy larroy Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment would help here, this one is the output corresponding to dL/dy from the first backward right?

I'm still unclear since the previous PRs on what
dL/dxgrad * dy/dx represents. To me is not obvious after spending more than half an hour thinking.

#15120

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even I am not sure of its significance in literature. But if you look at dL/dx = dL/dy * dy/dx as just c = a * b, then dc/da = b while dc/db=a.
So that is all I am thinking, does dL/dy affect our dL/dx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This term will be useful when you calculate the third order (and above) gradient.

{ograds[0], nnvm::NodeEntry{dydx}}, nullptr, &n));
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_inp",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok.

{ograds[0], nnvm::NodeEntry{d2ydx2}}, nullptr, &n));
return ret;
});

// abs
MXNET_OPERATOR_REGISTER_UNARY_WITH_RSP_CSR(abs, cpu, mshadow_op::abs)
Expand All @@ -736,7 +767,26 @@ The storage type of ``abs`` output depends upon the input storage type:
)code" ADD_FILELINE)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_abs"});

MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_abs, unary_bwd<mshadow_op::sign>);
MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_abs, unary_bwd<mshadow_op::sign>)
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
// ograds[0]: dL/dxgrad
// inputs[0]: dL/dy
// inputs[1]: x
// f(x) -> abs(x)
// f'(x) = 1 if x > 0 else -1
// f''(x) = 0
auto dydx = MakeNode("elemwise_div", n->attrs.name + "_dydx",
{nnvm::NodeEntry{n}, n->inputs[0]}, nullptr, &n);

std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above.

{ograds[0], nnvm::NodeEntry(dydx)}, nullptr, &n));
ret.emplace_back(MakeNode("zeros_like", n->attrs.name + "_backward_grad_grad_in",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

{n->inputs[1]}, nullptr, &n));
return ret;
});


// sign
MXNET_OPERATOR_REGISTER_UNARY_WITH_RSP_CSR(sign, cpu, mshadow_op::sign)
Expand Down
27 changes: 27 additions & 0 deletions tests/python/unittest/test_higher_order_grad.py
Expand Up @@ -107,6 +107,33 @@ def grad_grad_op(x):


@with_seed()
def test_reciprocal():
def reciprocal(x):
return nd.reciprocal(x)

def grad_grad_op(x):
return 2 / x**3

for dim in range(1, 5):
shape = rand_shape_nd(dim)
array = random_arrays(shape)
check_second_order_unary(array, reciprocal, grad_grad_op)


@with_seed()
def test_abs():
def abs(x):
return nd.abs(x)

def grad_grad_op(x):
return nd.zeros_like(x)

for dim in range(1, 5):
shape = rand_shape_nd(dim)
array = random_arrays(shape)
check_second_order_unary(array, abs, grad_grad_op)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove extra line

Copy link
Contributor

@larroy larroy Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fixed actually. I guess I removed the lower line so it is not showing up here.


def test_sigmoid():
def sigmoid(x):
return nd.sigmoid(x)
Expand Down