Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-978] Higher Order Gradient Support reciprocal, abs. #15413

Merged

Conversation

kshitij12345
Copy link
Contributor

Description

PR intends to add support for higher order gradient for reciprocal, abs.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA-978 issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • higher order gradient for a reciprocal, abs.
  • unit test for the same.

@kshitij12345 kshitij12345 changed the title Higher Order Gradient Support reciprocal, abs. [MXNET-978] Higher Order Gradient Support reciprocal, abs. Jun 29, 2019
const std::unordered_map<std::string, std::string> args = {{"scalar", "-2.0"}};

auto dydx_mul_dldy = nnvm::NodeEntry{n}; // f'(x) * head_grads
auto dydx = MakeNode("elemwise_div", n->attrs.name + "_dydx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to divide this explicitly here? I think the final _backward_grad_grad_input will also carry the term head_grads in the output, we may not need this extra node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see that you need this node for the first output "_backward_grad_grad"

return nd.reciprocal(x)

def grad_grad_op(x):
return 2/x**3
Copy link
Contributor

@apeforest apeforest Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space between /

@apeforest
Copy link
Contributor

@larroy Please help to review.

@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jul 1, 2019
shape = rand_shape_nd(dim)
array = random_arrays(shape)
check_second_order_unary(array, abs, grad_grad_op)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove extra line

Copy link
Contributor

@larroy larroy Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fixed actually. I guess I removed the lower line so it is not showing up here.

[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
// ograds[0]: dL/dxgrad
// inputs[0]: dL/dy
// inputs[1]: y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this term be x? _backward_abs is using ElemwiseGradUseIn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* fix extra line in tests.
* fix missing space.
* fix incorrect comment.
…to develop/add-higher-order/reciprocal-abs
auto dydx_mul_dldy = nnvm::NodeEntry{n}; // f'(x) * head_grads
auto dydx = MakeNode("elemwise_div", n->attrs.name + "_dydx",
{dydx_mul_dldy, n->inputs[0]}, nullptr, &n);
auto fx = MakeNode("reciprocal", n->attrs.name + "_fx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, Could we get fx from the first backward (node->inputs) if we do ElemwiseGradUseInOut ? I guess we would avoid additional divisions if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use it as that would work as our _backward_reciprocal which is binary will have to support 3 inputs.

https://github.com/apache/incubator-mxnet/blob/8ebaa5c0384ecbef244150859b3e24ea2f02095d/src/operator/elemwise_op_common.h#L213-L227


std::vector<nnvm::NodeEntry> ret;

ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
Copy link
Contributor

@larroy larroy Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment would help here, this one is the output corresponding to dL/dy from the first backward right?

I'm still unclear since the previous PRs on what
dL/dxgrad * dy/dx represents. To me is not obvious after spending more than half an hour thinking.

#15120

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even I am not sure of its significance in literature. But if you look at dL/dx = dL/dy * dy/dx as just c = a * b, then dc/da = b while dc/db=a.
So that is all I am thinking, does dL/dy affect our dL/dx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This term will be useful when you calculate the third order (and above) gradient.


ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
{ograds[0], nnvm::NodeEntry{dydx}}, nullptr, &n));
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_inp",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok.

std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
{ograds[0], nnvm::NodeEntry(dydx)}, nullptr, &n));
ret.emplace_back(MakeNode("zeros_like", n->attrs.name + "_backward_grad_grad_in",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

{nnvm::NodeEntry{n}, n->inputs[0]}, nullptr, &n);

std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the first output, but the result for x_grad_grad looks fine to me.

@sxjscience
Copy link
Member

Sorry that I've been busy this week (for the upcoming conference). I'll take a look next week.

@kshitij12345
Copy link
Contributor Author

Sure. No worries. Good Luck!

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@apeforest apeforest merged commit a3ae309 into apache:master Jul 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants