Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

New tutorial of implementing operators in MXNet backend #7828

Merged
merged 17 commits into from
Sep 28, 2017

Conversation

reminisce
Copy link
Contributor

@reminisce reminisce commented Sep 10, 2017

A new tutorial for implementing operators in MXNet backend, tailored for users interested in knowing about and contributing to MXNet C++ code base.

See this link for a friendly view.

@piiswrong @eric-haibin-lin @anirudh2290 @cjolivier01 @rahul003 @madjam @bhavinthaker

@piiswrong
Copy link
Contributor

piiswrong commented Sep 10, 2017

put this in tutorial under a new section called extend. Also let's move the python custom op guide to tutorial under extend

>>> c = mx.sym.Variable('c', shape=(0, 3))
>>> d = a * b + b * c
>>> print d.infer_shape()
([(2L, 3L), (2L, 3L), (2L, 3L)], [(2L, 3L)], [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A beginner may not understand the returned tuple of lists means. Add a note explaining that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

and then go through it line by line.
```cpp
template<typename xpu> // 1
void QuadraticOpForward(const nnvm::NodeAttrs& attrs, // 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we first mention the operator interface the developer is going to use, since it's a fixed one and disallows customization?

void (const nnvm::NodeAttrs& attrs,                       // 2
                         const OpContext& ctx,                               // 3
                         const std::vector<TBlob>& inputs,                   // 4
                         const std::vector<OpReqType>& req,                  // 5
                         const std::vector<TBlob>& outputs) {                // 6

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# check backward using finite difference
data = mx.sym.Variable('data')
quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c)
check_numeric_gradient(quad_sym, [data_np])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also mention check_symbolic_forward and check_symbolic_backward since they're usually used to test, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@rahul003 rahul003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone new to creating operators, I had a few questions while I was reading the tutorial. They may or may not need to be clarified in the tutorial

dimension of two shapes, such as (2, 3) and (3, 3), the macro would throw an
exception with an error message for shape inference.
5. At the end of the function body, we checked whether the output shape
is completely known by testing whether its size is greater than 0. If not,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this variable having size >0 mean that its shape is completely known? Can't the shape be multi-dimensional, where we know some dimensions? For example, going by the previous python example, can't the shape be something like (2,0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size=0 means that at least one dimension is 0, which means the shape is undefined and must be inferred before running forward/backward functions.. I will make the point clear here.

the interface `get_with_shape`.
- Line 13: Get user input parameters from the node attribute. Here the node
means a placeholder for the operator in the whole computational graph for
the neural network.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to move the description of node to the line 1, with the description of attrs, because that is when we first see NodeAttrs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will do that.

in the computational graph. MXNet would
add the missing argument with name `quadratic0_data`, where the prefix
`quadratic0` is the operator name appended with an index and the postfix
`data` comes from the return value of the user defined `FListInputName` function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of mx.sym.quadratic(), I understand that the computation graph can identify that it takes a variable, and creates a node in the graph with quadratic0_data. But such a function can't actually run because there's no argument, right? Can that variable be assigned somewhere later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument is assigned values during symbol binding stage. That could be another long tutorial. I will add a few sentences to make it clear here.


# check backward using finite difference
data = mx.sym.Variable('data')
quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also talk about where the operator is defined in the python API? Are all operators defined under mx.nd .
Is there a typo here where it says mx.sym.quadratic ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once an op is implemented in backend, it's registered in both mx.nd and mx.sym in frontend when you type import mxnet. I will mention it in the tutorial.

@x10000year
Copy link

x10000year commented Sep 12, 2017

Could you please also explain the use of auxiliary states in the tutorial? (initialization, update, and state sharing if possible)

CHECK_EQ(out_attrs->size(), 1U);

SHAPE_ASSIGN_CHECK(*out_attrs, 0, in_attrs->at(0));
SHAPE_ASSIGN_CHECK(*in_attrs, 0, out_attrs->at(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran the code, out_attrs doesn't contain the shape information of the output array before this line. Could you explain in what case we need to use the shape information of the output array to infer the shape of the input array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider the following example. q is the output of the operator quadratic and is inferred from data2. Then data1 is inferred from q in the backward inference pass. I will add this to the tutorial as well.

import mxnet as mx

data1 = mx.sym.var('data1')
q = mx.sym.quadratic(data=data1)

data2 = mx.sym.var('data2', shape=(2, 3))
s = q + data2
print s.infer_shape()

@reminisce
Copy link
Contributor Author

@x10000year The operators with auxiliary states are currently implemented using the legacy operator framework (inheriting from the class Operator), such as BatchNorm, while this tutorial mostly focuses on the new operator framework called FCompute, because operators depend on their registered FCompute attributes to run forward and backward functions. I will try to add a little bit of explanation of the legacy operator framework.

@x10000year
Copy link

@reminisce Is the legacy operator framework to be deprecated in the future? This make me a bit worried because in my projects many custom operators have complex states which are arbitrary c++ data structures as class members. Different operators can have different types of states. Those states are computed in forward pass and may be accessed in backward pass. The new nnvm operator framework seems to only support stateless operators.

@reminisce
Copy link
Contributor Author

@x10000year Yes, we plan to deprecate legacy op interface (@piiswrong correct me if I am mistaken). For operators with states, there is a FStatefulCompute attribute for registration. You can take a look at this #6928.

Copy link
Member

@anirudh2290 anirudh2290 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for creating this tutorial!

2. Define type and shape inference functions in `quadratic_op-inl.h`.
3. Define forward and backward functions in `quadratic_op-inl.h`.
4. Register the operator using [nnvm](https://github.com/dmlc/nnvm)
in `quadratic_op.cc` and `quadratic_op.cu` for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a guideline on where to place these operators in src/operator ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't one. I will add some in this tutorial.

a backward pass. Note that we used a convenience functor struct `ElemwiseGradUseIn`.
As you can tell from the name, the registered functor creates the node for gradient computation
with dependencies on the output gradient node and input node. Similarly, there are
other three functors defined as `ElemwiseGradUseOut`, `ElemwiseGradUseInOut`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for nodes created using any of these functors is output gradient node always a dependency in addition to something else ?

with dependencies on the output gradient node and input node. Similarly, there are
other three functors defined as `ElemwiseGradUseOut`, `ElemwiseGradUseInOut`,
and `ElemwiseGradUseNone` for developers' convenience. In order to add
this attribute, we also need to register a backward operator for `quadratic` with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mentioning the name of backward operator _backward_quadratic will help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, I forgot. Will add it.

Copy link
Contributor

@madjam madjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing such as detailed tutorial. It is very well written. Few comments.

To implement this, we first create three files: `quadratic_op-inl.h`,
`quadratic_op.cc`, and `quadratic_op.cu`. Then we are going to
1. Define the parameter struct
for registering `a`, `b`, and `c` in `quadratic_op-inl.h`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief note on file naming conventions would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


One important thing to note that inference functions should be capable of
performing **mutual inference**, i.e.
inferring input shape from output shape, inferring one argument's shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be qualify it as input argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

One important thing to note that inference functions should be capable of
performing **mutual inference**, i.e.
inferring input shape from output shape, inferring one argument's shape
from another argument, etc. This is very useful in building the computational graphs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be for a different article, but it would very helpful to explain how computation graphs are constructed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}); // 21
} // 22
```
- Line 1: `attrs` contains the user input parameters `a`, `b`, and `c`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Line 1 was template

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I corrected it and the following lines.

template<int req>
struct quadratic_forward {
template<typename DType>
MSHADOW_XINLINE static void Map(int i, DType* out_data, const DType* in_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like you have done for the above code snippet, it may be helpful to explain what MSHADOW_XINLINE and KERNEL_ASSIGN macros do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

dL/dx = dL/dy * dy/dx = dL/dy * (2*a*x + b).
```
The above equation indicates that `dL/dx` depends on the gradient
of the output tensor and the input tensor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gradient of the output tensor and value of the input tensor

of the input tensor will not be overwritten by the output.
- Line 20: Define the input argument name as `data` for the operator.
- Line 21: Add user input parameters `a`, `b`, and `c` as the attributes of the operator.
- Line 22: Register an operator named `_backward_quadratic` for backward pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note on naming convention would be helpful. Is naming the backward operator _backward_foo a suggestion or a rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

## Summary
In this tutorial, we practiced implementing the operator `quadratic` in MXNet backend
and unit testing the implementation in frontend. More specifically, we added parameter
struct for user-input parameters, walked through shape and type inference work flow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow

@reminisce
Copy link
Contributor Author

Thank everyone for reviewing the tutorial. I have addressed all the comments.
@piiswrong It's ready for being merged.

@piiswrong piiswrong merged commit d21de24 into apache:master Sep 28, 2017
crazy-cat pushed a commit to crazy-cat/incubator-mxnet that referenced this pull request Oct 26, 2017
* Tutorial first commit

* Add shape/dtype inference tutorial

* Add forward function

* Delete trailing spaces

* Add fwd/bwd registration

* Finish

* Fix

* Fix

* Fix

* Fix

* Fix

* Fix

* Fix based on comments

* Change index

* Fix shape inference example

* Address comments

* More fix
So far, we have acquired an operator working on CPU in frontend.
In order to register the operator working on GPUs, we just need to add the following
code to `quadratic_op.cu`. Note that forward and backward functions
are registered with attribute key `FCompute<gpu>`, rather than `FCompute<cpu>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between FCompute and FComputeEx? Where can I find any document about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FComputeEx takes NDArray instead of TBlob items and is generally called when the tensor is of sparse storage type (kCSRStorage or kRowSparseStorage instead of kDefaultStorage)

Copy link
Member

@TaoLv TaoLv Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your kindly reply, @cjolivier01. One more question, I notice that there are ForwardResource and BackwardResource methods to register addtional memory resource for computation in previous document for creating OP here.
So with NNVM interfaces, how can I register these resouces for forward and backward computation? And how can I share some states/memory between forward and backward computation. It seems the two functions are defined seperately in this tutorial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For registering temp resources using NNVM interface, you can take a look at this example. You need to register resources separately for forward and backward if they both need temp resources.
    https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/matrix_op.cc#L519
  2. For sharing states between forward and backward, you can take a look at BatchNorm op which has a aux_states argument for both forward and backward function. Please note that this kind of interface is going to be deprecated. The new way for sharing states would use NNVM interface to register shared states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, @reminisce. I will look into the aux_states in BN and the new NNVM interfaces. So what's your suggestion about stateless ops .vs. stateful ops for MXNet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest focusing on nnvm interface for both stateful and stateless ops from now on as the legacy interface is going to become deprecated. You can follow this PR for more details on using nnvm interface for stateful ops.
#8302

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for your help. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants