New tutorial of implementing operators in MXNet backend #7828

reminisce · 2017-09-10T04:30:15Z

A new tutorial for implementing operators in MXNet backend, tailored for users interested in knowing about and contributing to MXNet C++ code base.

See this link for a friendly view.

@piiswrong @eric-haibin-lin @anirudh2290 @cjolivier01 @rahul003 @madjam @bhavinthaker

piiswrong · 2017-09-10T06:41:04Z

put this in tutorial under a new section called extend. Also let's move the python custom op guide to tutorial under extend

eric-haibin-lin · 2017-09-10T21:21:31Z

docs/how_to/add_op_in_backend.md

+>>> c = mx.sym.Variable('c', shape=(0, 3))
+>>> d = a * b + b * c
+>>> print d.infer_shape()
+([(2L, 3L), (2L, 3L), (2L, 3L)], [(2L, 3L)], [])


A beginner may not understand the returned tuple of lists means. Add a note explaining that?

eric-haibin-lin · 2017-09-10T21:25:21Z

docs/how_to/add_op_in_backend.md

+and then go through it line by line.
+```cpp
+template<typename xpu>                                                        // 1
+void QuadraticOpForward(const nnvm::NodeAttrs& attrs,                         // 2


Should we first mention the operator interface the developer is going to use, since it's a fixed one and disallows customization?

void (const nnvm::NodeAttrs& attrs, // 2 const OpContext& ctx, // 3 const std::vector<TBlob>& inputs, // 4 const std::vector<OpReqType>& req, // 5 const std::vector<TBlob>& outputs) { // 6

?

eric-haibin-lin · 2017-09-10T21:27:52Z

docs/how_to/add_op_in_backend.md

+        # check backward using finite difference
+        data = mx.sym.Variable('data')
+        quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c)
+        check_numeric_gradient(quad_sym, [data_np])


Should we also mention check_symbolic_forward and check_symbolic_backward since they're usually used to test, too?

rahul003

As someone new to creating operators, I had a few questions while I was reading the tutorial. They may or may not need to be clarified in the tutorial

rahul003 · 2017-09-11T22:24:37Z

docs/how_to/add_op_in_backend.md

+dimension of two shapes, such as (2, 3) and (3, 3), the macro would throw an
+exception with an error message for shape inference.
+5. At the end of the function body, we checked whether the output shape
+is completely known by testing whether its size is greater than 0. If not,


How does this variable having size >0 mean that its shape is completely known? Can't the shape be multi-dimensional, where we know some dimensions? For example, going by the previous python example, can't the shape be something like (2,0)?

size=0 means that at least one dimension is 0, which means the shape is undefined and must be inferred before running forward/backward functions.. I will make the point clear here.

rahul003 · 2017-09-11T22:36:17Z

docs/how_to/add_op_in_backend.md

+the interface `get_with_shape`.
+- Line 13: Get user input parameters from the node attribute. Here the node
+means a placeholder for the operator in the whole computational graph for
+the neural network.


Might be better to move the description of node to the line 1, with the description of attrs, because that is when we first see NodeAttrs

Good point. Will do that.

rahul003 · 2017-09-12T00:50:57Z

docs/how_to/add_op_in_backend.md

+in the computational graph. MXNet would
+add the missing argument with name `quadratic0_data`, where the prefix
+`quadratic0` is the operator name appended with an index and the postfix
+`data` comes from the return value of the user defined `FListInputName` function.


In the case of mx.sym.quadratic(), I understand that the computation graph can identify that it takes a variable, and creates a node in the graph with quadratic0_data. But such a function can't actually run because there's no argument, right? Can that variable be assigned somewhere later?

The argument is assigned values during symbol binding stage. That could be another long tutorial. I will add a few sentences to make it clear here.

rahul003 · 2017-09-12T00:54:07Z

docs/how_to/add_op_in_backend.md

+
+        # check backward using finite difference
+        data = mx.sym.Variable('data')
+        quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c)


Could you also talk about where the operator is defined in the python API? Are all operators defined under mx.nd .
Is there a typo here where it says mx.sym.quadratic ?

Once an op is implemented in backend, it's registered in both mx.nd and mx.sym in frontend when you type import mxnet. I will mention it in the tutorial.

x10000year · 2017-09-12T04:01:47Z

Could you please also explain the use of auxiliary states in the tutorial? (initialization, update, and state sharing if possible)

zheng-da · 2017-09-12T22:23:14Z

docs/how_to/add_op_in_backend.md

+  CHECK_EQ(out_attrs->size(), 1U);
+
+  SHAPE_ASSIGN_CHECK(*out_attrs, 0, in_attrs->at(0));
+  SHAPE_ASSIGN_CHECK(*in_attrs, 0, out_attrs->at(0));


When I ran the code, out_attrs doesn't contain the shape information of the output array before this line. Could you explain in what case we need to use the shape information of the output array to infer the shape of the input array?

Please consider the following example. q is the output of the operator quadratic and is inferred from data2. Then data1 is inferred from q in the backward inference pass. I will add this to the tutorial as well.

import mxnet as mx data1 = mx.sym.var('data1') q = mx.sym.quadratic(data=data1) data2 = mx.sym.var('data2', shape=(2, 3)) s = q + data2 print s.infer_shape()

reminisce · 2017-09-13T06:28:03Z

@x10000year The operators with auxiliary states are currently implemented using the legacy operator framework (inheriting from the class Operator), such as BatchNorm, while this tutorial mostly focuses on the new operator framework called FCompute, because operators depend on their registered FCompute attributes to run forward and backward functions. I will try to add a little bit of explanation of the legacy operator framework.

x10000year · 2017-09-13T11:34:42Z

@reminisce Is the legacy operator framework to be deprecated in the future? This make me a bit worried because in my projects many custom operators have complex states which are arbitrary c++ data structures as class members. Different operators can have different types of states. Those states are computed in forward pass and may be accessed in backward pass. The new nnvm operator framework seems to only support stateless operators.

reminisce · 2017-09-13T15:44:27Z

@x10000year Yes, we plan to deprecate legacy op interface (@piiswrong correct me if I am mistaken). For operators with states, there is a FStatefulCompute attribute for registration. You can take a look at this #6928.

anirudh2290

Thanks a lot for creating this tutorial!

anirudh2290 · 2017-09-14T15:31:11Z

docs/how_to/add_op_in_backend.md

+2. Define type and shape inference functions in `quadratic_op-inl.h`.
+3. Define forward and backward functions in `quadratic_op-inl.h`.
+4. Register the operator using [nnvm](https://github.com/dmlc/nnvm)
+in `quadratic_op.cc` and `quadratic_op.cu` for


Is there a guideline on where to place these operators in src/operator ?

There isn't one. I will add some in this tutorial.

anirudh2290 · 2017-09-14T15:34:58Z

docs/how_to/add_op_in_backend.md

+a backward pass. Note that we used a convenience functor struct `ElemwiseGradUseIn`.
+As you can tell from the name, the registered functor creates the node for gradient computation
+with dependencies on the output gradient node and input node. Similarly, there are
+other three functors defined as `ElemwiseGradUseOut`, `ElemwiseGradUseInOut`,


for nodes created using any of these functors is output gradient node always a dependency in addition to something else ?

anirudh2290 · 2017-09-14T15:36:44Z

docs/how_to/add_op_in_backend.md

+with dependencies on the output gradient node and input node. Similarly, there are
+other three functors defined as `ElemwiseGradUseOut`, `ElemwiseGradUseInOut`,
+and `ElemwiseGradUseNone` for developers' convenience. In order to add
+this attribute, we also need to register a backward operator for `quadratic` with


Maybe mentioning the name of backward operator _backward_quadratic will help here.

Oh, right, I forgot. Will add it.

madjam

Thanks for writing such as detailed tutorial. It is very well written. Few comments.

madjam · 2017-09-14T17:18:45Z

docs/how_to/add_op_in_backend.md

+To implement this, we first create three files: `quadratic_op-inl.h`,
+`quadratic_op.cc`, and `quadratic_op.cu`. Then we are going to
+1. Define the parameter struct
+for registering `a`, `b`, and `c` in `quadratic_op-inl.h`.


A brief note on file naming conventions would be useful.

madjam · 2017-09-14T17:22:04Z

docs/how_to/add_op_in_backend.md

+
+One important thing to note that inference functions should be capable of
+performing **mutual inference**, i.e.
+inferring input shape from output shape, inferring one argument's shape


may be qualify it as input argument.

madjam · 2017-09-14T17:23:02Z

docs/how_to/add_op_in_backend.md

+One important thing to note that inference functions should be capable of
+performing **mutual inference**, i.e.
+inferring input shape from output shape, inferring one argument's shape
+from another argument, etc. This is very useful in building the computational graphs


may be for a different article, but it would very helpful to explain how computation graphs are constructed.

madjam · 2017-09-14T17:48:50Z

docs/how_to/add_op_in_backend.md

+  });                                                                         // 21
+}                                                                             // 22
+```
+- Line 1: `attrs` contains the user input parameters `a`, `b`, and `c`.


I thought Line 1 was template

Good catch. I corrected it and the following lines.

madjam · 2017-09-14T17:55:47Z

docs/how_to/add_op_in_backend.md

+template<int req>
+struct quadratic_forward {
+  template<typename DType>
+  MSHADOW_XINLINE static void Map(int i, DType* out_data, const DType* in_data,


Just like you have done for the above code snippet, it may be helpful to explain what MSHADOW_XINLINE and KERNEL_ASSIGN macros do.

madjam · 2017-09-14T17:58:12Z

docs/how_to/add_op_in_backend.md

+dL/dx = dL/dy * dy/dx = dL/dy * (2*a*x + b).
+```
+The above equation indicates that `dL/dx` depends on the gradient
+of the output tensor and the input tensor.


gradient of the output tensor and value of the input tensor

madjam · 2017-09-14T18:05:21Z

docs/how_to/add_op_in_backend.md

+of the input tensor will not be overwritten by the output.
+- Line 20: Define the input argument name as `data` for the operator.
+- Line 21: Add user input parameters `a`, `b`, and `c` as the attributes of the operator.
+- Line 22: Register an operator named `_backward_quadratic` for backward pass


A note on naming convention would be helpful. Is naming the backward operator _backward_foo a suggestion or a rule?

madjam · 2017-09-14T18:09:03Z

docs/how_to/add_op_in_backend.md

+## Summary
+In this tutorial, we practiced implementing the operator `quadratic` in MXNet backend
+and unit testing the implementation in frontend. More specifically, we added parameter
+struct for user-input parameters, walked through shape and type inference work flow,


reminisce · 2017-09-28T23:06:32Z

Thank everyone for reviewing the tutorial. I have addressed all the comments.
@piiswrong It's ready for being merged.

* Tutorial first commit * Add shape/dtype inference tutorial * Add forward function * Delete trailing spaces * Add fwd/bwd registration * Finish * Fix * Fix * Fix * Fix * Fix * Fix * Fix based on comments * Change index * Fix shape inference example * Address comments * More fix

TaoLv · 2017-11-27T01:55:44Z

docs/how_to/add_op_in_backend.md

+So far, we have acquired an operator working on CPU in frontend.
+In order to register the operator working on GPUs, we just need to add the following
+code to `quadratic_op.cu`. Note that forward and backward functions
+are registered with attribute key `FCompute<gpu>`, rather than `FCompute<cpu>`.


What's the difference between FCompute and FComputeEx? Where can I find any document about that?

FComputeEx takes NDArray instead of TBlob items and is generally called when the tensor is of sparse storage type (kCSRStorage or kRowSparseStorage instead of kDefaultStorage)

Thanks for your kindly reply, @cjolivier01. One more question, I notice that there are ForwardResource and BackwardResource methods to register addtional memory resource for computation in previous document for creating OP here.
So with NNVM interfaces, how can I register these resouces for forward and backward computation? And how can I share some states/memory between forward and backward computation. It seems the two functions are defined seperately in this tutorial.

For registering temp resources using NNVM interface, you can take a look at this example. You need to register resources separately for forward and backward if they both need temp resources.
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/matrix_op.cc#L519

For sharing states between forward and backward, you can take a look at BatchNorm op which has a aux_states argument for both forward and backward function. Please note that this kind of interface is going to be deprecated. The new way for sharing states would use NNVM interface to register shared states.

Many thanks, @reminisce. I will look into the aux_states in BN and the new NNVM interfaces. So what's your suggestion about stateless ops .vs. stateful ops for MXNet?

I suggest focusing on nnvm interface for both stateful and stateless ops from now on as the legacy interface is going to become deprecated. You can follow this PR for more details on using nnvm interface for stateful ops.
#8302

I see. Thanks for your help. :)

reminisce mentioned this pull request Sep 10, 2017

Need a clear and thorough tutorial for writing custom operators in c++ with nnvm #7652

Closed

eric-haibin-lin reviewed Sep 10, 2017

View reviewed changes

reminisce force-pushed the add_backend_op_tutorial branch from 7e36f8e to f3b5544 Compare September 11, 2017 03:51

reminisce requested review from mli and piiswrong as code owners September 11, 2017 03:51

rahul003 reviewed Sep 12, 2017

View reviewed changes

zheng-da reviewed Sep 12, 2017

View reviewed changes

anirudh2290 reviewed Sep 14, 2017

View reviewed changes

madjam reviewed Sep 14, 2017

View reviewed changes

reminisce added 16 commits September 28, 2017 15:47

Tutorial first commit

1a935bb

Add shape/dtype inference tutorial

383dbf1

Add forward function

16668c8

Delete trailing spaces

b682582

Add fwd/bwd registration

4002b23

Finish

4cef646

Fix

3779bd0

Fix

9a5705f

Fix

096fc87

Fix

2bf5332

Fix

aa9cb4c

Fix

f2bb716

Fix based on comments

c99f839

Change index

0c3fa22

Fix shape inference example

07e213a

Address comments

9d5535f

reminisce force-pushed the add_backend_op_tutorial branch from 7ab7d45 to 9d5535f Compare September 28, 2017 22:48

More fix

900a699

piiswrong merged commit d21de24 into apache:master Sep 28, 2017

TaoLv reviewed Nov 27, 2017

View reviewed changes

New tutorial of implementing operators in MXNet backend #7828

New tutorial of implementing operators in MXNet backend #7828

Conversation

reminisce commented Sep 10, 2017 • edited

piiswrong commented Sep 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul003 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

x10000year commented Sep 12, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reminisce commented Sep 13, 2017

x10000year commented Sep 13, 2017

reminisce commented Sep 13, 2017

anirudh2290 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madjam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reminisce commented Sep 28, 2017

Choose a reason for hiding this comment

cjolivier01 Nov 27, 2017 • edited

Choose a reason for hiding this comment

TaoLv Nov 27, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reminisce commented Sep 10, 2017 •

edited

piiswrong commented Sep 10, 2017 •

edited

rahul003 left a comment •

edited

x10000year commented Sep 12, 2017 •

edited

cjolivier01 Nov 27, 2017 •

edited

TaoLv Nov 27, 2017 •

edited