MXNet Extensions enhancements #17885

samskalicky · 2020-03-22T18:45:52Z

Description

Enhancements to MXNet Extensions (subgraph property, custom ops, custom graph passes). Addresses a few points from #17236. More description on the way!

Features

Enhancement to supportedOps to allow for graph coloring (specifying which subgraph an op should be partitioned into)
Enhancement for partitioners to use Selector class instead of supportedOps
Added a new example for custom subgraph op using selector class
Clean up lib_api.h and library loading in c_api.cc
Added option to quiet library loading printing ops, partitioners, passes found in library
Add support for custom graph passes, and docs/README
Add a new example for custom pass libraries with 2 example passes
Added support for allocating new args/aux within the pass, or replacing existing args/aux
Add new custom pass lib to Makefile/CMakeLists.txt
Add building sparse custom op libraries to Makefile/CMakeLists.txt
Add compiling custom libs with lowest supported C++11, building with C++17 for testing with MXNet

SupportedOps Enhancements for graph coloring

In the custom partitioner API, custom library writers could implement the SupportedOps API to specify which ops to include in a subgraph by setting True/False for each node ID in the graph. This PR adds support for writers to identify which specific subgraph to include a node into by specifying a subgraph ID integer for each node, or -1 to indicate that a node can go into any subgraph.

Selector class for subgraph creation

In the custom partitioner API, custom library writers could implement the SupportedOps API to specify which ops to include in a subgraph. But some custom partitioners may want more control. This PR, adds support for the internal MXNet subgraph property SubgraphSelector class to be implemented in a custom library. Custom library writers can choose to implement SupportedOps API or CustomOpSelector class for their partitioner.

Custom Graph Passes

This PR adds the ability to register a custom graph pass in a library. Working backwards from the custom library writer, the Pass API is implemented with the following interface:

MXReturnValue myPass(const std::string& in_graph, const std::string** out_graph,
                     const std::unordered_map<std::string, std::string>& options,
                     const std::unordered_map<std::string, MXTensor>& args,
                     const std::unordered_map<std::string, MXTensor>& aux,
                     const PassResource& res)

The model graph is passed as a JSON string, and users provide a new JSON string for the modified graph. Options specified by the MXNet user at the Python level are passed in the options map. If the MXNet user provided the args/aux at the Python level they are provided to the pass in the args/aux arguments. The PassResource class res exposes two functions: alloc_arg and alloc_aux that allow the custom library writer to allocate NDArrays for new or replacement args/aux within the custom pass. Custom passes are registered in the library with this syntax:

REGISTER_PASS(myPassName)
.setBody(myPassFunc);

As custom passes are found in the library during loading, a lambda function is registered:

auto pass_lambda = [=] (nnvm::Graph&& g) {
    ...
    CHECK(callGraphPass(...));
    ...
    return out_graph;
};

nnvm::PassFunctionReg& pass = dmlc::Registry<nnvm::PassFunctionReg>::Get()->__REGISTER__(myPassName);
pass.set_body(pass_lambda);
pass.set_change_graph(true);

From the MXNet front-end, users call custom passes using the same optimize_for API. The backend name argument to the optimize_for API is attempted to be looked up in registered subgraph backends first, and if missing is attempted to be looked up in registered graph passes:

if (mxnet::op::SubgraphBackendRegistry::Get()->backend_map_.count(backend_name) > 0) {
    // use subgraph backend
    ...
} else if (dmlc::Registry<nnvm::PassFunctionReg>::Find(backend_name) != nullptr) {
    // use graph pass
    ...
    g = ApplyPass(std::move(g), backend_name);
    ...
}

Compilation

The lib_api.h was designed to be as generic as possible to allow users to compile their custom library with any version of C++ (C++11 or higher) and GLIBC to fit the needs of their application. This does not have to correspond to the version of C++ or GLIBC used to compile MXNet. To test this, we will compile with C++11 to check for compiler errors, but will also compile and test with C++17 to match was MXNet is using.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

samskalicky · 2020-04-07T21:28:51Z

@mxnet-bot run ci [sanity]

mxnet-bot · 2020-04-07T21:28:57Z

Jenkins CI successfully triggered : [sanity]

samskalicky · 2020-04-07T23:27:57Z

@mxnet-bot run ci [sanity]

mxnet-bot · 2020-04-07T23:28:43Z

Jenkins CI successfully triggered : [sanity]

samskalicky · 2020-04-08T00:59:45Z

@mxnet-bot run ci [sanity]

mxnet-bot · 2020-04-08T00:59:50Z

Jenkins CI successfully triggered : [sanity]

include/mxnet/lib_api.h

rondogency · 2020-04-20T19:58:16Z

CMakeLists.txt

 if(USE_CUDA)
  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
  target_include_directories(customop_gpu_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 endif()
-if(MSVC)
+if(UNIX)


those things can be deleted

I mean you don't need to add -shared even for customop_gpu_lib, just change the lines inside MSVC block

rondogency · 2020-04-20T21:54:52Z

example/extensions/lib_custom_op/gemm_lib.cc

-                       std::vector<MXTensor> outputs,
-                       OpResource res) {
+MXReturnValue backward(const std::unordered_map<std::string, std::string>& attrs,
+                       std::vector<MXTensor>* inputs,


don't forget to update this change to lib_custom_op README

rondogency · 2020-04-21T01:04:51Z

also please lib_api update version to 10

samskalicky · 2020-04-21T01:23:17Z

also please lib_api update version to 10

changed to version 7

ptrendx

LGTM

rondogency

LGTM. Thanks for the hard work for writing custom graph pass support and making MXNet extension well organized! I have left a few minor suggestions. If you feel it is not worth a CI run, feel free to ignore them.

rondogency · 2020-04-21T05:42:24Z

CMakeLists.txt

 if(USE_CUDA)
  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
  target_include_directories(customop_gpu_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 endif()
-if(MSVC)
+if(UNIX)


I mean you don't need to add -shared even for customop_gpu_lib, just change the lines inside MSVC block

rondogency · 2020-04-21T05:43:40Z

example/extensions/lib_custom_op/README.md

-Custom Operator support was merged (#15921, #17270) and is not available in versions of MXNet prior to v1.7.0.
-To access the feature now, please install MXNet by compiling from source using master or using the previously mentioned commits, downloading one of the nightly builds, or from a release of MXNet 1.7.0+.
-For running the following example, it doesn’t matter if it is a CUDA, MKLDNN or plain MXNet build; the custom operator doesn’t interact with the execution of other native MXNet operators.
+To run the following example, the build type of MXNet doesn’t matter since the custom operator doesn’t interact with the execution of other native MXNet operators.


it is better to add prerequisite here or run an example like "This requires GCC > 5 or CUDA > 9 to run the examples"

rondogency · 2020-04-21T05:47:19Z

include/mxnet/lib_api.h

- * register custom ops for library authors
+ * register custom ops, partitioner, and passes
+ * for library authors
+ * See example/extension/lib_custom_op/README.md


maybe we can rephrase it to "APIs to write extension library, see ... for registering custom operators, ... for custom partitioners, ... for custom graph passes"

rondogency · 2020-04-21T05:50:19Z

include/mxnet/lib_api.h

@@ -45,7 +49,7 @@
 #endif

 /* Make sure to update the version number everytime you make changes */
-#define MX_LIBRARY_VERSION 6
+#define MX_LIBRARY_VERSION 7


I still feel it is better to make it 10

also it is better to add to c_api.cc line 339 version checking message something like "please update lib_api.h to match the version supported by MXNet backend"

ptrendx · 2020-04-21T16:50:32Z

Synced offline with @samskalicky and @rondogency - they are ok with doing the last changes proposed by @rondogency in another PR targeting master specifically, so this PR is ready to merge.

eric-haibin-lin · 2020-04-21T16:38:33Z

example/extensions/lib_custom_op/README.md

@@ -200,6 +208,9 @@ If the number of input and output tensors are fixed, you can use hard-coded numb
 * **inferType**: This function takes three arguments. The 1st argument is the attributes (same as above). The 2nd argument is the a list of input data types corresponding to the input tensors. The 3rd argument is the placeholder for output tensor data types you need to assign.
 For example, if this operator has one input and one output, and data type doesn’t change, then you can do `outtypes[0] = intypes[0]` to populate the data type.

+* **inferSType**: This function takes three arguments. The 1st argument is the attributes (same as above). The 2nd argument is the a list of input storage types corresponding to the input tensors. The 3rd argument is the placeholder for output storage types you need to assign.
+For example, if this operator has one input and one output, and data type doesn’t change, then you can do `outtypes[0] = intypes[0]` to populate the data type.


data type doesn’t change -> data storage type doesn’t change

a list of input storage types corresponding to the input tensors -> a list of input storage types corresponding to the input tensors (dense, row_sparse, or CSR). For details, see https://cwiki.apache.org/confluence/display/MXNET/A+Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend

It would be good to include the link above in case people wonder why/if inferSType is needed.

We need a whole overview for Sparse. Maybe you can help us add another section to the readme about that

eric-haibin-lin · 2020-04-21T16:50:38Z

example/extensions/lib_pass/README.md

+
+```python
+import mxnet as mx
+mx.library.load(‘libmypass_lib.so’)


‘libmypass_lib.so’ -> 'libmypass_lib.so'

what am i missing? looks the same to me...

leezu · 2020-04-21T17:16:10Z

@ptrendx let's not merge commits named "[WIP]" to master. You can edit the name prior to merge

ptrendx · 2020-04-21T17:30:35Z

Oops, you are right, sorry, I missed that.

samskalicky · 2020-04-21T17:32:32Z

@ptrendx let's not merge commits named "[WIP]" to master. You can edit the name prior to merge

We didnt want to rerun the whole CI. did we fix the problem where renaming the PR reruns CI @leezu ?

* add debug prints to debug error in CI * add debug prints to debug error in CI * remove prints * initial commit * enabled calling create for selector * connected selector to call external class * added code to remove temp graph attrs * fixed build issues * changed shape inference to use different attr names * fixed selector class * cleaned up APIs * fixed sanity * updated build for extensions * sanity fix * refactored MXLoadLib into separate functions * undo rebase * finished merge * enabled verbose in library loading * fixed example * added passing args/aux down to graph pass * added creating new args/aux for graph passes * fixed return args/aux * fixed sanity * whitespace * fixed lint * updated perl API, README, added pass_lib to cmake build flow * fixed mistake with relu example lib * fixed perl syntax * addressed comments * addressed more comments * fixed compile issues Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-148.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-217.us-west-2.compute.internal>

leezu · 2020-04-21T18:30:19Z

< We didnt want to rerun the whole CI. did we fix the problem where renaming the PR reruns CI @leezu ?

@samskalicky when merging the commit, Github allows to edit the commit message. So the commit message can be different from the PR title. Indeed we can't edit the PR title without retriggering the CI as of now (cc @ChaiBapchya)

* add debug prints to debug error in CI * add debug prints to debug error in CI * remove prints * initial commit * enabled calling create for selector * connected selector to call external class * added code to remove temp graph attrs * fixed build issues * changed shape inference to use different attr names * fixed selector class * cleaned up APIs * fixed sanity * updated build for extensions * sanity fix * refactored MXLoadLib into separate functions * undo rebase * finished merge * enabled verbose in library loading * fixed example * added passing args/aux down to graph pass * added creating new args/aux for graph passes * fixed return args/aux * fixed sanity * whitespace * fixed lint * updated perl API, README, added pass_lib to cmake build flow * fixed mistake with relu example lib * fixed perl syntax * addressed comments * addressed more comments * fixed compile issues Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-148.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-217.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-148.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-217.us-west-2.compute.internal>

* add debug prints to debug error in CI * add debug prints to debug error in CI * remove prints * initial commit * enabled calling create for selector * connected selector to call external class * added code to remove temp graph attrs * fixed build issues * changed shape inference to use different attr names * fixed selector class * cleaned up APIs * fixed sanity * updated build for extensions * sanity fix * refactored MXLoadLib into separate functions * undo rebase * finished merge * enabled verbose in library loading * fixed example * added passing args/aux down to graph pass * added creating new args/aux for graph passes * fixed return args/aux * fixed sanity * whitespace * fixed lint * updated perl API, README, added pass_lib to cmake build flow * fixed mistake with relu example lib * fixed perl syntax * addressed comments * addressed more comments * fixed compile issues Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-148.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-217.us-west-2.compute.internal>

samskalicky requested review from aaronmarkham, anirudh2290, eric-haibin-lin and szha as code owners March 22, 2020 18:45

eric-haibin-lin self-assigned this Mar 23, 2020

samskalicky mentioned this pull request Mar 25, 2020

Dynamic subgraph compile support #17623

Merged

5 tasks

samskalicky requested a review from leezu as a code owner April 7, 2020 23:27

samskalicky and others added 15 commits April 8, 2020 03:40

add debug prints to debug error in CI

60942d0

remove prints

5ff9fee

add debug prints to debug error in CI

0809dc2

initial commit

bc4855f

enabled calling create for selector

b16a257

connected selector to call external class

8a32e69

added code to remove temp graph attrs

4c5cf56

fixed build issues

176b38d

changed shape inference to use different attr names

9fb1368

fixed selector class

83296ae

cleaned up APIs

ae9fd4b

fixed sanity

4bcb6bf

updated build for extensions

6950ac2

sanity fix

6fff166

refactored MXLoadLib into separate functions

32e9458

samskalicky requested a review from marcoabreu as a code owner April 8, 2020 05:44

undo rebase

4658e79

ptrendx reviewed Apr 20, 2020

View reviewed changes

include/mxnet/lib_api.h Outdated Show resolved Hide resolved

rondogency reviewed Apr 21, 2020

View reviewed changes

samskalicky added 2 commits April 21, 2020 01:12

addressed comments

b4633bb

addressed more comments

c611f2b

fixed compile issues

3392610

ptrendx approved these changes Apr 21, 2020

View reviewed changes

samskalicky mentioned this pull request Apr 21, 2020

[v1.7.x] cherry pick #17741 to v1.7.x #18113

Merged

rondogency approved these changes Apr 21, 2020

View reviewed changes

ptrendx merged commit e761f84 into apache:master Apr 21, 2020

eric-haibin-lin reviewed Apr 21, 2020

View reviewed changes

samskalicky changed the title ~~[WIP] MXNet Extensions enhancements~~ MXNet Extensions enhancements Apr 21, 2020

This was referenced Apr 21, 2020

[1.x] backport #17885 #18126

Merged

[1.7] Backport #17885 #18127

Closed

samskalicky mentioned this pull request Apr 21, 2020

[v1.7.x] Backport #17885 #18128

Merged

rondogency mentioned this pull request Apr 25, 2020

Custom Operator Perf Script and Error Message #18167

Open

wkcn mentioned this pull request May 9, 2020

[MXNet Extensions] Include lib_api.h in the pre-built pip package #18267

Open

samskalicky mentioned this pull request Jul 29, 2020

Support extra inputs for subgraph ops #18779

Merged

7 tasks

szha mentioned this pull request Aug 15, 2020

[Development] MXNet 2.0 Update #18931

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MXNet Extensions enhancements #17885

MXNet Extensions enhancements #17885

samskalicky commented Mar 22, 2020 •

edited

Loading

samskalicky commented Apr 7, 2020

mxnet-bot commented Apr 7, 2020

samskalicky commented Apr 7, 2020 •

edited

Loading

mxnet-bot commented Apr 7, 2020

samskalicky commented Apr 8, 2020

mxnet-bot commented Apr 8, 2020

rondogency Apr 20, 2020

samskalicky Apr 21, 2020

rondogency Apr 21, 2020

rondogency Apr 20, 2020

samskalicky Apr 21, 2020

rondogency commented Apr 21, 2020

samskalicky commented Apr 21, 2020

ptrendx left a comment

rondogency left a comment

rondogency Apr 21, 2020

rondogency Apr 21, 2020

rondogency Apr 21, 2020

rondogency Apr 21, 2020

ptrendx commented Apr 21, 2020

eric-haibin-lin Apr 21, 2020

eric-haibin-lin Apr 21, 2020

samskalicky Apr 21, 2020

eric-haibin-lin Apr 21, 2020

samskalicky Apr 21, 2020

leezu commented Apr 21, 2020

ptrendx commented Apr 21, 2020

samskalicky commented Apr 21, 2020

leezu commented Apr 21, 2020

MXNet Extensions enhancements #17885

MXNet Extensions enhancements #17885

Conversation

samskalicky commented Mar 22, 2020 • edited Loading

Description

Features

SupportedOps Enhancements for graph coloring

Selector class for subgraph creation

Custom Graph Passes

Compilation

Checklist

Essentials

Changes

Comments

samskalicky commented Apr 7, 2020

mxnet-bot commented Apr 7, 2020

samskalicky commented Apr 7, 2020 • edited Loading

mxnet-bot commented Apr 7, 2020

samskalicky commented Apr 8, 2020

mxnet-bot commented Apr 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rondogency commented Apr 21, 2020

samskalicky commented Apr 21, 2020

ptrendx left a comment

Choose a reason for hiding this comment

rondogency left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrendx commented Apr 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leezu commented Apr 21, 2020

ptrendx commented Apr 21, 2020

samskalicky commented Apr 21, 2020

leezu commented Apr 21, 2020

samskalicky commented Mar 22, 2020 •

edited

Loading

samskalicky commented Apr 7, 2020 •

edited

Loading