Dynamic subgraph property #17034

samskalicky · 2019-12-10T08:19:29Z

Description

Initial PR for supporting dynamic loading of subgraph properties from libraries. This PR builds on the previous work for dynamic library loading (#15760), partition API enhancements (#15886), and dynamic custom operators (#15921). It enables partitioning a model using a partitioning strategy loaded from an external library.

The model's symbolic graph is analyzed at bind time to determine which operators should be partitioned into a subgraph. Then this info is kept in a custom SubgraphProperty class instance until partition time. Then the custom SubgraphProperty uses this info to guide partitioning of the model and insert the custom subgraph operator specified in the external library.

At runtime, the operators in the model will be executed normally, including the custom subgraph operator. At this time, the subgraph can be executed by the external library just like any regular operator. This provides an interface for parts of the model to be executed on custom accelerators without any change to MXNet's source code (specifically for each accelerator).

Design

Rather than provide a pure subgraph property interface to external libraries, we will provide a single API function that the user will implement to control the partitioning. The whole symbolic graph will be provided to the user -- post infer type & infer shape -- to analyze and determine which ops they want to include in subgraphs. They will return the node ID for each node in the graph that is supported. In this PR we implement a fixed custom SubgraphProperty in MXNet that will interface between the current Subgraph API and this streamlined "supportedOps" API in the external library.

Heres an end-to-end overview, starting at the Python users' end. First, a user will load their custom library containing a custom subgraph operator and the implemented "supportedOps" API.

mx.library.load(path)

Then they will call the "optimize_for" API (from #15886) that will use the custom subgraph property to partition the graph. The name here is the name the user specified when registering their "supportedOps" function in the external library.

part_sym = sym.optimize_for("myProp")

This will then call the SubgraphProperty's "PrePartition" API and pass the whole model graph (post infer type/shape) to the custom subgraph backend/property.
https://github.com/apache/incubator-mxnet/blob/61013a8bf9ef8a7b79d684504df1b321b1efb8d8/src/c_api/c_api_symbolic.cc#L1305
In the custom subgraph property's "PrePartition" API, it will call the "supportedOps" API registered in the external library

int retval = callSupportedOps_(supportedOps_, json, supportedNodeIDs.size(), ids,
                            opt_keys_.data(), opt_vals_.data(), opt_keys_.size());

Then the "supportedOps" API in the external library will be called. The symbol json string will be given as input, and any node that is supported by the external library will be set in a list of node IDs.

MXReturnValue mySupportedOps(std::string json,
                             const int num_ids,
                             int *ids,
                             std::unordered_map<std::string, std::string>& options) {
  return MX_SUCCESS;
}

Then in the "PrePartition" function, these node IDs will be converted back into node names:

for(int i=0; i<num_ids; i++) {
      if(supportedNodeIDs[i]) {
        supportedNodes.push_back(idx[i].source->attrs.name);
        std::cout << idx[i].source->attrs.name << std::endl;
      }
    }

The supportedNodes vector is passed when creating the SubgraphSelector:

virtual SubgraphSelectorPtr CreateSubgraphSelector() const {
    return std::make_shared<CustomContainOpSelector>(supportedNodes);
  }

And then in the SubgraphSelector "Select" function is used to check if a given node should be included in the subgraph:

virtual bool Select(const nnvm::Node &n) {
    return supportedNodes_.count(n.attrs.name) > 0;
  }

Finally, after the subgraph is created the operator used is the one specified by the user for their custom subgraph operator:

virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &sym,
                                           const int subgraph_id = 0) const {
    nnvm::NodePtr n = nnvm::Node::Create();
    n->attrs.op = Op::Get(subgraph_op_name);
    n->attrs.name = "_op" + std::to_string(subgraph_id);
    n->attrs.subgraphs.push_back(std::make_shared<nnvm::Symbol>(sym));
    return n;
  }

At runtime, the regular user operator is called resulting in the execution of the custom operator for the subgraph that was partitioned.

Users register their partitioning strategy using the following API where they specify the name of their subgraphBackend (and subgraphProperty), the supportedOps function, and the name of the custom operator they want inserted for each subgraph created:

REGISTER_PARTITIONER(myProp)
.setSupportedOps(mySupportedOps)
.setSubgraphOp("_custom_subgraph_op");

Given that the MXNet subgraph API does de-cycle and other checks, the subgraph reviewed by the
supportedOps API may differ than the final subgraph. To give the library an additional option to reject the final subgraph combination, we added an additional API acceptSubgraph that the library creator can implement. This is an optional API, and if implemented will be called from the CreateSubgraphNode API in the MXNet subgraph property. If accepted, the subgraph op will be created and inserted into the graph. If not, we'll reattach the subgraph inputs to the graph (reversing the functionality of CutGraphInputs in build_subgraph.cc) returning the graph to its original state.

This PR also sets an attribute for each subgraph input node "isArg" that is "True" if the subgraph input is also an input to the model (and not an output of some other operator in the model). This can be used by the accelerator library to avoid unnecessary data movement for the same data, or to execute further optimizations knowing that a particular subgraph input will be unchanged between calls.

Next Steps

docs, readme, tutorial
dynamic graph passes

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

…to dynamic_subgraph_prop

added partitioner registration to c_api

…to dynamic_subgraph_prop

example is printing

…to dynamic_subgraph_prop

samskalicky · 2019-12-14T05:38:21Z

@mseth10 @rondogency @szha @ptrendx @PatricZhao @TaoLv @ZhennanQin Initial PoC is working, would be great to get some early feedback before we go too far down the wrong road.

Some more things todo:

[Done] Support multiple supportOps functions grouped together. Similar to how subgraphProperties are grouped into subgraphBackends
[Done] Annotate the symbol json string with shape/type info from the graph object
[Done] Implement working example that computes real output values to compare to non-partitioned model execution
[Done] Add test to unittests to build library in the CI and test
[Done] Call a new API in the library AnalyzeSubgraph inCreateSubgraphNode API to give library a chance to review subgraphs reject creating the subgraph if unsupported configuration was created.
https://github.com/apache/incubator-mxnet/blob/faa283228d8e4aa391dd0877b7388996e9a0e223/src/operator/subgraph/build_subgraph.cc#L597-L599
[Done] Pass options_map to supportedOps function to let users specify custom options to configure their custom partitioning in the library
https://github.com/apache/incubator-mxnet/blob/a37a76c42bf790f288a06adfc81e2df325fe0c24/src/operator/subgraph/subgraph_property.h#L270-L271

…bject

…w subgraphProperties are grouped into subgraphBackends

src/operator/subgraph/partitioner/custom_subgraph_property.h

mseth10 · 2020-01-06T09:54:26Z

Reviewed recent changes to allow library to add attributes to subgraph. Looks good to me.

example/extensions/lib_subgraph/subgraph_lib.cc

example/extensions/lib_subgraph/test_subgraph.py

src/c_api/c_api.cc

include/mxnet/lib_api.h

example/extensions/lib_subgraph/Makefile

example/extensions/lib_subgraph/subgraph_lib.cc

example/extensions/lib_subgraph/test_subgraph.py

…to dynamic_subgraph_prop

samskalicky · 2020-01-07T01:55:01Z

Thanks for the review @eric-haibin-lin! ive made changes based on your feedback, updated the PR description with "Next Steps" for some todo items you suggested.

Makefile

TaoLv · 2020-01-07T08:18:28Z

You might want to have github issues or project to track the todos and the progress of this project.

samskalicky · 2020-01-07T08:24:32Z

You might want to have github issues or project to track the todos and the progress of this project.

Thanks @TaoLv I created this issue: #17236

eric-haibin-lin

some final nickpicks..

example/extensions/lib_subgraph/subgraph_lib.cc

src/operator/subgraph/partitioner/custom_subgraph_property.h

samskalicky · 2020-01-07T18:57:51Z

@mxnet-label-bot update [pr-awaiting-merge]

…to dynamic_subgraph_prop

samskalicky · 2020-01-07T21:29:43Z

apologies @eric-haibin-lin my mistake on the comments. Thanks for the suggestions, ive gone through and changed the variable names to use the underscore format in the custom_subgraph_property.h file.

…to dynamic_subgraph_prop

wkcn · 2020-01-08T23:43:45Z

Hi @samskalicky , is it ready to merge this PR?

samskalicky · 2020-01-08T23:50:30Z

@wkcn ready to go!

wkcn · 2020-01-09T00:02:03Z

Merged. Thank you!

Ubuntu added 2 commits December 10, 2019 08:17

initial commit

cf30d30

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

58a41ff

…to dynamic_subgraph_prop

samskalicky requested review from aaronmarkham, anirudh2290, eric-haibin-lin and szha as code owners December 10, 2019 08:19

Ubuntu and others added 19 commits December 11, 2019 05:42

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

702cf50

…to dynamic_subgraph_prop

added subgraphCreate API function to partition the model/graph

d17a118

added partitioner registration to c_api

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

e05d4e1

…to dynamic_subgraph_prop

cleaned up API, added API to specify subgraphOp

d0f2fb3

fixed whitespace

55e8861

added custom subgraph property

8c97c45

fixed whitespace

0ebf5b6

fixed subgraph property

84f8426

fixed whitespace

2a1c6fe

added registration of custom subgraph property in c_api

d3c277d

added support for calling supportOps from customSubgraphProperty

55e0974

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

060c8b6

…to dynamic_subgraph_prop

sending symbol json to supportedOps API

6a7e3f7

example is printing

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

0488879

…to dynamic_subgraph_prop

working partitioning example

fa206e4

fixed whitespace

79d08ea

fixed whitespace

bbe4062

fixed style

a418301

fixed style

420e77a

samskalicky added 3 commits December 15, 2019 06:52

Annotate the symbol json string with shape/type info from the graph o…

2f354a4

…bject

Support multiple supportOps functions grouped together. Similar to ho…

92e6f00

…w subgraphProperties are grouped into subgraphBackends

fixed whitespace

901d308

ZhennanQin reviewed Dec 16, 2019

View reviewed changes

src/operator/subgraph/partitioner/custom_subgraph_property.h Outdated Show resolved Hide resolved

samskalicky added 2 commits January 5, 2020 00:54

fixed seconds

dda4c86

retrigger ci

1931599

mseth10 reviewed Jan 6, 2020

View reviewed changes

src/operator/subgraph/partitioner/custom_subgraph_property.h Outdated Show resolved Hide resolved

eric-haibin-lin reviewed Jan 7, 2020

View reviewed changes

samskalicky added 2 commits January 7, 2020 01:51

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

31473ba

…to dynamic_subgraph_prop

removed unused code

39ed299

TaoLv reviewed Jan 7, 2020

View reviewed changes

Makefile Show resolved Hide resolved

This was referenced Jan 7, 2020

[RFC] Custom Operator Part 2 #17006

Open

[RFC] Custom subgraph property enhancements #17236

Open

TaoLv approved these changes Jan 7, 2020

View reviewed changes

eric-haibin-lin reviewed Jan 7, 2020

View reviewed changes

example/extensions/lib_subgraph/subgraph_lib.cc Show resolved Hide resolved

src/operator/subgraph/partitioner/custom_subgraph_property.h Show resolved Hide resolved

lanking520 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Jan 7, 2020

samskalicky mentioned this pull request Jan 7, 2020

Add CustomOp tutorial doc #17241

Merged

7 tasks

samskalicky added 2 commits January 7, 2020 21:15

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

13cfc1c

…to dynamic_subgraph_prop

fixed variable naming

967787e

eric-haibin-lin approved these changes Jan 7, 2020

View reviewed changes

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

f90df49

…to dynamic_subgraph_prop

wkcn merged commit ddeac2e into apache:master Jan 9, 2020

wkcn mentioned this pull request May 9, 2020

[MXNet Extensions] Include lib_api.h in the pre-built pip package #18267

Open

szha mentioned this pull request Aug 15, 2020

[Development] MXNet 2.0 Update #18931

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic subgraph property #17034

Dynamic subgraph property #17034

samskalicky commented Dec 10, 2019 •

edited

Loading

samskalicky commented Dec 14, 2019 •

edited

Loading

mseth10 commented Jan 6, 2020

samskalicky commented Jan 7, 2020

TaoLv commented Jan 7, 2020

samskalicky commented Jan 7, 2020 •

edited

Loading

eric-haibin-lin left a comment

samskalicky commented Jan 7, 2020

samskalicky commented Jan 7, 2020

wkcn commented Jan 8, 2020

samskalicky commented Jan 8, 2020

wkcn commented Jan 9, 2020

Dynamic subgraph property #17034

Dynamic subgraph property #17034

Conversation

samskalicky commented Dec 10, 2019 • edited Loading

Description

Design

Next Steps

Checklist

Essentials

samskalicky commented Dec 14, 2019 • edited Loading

mseth10 commented Jan 6, 2020

samskalicky commented Jan 7, 2020

TaoLv commented Jan 7, 2020

samskalicky commented Jan 7, 2020 • edited Loading

eric-haibin-lin left a comment

Choose a reason for hiding this comment

samskalicky commented Jan 7, 2020

samskalicky commented Jan 7, 2020

wkcn commented Jan 8, 2020

samskalicky commented Jan 8, 2020

wkcn commented Jan 9, 2020

samskalicky commented Dec 10, 2019 •

edited

Loading

samskalicky commented Dec 14, 2019 •

edited

Loading

samskalicky commented Jan 7, 2020 •

edited

Loading