Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Dynamic subgraph property #17034

Merged
merged 71 commits into from
Jan 9, 2020
Merged

Conversation

samskalicky
Copy link
Contributor

@samskalicky samskalicky commented Dec 10, 2019

Description

Initial PR for supporting dynamic loading of subgraph properties from libraries. This PR builds on the previous work for dynamic library loading (#15760), partition API enhancements (#15886), and dynamic custom operators (#15921). It enables partitioning a model using a partitioning strategy loaded from an external library.

The model's symbolic graph is analyzed at bind time to determine which operators should be partitioned into a subgraph. Then this info is kept in a custom SubgraphProperty class instance until partition time. Then the custom SubgraphProperty uses this info to guide partitioning of the model and insert the custom subgraph operator specified in the external library.

At runtime, the operators in the model will be executed normally, including the custom subgraph operator. At this time, the subgraph can be executed by the external library just like any regular operator. This provides an interface for parts of the model to be executed on custom accelerators without any change to MXNet's source code (specifically for each accelerator).

Design

Rather than provide a pure subgraph property interface to external libraries, we will provide a single API function that the user will implement to control the partitioning. The whole symbolic graph will be provided to the user -- post infer type & infer shape -- to analyze and determine which ops they want to include in subgraphs. They will return the node ID for each node in the graph that is supported. In this PR we implement a fixed custom SubgraphProperty in MXNet that will interface between the current Subgraph API and this streamlined "supportedOps" API in the external library.

Heres an end-to-end overview, starting at the Python users' end. First, a user will load their custom library containing a custom subgraph operator and the implemented "supportedOps" API.

mx.library.load(path)

Then they will call the "optimize_for" API (from #15886) that will use the custom subgraph property to partition the graph. The name here is the name the user specified when registering their "supportedOps" function in the external library.

part_sym = sym.optimize_for("myProp")

This will then call the SubgraphProperty's "PrePartition" API and pass the whole model graph (post infer type/shape) to the custom subgraph backend/property.
https://github.com/apache/incubator-mxnet/blob/61013a8bf9ef8a7b79d684504df1b321b1efb8d8/src/c_api/c_api_symbolic.cc#L1305
In the custom subgraph property's "PrePartition" API, it will call the "supportedOps" API registered in the external library

int retval = callSupportedOps_(supportedOps_, json, supportedNodeIDs.size(), ids,
                            opt_keys_.data(), opt_vals_.data(), opt_keys_.size());

Then the "supportedOps" API in the external library will be called. The symbol json string will be given as input, and any node that is supported by the external library will be set in a list of node IDs.

MXReturnValue mySupportedOps(std::string json,
                             const int num_ids,
                             int *ids,
                             std::unordered_map<std::string, std::string>& options) {
  return MX_SUCCESS;
}

Then in the "PrePartition" function, these node IDs will be converted back into node names:

for(int i=0; i<num_ids; i++) {
      if(supportedNodeIDs[i]) {
        supportedNodes.push_back(idx[i].source->attrs.name);
        std::cout << idx[i].source->attrs.name << std::endl;
      }
    }

The supportedNodes vector is passed when creating the SubgraphSelector:

virtual SubgraphSelectorPtr CreateSubgraphSelector() const {
    return std::make_shared<CustomContainOpSelector>(supportedNodes);
  }

And then in the SubgraphSelector "Select" function is used to check if a given node should be included in the subgraph:

virtual bool Select(const nnvm::Node &n) {
    return supportedNodes_.count(n.attrs.name) > 0;
  }

Finally, after the subgraph is created the operator used is the one specified by the user for their custom subgraph operator:

virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &sym,
                                           const int subgraph_id = 0) const {
    nnvm::NodePtr n = nnvm::Node::Create();
    n->attrs.op = Op::Get(subgraph_op_name);
    n->attrs.name = "_op" + std::to_string(subgraph_id);
    n->attrs.subgraphs.push_back(std::make_shared<nnvm::Symbol>(sym));
    return n;
  }

At runtime, the regular user operator is called resulting in the execution of the custom operator for the subgraph that was partitioned.

Users register their partitioning strategy using the following API where they specify the name of their subgraphBackend (and subgraphProperty), the supportedOps function, and the name of the custom operator they want inserted for each subgraph created:

REGISTER_PARTITIONER(myProp)
.setSupportedOps(mySupportedOps)
.setSubgraphOp("_custom_subgraph_op");

Given that the MXNet subgraph API does de-cycle and other checks, the subgraph reviewed by the
supportedOps API may differ than the final subgraph. To give the library an additional option to reject the final subgraph combination, we added an additional API acceptSubgraph that the library creator can implement. This is an optional API, and if implemented will be called from the CreateSubgraphNode API in the MXNet subgraph property. If accepted, the subgraph op will be created and inserted into the graph. If not, we'll reattach the subgraph inputs to the graph (reversing the functionality of CutGraphInputs in build_subgraph.cc) returning the graph to its original state.

This PR also sets an attribute for each subgraph input node "isArg" that is "True" if the subgraph input is also an input to the model (and not an output of some other operator in the model). This can be used by the accelerator library to avoid unnecessary data movement for the same data, or to execute further optimizations knowing that a particular subgraph input will be unchanged between calls.

Next Steps

  • docs, readme, tutorial
  • dynamic graph passes

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@samskalicky
Copy link
Contributor Author

samskalicky commented Dec 14, 2019

@mseth10 @rondogency @szha @ptrendx @PatricZhao @TaoLv @ZhennanQin Initial PoC is working, would be great to get some early feedback before we go too far down the wrong road.

Some more things todo:

@mseth10
Copy link
Contributor

mseth10 commented Jan 6, 2020

Reviewed recent changes to allow library to add attributes to subgraph. Looks good to me.

src/c_api/c_api.cc Outdated Show resolved Hide resolved
include/mxnet/lib_api.h Show resolved Hide resolved
example/extensions/lib_subgraph/Makefile Show resolved Hide resolved
@samskalicky
Copy link
Contributor Author

Thanks for the review @eric-haibin-lin! ive made changes based on your feedback, updated the PR description with "Next Steps" for some todo items you suggested.

Makefile Show resolved Hide resolved
@TaoLv
Copy link
Member

TaoLv commented Jan 7, 2020

You might want to have github issues or project to track the todos and the progress of this project.

@samskalicky
Copy link
Contributor Author

samskalicky commented Jan 7, 2020

You might want to have github issues or project to track the todos and the progress of this project.

Thanks @TaoLv I created this issue: #17236

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some final nickpicks..

@samskalicky
Copy link
Contributor Author

@mxnet-label-bot update [pr-awaiting-merge]

@lanking520 lanking520 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Jan 7, 2020
@samskalicky samskalicky mentioned this pull request Jan 7, 2020
7 tasks
@samskalicky
Copy link
Contributor Author

apologies @eric-haibin-lin my mistake on the comments. Thanks for the suggestions, ive gone through and changed the variable names to use the underscore format in the custom_subgraph_property.h file.

@wkcn
Copy link
Member

wkcn commented Jan 8, 2020

Hi @samskalicky , is it ready to merge this PR?

@samskalicky
Copy link
Contributor Author

@wkcn ready to go!

@wkcn wkcn merged commit ddeac2e into apache:master Jan 9, 2020
@wkcn
Copy link
Member

wkcn commented Jan 9, 2020

Merged. Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants