Adding sparse support to MXTensor for custom operators #17569

guanxinq · 2020-02-11T19:52:00Z

Description

Add support for sparse custom operators. It will support row sparse and CSR formats.
This is a continuation of custom operators project, initial CPU support is implemented here: #15921 and GPU support is implemented here: #17270 .

Added the MXSparse structure. The data_ptr of MXTensor points to an object of this structure for sparse tensors and point to data for dense tensors.
Added data storage type enum for MXTensor. It supports dense, row sparse and CSR formats.
Added alloc_sparse() function within OpResource which fixed the sparse output size issue at run time.
Added inferSType() function which enables customized storage type inference implementation.
Created new example libraries in "example/extensions/lib_custom_op" for CSR and row sparse tensor transpose operators.

Design

The function alloc_sparse() in lower level call function CheckAndAlloc(). To call this member function of NDArray, we added lambda functions just as what we did for alloc_cpu().

auto sparse_alloc = [&](int index, int indices_len, int idxptr_len,
                      void** data, int64_t** indices, int64_t** indptr) {
  // Row Sparse
  if(idxptr_len == 0) {
    outputs[index].CheckAndAlloc({mshadow::Shape1(indices_len)});
    *data = outputs[index].data().dptr_;
    *indices = (int64_t*)outputs[index].aux_data(rowsparse::kIdx).dptr_;
  }
  // CSR
  else {
    outputs[index].CheckAndAlloc({mshadow::Shape1(idxptr_len), mshadow::Shape1(indices_len)});
    *data = outputs[index].data().dptr_;
    *indices = (int64_t*)outputs[index].aux_data(csr::kIdx).dptr_;
    *indptr = (int64_t*)outputs[index].aux_data(csr::kIndPtr).dptr_;
  }
};

typedef decltype(sparse_alloc) alloc_type_sparse;
auto sparse_malloc = [](void* _sparse_alloc, int index, int indices_len, int idxptr_len,
                       void** data, int64_t** indices, int64_t** indptr) {
  alloc_type_sparse* sparsealloc = static_cast<alloc_type_sparse*>(_sparse_alloc);
  (*sparsealloc)(index, indices_len, idxptr_len, data, indices, indptr);
};

The lambda function could be called by alloc_sparse() within OpResource.

void alloc_sparse(MXSparse* sparse, int index, int indices_len, int indptr_len = 0) {
  sparse_malloc(sparse_alloc, index, indices_len, indptr_len,
               &(sparse->data), &(sparse->indices), &(sparse->indptr));
}

In the customized implementation, users are able to set output tensor size by

MXReturnValue forward(std::map<std::string, std::string> attrs,
                      std::vector<MXTensor> inputs,
                      std::vector<MXTensor> outputs,
                      OpResource res) {
  // implementation of operators.
  MXSparse*  ptr = outputs[index].data<MXSparse>();
  res.alloc_sparse(ptr, index , indices_len, indptr_len);
  // implementation of operators reply on output size.
  return MX_SUCCESS;
}

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

include/mxnet/lib_api.h

eric-haibin-lin

Look forward to the complete API with example and documentation :)

include/mxnet/lib_api.h

src/c_api/c_api.cc

include/mxnet/lib_api.h

mseth10

Thanks for the contribution @guanxinq . Left a few comments.

example/extensions/lib_custom_op/test_transposecsr.py

example/extensions/lib_custom_op/test_transposerowsp.py

example/extensions/lib_custom_op/transposecsr_lib.cc

example/extensions/lib_custom_op/transposerowsp_lib.cc

include/mxnet/lib_api.h

eric-haibin-lin

can u resolve conflicts ?

example/extensions/lib_custom_op/test_transposecsr.py

example/extensions/lib_custom_op/transposecsr_lib.cc

samskalicky · 2020-03-19T23:42:03Z

can you update MX_LIBRARY_VERSION to 5?

rondogency

LGTM! Thanks for the contribution!

guanxinq · 2020-03-20T06:18:54Z

can you update MX_LIBRARY_VERSION to 5?

Updated.

rondogency · 2020-03-21T00:41:27Z

@wkcn @eric-haibin-lin this PR is ready to merge!

wkcn

LGTM. Thank you for the contribution!

mseth10

LGTM

* 'master' of https://github.com/apache/incubator-mxnet: (192 commits) * impl - FFI for np einsum (apache#17869) [Numpy] FFI for diag/diagonal/diag_indices_from (apache#17789) [Numpy] Kron operator (apache#17323) cmake: Set DMLC_LOG_FATAL_THROW only for building mxnet and not for tvm (apache#17878) Add simplified HybridBlock.forward without F (apache#17530) Use FP32 copy of weights for norm (multitensor LAMB optimizer) (apache#17700) Use multi-tensor sumSQ in clip_global_norm (apache#17652) [Numpy] Add op fmax, fmin, fmod (apache#17567) Adding sparse support to MXTensor for custom operators (apache#17569) Update 3rdparty/mkldnn to v1.2.2 (apache#17313) Dynamic subgraph compile support (apache#17623) Refactor cpp-package CMakeLists.txt & add missing inference/imagenet_inference (apache#17835) staticbuild: Fix potential user-assisted execution of arbitrary code (apache#17860) * FFI for np.argmax and np.argmin (apache#17843) ffi for roll/rot90 (apache#17861) Skip test_multi_worker_dataloader_release_pool on OS X (apache#17797) add ffi for full_like, binary (apache#17811) HybridBlock.export() to return created filenames (apache#17758) Fix SoftReLU fused operator numerical stability (apache#17849) CI: Test clang10 cpu & gpu builds with -WError (apache#17830) ...

* Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5

…18069) * Dynamic subgraph compile support (#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270 Co-authored-by: guanxinq <58794120+guanxinq@users.noreply.github.com> Co-authored-by: Ziyi Mu <ziyi.mu@columbia.edu>

* Dynamic subgraph compile support (#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270 Co-authored-by: guanxinq <58794120+guanxinq@users.noreply.github.com> Co-authored-by: Ziyi Mu <ziyi.mu@columbia.edu>

* Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5

Added enum for sparse storage

a63bae9

guanxinq requested review from anirudh2290 and eric-haibin-lin as code owners February 11, 2020 19:52

Add structure for Dense and Sparse

93cddf4

guanxinq force-pushed the sparseCustomOps branch from a3c02b3 to 93cddf4 Compare February 11, 2020 22:04

samskalicky reviewed Feb 12, 2020

View reviewed changes

include/mxnet/lib_api.h Show resolved Hide resolved

samskalicky reviewed Feb 12, 2020

View reviewed changes

include/mxnet/lib_api.h Outdated Show resolved Hide resolved

guanxinq changed the title ~~[WIP] Support sparse custom operator~~ [WIP] Adding sparse support to MXTensor for custom operators Feb 13, 2020

redesign the data structure for MXSparse

8ccfbd2

guanxinq force-pushed the sparseCustomOps branch from a1e6baa to 8ccfbd2 Compare February 13, 2020 23:12

guanxinq added 2 commits February 14, 2020 23:24

pull out aux data from sparse NDArray

8c9b358

Added more sparse arguments to API interface

2bf9200

samskalicky reviewed Feb 15, 2020

View reviewed changes

include/mxnet/lib_api.h Outdated Show resolved Hide resolved

eric-haibin-lin reviewed Feb 15, 2020

View reviewed changes

include/mxnet/lib_api.h Outdated Show resolved Hide resolved

guanxinq requested review from aaronmarkham and szha as code owners February 17, 2020 22:21

Passed sparse from c_api to lib_api.h and set in MXTensor

7eba53c

guanxinq force-pushed the sparseCustomOps branch from a8f3181 to 7eba53c Compare February 17, 2020 22:23

Fix indent

3fdf771

samskalicky reviewed Feb 18, 2020

View reviewed changes

include/mxnet/lib_api.h Outdated Show resolved Hide resolved

samskalicky reviewed Feb 18, 2020

View reviewed changes

include/mxnet/lib_api.h Show resolved Hide resolved

samskalicky reviewed Feb 18, 2020

View reviewed changes

src/c_api/c_api.cc Outdated Show resolved Hide resolved

guanxinq added 4 commits February 19, 2020 00:11

fix segfault

a1aa78f

Fix NDArray to MXTensor errors

0537deb

Add a sample of sparse(CSR) transpose

4f44695

Make CSR transpose temporarily work by hardcoding

ade3e46

guanxinq force-pushed the sparseCustomOps branch from 947c422 to ade3e46 Compare February 26, 2020 23:44

samskalicky reviewed Mar 2, 2020

View reviewed changes

include/mxnet/lib_api.h Show resolved Hide resolved

samskalicky reviewed Mar 2, 2020

View reviewed changes

include/mxnet/lib_api.h Outdated Show resolved Hide resolved

Fixed sparse output size(Refined)

9a26ac3

mseth10 reviewed Mar 18, 2020

View reviewed changes

Resolve comments

0eb1de9

eric-haibin-lin reviewed Mar 18, 2020

View reviewed changes

example/extensions/lib_custom_op/test_transposecsr.py Show resolved Hide resolved

example/extensions/lib_custom_op/transposecsr_lib.cc Show resolved Hide resolved

verify transpose ops results

79d7d64

guanxinq force-pushed the sparseCustomOps branch 2 times, most recently from fc65f7d to 79d7d64 Compare March 19, 2020 03:34

guanxinq added 2 commits March 19, 2020 03:40

Resolved merge conflict

89d638f

fix sanity check

9dcb604

guanxinq force-pushed the sparseCustomOps branch 2 times, most recently from efd3dbb to cafd3a3 Compare March 19, 2020 23:17

Merge and resolve conflicts

08faed4

guanxinq force-pushed the sparseCustomOps branch from cafd3a3 to 08faed4 Compare March 20, 2020 01:08

rondogency approved these changes Mar 20, 2020

View reviewed changes

update MX_LIBRARY_VERSION to 5

7f39b85

wkcn approved these changes Mar 21, 2020

View reviewed changes

mseth10 approved these changes Mar 21, 2020

View reviewed changes

wkcn added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Mar 21, 2020

wkcn merged commit f01dc80 into apache:master Mar 22, 2020

This was referenced Apr 15, 2020

[1.7] MXNet Extension PRs (#17623, #17569, #17762) #18063

Merged

[1.7] Backport MXNet Extension PRs (#17623, #17569, #17762) #18063 #18069

Merged

wkcn mentioned this pull request May 9, 2020

[MXNet Extensions] Include lib_api.h in the pre-built pip package #18267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding sparse support to MXTensor for custom operators #17569

Adding sparse support to MXTensor for custom operators #17569

guanxinq commented Feb 11, 2020 •

edited

Loading

eric-haibin-lin left a comment

mseth10 left a comment

eric-haibin-lin left a comment

samskalicky commented Mar 19, 2020

rondogency left a comment

guanxinq commented Mar 20, 2020

rondogency commented Mar 21, 2020

wkcn left a comment

mseth10 left a comment

Adding sparse support to MXTensor for custom operators #17569

Adding sparse support to MXTensor for custom operators #17569

Conversation

guanxinq commented Feb 11, 2020 • edited Loading

Description

Design

Checklist

Essentials

Changes

Comments

eric-haibin-lin left a comment

Choose a reason for hiding this comment

mseth10 left a comment

Choose a reason for hiding this comment

eric-haibin-lin left a comment

Choose a reason for hiding this comment

samskalicky commented Mar 19, 2020

rondogency left a comment

Choose a reason for hiding this comment

guanxinq commented Mar 20, 2020

rondogency commented Mar 21, 2020

wkcn left a comment

Choose a reason for hiding this comment

mseth10 left a comment

Choose a reason for hiding this comment

guanxinq commented Feb 11, 2020 •

edited

Loading