[Sparse] add sparse tensor computation support #1289

liangfu · 2018-06-15T11:41:26Z

This PR implements following features:

Create tvm.contrib.sparse.CSRNDArray and tvm.contrib.sparse.placeholder for creating sparse tensor.
Support conversion between numpy.ndarray and tvm.contrib.sparse.CSRNDArray
Support topi.sparse.csrmv and topi.sparse.csrmm as SpMV and SpMM, and check correctness with dense tensor operations.
Enable dense operator for both sparse input and sparse weights

Here is a table that briefly shows performance between numpy.dot and topi.sparse.csrmm.

SpM	M	Sparsity	openblas	tvm	speedup
512x512	512x512	50%	3.46 ms	3.20 ms	1.08x
512x512	512x512	80%	2.72 ms	1.21 ms	2.06x

The timing results are base on average of 10 iterations.

tqchen · 2018-06-15T21:28:00Z

Supporting sparse matrix is great. This is a major change, please send an RFC so we can have more discussion before we actually start working on implementation.

liangfu · 2018-06-26T09:12:04Z

As the intention of this PR is to provide a basic structure and prove it is possible to perform sparse matrix computation upon TVM Stack, and this is done by using existing Tensor and dynamic memory allocation, I think this PR is ready to merge at the moment. Please review.

tqchen · 2018-07-19T04:04:39Z

@eric-haibin-lin can you help review this PR?

tqchen · 2018-07-19T04:07:32Z

Sorry for the delayed review @liangfu I take a brief look, and I think the current way of constructing through IR Builder is ok, but nevertheless may loss some of the benefit of scheduling.

On the otherhand, it will be quite interesting if we pick specific sparse operators we are interested in and being used in real nn and push perf of that

eric-haibin-lin · 2018-07-19T04:45:52Z

Yes, I'll take a look.

On the second point, sparse block net https://arxiv.org/pdf/1801.02108.pdf would be quite interesting, although it

requires sparsification during training
requires a sparse block format, which is different from csr

liangfu · 2018-07-19T05:31:52Z

Thanks for your attention.
As we proposed in #1291 , we will add support to demonstrate sparse operators for real CNNs based on this PR.
I'm currently interested in reproducing https://arxiv.org/abs/1608.01409 in TVM Stack, because it provides source code and demonstration.

eric-haibin-lin

Nice effort. I'm new to TVM and hope if you guys can answer some of my questions

eric-haibin-lin · 2018-07-20T00:19:56Z

python/tvm/contrib/sparse.py

+csr = "csr"
+
+@register_node
+class CSRNDArray(object):


Integration with external functions:
DLPack only supports dense array blob. How would this class support invoking cusparse/mkl-sparse-blas functions? Or is it never possible if we don't enable DLPack to pack sparse arrays?

What would be the pro and cons if it inherits NDArrayBase and throw exceptions on unimplemented method?

Support invoking cusparse/mkl-sparse-blas functions?

I think we should keep these in mind. Let me give it a try and give you a definite answer later.

Will this bring changes to dlpack?

There is no need to change dlpack at the moment. @tqchen suggested that discussions are need before adding sparse matrix support into dlpack.

Inherit NDArrayBase and throw exceptions on unimplemented method?

Good suggestion. However, there would be too many changes at the moment, and csr_matrix is not really N-dimensional. Let's discuss what should be a proper way to make this change.

eric-haibin-lin · 2018-07-20T00:20:06Z

python/tvm/contrib/sparse.py

+@register_node
+class CSRNDArray(object):
+    """Sparse tensor object in CSR format."""
+    def __init__(self, source_array=None,


Did you consider scipy.sparse style interface, where the 1st argument could either be a numpy array or tuples of (data, idx, indptr)? https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

eric-haibin-lin · 2018-07-20T00:21:37Z

python/tvm/contrib/sparse.py

+        full[ridx, self.indices.asnumpy().astype('int32')] = self.data.asnumpy()
+        return full
+
+def array(source_array, ctx=None):


Rename to csr_matrix? sparse.array could potentially create arrays of other sparse formats, including csr, csc, block sparse, etc.

Is this really necessary? I was trying to make the interface consistent between tvm.array and tvm.contrib.sparse.array. In the test case, I demonstrated that if I use

import tvm.contrib.sparse as tvmsp

Users could easily change their previous code writing in tvm.array to tvmsp.array with no extra effort. If we need to add support for other sparse formats, I would suggest adding an extra argument to specify the format and leave its default to csr.

eric-haibin-lin · 2018-07-20T05:04:56Z

topi/python/topi/sparse/csrmv.py

+
+    Parameters
+    ----------
+    data : tvm.contrib.CSRTensor


incorrect name: CSRTensor

eric-haibin-lin · 2018-07-20T06:44:39Z

topi/python/topi/sparse/csrmv.py

+
+    Returns
+    -------
+    output : tvm.Tensor


Some sparse ops return sparse result (for example, csr+csr=csr). Would that be supported by specifying sparse placeholder in the IR?
I'm not familiar with tvm memory management, does it support operators with uncertain output shape? The data and indices buffer may have to be resized if we coalesce entries of the same indices when adding two csr matrices.

Would that be supported by specifying sparse placeholder in the IR?

Sparse results are not specified with PlaceholderOp, they are allocated with ExternOp or ComputeOp, I think.

Does it support operators with uncertain output shape?

I have not tried (csr+csr=csr) yet, however, I think tvm requires an estimate of the buffer size before fill-in the values.

eric-haibin-lin · 2018-07-20T06:51:43Z

python/tvm/contrib/sparse.py

+        self.name = name
+        self.stype = stype
+        self.data = _api.placeholder((nonzeros,), dtype=dtype, name=self.name+'_data')
+        self.indices = _api.placeholder((nonzeros,), dtype='int32', name=self.name+'_indices')


In some extreme cases people might want to use int64 for indices dtype (for ads/recommendation) because the number of features goes up to 10 billion. Does any code assume the dtype of indices/indptr is always int32?

liangfu · 2018-07-20T15:00:35Z

Let me summarize the suggested changes as below:

Suggested Changes

Consider scipy.sparse style interface in tvm.contrib.sparse.CSRNDArray.
Fix incorrect name tvm.contrib.NDTensor in topi/python/topi/sparse/*.py, and improve the comments.
Ensure there is no code assume the dtype of indices/indptr is always int32.

Additional Changes

Enable dense operator for both sparse input and sparse weights

Unchanged Items

Invoking cusparse/mkl-sparse-blas functions is supported in tvm.contrib.sparse.array? In my observation, there is no easy way to check whether these functions could be supported.
Make tvm.contrib.sparse.CSRNDArray inherits tvm.NDArrayBase ? The advantage of making this change would bring abstraction layer between dense ndarray and sparse ndarray. However, after I tried to make the change for a while, I don't think CSRNDArray can be implemented to inherit NDArrayBase without changes to NDArrayBase itself, and the program terminated unexpectedly.
To make the interface consistent between tvm.array and tvmsp.array,
I don't think tvm.contrib.sparse.array should be renamed to tvm.contrib.sparse.csr_matrix.

liangfu · 2018-08-01T08:42:27Z

Hi @tqchen @eric-haibin-lin , as summarized above, most suggested changes have been made to reflect the suggestions in the review; and some items that are not changed are explained as well. Please take another review to see whether current PR is suitable for a merge.

In the long run, as suggested by @tqchen , we would port current implementation to real neural networks. I've create a experimental repo and trained a sparse mlp. The dense operator proposed in this PR can be used to perform inference in the sparse mlp.

yzhliu · 2018-08-06T21:45:58Z

Please rebase according to #1394
Sorry for the inconvenience.

yzhliu · 2018-08-13T20:08:00Z

@liangfu Could you update according to Haibin's comments?

liangfu · 2018-08-16T00:10:24Z

yes, definitely.

Conflicts: topi/python/topi/__init__.py

liangfu · 2018-08-16T11:42:26Z

List of Changes

Remove unused codeitype = int64
Now, we don't need the stype argument for CSRPlaceholderOp; and for placeholder, the default stype would be csr, it could be any other format if we could have them defined.
SparsePlaceholderOp class have been created and inherited.
conflict with upstream have been fixed.

I'm not quite sure whether it's okay to have duplicated entries in CSR, like what have been implemented in scipy.

@eric-haibin-lin @yzhliu @tqchen I think most of the concerns have been covered by now. Please check out whether this is suitable for a merge.

tqchen · 2018-08-21T00:44:06Z

python/tvm/contrib/sparse.py

+        raise NotImplementedError('stype=%s is not supported yet.' % (stype,))
+    return ret
+
+@register_node


do not need register_node for now if it is not part of node system

tqchen · 2018-08-21T00:44:40Z

@eric-haibin-lin, please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

yzhliu

Sorry for late review.
The functionality looks good to me. Wait for @eric-haibin-lin 's confirming.

yzhliu · 2018-08-23T05:15:18Z

python/tvm/contrib/sparse.py

+        The name hint of the tensor
+
+    stype: str, optional
+        The name storage type of the sparse tensor (e.g. csr, coo, ell)


missing doc for nonzeros. so as that in SparsePlaceholderOp and CSRPlaceholderOp.

yzhliu · 2018-08-23T05:16:02Z

python/tvm/contrib/sparse.py

+
+class SparsePlaceholderOp(object):
+    """Placeholder class for sparse tensor representations."""
+    def __init__(self, shape, nonzeros, dtype, name):


shall we store nonzeros?

I left it unused intentionally.

yzhliu · 2018-08-23T05:17:51Z

python/tvm/contrib/sparse.py

+    dtype = float32 if dtype is None else dtype
+    stype = csr if stype is None else stype
+    ret = None
+    if stype == 'csr':


'csr' -> csr. Or just remove the constant def, always use str, actually I prefer the later.

yzhliu · 2018-08-23T05:43:29Z

python/tvm/contrib/sparse.py

+            The shape of the array
+        """
+        if isinstance(arg1, tuple):
+            self.data, self.indices, self.indptr = arg1[0], arg1[1], arg1[2]


better to assert len(arg1) == 3 and do self.data, self.indices, self.indptr = arg1.
also have a better name for arg1

okay, i'll fix this.
a better name didn't come into my mind, the name arg1 was inspired by scipy.sparse.csr_matrix.

yzhliu

@tqchen Would you mind approve and merge if it is good to you?

liangfu · 2018-09-06T02:23:03Z

@tqchen Would you mind approve and merge if it is good to you?

tqchen · 2018-09-06T17:29:43Z

Thanks @liangfu @eric-haibin-lin @yzhliu , this is now merged

ajtulloch · 2018-10-15T23:40:43Z

This looks really cool, well done @liangfu.

liangfu mentioned this pull request Jun 16, 2018

[Sparse] Support sparse matrix computation #1291

Closed

5 tasks

liangfu force-pushed the sparse branch from e90d200 to 3773516 Compare June 20, 2018 14:15

liangfu changed the title ~~[WIP] [Sparse] add storage type support for sparse matrices~~ [Sparse] add sparse tensor computation support Jun 27, 2018

tqchen added the status: need review label Jul 19, 2018

eric-haibin-lin reviewed Jul 20, 2018

View reviewed changes

tqchen force-pushed the master branch from e316f03 to 7b59b8e Compare August 4, 2018 17:14

liangfu and others added 15 commits August 7, 2018 10:00

[SPARSE] adjust implement regarding to suggestions;

19af82a

fix pylint;

6030a64

derive from PlaceholderOp;

7732946

[Sparse] added CSRTensor and a placeholder for sparse tensors;

8b660e7

trying to add buffers to be binded with sparse placeholders;

c98dc77

avoid modifying original NDArray;

033e446

enable sparse buffer;

4952e63

bug fix and unpack sparse tensor;

12ea0bb

first successful cs_scatter;

52f8e48

bug fix;

13a40f5

implemented topi.sparse.dense;

0e6bb1d

bug fix;

9520196

first successful csrmv implement;

f948622

test sparse tensor;

2b3a34a

enable dynamic memory allocation for sparse tensor placeholder;

f6e5073

yzhliu added the status: need update need update based on feedbacks label Aug 13, 2018

liangfu added 2 commits August 16, 2018 19:37

update according to the comments;

915d3bd

Merge remote-tracking branch 'upstream/master' into sparse

c9a6e30

Conflicts: topi/python/topi/__init__.py

Update sparse.py

743558b

tqchen assigned tqchen and yzhliu Aug 21, 2018

tqchen requested changes Aug 21, 2018

View reviewed changes

liangfu added 3 commits August 21, 2018 10:35

Merge remote-tracking branch 'upstream/master' into sparse

a3ea83b

remove register_node declaration and path assignment in testing code;

98207fb

satisfy the linter;

6becfb9

yzhliu reviewed Aug 23, 2018

View reviewed changes

liangfu added 2 commits August 24, 2018 18:47

update regarding to the comments;

727b32b

Merge branch 'sparse' of github.com:liangfu/tvm into sparse

20ead96

eric-haibin-lin approved these changes Aug 27, 2018

View reviewed changes

yzhliu approved these changes Aug 27, 2018

View reviewed changes

tqchen approved these changes Sep 6, 2018

View reviewed changes

tqchen merged commit d87c94d into apache:master Sep 6, 2018

FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018

[Sparse] add sparse tensor computation support (apache#1289)

3a6f0df

liangfu mentioned this pull request Jan 18, 2019

TVM 0.5 Release Note #2448

Closed

liangfu deleted the sparse branch January 18, 2019 05:39

learning-chip mentioned this pull request Jun 9, 2021

Related topic: NumPy array protocols data-apis/array-api#1

Closed

This was referenced Jul 5, 2021

[BUG] topi.sparse.csrmv only accepts float32, but not other data types #8406

Closed

[BUG] tvm.contrib.sparse.placeholder cannot be used as argument for tvm.build() #8413

Closed

[Sparse] add sparse tensor computation support #1289

[Sparse] add sparse tensor computation support #1289

Conversation

liangfu commented Jun 15, 2018 • edited Loading

tqchen commented Jun 15, 2018

liangfu commented Jun 26, 2018

tqchen commented Jul 19, 2018

tqchen commented Jul 19, 2018 • edited Loading

eric-haibin-lin commented Jul 19, 2018

liangfu commented Jul 19, 2018 • edited Loading

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liangfu Jul 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liangfu commented Jul 20, 2018 • edited Loading

Suggested Changes

Additional Changes

Unchanged Items

liangfu commented Aug 1, 2018 • edited Loading

yzhliu commented Aug 6, 2018

yzhliu commented Aug 13, 2018

liangfu commented Aug 16, 2018

liangfu commented Aug 16, 2018 • edited Loading

List of Changes

Choose a reason for hiding this comment

tqchen commented Aug 21, 2018

yzhliu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yzhliu left a comment

Choose a reason for hiding this comment

liangfu commented Sep 6, 2018

tqchen commented Sep 6, 2018

ajtulloch commented Oct 15, 2018

liangfu commented Jun 15, 2018 •

edited

Loading

tqchen commented Jul 19, 2018 •

edited

Loading

liangfu commented Jul 19, 2018 •

edited

Loading

liangfu Jul 20, 2018 •

edited

Loading

liangfu commented Jul 20, 2018 •

edited

Loading

liangfu commented Aug 1, 2018 •

edited

Loading

liangfu commented Aug 16, 2018 •

edited

Loading