Sparse matmul support #4397

anaruse · 2018-02-23T04:10:29Z

This PR aims to support sparse matmul in Chainer (this is related to #4377 and chainer/chainer-chemistry#90).

I implemented a function named sparse_matmul which computes matrix multiplication of sparse and dense matrix. The usage of this function is as follows (assuming a and b are matrix or 3D tensor).

sp_a = F.sparse_dense2coo(a)
c = F.sparse_matmul(sp_a, b)

You can also use this function for batched sparse matrix multiplication (actually this is my main focus) like matmul. It supports backward and double-backward, so that you can compute gradients of sparse and dense matrix and gradients of gradients as well.

Please note that CPU version is not implemented now because I don't have good idea on efficient CPU implementation using numpy or scipy for batched sparse matrix multiplication.

anaruse · 2018-02-26T00:57:38Z

I've updated the branch, so that you can use sparse_matmul on CPU.

hvy · 2018-03-15T09:09:56Z

jenkins, test this please.

hvy · 2018-03-15T09:37:03Z

chainer/functions/__init__.py

@@ -235,6 +235,9 @@
 from chainer.functions.math.prod import Prod  # NOQA
 from chainer.functions.math.scale import scale  # NOQA
 from chainer.functions.math.sign import sign  # NOQA
+from chainer.functions.math.sparse_matmul import sparse_coo_matrix  # NOQA


Should this class be exposed through the chainer.functions module?

Perhaps it will be used from users though it is not used even from the tests. I remote it from __init__.py now and will add it again when it is certainly necessary.

hvy · 2018-03-15T09:37:07Z

chainer/functions/math/sparse_matmul.py

+from chainer import utils
+from chainer.utils import type_check
+import numpy
+import warnings


Could you reorder the imports? Here's an example https://github.com/chainer/chainer/blob/master/chainer/utils/array.py ?

will fix in that way!

hvy · 2018-03-15T09:37:10Z

chainer/functions/math/sparse_matmul.py

+try:
+    from scipy import sparse
+except ImportError:
+    warnings.warn("SciPy seems not available on your system. A CPU"


Could you cache the failure and raise a warning when scipy.sparse is first called instead? E.g. like https://github.com/chainer/chainer/blob/master/chainer/datasets/svhn.py

hvy · 2018-03-15T09:37:12Z

chainer/functions/math/sparse_matmul.py

+                  " cannot use sparse_matmul on CPU.")
+
+
+class sparse_coo_matrix(object):


Could you rename this class using CamelCase?

will use CamelCase for the class name.

hvy · 2018-03-15T10:08:51Z

It seems like atomicAdd is called with a double (which is not supported). Could you double check?

hvy · 2018-03-15T10:31:06Z

I added some comments but I now see we'd like to discuss the design.

We probably want to generalize it as much as possible for Chainer, @corochann do you see any concerns or have any ideas how to go from here? I'd be happy to get any updates on the discussion from chainer/chainer-chemistry#90 .

anaruse · 2018-03-20T13:52:24Z

Regarding atomicAdd with double precision, it is available on Pascal and Volta GPUs, but not available on Maxwell and older GPUs. I forgot this. will fix code, so that atomicCAS is used when GPU is Maxwell or older one.

anaruse · 2018-03-20T14:28:01Z

Thank you for your comments, @hvy. I've fixed the branch based on your suggestions. Please review again.

anaruse · 2018-03-30T09:15:03Z

I've just submitted a PR to CuPy (cupy/cupy#1071) on support of double precision atomicAdd on Maxwell or older GPUs. It is highly related to CUDA hence I think it should be done in CuPy.

hvy · 2018-04-03T04:27:04Z

Thank you for the fixes and the double precision support in CuPy. I'm sorry for the delayed response and will take another look today and tomorrow.

hvy · 2018-04-04T08:41:49Z

I just discussed with @beam2d how to expose the conversion function sparse_dense2coo and the wrapper class SparseCooMatrix. Would it be possible to expose them under chainer.utils instead of where they currently are under chainer.functions, as these definitions are not functions themselves but used by them.

hvy · 2018-04-04T08:54:08Z

chainer/functions/math/sparse_matmul.py

+
+class SparseCooMatrix(object):
+
+    def __init__(self, data, row, col, shape, use_variable=False):


When is use_variable=True needed? In my understanding, gradients are not propagated to the sparse matrix.

It is used when grads of sparse matrix are necessary as you thought. I've heard from a few researchers that there are applications in which grads of sparse matrix are needed.

To what extent do you think we should support it (in this PR)? I talked with @beam2d and concluded that we could wait with this feature, because we need to think though the interface and maybe also include differentiable conversions (FunctionNode implementations) between dense and sparse representations?

Dose that mean that you would not rush to conclude on how to handle gradients of sparse matrix? I understand it. So, should I delete codes that are related to this feature like SparseMatMulGrandSP among others from this PR?

No exactly, that's what we thought. But I am not entirely sure how gradients of A in A(sparse) * B (dense) = C should be defined. In theory, they become dense and thus A after applying an update is no longer a sparse matrix, but this probably depends on the use case. Do you have other sources that we can maybe take inspiration from?

For the record, MXNet supports sparse gradient it seems. PyTorch have a sparse module that's experimental and not that well documented. I'm not sure about TensorFlow but there are discussions with varying conclusions. TVM mentions nothing about sparse representations.

Regarding how gradients of A should be defined, I agree with you that it depends on the use case. The gradients of A could become dense in some use cases but may remain sparse in other use cases. BTW, more important ask is whether the sparsity of gradients A remain the same as original A or not.

Anyway, IMO, you don't need to consider the use cases in which the sparsity of original A and gradients A are to be different here in sparse_matmul. Why? It is simply because dense matmul should used for that use cases. If gradients A is dense or denser than original A, A will be soon dense. That means that there is little beneficial to apply sparse_matmul to this sort of use cases.

What would you think on this?

hvy · 2018-04-04T08:54:11Z

chainer/functions/math/sparse_matmul.py

+            self.data = chainer.Variable(self.data)
+        self.row = row
+        self.col = col
+        self.shape = shape  # (row, col)


Can we afford any data validation here, e.g. since to_dense depends on ndim.

All right, I will add some validation code here for shape among others.

hvy · 2018-04-04T08:55:55Z

chainer/functions/math/sparse_matmul.py

+        self.col = col
+        self.shape = shape  # (row, col)
+
+    def to_dense(self):


How about adding tests for this method?

All right, I will add tests for to_dense.

anaruse · 2018-05-09T03:53:01Z

I've just fixed the issues. Could you run tests again?

It looks CI failed at where this PR is not related...

hvy · 2018-05-09T04:05:58Z

Jenkins, test this please.

hvy · 2018-05-09T04:06:20Z

Thanks, I started the build just now!

hvy · 2018-05-11T04:59:27Z

chainer/utils/sparse.py

+            a single matrix. If three, it is treated as batched matrices.
+        ldnz (int): Size of arrays for data, row index and column index to be
+            created. The Actual size becomes max(nnz, ldnz) where nnz is number
+            of non-zero elmeents in a input dense matrix.


Oh, a small typo at elemeents.

Just a small question, is "ldnz" common terminology, would you mind explaining what is stands for?

Thanks, will fix the typo.

Re: "ldnz", it stands for "Leading Dimension of array for Non-Zero elements" and shows the size of leading axis of array which holds matrix entries, row indexes or column indexes. I follow naming convention of matrix libraries like blas, lapck, etc. in which, for example, scalar variable ldA indicates the size of leading axis of array A.

Thanks you for your explanation! I see it is used for indexing in general. It seems though that it is not tested. Should we add tests since it's a public interface?

Exactly, the option ldnz is not explicitly tested now, though it is implicitly tested when batched sparse matrix is created with 'to_coo'. Anyway, I will add some tests for that.

Thank you so much! I found a bug in to_dense of CooMatrix thanks to the tests for ldnz.

hvy · 2018-05-11T05:03:06Z

chainer/functions/math/sparse_matmul.py

+        }
+        int i_k = A_col[i_A];
+        if (i_k < 0) {
+            continue;


When do we encounter these negative indices?

You may see the negative indexes when it is batched sparse matrices and number of non-zero elements in each sparse matrix differ.

Ah, they're initialized with -1.

hvy · 2018-05-11T05:10:43Z

jenkins, test this please.

hvy · 2018-05-14T01:26:09Z

jenkins, test this please.

hvy · 2018-05-14T01:31:14Z

@anaruse Not sure if this one https://github.com/chainer/chainer/pull/4397/files/5eeff1763f274deefcccbe43966d4ce2c48e7f52#r187516691 got away unnoticed. Would you mind changing the names as you previously suggested?

anaruse · 2018-05-14T02:06:12Z

I've just renamed the names of some functions and classes as we discussed. Sorry to bother you again, but could you check again?

hvy · 2018-05-14T02:20:53Z

chainer/functions/math/sparse_matmul.py

+        transb (bool): If ``True``, each matrix in ``b`` will be transposed.
+
+    Returns:
+        _chainer.Variable: Result of batched mat-mul.


Should this be ~chainer.Variable: ... ?

Strictly speaking, I think yes. So, should CooMatrix be ~chainer.utils.CooMatrix as well?

I mean explanation of type/class for arguments.

They actually should (links were broken). Thanks for that catch, could you fix it?

Yes, I will.

…, etc.

anaruse · 2018-05-14T03:05:00Z

I'm sorry to bother you over and over, but could you check again?

hvy · 2018-05-14T03:07:27Z

No worries, I am rather sorry for the exact opposite. I'll rerun the CI to check your fixes!

hvy · 2018-05-14T03:07:36Z

jenkins, test this please.

hvy · 2018-05-14T05:02:31Z

jenkins, test this please.

hvy · 2018-05-14T06:12:55Z

The CI passes and it looks good except a minor detail that the CooMatrix link for the argument seems broken. Maybe you have to write using (:class: ... or ... :class: ...) ?

anaruse · 2018-05-14T06:44:21Z

I dig into the link issue above a little bit, and seems we need to add/edit .rst file for CooMatrix, etc. to solve the issue. I will push the update after some local verification.

I also noticed that the names of cupy kernel are still sparse_matmul*. I will fix that as well.

anaruse · 2018-05-14T06:56:39Z

Could you check again? Probably, the update above will fix the link issue.

hvy · 2018-05-14T07:04:54Z

Ah I see, I'll try rebuilding the docs.

hvy · 2018-05-14T07:05:04Z

jenkins, test this please.

hvy · 2018-05-14T09:22:50Z

LGTM! @anaruse Sorry it took so long to get it merged but thank you very much for this PR and for all the hard work.

anaruse added 5 commits February 22, 2018 22:48

Support Sparse MatMul

59b3841

flake8 for sparse matmul

4001c21

Add tests for sparse matmul

4c219dd

Add a few tests to sparse matmul

fb14231

Fix to satisfy autopep8

cf93da7

anaruse mentioned this pull request Feb 23, 2018

Sparse matmul chainer/chainer-chemistry#90

Merged

anaruse added 2 commits February 26, 2018 09:50

Add CPU version of sparse_matmul

9808e69

Minor fix

5a7f33e

anaruse added 3 commits February 26, 2018 10:20

Update tests for sparse_matmul

5076573

Add scipy related workaround

e035d07

Code refactoring

dc45245

beam2d assigned hvy Mar 12, 2018

hvy reviewed Mar 15, 2018

View reviewed changes

Use atomicCAS when Maxwell and double, etc.

9c2d96d

anaruse mentioned this pull request Mar 30, 2018

Support double precision atomicAdd on Maxwell or older GPUs cupy/cupy#1071

Merged

hvy reviewed Apr 4, 2018

View reviewed changes

anaruse added 2 commits May 9, 2018 12:04

Use dot/einsum instead of matmul when numpy < 1.10.0 (2)

c118776

Update functions.rst

5eeff17

hvy reviewed May 11, 2018

View reviewed changes

Fix a typo

a6d51e0

Change function/class names for future extensibility

8556fc5

hvy reviewed May 14, 2018

View reviewed changes

Fix bug in to_dense of CooMatrix, Add tests for option ldnz of to_coo…

ee6cefe

…, etc.

Add/Edit .rst files for sparse utils

b530a7a

hvy merged commit b5910c7 into chainer:master May 14, 2018

hvy added this to the v5.0.0b1 milestone May 14, 2018

hvy added the cat:feature Implementation that introduces new interfaces. label May 14, 2018

kmaehashi mentioned this pull request May 21, 2018

Sparse matmul #4377

Closed

		" cannot use sparse_matmul on CPU.")


		class sparse_coo_matrix(object):


		class SparseCooMatrix(object):

		def __init__(self, data, row, col, shape, use_variable=False):

Sparse matmul support #4397

Sparse matmul support #4397

Conversation

anaruse commented Feb 23, 2018

anaruse commented Feb 26, 2018

hvy commented Mar 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hvy commented Mar 15, 2018

hvy commented Mar 15, 2018

anaruse commented Mar 20, 2018

anaruse commented Mar 20, 2018

anaruse commented Mar 30, 2018 • edited

hvy commented Apr 3, 2018

hvy commented Apr 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hvy Apr 5, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anaruse commented May 9, 2018 • edited

hvy commented May 9, 2018

hvy commented May 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anaruse May 11, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hvy commented May 11, 2018

hvy commented May 14, 2018

hvy commented May 14, 2018

anaruse commented May 14, 2018

Choose a reason for hiding this comment

anaruse May 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anaruse commented May 14, 2018

hvy commented May 14, 2018

hvy commented May 14, 2018

hvy commented May 14, 2018

hvy commented May 14, 2018

anaruse commented May 14, 2018 • edited

anaruse commented May 14, 2018

hvy commented May 14, 2018

hvy commented May 14, 2018

hvy commented May 14, 2018

anaruse commented Mar 30, 2018 •

edited

hvy Apr 5, 2018 •

edited

anaruse commented May 9, 2018 •

edited

anaruse May 11, 2018 •

edited

anaruse May 14, 2018 •

edited

anaruse commented May 14, 2018 •

edited