Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matmul support #4397

Merged
merged 26 commits into from May 14, 2018
Merged

Sparse matmul support #4397

merged 26 commits into from May 14, 2018

Conversation

anaruse
Copy link
Contributor

@anaruse anaruse commented Feb 23, 2018

This PR aims to support sparse matmul in Chainer (this is related to #4377 and chainer/chainer-chemistry#90).

I implemented a function named sparse_matmul which computes matrix multiplication of sparse and dense matrix. The usage of this function is as follows (assuming a and b are matrix or 3D tensor).

sp_a = F.sparse_dense2coo(a)
c = F.sparse_matmul(sp_a, b)

You can also use this function for batched sparse matrix multiplication (actually this is my main focus) like matmul. It supports backward and double-backward, so that you can compute gradients of sparse and dense matrix and gradients of gradients as well.

Please note that CPU version is not implemented now because I don't have good idea on efficient CPU implementation using numpy or scipy for batched sparse matrix multiplication.

@anaruse
Copy link
Contributor Author

anaruse commented Feb 26, 2018

I've updated the branch, so that you can use sparse_matmul on CPU.

@hvy
Copy link
Member

hvy commented Mar 15, 2018

jenkins, test this please.

@@ -235,6 +235,9 @@
from chainer.functions.math.prod import Prod # NOQA
from chainer.functions.math.scale import scale # NOQA
from chainer.functions.math.sign import sign # NOQA
from chainer.functions.math.sparse_matmul import sparse_coo_matrix # NOQA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class be exposed through the chainer.functions module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it will be used from users though it is not used even from the tests. I remote it from __init__.py now and will add it again when it is certainly necessary.

from chainer import utils
from chainer.utils import type_check
import numpy
import warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reorder the imports? Here's an example https://github.com/chainer/chainer/blob/master/chainer/utils/array.py ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix in that way!

try:
from scipy import sparse
except ImportError:
warnings.warn("SciPy seems not available on your system. A CPU"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you cache the failure and raise a warning when scipy.sparse is first called instead? E.g. like https://github.com/chainer/chainer/blob/master/chainer/datasets/svhn.py

" cannot use sparse_matmul on CPU.")


class sparse_coo_matrix(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this class using CamelCase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will use CamelCase for the class name.

@hvy
Copy link
Member

hvy commented Mar 15, 2018

It seems like atomicAdd is called with a double (which is not supported). Could you double check?

@hvy
Copy link
Member

hvy commented Mar 15, 2018

I added some comments but I now see we'd like to discuss the design.

We probably want to generalize it as much as possible for Chainer, @corochann do you see any concerns or have any ideas how to go from here? I'd be happy to get any updates on the discussion from chainer/chainer-chemistry#90 .

@anaruse
Copy link
Contributor Author

anaruse commented Mar 20, 2018

Regarding atomicAdd with double precision, it is available on Pascal and Volta GPUs, but not available on Maxwell and older GPUs. I forgot this. will fix code, so that atomicCAS is used when GPU is Maxwell or older one.

@anaruse
Copy link
Contributor Author

anaruse commented Mar 20, 2018

Thank you for your comments, @hvy. I've fixed the branch based on your suggestions. Please review again.

@anaruse
Copy link
Contributor Author

anaruse commented Mar 30, 2018

I've just submitted a PR to CuPy (cupy/cupy#1071) on support of double precision atomicAdd on Maxwell or older GPUs. It is highly related to CUDA hence I think it should be done in CuPy.

@hvy
Copy link
Member

hvy commented Apr 3, 2018

Thank you for the fixes and the double precision support in CuPy. I'm sorry for the delayed response and will take another look today and tomorrow.

@hvy
Copy link
Member

hvy commented Apr 4, 2018

I just discussed with @beam2d how to expose the conversion function sparse_dense2coo and the wrapper class SparseCooMatrix. Would it be possible to expose them under chainer.utils instead of where they currently are under chainer.functions, as these definitions are not functions themselves but used by them.


class SparseCooMatrix(object):

def __init__(self, data, row, col, shape, use_variable=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is use_variable=True needed? In my understanding, gradients are not propagated to the sparse matrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used when grads of sparse matrix are necessary as you thought. I've heard from a few researchers that there are applications in which grads of sparse matrix are needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent do you think we should support it (in this PR)? I talked with @beam2d and concluded that we could wait with this feature, because we need to think though the interface and maybe also include differentiable conversions (FunctionNode implementations) between dense and sparse representations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dose that mean that you would not rush to conclude on how to handle gradients of sparse matrix? I understand it. So, should I delete codes that are related to this feature like SparseMatMulGrandSP among others from this PR?

Copy link
Member

@hvy hvy Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No exactly, that's what we thought. But I am not entirely sure how gradients of A in A(sparse) * B (dense) = C should be defined. In theory, they become dense and thus A after applying an update is no longer a sparse matrix, but this probably depends on the use case. Do you have other sources that we can maybe take inspiration from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, MXNet supports sparse gradient it seems. PyTorch have a sparse module that's experimental and not that well documented. I'm not sure about TensorFlow but there are discussions with varying conclusions. TVM mentions nothing about sparse representations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding how gradients of A should be defined, I agree with you that it depends on the use case. The gradients of A could become dense in some use cases but may remain sparse in other use cases. BTW, more important ask is whether the sparsity of gradients A remain the same as original A or not.

Anyway, IMO, you don't need to consider the use cases in which the sparsity of original A and gradients A are to be different here in sparse_matmul. Why? It is simply because dense matmul should used for that use cases. If gradients A is dense or denser than original A, A will be soon dense. That means that there is little beneficial to apply sparse_matmul to this sort of use cases.

What would you think on this?

self.data = chainer.Variable(self.data)
self.row = row
self.col = col
self.shape = shape # (row, col)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we afford any data validation here, e.g. since to_dense depends on ndim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I will add some validation code here for shape among others.

self.col = col
self.shape = shape # (row, col)

def to_dense(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding tests for this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I will add tests for to_dense.

@anaruse
Copy link
Contributor Author

anaruse commented May 9, 2018

I've just fixed the issues. Could you run tests again?

It looks CI failed at where this PR is not related...

@hvy
Copy link
Member

hvy commented May 9, 2018

Jenkins, test this please.

@hvy
Copy link
Member

hvy commented May 9, 2018

Thanks, I started the build just now!

a single matrix. If three, it is treated as batched matrices.
ldnz (int): Size of arrays for data, row index and column index to be
created. The Actual size becomes max(nnz, ldnz) where nnz is number
of non-zero elmeents in a input dense matrix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, a small typo at elemeents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small question, is "ldnz" common terminology, would you mind explaining what is stands for?

Copy link
Contributor Author

@anaruse anaruse May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix the typo.

Re: "ldnz", it stands for "Leading Dimension of array for Non-Zero elements" and shows the size of leading axis of array which holds matrix entries, row indexes or column indexes. I follow naming convention of matrix libraries like blas, lapck, etc. in which, for example, scalar variable ldA indicates the size of leading axis of array A.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you for your explanation! I see it is used for indexing in general. It seems though that it is not tested. Should we add tests since it's a public interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, the option ldnz is not explicitly tested now, though it is implicitly tested when batched sparse matrix is created with 'to_coo'. Anyway, I will add some tests for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much! I found a bug in to_dense of CooMatrix thanks to the tests for ldnz.

}
int i_k = A_col[i_A];
if (i_k < 0) {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we encounter these negative indices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may see the negative indexes when it is batched sparse matrices and number of non-zero elements in each sparse matrix differ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, they're initialized with -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :)

@hvy
Copy link
Member

hvy commented May 11, 2018

jenkins, test this please.

@hvy
Copy link
Member

hvy commented May 14, 2018

jenkins, test this please.

@hvy
Copy link
Member

hvy commented May 14, 2018

@anaruse Not sure if this one https://github.com/chainer/chainer/pull/4397/files/5eeff1763f274deefcccbe43966d4ce2c48e7f52#r187516691 got away unnoticed. Would you mind changing the names as you previously suggested?

@anaruse
Copy link
Contributor Author

anaruse commented May 14, 2018

I've just renamed the names of some functions and classes as we discussed. Sorry to bother you again, but could you check again?

transb (bool): If ``True``, each matrix in ``b`` will be transposed.

Returns:
_chainer.Variable: Result of batched mat-mul.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ~chainer.Variable: ... ?

Copy link
Contributor Author

@anaruse anaruse May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, I think yes. So, should CooMatrix be ~chainer.utils.CooMatrix as well?

I mean explanation of type/class for arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They actually should (links were broken). Thanks for that catch, could you fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will.

@anaruse
Copy link
Contributor Author

anaruse commented May 14, 2018

I'm sorry to bother you over and over, but could you check again?

@hvy
Copy link
Member

hvy commented May 14, 2018

No worries, I am rather sorry for the exact opposite. I'll rerun the CI to check your fixes!

@hvy
Copy link
Member

hvy commented May 14, 2018

jenkins, test this please.

1 similar comment
@hvy
Copy link
Member

hvy commented May 14, 2018

jenkins, test this please.

@hvy
Copy link
Member

hvy commented May 14, 2018

The CI passes and it looks good except a minor detail that the CooMatrix link for the argument seems broken. Maybe you have to write using (:class: ... or ... :class: ...) ?

@anaruse
Copy link
Contributor Author

anaruse commented May 14, 2018

I dig into the link issue above a little bit, and seems we need to add/edit .rst file for CooMatrix, etc. to solve the issue. I will push the update after some local verification.

I also noticed that the names of cupy kernel are still sparse_matmul*. I will fix that as well.

@anaruse
Copy link
Contributor Author

anaruse commented May 14, 2018

Could you check again? Probably, the update above will fix the link issue.

@hvy
Copy link
Member

hvy commented May 14, 2018

Ah I see, I'll try rebuilding the docs.

@hvy
Copy link
Member

hvy commented May 14, 2018

jenkins, test this please.

@hvy
Copy link
Member

hvy commented May 14, 2018

LGTM! @anaruse Sorry it took so long to get it merged but thank you very much for this PR and for all the hard work.

@hvy hvy merged commit b5910c7 into chainer:master May 14, 2018
@hvy hvy added this to the v5.0.0b1 milestone May 14, 2018
@hvy hvy added the cat:feature Implementation that introduces new interfaces. label May 14, 2018
@kmaehashi kmaehashi mentioned this pull request May 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature Implementation that introduces new interfaces.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants