New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse matmul support #4397
Sparse matmul support #4397
Conversation
I've updated the branch, so that you can use |
jenkins, test this please. |
chainer/functions/__init__.py
Outdated
@@ -235,6 +235,9 @@ | |||
from chainer.functions.math.prod import Prod # NOQA | |||
from chainer.functions.math.scale import scale # NOQA | |||
from chainer.functions.math.sign import sign # NOQA | |||
from chainer.functions.math.sparse_matmul import sparse_coo_matrix # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this class be exposed through the chainer.functions
module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it will be used from users though it is not used even from the tests. I remote it from __init__.py
now and will add it again when it is certainly necessary.
from chainer import utils | ||
from chainer.utils import type_check | ||
import numpy | ||
import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you reorder the imports? Here's an example https://github.com/chainer/chainer/blob/master/chainer/utils/array.py ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix in that way!
try: | ||
from scipy import sparse | ||
except ImportError: | ||
warnings.warn("SciPy seems not available on your system. A CPU" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you cache the failure and raise a warning when scipy.sparse
is first called instead? E.g. like https://github.com/chainer/chainer/blob/master/chainer/datasets/svhn.py
" cannot use sparse_matmul on CPU.") | ||
|
||
|
||
class sparse_coo_matrix(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rename this class using CamelCase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will use CamelCase for the class name.
It seems like |
I added some comments but I now see we'd like to discuss the design. We probably want to generalize it as much as possible for Chainer, @corochann do you see any concerns or have any ideas how to go from here? I'd be happy to get any updates on the discussion from chainer/chainer-chemistry#90 . |
Regarding |
Thank you for your comments, @hvy. I've fixed the branch based on your suggestions. Please review again. |
I've just submitted a PR to CuPy (cupy/cupy#1071) on support of double precision |
Thank you for the fixes and the double precision support in CuPy. I'm sorry for the delayed response and will take another look today and tomorrow. |
I just discussed with @beam2d how to expose the conversion function |
|
||
class SparseCooMatrix(object): | ||
|
||
def __init__(self, data, row, col, shape, use_variable=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is use_variable=True
needed? In my understanding, gradients are not propagated to the sparse matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used when grads of sparse matrix are necessary as you thought. I've heard from a few researchers that there are applications in which grads of sparse matrix are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what extent do you think we should support it (in this PR)? I talked with @beam2d and concluded that we could wait with this feature, because we need to think though the interface and maybe also include differentiable conversions (FunctionNode
implementations) between dense and sparse representations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dose that mean that you would not rush to conclude on how to handle gradients of sparse matrix? I understand it. So, should I delete codes that are related to this feature like SparseMatMulGrandSP
among others from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No exactly, that's what we thought. But I am not entirely sure how gradients of A
in A(sparse) * B (dense) = C
should be defined. In theory, they become dense and thus A
after applying an update is no longer a sparse matrix, but this probably depends on the use case. Do you have other sources that we can maybe take inspiration from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, MXNet supports sparse gradient it seems. PyTorch have a sparse module that's experimental and not that well documented. I'm not sure about TensorFlow but there are discussions with varying conclusions. TVM mentions nothing about sparse representations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding how gradients of A
should be defined, I agree with you that it depends on the use case. The gradients of A
could become dense in some use cases but may remain sparse in other use cases. BTW, more important ask is whether the sparsity of gradients A remain the same as original A or not.
Anyway, IMO, you don't need to consider the use cases in which the sparsity of original A and gradients A are to be different here in sparse_matmul
. Why? It is simply because dense matmul
should used for that use cases. If gradients A is dense or denser than original A, A will be soon dense. That means that there is little beneficial to apply sparse_matmul
to this sort of use cases.
What would you think on this?
self.data = chainer.Variable(self.data) | ||
self.row = row | ||
self.col = col | ||
self.shape = shape # (row, col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we afford any data validation here, e.g. since to_dense
depends on ndim
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I will add some validation code here for shape
among others.
self.col = col | ||
self.shape = shape # (row, col) | ||
|
||
def to_dense(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding tests for this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I will add tests for to_dense
.
I've just fixed the issues. Could you run tests again? It looks CI failed at where this PR is not related... |
Jenkins, test this please. |
Thanks, I started the build just now! |
chainer/utils/sparse.py
Outdated
a single matrix. If three, it is treated as batched matrices. | ||
ldnz (int): Size of arrays for data, row index and column index to be | ||
created. The Actual size becomes max(nnz, ldnz) where nnz is number | ||
of non-zero elmeents in a input dense matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, a small typo at elemeents
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small question, is "ldnz" common terminology, would you mind explaining what is stands for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will fix the typo.
Re: "ldnz", it stands for "Leading Dimension of array for Non-Zero elements" and shows the size of leading axis of array which holds matrix entries, row indexes or column indexes. I follow naming convention of matrix libraries like blas, lapck, etc. in which, for example, scalar variable ldA
indicates the size of leading axis of array A
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you for your explanation! I see it is used for indexing in general. It seems though that it is not tested. Should we add tests since it's a public interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, the option ldnz
is not explicitly tested now, though it is implicitly tested when batched sparse matrix is created with 'to_coo'. Anyway, I will add some tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much! I found a bug in to_dense
of CooMatrix
thanks to the tests for ldnz
.
} | ||
int i_k = A_col[i_A]; | ||
if (i_k < 0) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we encounter these negative indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may see the negative indexes when it is batched sparse matrices and number of non-zero elements in each sparse matrix differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, they're initialized with -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :)
jenkins, test this please. |
jenkins, test this please. |
@anaruse Not sure if this one https://github.com/chainer/chainer/pull/4397/files/5eeff1763f274deefcccbe43966d4ce2c48e7f52#r187516691 got away unnoticed. Would you mind changing the names as you previously suggested? |
I've just renamed the names of some functions and classes as we discussed. Sorry to bother you again, but could you check again? |
transb (bool): If ``True``, each matrix in ``b`` will be transposed. | ||
|
||
Returns: | ||
_chainer.Variable: Result of batched mat-mul. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be ~chainer.Variable: ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, I think yes. So, should CooMatrix
be ~chainer.utils.CooMatrix
as well?
I mean explanation of type/class for arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They actually should (links were broken). Thanks for that catch, could you fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will.
I'm sorry to bother you over and over, but could you check again? |
No worries, I am rather sorry for the exact opposite. I'll rerun the CI to check your fixes! |
jenkins, test this please. |
1 similar comment
jenkins, test this please. |
The CI passes and it looks good except a minor detail that the |
I dig into the link issue above a little bit, and seems we need to add/edit .rst file for I also noticed that the names of cupy kernel are still sparse_matmul*. I will fix that as well. |
Could you check again? Probably, the update above will fix the link issue. |
Ah I see, I'll try rebuilding the docs. |
jenkins, test this please. |
LGTM! @anaruse Sorry it took so long to get it merged but thank you very much for this PR and for all the hard work. |
This PR aims to support sparse matmul in Chainer (this is related to #4377 and chainer/chainer-chemistry#90).
I implemented a function named
sparse_matmul
which computes matrix multiplication of sparse and dense matrix. The usage of this function is as follows (assuminga
andb
are matrix or 3D tensor).You can also use this function for batched sparse matrix multiplication (actually this is my main focus) like
matmul
. It supports backward and double-backward, so that you can compute gradients of sparse and dense matrix and gradients of gradients as well.Please note that CPU version is not implemented now because I don't have good idea on efficient CPU implementation using numpy or scipy for batched sparse matrix multiplication.