New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy like matmul #2426

Merged
merged 20 commits into from Jun 20, 2017

Conversation

Projects
None yet
5 participants
@fukatani
Contributor

fukatani commented Mar 18, 2017

This PR is proposal for numpy like matmul.
As reffered as #1963, current matmul and batch_matmul Implementation do different behavior as numpy.matmul.

Except for the following two points, NumpyLikeMatmul is consistent with numpy.matmul.

  1. Disallow broadcast as other basic_math function.
  2. Arguments transa and transb exists.

fukatani added some commits Mar 18, 2017

@fukatani fukatani changed the title from Numpy like matmul to [WIP] Numpy like matmul Mar 22, 2017

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Mar 22, 2017

Contributor

I want to rethink and make this PR WIP.
Sorry for inconvinience.

Contributor

fukatani commented Mar 22, 2017

I want to rethink and make this PR WIP.
Sorry for inconvinience.

@boeddeker

This comment has been minimized.

Show comment
Hide comment
@boeddeker

boeddeker Mar 23, 2017

Contributor

The cupy matmul function has the the code line a = ascontiguousarray(a, dtype=dtype) and the cuda functions have the options transa and transb. Maybe it is better to make these parameters accessible from outside? Then is the swapaxis no longer required.

Contributor

boeddeker commented Mar 23, 2017

The cupy matmul function has the the code line a = ascontiguousarray(a, dtype=dtype) and the cuda functions have the options transa and transb. Maybe it is better to make these parameters accessible from outside? Then is the swapaxis no longer required.

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Mar 23, 2017

Contributor

Thank you for your comment!

I think asfortranarray and ascontiguousarray don't affect calculation result.

ex.

import numpy as np

x = np.arange(6).reshape(2,3)
print(x.flags['C_CONTIGUOUS'])  # TRUE
print(x)
y = np.asfortranarray(x)
print(y.flags['C_CONTIGUOUS'])  # FALSE
print(y)

z = np.arange(6).reshape(3,2)
assert (np.matmul(x, z) == np.matmul(y, z)).all()  # PASS this assertion
Contributor

fukatani commented Mar 23, 2017

Thank you for your comment!

I think asfortranarray and ascontiguousarray don't affect calculation result.

ex.

import numpy as np

x = np.arange(6).reshape(2,3)
print(x.flags['C_CONTIGUOUS'])  # TRUE
print(x)
y = np.asfortranarray(x)
print(y.flags['C_CONTIGUOUS'])  # FALSE
print(y)

z = np.arange(6).reshape(3,2)
assert (np.matmul(x, z) == np.matmul(y, z)).all()  # PASS this assertion
@boeddeker

This comment has been minimized.

Show comment
Hide comment
@boeddeker

boeddeker Mar 23, 2017

Contributor

Yes that is true. What I meant was, that the input to the matmul function may be ccontiguous and the user use the transa or transb option and has the idea, that the code is optimised to do this calculation.
But what happens is, that the variable matmul do the transpose and the cupy matmul use ascontiguous and generate a new array. But in this example it is not necessary to create a new array.

So the idea for my previous comment was:
The call to ascontiguousarray may generate a copy and when variable matmul do the transpose the probability that the cupy function generates a copy is higher.

Contributor

boeddeker commented Mar 23, 2017

Yes that is true. What I meant was, that the input to the matmul function may be ccontiguous and the user use the transa or transb option and has the idea, that the code is optimised to do this calculation.
But what happens is, that the variable matmul do the transpose and the cupy matmul use ascontiguous and generate a new array. But in this example it is not necessary to create a new array.

So the idea for my previous comment was:
The call to ascontiguousarray may generate a copy and when variable matmul do the transpose the probability that the cupy function generates a copy is higher.

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Mar 24, 2017

Contributor

OK.
So you mean call ascontigiousarray at cupy.matmul and swapaxes at chainer.matmul are redundant.
That's a good point.

I think that we should measure the impact on performance before starting optimization.
And merge will be smoother if optimization is done by different PR.

Contributor

fukatani commented Mar 24, 2017

OK.
So you mean call ascontigiousarray at cupy.matmul and swapaxes at chainer.matmul are redundant.
That's a good point.

I think that we should measure the impact on performance before starting optimization.
And merge will be smoother if optimization is done by different PR.

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Mar 24, 2017

Contributor

Of course, if you are willing to send PR, I'll close this PR. It was originally your issue.

Contributor

fukatani commented Mar 24, 2017

Of course, if you are willing to send PR, I'll close this PR. It was originally your issue.

@boeddeker

This comment has been minimized.

Show comment
Hide comment
@boeddeker

boeddeker Mar 24, 2017

Contributor

Yes my thought go in that direction. The call to swapaxis will result in my mind in most cases, that ascontignous do a copy operation. And when the cuda argument is used, it can handel this transpose inside of the dot operation.

But you are right, this could be done in a different PR.

Contributor

boeddeker commented Mar 24, 2017

Yes my thought go in that direction. The call to swapaxis will result in my mind in most cases, that ascontignous do a copy operation. And when the cuda argument is used, it can handel this transpose inside of the dot operation.

But you are right, this could be done in a different PR.

@boeddeker

This comment has been minimized.

Show comment
Hide comment
@boeddeker

boeddeker Mar 24, 2017

Contributor

Currently, I have not the time to write a PR, that is also the reason, why I have not written a PR like this one.

Contributor

boeddeker commented Mar 24, 2017

Currently, I have not the time to write a PR, that is also the reason, why I have not written a PR like this one.

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Mar 25, 2017

Contributor

I investigated each matmul behavior.
Here are the result calculated when a.ndim == b.ndim.
N/A means matmul raises InvalidType.

method 1 dim vs 1 dim 2 dim vs 2 dim 3 dim vs 3 dim 4 or more
NumpyLikeMatmul as numpy.matmul as numpy.matmul as numpy.matmul as numpy.matmul
Matmul as numpy.matmul as numpy.matmul N/A N/A
BatchedMatmul N/A original behavior (using for loop) as numpy.matmul (using for loop) N/A

Thanks to @boeddeker, we have already known each matmul broadcasting behavior.

method 1 dim vs 2 dim 2 dim vs 3 dim
NumpyLikeMatmul N/A N/A
Matmul original behavior N/A
BatchedMatmul N/A original behavior

Here is behavior when transa == True.

method 1 dim(vector) 2 dim 3 dim 4 or more
NumpyLikeMatmul a (do nothing) a.T a.swapaxes(-1, -2) a.swapaxes(-1, -2)
Matmul asmat(a).T a.T N/A N/A
BatchedMatmul N/A a.reshape(a.shape[:2] + (-1,)).swapaxes(-1, -2) a.swapaxes(-1, -2) N/A

In my opinion, Calculations desired by most users can be covered only with NumyLikeMatmul, so it is better to replace in v2.

Although some calculations such as those involving broadcasts can not be done alone on NumpyLikeMatmul, they can be reproduced by doing explicit array manipulation.
Not to broadcast implicitly is chainer-like-API.

Contributor

fukatani commented Mar 25, 2017

I investigated each matmul behavior.
Here are the result calculated when a.ndim == b.ndim.
N/A means matmul raises InvalidType.

method 1 dim vs 1 dim 2 dim vs 2 dim 3 dim vs 3 dim 4 or more
NumpyLikeMatmul as numpy.matmul as numpy.matmul as numpy.matmul as numpy.matmul
Matmul as numpy.matmul as numpy.matmul N/A N/A
BatchedMatmul N/A original behavior (using for loop) as numpy.matmul (using for loop) N/A

Thanks to @boeddeker, we have already known each matmul broadcasting behavior.

method 1 dim vs 2 dim 2 dim vs 3 dim
NumpyLikeMatmul N/A N/A
Matmul original behavior N/A
BatchedMatmul N/A original behavior

Here is behavior when transa == True.

method 1 dim(vector) 2 dim 3 dim 4 or more
NumpyLikeMatmul a (do nothing) a.T a.swapaxes(-1, -2) a.swapaxes(-1, -2)
Matmul asmat(a).T a.T N/A N/A
BatchedMatmul N/A a.reshape(a.shape[:2] + (-1,)).swapaxes(-1, -2) a.swapaxes(-1, -2) N/A

In my opinion, Calculations desired by most users can be covered only with NumyLikeMatmul, so it is better to replace in v2.

Although some calculations such as those involving broadcasts can not be done alone on NumpyLikeMatmul, they can be reproduced by doing explicit array manipulation.
Not to broadcast implicitly is chainer-like-API.

@fukatani fukatani changed the title from [WIP] Numpy like matmul to Numpy like matmul Mar 25, 2017

fukatani added some commits Apr 9, 2017

@niboshi niboshi self-assigned this May 8, 2017

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani May 12, 2017

Contributor

Thanks! I fixed test.

Contributor

fukatani commented May 12, 2017

Thanks! I fixed test.

@niboshi niboshi self-assigned this Jun 2, 2017

@niboshi

This comment has been minimized.

Show comment
Hide comment
@niboshi

niboshi Jun 8, 2017

Member

I'm sorry for being late.
As v2 has already been released, and we have revised the release policy so that we can make more frequent compatibility-breaking releases, it's more straight-forward to include this PR directly into the next alpha release. (Please see Contribution Guide and Compatibility Policy for details)

So could you rename function/links in this PR to override the existing ones?

Member

niboshi commented Jun 8, 2017

I'm sorry for being late.
As v2 has already been released, and we have revised the release policy so that we can make more frequent compatibility-breaking releases, it's more straight-forward to include this PR directly into the next alpha release. (Please see Contribution Guide and Compatibility Policy for details)

So could you rename function/links in this PR to override the existing ones?

@niboshi

niboshi approved these changes Jun 8, 2017

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Jun 14, 2017

Contributor

Sorry, I found some function uses batch_matmul. I will replace them to matmul and confirm to pass all tests.

Contributor

fukatani commented Jun 14, 2017

Sorry, I found some function uses batch_matmul. I will replace them to matmul and confirm to pass all tests.

fukatani added some commits Jun 15, 2017

@fukatani fukatani changed the title from [WIP]Numpy like matmul to Numpy like matmul Jun 16, 2017

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Jun 16, 2017

Contributor

I fixed test. But part of appveyor test failed. Could you help me?

Contributor

fukatani commented Jun 16, 2017

I fixed test. But part of appveyor test failed. Could you help me?

@niboshi

This comment has been minimized.

Show comment
Hide comment
@niboshi

niboshi Jun 16, 2017

Member

Thank you for the fix! I will review again.

I fixed test. But part of appveyor test failed. Could you help me?

It's not relevant to this PR. So please don't care.
It should be fixed by #2855, but it's not merged yet.

Member

niboshi commented Jun 16, 2017

Thank you for the fix! I will review again.

I fixed test. But part of appveyor test failed. Could you help me?

It's not relevant to this PR. So please don't care.
It should be fixed by #2855, but it's not merged yet.

@niboshi

Please check my review.

@niboshi niboshi added the no-compat label Jun 16, 2017

fukatani added some commits Jun 18, 2017

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Jun 18, 2017

Contributor

Thank you for your attentive review.
How about this version?

Contributor

fukatani commented Jun 18, 2017

Thank you for your attentive review.
How about this version?

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Jun 19, 2017

Contributor

You are absolutely right.
I fixed them.

Contributor

fukatani commented Jun 19, 2017

You are absolutely right.
I fixed them.

@niboshi

This comment has been minimized.

Show comment
Hide comment
@niboshi

niboshi Jun 20, 2017

Member

LGTM! Thank you very much for a big contribution!

Member

niboshi commented Jun 20, 2017

LGTM! Thank you very much for a big contribution!

@niboshi niboshi merged commit 4a33c29 into chainer:master Jun 20, 2017

3 of 5 checks passed

continuous-integration/appveyor/branch AppVeyor build failed
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls First build on niboshi-review-2426-numpy-like-matmul at 87.572%
Details

@fukatani fukatani referenced this pull request Jun 20, 2017

Merged

Matmul cleanup #2892

@okuta okuta added this to the v3.0.0a1 milestone Jun 22, 2017

@@ -204,8 +204,8 @@
from chainer.functions.math.logarithm_1p import log1p # NOQA
from chainer.functions.math.logsumexp import logsumexp # NOQA
from chainer.functions.math.logsumexp import LogSumExp # NOQA
from chainer.functions.math.matmul import batch_matmul # NOQA
from chainer.functions.math.matmul import BatchMatMul # NOQA
from chainer.functions.math.matmul import matmul # NOQA

This comment has been minimized.

@okuta

okuta Jun 22, 2017

Member

This line and L209 are same. Is it mistake?

@okuta

okuta Jun 22, 2017

Member

This line and L209 are same. Is it mistake?

@okuta

This comment has been minimized.

Show comment
Hide comment
@okuta

okuta Jun 22, 2017

Member

Can we provide batch_matmul as deprecated function in version 3. We can use FutureWarning.

Member

okuta commented Jun 22, 2017

Can we provide batch_matmul as deprecated function in version 3. We can use FutureWarning.

@delta2323

This comment has been minimized.

Show comment
Hide comment
@delta2323

delta2323 Jun 30, 2017

Member

I agree with @okuta in that we restore batch_matmul and declare deprecation.

Member

delta2323 commented Jun 30, 2017

I agree with @okuta in that we restore batch_matmul and declare deprecation.

@fukatani

This comment has been minimized.

Show comment
Hide comment
@fukatani

fukatani Jun 30, 2017

Contributor

So what should I do?
This PR will be revirted or create new PR to reimplement batch_matmul with deprecation?

Contributor

fukatani commented Jun 30, 2017

So what should I do?
This PR will be revirted or create new PR to reimplement batch_matmul with deprecation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment