should chainer matmul have an equal behavior as numpy matmul? #1963

boeddeker · 2016-12-04T17:56:24Z

Currently chainer has matmul and batch_matmul, but both have a different behavior as numpy:

Here some shapes:

matmul: (2, 3),	(3, 4)	-> (2, 4)
matmul: (2,),	(1, 4)	-> (2, 4)  # different from numpy
matmul: (2, 3),	(3,)	-> (2, 1)  # different from numpy
np.matmul: (2, 3), (3, 4) -> (2, 4)
np.matmul: (2,),   (2, 4) -> (4,)
np.matmul: (2, 3), (3,)	  -> (2,)

batch_matmul: (5, 2, 3), (5, 3, 4) -> (5, 2, 4)
batch_matmul: (5, 2, 3), (5, 3)	   -> (5, 2, 1)  # different from numpy
batch_matmul: (5, 3),	 (5, 1, 3) -> (5, 3, 3)  # different from numpy
np.matmul: (5, 2, 3),	(5, 3, 4) -> (5, 2, 4)
np.matmul: (5, 2, 3),	(3, 4)	  -> (5, 2, 4)
np.matmul: (5, 3),	(5, 3, 4) -> (5, 5, 4)

In my opinion, the behavior of chainer matmul should be equal to numpy matmul or raise an Exception.

In #1901 @okuta has started to rewrite matmul's internal code and I asked to also rewrite the interface.
Therefore I want to start a discussion, if the interface should be changed.

A possible starting point for a new matmul is the following code:

def xp_hermitian(x):
    xp = cuda.get_array_module(x)
    return xp.swapaxes(x.conj(), -1, -2)

class MatMul(function.Function):
    def check_type_forward(self, in_types):
        type_check.expect(in_types.size() == 2)
        if not in_types[0].value.dtype in (
                numpy.float32, numpy.complex64, numpy.complex128):
            raise TypeError(in_types[0].value.dtype)
        assert in_types[1].value.dtype in (
            numpy.float32, numpy.complex64, numpy.complex128)
        type_check.expect(
            in_types[0].ndim >= 2,
            in_types[1].ndim >= 2,
            in_types[0].shape[:-2] == in_types[1].shape[:-2],
            in_types[0].shape[-1] == in_types[1].shape[-2],
        )
    def forward(self, x):
        xp = cuda.get_array_module(*x)
        result = utils.force_array(x[0] @ x[1])
        return result,

    def backward(self, x, gy):
        grad_x_0_star = utils.force_array(gy[0] @ xp_hermitian(x[1]))
        grad_x_1_star = utils.force_array(xp_hermitian(x[0]) @ gy[0])
        return grad_x_0_star, grad_x_1_star

The text was updated successfully, but these errors were encountered:

beam2d · 2016-12-06T00:30:28Z

I agree that matmul should have the same interface as that of numpy. We want to keep the interface during v1, so it is good to make this change for v2. It would be better to leave the current matmul with a different name to make the migration to v2 easy.

The above implementation looks good as a starting point, except the use of @ operator that is not supported by Python<3.5.

muupan · 2017-07-14T03:59:36Z

If I understand it correctlly, the behavior of the new matmul (#2426) is as below:

new matmul: (2, 3), (3, 4) -> (2, 4)
new matmul: (2,),   (1, 4) -> error  # different from numpy
new matmul: (2, 3), (3,)   -> error  # different from numpy
new matmul: (5, 2, 3), (5, 3, 4) -> (5, 2, 4)
new matmul: (5, 2, 3), (5, 3)    -> error  # different from numpy
new matmul: (5, 3),    (5, 1, 3) -> error  # different from numpy

So, it reduces functionality and starts to raise more errors than before?

boeddeker · 2017-07-14T04:19:52Z

I would call it consistent with numpy. It droppes some operations and adds others.
Here all your examples executed with numpy:

>>> def test(shape1, shape2):
...     return (np.zeros(shape1) @ np.zeros(shape2)).shape
>>> test([2, 3], [3, 4])
(2, 4)
>>> test([2], [1, 4])
Traceback (most recent call last):
...
ValueError: shapes (2,) and (1,4) not aligned: 2 (dim 0) != 1 (dim 0)
>>> test([2, 3], [3])  # this does not work in the new version 
(2,)
>>> test([5, 2, 3], [5, 3])
Traceback (most recent call last):
...
ValueError: shapes (5,2,3) and (5,3) not aligned: 3 (dim 2) != 5 (dim 0)
>>> test([5, 3], [5, 1, 3])
Traceback (most recent call last):
...
ValueError: shapes (5,3) and (5,1,3) not aligned: 3 (dim 1) != 1 (dim 1)

muupan · 2017-07-14T04:28:40Z

Oh, sorry. I didn't notice input shapes are different between numpy.matmul and chainer functions in your original post. You're right. So my examples should be

new matmul: (2, 3), (3, 4) -> (2, 4)
new matmul: (2,),   (2, 4) -> error  # different from numpy
new matmul: (2, 3), (3,)   -> error  # different from numpy
new matmul: (5, 2, 3), (5, 3, 4) -> (5, 2, 4)
new matmul: (5, 2, 3), (3, 4)    -> error  # different from numpy
new matmul: (5, 3),    (5, 3, 4) -> error  # different from numpy

So, it's more consistent with numpy.matmul because it reduces functionality that is inconsistent and adds some operations that are consistent with numpy.matmul e.g. (2,) @ (2,). That makes sense.

stale · 2017-10-23T08:06:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs. Thank you for your contributions.

stale · 2017-11-22T08:13:04Z

This issue is closed as announced. Feel free to re-open it if needed.

fukatani mentioned this issue Mar 18, 2017

Numpy like matmul #2426

Merged

stale bot added the stale Not updated for a longer period of time. label Oct 23, 2017

stale bot closed this as completed Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should chainer matmul have an equal behavior as numpy matmul? #1963

should chainer matmul have an equal behavior as numpy matmul? #1963

boeddeker commented Dec 4, 2016

beam2d commented Dec 6, 2016

muupan commented Jul 14, 2017

boeddeker commented Jul 14, 2017

muupan commented Jul 14, 2017 •

edited

stale bot commented Oct 23, 2017

stale bot commented Nov 22, 2017

should chainer matmul have an equal behavior as numpy matmul? #1963

should chainer matmul have an equal behavior as numpy matmul? #1963

Comments

boeddeker commented Dec 4, 2016

beam2d commented Dec 6, 2016

muupan commented Jul 14, 2017

boeddeker commented Jul 14, 2017

muupan commented Jul 14, 2017 • edited

stale bot commented Oct 23, 2017

stale bot commented Nov 22, 2017

muupan commented Jul 14, 2017 •

edited