RuntimeError when calling jt.matmul in cuda #409

Purewhite2019 · 2022-10-31T15:22:26Z

Describe the bug

RuntimeError occurs when calling jt.matmul in WSL on an NVIDIA GeForce MX450 laptop.

Full Log

[i 1031 23:09:50.689249 80 cuda_flags.cc:32] CUDA enabled.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-b8f9d3eceb14> in <module>
----> 1 jt.flags.use_cuda = 1; jt.matmul(jt.rand((1, 1, 100, 100)), jt.rand(((1, 16, 100, 1)))).shape

~/anaconda3/envs/gm-jittor/lib/python3.7/site-packages/jittor/nn.py in matmul(a, b)
    122             # a: [..., n, m], b: [..., m, k], c:[..., n, k]
    123             if jt.flags.use_cuda and jt.compile_extern.cublas_ops:
--> 124                 return jt.compile_extern.cublas_ops.cublas_batched_matmul(a, b, 0, 0)
    125         shape = []
    126         len_c = max(len_a, len_b)

RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.cublas_batched_matmul)).

Types of your inputs are:
 self   = module,
 args   = (Var, Var, int, int, ),

The function declarations are:
 VarHolder* cublas_batched_matmul(VarHolder* a, VarHolder* b,  bool trans_a,  bool trans_b)

Failed reason:[f 1031 23:09:50.690284 80 cublas_batched_matmul_op.cc:75] Check failed a->shape[i](1) == b->shape[i](16) Something wrong ... Could you please report this issue?

Minimal Reproduce

# On CPU, jt.matmul() works
jt.flags.use_cuda = 0; jt.matmul(jt.rand((1, 1, 100, 100)), jt.rand(((1, 16, 100, 1)))).shape
# On GPU, it doesn`t work
jt.flags.use_cuda = 1; jt.matmul(jt.rand((1, 1, 100, 100)), jt.rand(((1, 16, 100, 1)))).shape

Possible Solution

# Add repeat(1, 16, 1, 1) to explicitly specify the shape to broadcast
jt.flags.use_cuda = 1; jt.matmul(jt.rand((1, 1, 100, 100)).repeat(1, 16, 1, 1), jt.rand(((1, 16, 100, 1)))).shape

However, this solution results in large computational error.

a = torch.randn(1,1,100,100).numpy()
b = torch.randn(1,16,100,1).numpy()
c_torch = torch.matmul(torch.tensor(a), torch.tensor(b)).numpy()
c_jt = jt.matmul(jt.Var(a).repeat(1, 16, 1, 1), jt.Var(b)).numpy()
np.testing.assert_almost_equal(c_torch, c_jt) # Passed

jt.flags.use_cuda = 1
c_jt = jt.matmul(jt.Var(a).repeat(1, 16, 1, 1), jt.Var(b)).numpy()
np.testing.assert_almost_equal(c_torch, c_jt) # AssertionError

AssertionError:
Arrays are not almost equal to 7 decimals

Mismatched elements: 1350 / 1600 (84.4%)
Max absolute difference: 1.1444092e-05
Max relative difference: 3.5474255e-05
 x: array([[[[  4.539161 ],
         [  5.9756746],
         [ 13.752586 ],...
 y: array([[[[  4.53916  ],
         [  5.9756722],
         [ 13.752583 ],...

Expected behavior

jt.matmul should work correctly on CUDA.

The text was updated successfully, but these errors were encountered:

Exusial · 2022-11-19T01:59:55Z

Thank you for your advice, we will fix the shape issues in the upcoming updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError when calling jt.matmul in cuda #409

RuntimeError when calling jt.matmul in cuda #409

Purewhite2019 commented Oct 31, 2022

Exusial commented Nov 19, 2022

RuntimeError when calling jt.matmul in cuda #409

RuntimeError when calling jt.matmul in cuda #409

Comments

Purewhite2019 commented Oct 31, 2022

Describe the bug

Full Log

Minimal Reproduce

Possible Solution

Expected behavior

Exusial commented Nov 19, 2022