Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when calling jt.matmul in cuda #409

Open
Purewhite2019 opened this issue Oct 31, 2022 · 1 comment
Open

RuntimeError when calling jt.matmul in cuda #409

Purewhite2019 opened this issue Oct 31, 2022 · 1 comment

Comments

@Purewhite2019
Copy link

Describe the bug

RuntimeError occurs when calling jt.matmul in WSL on an NVIDIA GeForce MX450 laptop.

Full Log

[i 1031 23:09:50.689249 80 cuda_flags.cc:32] CUDA enabled.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-b8f9d3eceb14> in <module>
----> 1 jt.flags.use_cuda = 1; jt.matmul(jt.rand((1, 1, 100, 100)), jt.rand(((1, 16, 100, 1)))).shape

~/anaconda3/envs/gm-jittor/lib/python3.7/site-packages/jittor/nn.py in matmul(a, b)
    122             # a: [..., n, m], b: [..., m, k], c:[..., n, k]
    123             if jt.flags.use_cuda and jt.compile_extern.cublas_ops:
--> 124                 return jt.compile_extern.cublas_ops.cublas_batched_matmul(a, b, 0, 0)
    125         shape = []
    126         len_c = max(len_a, len_b)

RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.cublas_batched_matmul)).

Types of your inputs are:
 self   = module,
 args   = (Var, Var, int, int, ),

The function declarations are:
 VarHolder* cublas_batched_matmul(VarHolder* a, VarHolder* b,  bool trans_a,  bool trans_b)

Failed reason:[f 1031 23:09:50.690284 80 cublas_batched_matmul_op.cc:75] Check failed a->shape[i](1) == b->shape[i](16) Something wrong ... Could you please report this issue?

Minimal Reproduce

# On CPU, jt.matmul() works
jt.flags.use_cuda = 0; jt.matmul(jt.rand((1, 1, 100, 100)), jt.rand(((1, 16, 100, 1)))).shape
# On GPU, it doesn`t work
jt.flags.use_cuda = 1; jt.matmul(jt.rand((1, 1, 100, 100)), jt.rand(((1, 16, 100, 1)))).shape

Possible Solution

# Add repeat(1, 16, 1, 1) to explicitly specify the shape to broadcast
jt.flags.use_cuda = 1; jt.matmul(jt.rand((1, 1, 100, 100)).repeat(1, 16, 1, 1), jt.rand(((1, 16, 100, 1)))).shape

However, this solution results in large computational error.

a = torch.randn(1,1,100,100).numpy()
b = torch.randn(1,16,100,1).numpy()
c_torch = torch.matmul(torch.tensor(a), torch.tensor(b)).numpy()
c_jt = jt.matmul(jt.Var(a).repeat(1, 16, 1, 1), jt.Var(b)).numpy()
np.testing.assert_almost_equal(c_torch, c_jt) # Passed

jt.flags.use_cuda = 1
c_jt = jt.matmul(jt.Var(a).repeat(1, 16, 1, 1), jt.Var(b)).numpy()
np.testing.assert_almost_equal(c_torch, c_jt) # AssertionError
AssertionError:
Arrays are not almost equal to 7 decimals

Mismatched elements: 1350 / 1600 (84.4%)
Max absolute difference: 1.1444092e-05
Max relative difference: 3.5474255e-05
 x: array([[[[  4.539161 ],
         [  5.9756746],
         [ 13.752586 ],...
 y: array([[[[  4.53916  ],
         [  5.9756722],
         [ 13.752583 ],...

Expected behavior

jt.matmul should work correctly on CUDA.

@Exusial
Copy link
Contributor

Exusial commented Nov 19, 2022

Thank you for your advice, we will fix the shape issues in the upcoming updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants