RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #50

yxchng · 2022-08-06T00:47:20Z

 ** On entry to SGEMM  parameter number 10 had an illegal value
Traceback (most recent call last):
  File "check_flops.py", line 34, in <module>
    model(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 620, in forward
    x = self.forward_features(x)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 616, in forward_features
    return self.features(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 434, in forward
    new_features = layer(features)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 392, in forward
    bottleneck_output = self.bottleneck_fn(prev_features)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 349, in bottleneck_fn
    bottleneck_output = self.block(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 128, in forward
    x = self.attn(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/natten/nattencuda.py", line 121, in forward
    qkv = self.qkv(x).reshape(B, H, W, 3, self.num_heads, self.head_dim).permute(3, 0, 4, 1, 2, 5)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

My input shape is torch.Size([1, 256, 14, 14]). Why am I getting this error?

The text was updated successfully, but these errors were encountered:

yxchng · 2022-08-06T01:14:49Z

Seems like it expects channels_last format. Will you support channels_first? Frequent permutation is quite a hit on performance.

alihassanijr · 2022-08-07T00:52:34Z

Thank you for your interest.

That's correct, the module expects channels-last inputs, similar to many other Attention implementations (although those are typically flattened across spatial axes and are 3D tensors).

Since the model is not a CNN, and Convolutions are the only operations in the model that require a channels first structure, it makes sense to keep inputs channels-last to avoid frequent permutations.

Some permutations and reshaping operations are unavoidable because of the multi-head split.
The current structure is optimized for speed, as all linear projections expect channels-last, and our NA kernel expects inputs
to be of shape Batch x Heads x Height x Width x Dim, which is again the most efficient way of computing outputs.

To be clear, channels-last is usually the more efficient format, hence the movement towards channels-last integration since torch 1.11. Although other factors also affect how much faster your operations get when switching to channels-last (to name a few: cuDNN kernels being called instead of naive ATEN kernels, architecture-specific kernels from NVIDIA packages).

But thank you for bringing this to our attention, shape requirements should be included in our documentation. We will update this in future versions.

alihassanijr · 2022-09-30T01:09:58Z

Closing this due to inactivity. If you still have questions feel free to open it back up.

alihassanijr added the documentation Improvements or additions to documentation label Aug 7, 2022

alihassanijr closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #50

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #50

yxchng commented Aug 6, 2022

yxchng commented Aug 6, 2022

alihassanijr commented Aug 7, 2022 •

edited

Loading

alihassanijr commented Sep 30, 2022

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) #50

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) #50

Comments

yxchng commented Aug 6, 2022

yxchng commented Aug 6, 2022

alihassanijr commented Aug 7, 2022 • edited Loading

alihassanijr commented Sep 30, 2022

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #50

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #50

alihassanijr commented Aug 7, 2022 •

edited

Loading