-
-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cp.dot
causes illegal memory access encountered
#3284
Comments
Interesting thing to note here is that, if, right before the dot product we do |
cp.dot
causes illegal memory access encountedcp.dot
causes illegal memory access encountered
I can reproduce this error. (I used 90,000,000 instead of 100,000,000 to reproduce as I didn't have enough memory, but it wouldn't be essential) |
Relabelled as cat:bug as it reproduced. |
I tried running this on CUDA 11 and didn't get an error, though I got an error on CUDA 10.2. It's strange... |
The cause of the problem has been nearly identified. There seems to be a bug in the gemm implementation of cuBLAS in CUDA 10.2 or older. At least one of the input matrices has more than 2 giga elements and when the matrix is transposed in cuBLAS, the results becomes incorrect or a segmentation fault occurs. This bug is fixed in CUDA 11. You might work around this problem by transposing the matrices in CuPy before calling cuBLAS gemms, since the problem will not occur if matrices are not transposed in cuBLAS, However, it will increase the memory usage.. |
Let me close this as the issue is fixed in the latest CUDA. |
Conditions (you can just paste the output of
python -c 'import cupy; cupy.show_config()'
)CuPy Version : 7.3.0
CUDA Root : /usr/local/cuda
CUDA Build Version : 10000
CUDA Driver Version : 10010
CUDA Runtime Version : 10000
cuBLAS Version : 10000
cuFFT Version : 10000
cuRAND Version : 10000
cuSOLVER Version : (10, 0, 0)
cuSPARSE Version : 10000
NVRTC Version : (10, 0)
cuDNN Build Version : 7605
cuDNN Version : 7600
NCCL Build Version : 2406
NCCL Runtime Version : 2604
Code to reproduce
The text was updated successfully, but these errors were encountered: