Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during CUDA test #1718

Closed
oneg1101 opened this issue Jan 6, 2023 · 3 comments
Closed

Error during CUDA test #1718

oneg1101 opened this issue Jan 6, 2023 · 3 comments
Labels
bug Something isn't working needs information Further information is requested

Comments

@oneg1101
Copy link

oneg1101 commented Jan 6, 2023

Describe the bug

I encountered an error while testing CUDA. This is my first time trying to use CUDA. The culprit appears to be cudadrv (drivers?). I already tried updating my Nvidia drivers.

To reproduce

julia> Pkg.test("CUDA")
     Testing CUDA
      Status `C:\Users\jrp29\AppData\Local\Temp\jl_F168Tb\Project.toml`
  [79e6a3ab] Adapt v3.4.0
⌅ [ab4f0b2a] BFloat16s v0.2.0
  [052768ef] CUDA v3.12.0
  [864edb3b] DataStructures v0.18.13
  [7a1cc6ca] FFTW v1.5.0
  [0c68f7d7] GPUArrays v8.5.0
  [a98d9a8b] Interpolations v0.14.7
  [872c559c] NNlib v0.8.13
  [276daf66] SpecialFunctions v2.1.7
  [a759f4b9] TimerOutputs v0.5.22
  [ade2ca70] Dates `@stdlib/Dates`
  [8ba89e20] Distributed `@stdlib/Distributed`
  [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra`
  [de0858da] Printf `@stdlib/Printf`
  [3fa0cd96] REPL `@stdlib/REPL`
  [9a3f8284] Random `@stdlib/Random`
  [2f01184e] SparseArrays `@stdlib/SparseArrays`
  [10745b16] Statistics `@stdlib/Statistics`
  [8dfed614] Test `@stdlib/Test`
      Status `C:\Users\jrp29\AppData\Local\Temp\jl_F168Tb\Manifest.toml`
  [621f4979] AbstractFFTs v1.2.1
  [79e6a3ab] Adapt v3.4.0
  [13072b0f] AxisAlgorithms v1.0.1
⌅ [ab4f0b2a] BFloat16s v0.2.0
  [fa961155] CEnum v0.4.2
  [052768ef] CUDA v3.12.0
  [d360d2e6] ChainRulesCore v1.15.6
  [9e997f8a] ChangesOfVariables v0.1.4
  [34da2185] Compat v4.5.0
  [864edb3b] DataStructures v0.18.13
  [ffbed154] DocStringExtensions v0.9.3
  [e2ba6199] ExprTools v0.1.8
  [7a1cc6ca] FFTW v1.5.0
  [0c68f7d7] GPUArrays v8.5.0
  [46192b85] GPUArraysCore v0.1.2
⌅ [61eb1bfa] GPUCompiler v0.16.7
  [a98d9a8b] Interpolations v0.14.7
  [3587e190] InverseFunctions v0.1.8
  [92d709cd] IrrationalConstants v0.1.1
  [692b3bcd] JLLWrappers v1.4.1
  [929cbde3] LLVM v4.14.1
  [2ab3a3ac] LogExpFunctions v0.3.19
  [872c559c] NNlib v0.8.13
  [6fe1bfb0] OffsetArrays v1.12.8
  [bac558e1] OrderedCollections v1.4.1
  [21216c6a] Preferences v1.3.0
  [74087812] Random123 v1.6.0
  [e6cf234a] RandomNumbers v1.5.3
  [c84ed2f1] Ratios v0.4.3
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [276daf66] SpecialFunctions v2.1.7
  [90137ffa] StaticArrays v1.5.12
  [1e83bf80] StaticArraysCore v1.4.0
  [a759f4b9] TimerOutputs v0.5.22
  [efce3f68] WoodburyMatrices v0.5.5
  [f5851436] FFTW_jll v3.3.10+0
  [1d5cc7b8] IntelOpenMP_jll v2018.0.3+2
  [dad2f222] LLVMExtra_jll v0.0.16+0
  [856f044c] MKL_jll v2022.2.0+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [0dad84c5] ArgTools v1.1.1 `@stdlib/ArgTools`
  [56f22d72] Artifacts `@stdlib/Artifacts`
  [2a0f44e3] Base64 `@stdlib/Base64`
  [ade2ca70] Dates `@stdlib/Dates`
  [8ba89e20] Distributed `@stdlib/Distributed`
  [f43a241f] Downloads v1.6.0 `@stdlib/Downloads`
  [7b1f6079] FileWatching `@stdlib/FileWatching`
  [b77e0a4c] InteractiveUtils `@stdlib/InteractiveUtils`
  [4af54fe1] LazyArtifacts `@stdlib/LazyArtifacts`
  [b27032c2] LibCURL v0.6.3 `@stdlib/LibCURL`
  [76f85450] LibGit2 `@stdlib/LibGit2`
  [8f399da3] Libdl `@stdlib/Libdl`
  [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra`
  [56ddb016] Logging `@stdlib/Logging`
  [d6f4376e] Markdown `@stdlib/Markdown`
  [a63ad114] Mmap `@stdlib/Mmap`
  [ca575930] NetworkOptions v1.2.0 `@stdlib/NetworkOptions`
  [44cfe95a] Pkg v1.8.0 `@stdlib/Pkg`
  [de0858da] Printf `@stdlib/Printf`
  [3fa0cd96] REPL `@stdlib/REPL`
  [9a3f8284] Random `@stdlib/Random`
  [ea8e919c] SHA v0.7.0 `@stdlib/SHA`
  [9e88b42a] Serialization `@stdlib/Serialization`
  [1a1011a3] SharedArrays `@stdlib/SharedArrays`
  [6462fe0b] Sockets `@stdlib/Sockets`
  [2f01184e] SparseArrays `@stdlib/SparseArrays`
  [10745b16] Statistics `@stdlib/Statistics`
  [fa267f1f] TOML v1.0.0 `@stdlib/TOML`
  [a4e569a6] Tar v1.10.1 `@stdlib/Tar`
  [8dfed614] Test `@stdlib/Test`
  [cf7118a7] UUIDs `@stdlib/UUIDs`
  [4ec0a83e] Unicode `@stdlib/Unicode`
  [e66e0078] CompilerSupportLibraries_jll v0.5.2+0 `@stdlib/CompilerSupportLibraries_jll`
  [deac9b47] LibCURL_jll v7.84.0+0 `@stdlib/LibCURL_jll`
  [29816b5a] LibSSH2_jll v1.10.2+0 `@stdlib/LibSSH2_jll`
  [c8ffd9c3] MbedTLS_jll v2.28.0+0 `@stdlib/MbedTLS_jll`
  [14a3606d] MozillaCACerts_jll v2022.2.1 `@stdlib/MozillaCACerts_jll`
  [4536629a] OpenBLAS_jll v0.3.20+0 `@stdlib/OpenBLAS_jll`
  [05823500] OpenLibm_jll v0.8.1+0 `@stdlib/OpenLibm_jll`
  [83775a58] Zlib_jll v1.2.12+3 `@stdlib/Zlib_jll`
  [8e850b90] libblastrampoline_jll v5.1.1+0 `@stdlib/libblastrampoline_jll`
  [8e850ede] nghttp2_jll v1.48.0+0 `@stdlib/nghttp2_jll`
  [3f19e933] p7zip_jll v17.4.0+0 `@stdlib/p7zip_jll`
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading.
     Testing Running tests...
┌ Info: System information:
│ CUDA toolkit 11.7, artifact installation
│ NVIDIA driver 527.56.0, for CUDA 12.0
│ CUDA driver 12.0
│ 
│ Libraries:
│ - CUBLAS: 11.10.1
│ - CURAND: 10.2.10
│ - CUFFT: 10.7.1
│ - CUSOLVER: 11.3.5
│ - CUSPARSE: 11.7.3
│ - CUPTI: 17.0.0
│ - NVML: 12.0.0+527.56
│ - CUDNN: 8.30.2 (for CUDA 11.5.0)
│ - CUTENSOR: 1.4.0 (for CUDA 11.5.0)
│ 
│ Toolchain:
│ - Julia: 1.8.3
│ - LLVM: 13.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
│ - Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
│ 
│ 1 device:
└   0: NVIDIA GeForce RTX 2070 (sm_75, 7.831 GiB / 8.000 GiB available)
[ Info: Testing using 1 device(s): 0. NVIDIA GeForce RTX 2070 (UUID 5ee48701-dc84-8bf7-cfd5-5bb1d1121c61)
                                                  |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
initialization                                (2) |    10.15 |   0.00 |  0.0 |       0.00 |      N/A |   0.02 |  0.2 |     129.38 |   778.15 |
gpuarrays\indexing scalar                     (3) |    39.38 |   0.06 |  0.2 |       0.01 |      N/A |   1.27 |  3.2 |    4147.67 |   777.11 |
gpuarrays\math/power                          (2) |    92.15 |   0.00 |  0.0 |       0.01 |      N/A |   5.53 |  6.0 |   12866.37 |  1599.80 |
gpuarrays\linalg/mul!/vector-matrix           (3) |   107.46 |   0.01 |  0.0 |       0.02 |      N/A |   4.59 |  4.3 |   13275.79 |  1560.66 |
gpuarrays\interface                           (3) |     6.23 |   0.00 |  0.0 |       0.00 |      N/A |   0.24 |  3.9 |     734.11 |  1560.66 |
gpuarrays\indexing multidimensional           (2) |    58.72 |   0.00 |  0.0 |       1.21 |      N/A |   2.22 |  3.8 |    7226.44 |  1599.80 |
gpuarrays\reductions/reducedim!               (4) |   161.67 |   0.06 |  0.0 |       1.03 |      N/A |  10.26 |  6.3 |   23342.60 |  1257.55 |
gpuarrays\linalg                              (5) |   166.43 |   0.06 |  0.0 |      11.72 |      N/A |   6.46 |  3.9 |   20020.64 |  1930.75 |
gpuarrays\uniformscaling                      (4) |    15.42 |   0.00 |  0.0 |       0.01 |      N/A |   0.44 |  2.8 |    1501.55 |  1257.55 |
gpuarrays\reductions/any all count            (3) |    28.81 |   0.00 |  0.0 |       0.00 |      N/A |   2.19 |  7.6 |    4767.81 |  1560.66 |
gpuarrays\math/intrinsics                     (4) |     5.52 |   0.00 |  0.0 |       0.00 |      N/A |   0.19 |  3.5 |     655.97 |  1257.55 |
gpuarrays\statistics                          (4) |   121.70 |   0.00 |  0.0 |       1.51 |      N/A |   7.17 |  5.9 |   15499.38 |  2320.09 |
gpuarrays\linalg/mul!/matrix-matrix           (5) |   189.74 |   0.02 |  0.0 |       0.12 |      N/A |   6.55 |  3.5 |   21215.28 |  2285.42 |
gpuarrays\constructors                        (5) |    44.62 |   0.01 |  0.0 |       0.08 |      N/A |   1.43 |  3.2 |    4285.20 |  2285.42 |
gpuarrays\random                              (5) |    39.42 |   0.00 |  0.0 |       0.03 |      N/A |   1.55 |  3.9 |    4310.77 |  2285.42 |
gpuarrays\base                                (5) |    49.78 |   0.00 |  0.0 |       8.90 |      N/A |   2.34 |  4.7 |    6117.97 |  2285.42 |
gpuarrays\linalg/norm                         (3) |   329.82 |   0.01 |  0.0 |       0.02 |      N/A |  17.74 |  5.4 |   40543.80 |  3431.61 |
gpuarrays\reductions/minimum maximum extrema  (2) |   410.32 |   0.02 |  0.0 |       2.19 |      N/A |  23.69 |  5.8 |   56151.26 |  2744.42 |
gpuarrays\reductions/mapreduce                (4) |   300.86 |   0.01 |  0.0 |       1.81 |      N/A |  13.18 |  4.4 |   36667.97 |  2517.00 |
gpuarrays\reductions/== isequal               (5) |   121.41 |   0.01 |  0.0 |       1.07 |      N/A |   5.82 |  4.8 |   15456.03 |  2368.95 |
gpuarrays\reductions/reduce                   (4) |    34.61 |   0.01 |  0.0 |       1.21 |      N/A |   0.56 |  1.6 |    2033.86 |  2517.00 |
apiutils                                      (4) |     0.17 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       0.86 |  2517.00 |
gpuarrays\reductions/mapreducedim!            (2) |   221.71 |   0.01 |  0.0 |       1.54 |      N/A |   9.60 |  4.3 |   25086.13 |  3138.73 |
broadcast                                     (2) |    35.57 |   0.00 |  0.0 |       0.00 |      N/A |   1.31 |  3.7 |    3659.87 |  3138.73 |
array                                         (4) |   190.64 |   0.11 |  0.1 |    1264.47 |      N/A |   7.40 |  3.9 |   19819.77 |  2754.28 |
codegen                                       (2) |    15.11 |   0.00 |  0.0 |       0.00 |      N/A |   0.50 |  3.3 |    1249.18 |  3138.73 |
      From worker 2:
      From worker 2:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
      From worker 2:    Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ff9a584df70 -- unknown function (ip: 00007ff9a584df70)
      From worker 2:    in expression starting at C:\Users\jrp29\.julia\packages\CUDA\DfvRa\test\cudadrv.jl:680
      From worker 2:    unknown function (ip: 00007ff9a584df70)
      From worker 2:    Allocations: 2091531478 (Pool: 2089436083; Big: 2095395); GC: 750
cudadrv                                       (2) |         failed at 2023-01-05T19:31:33.374
Worker 2 terminated.
Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
 [1] (::Base.var"#wait_locked#680")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
   @ Base .\stream.jl:945
 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
   @ Base .\stream.jl:953
 [3] unsafe_read
   @ .\io.jl:759 [inlined]
 [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
   @ Base .\io.jl:758
 [5] read!
   @ .\io.jl:760 [inlined]
 [6] deserialize_hdr_raw
   @ C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Distributed\src\messages.jl:167 [inlined]
 [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
   @ Distributed C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:172
 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
   @ Distributed C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:133
 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
   @ Distributed .\task.jl:484
cufft                                         (6) |    64.01 |   0.08 |  0.1 |     233.38 |      N/A |   2.64 |  4.1 |    6051.06 |  1498.49 |
curand                                        (6) |     0.58 |   0.00 |  0.0 |       0.00 |      N/A |   0.01 |  1.6 |      33.68 |  1511.90 |
gpuarrays\reductions/sum prod                 (5) |   383.29 |   0.02 |  0.0 |       3.24 |      N/A |  17.25 |  4.5 |   45159.00 |  3107.14 |
gpuarrays\broadcasting                        (3) |   505.83 |   0.02 |  0.0 |       2.00 |      N/A |  18.35 |  3.6 |   49994.60 |  3648.41 |
cublas                                        (4) |   198.88 |   0.04 |  0.0 |      14.55 |      N/A |   7.01 |  3.5 |   18926.98 |  3062.09 |
cusparse                                      (6) |    95.93 |   0.04 |  0.0 |      10.81 |      N/A |   2.82 |  2.9 |    8785.07 |  1511.90 |
iterator                                      (6) |     2.64 |   0.00 |  0.0 |       1.93 |      N/A |   0.11 |  4.1 |     399.20 |  1511.90 |
      From worker 4:    WARNING: Method definition #6714#kernel(Any) in module Main at C:\Users\jrp29\.julia\packages\CUDA\DfvRa\test\execution.jl:315 overwritten at C:\Users\jrp29\.julia\packages\CUDA\DfvRa\test\execution.jl:323.
linalg                                        (6) |    44.97 |   0.00 |  0.0 |       9.03 |      N/A |   3.50 |  7.8 |    7202.81 |  1511.90 |
nvml                                          (6) |     0.39 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      21.85 |  1511.90 |
nvtx                                          (6) |     0.26 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      12.08 |  1511.90 |
pointer                                       (6) |     0.25 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      10.68 |  1511.90 |
pool                                          (6) |     1.85 |   0.00 |  0.0 |       0.00 |      N/A |   0.47 | 25.4 |     200.49 |  1511.90 |
exceptions                                    (3) |    99.55 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      24.56 |  3648.41 |
random                                        (6) |    37.02 |   0.00 |  0.0 |     256.58 |      N/A |   1.35 |  3.6 |    4134.91 |  1511.90 |
execution                                     (4) |   102.78 |   0.00 |  0.0 |       0.44 |      N/A |   3.37 |  3.3 |    9743.76 |  3062.09 |
threading                                     (4) |     3.66 |   0.00 |  0.1 |      10.94 |      N/A |   0.18 |  4.9 |     271.14 |  3301.07 |
utils                                         (4) |     1.66 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      97.57 |  3301.07 |
cudnn\activation                              (4) |     2.47 |   0.00 |  0.0 |       0.00 |      N/A |   0.06 |  2.6 |     182.25 |  3301.07 |
cudnn\convolution                             (4) |     0.10 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       6.02 |  3301.07 |
      From worker 4:    ┌ Warning: CUDA.CUDNN.cudnnDropoutSeed[] >= 0: dropout operations will be deterministic but 40x more expensive
      From worker 4:    └ @ CUDA.CUDNN C:\Users\jrp29\.julia\packages\CUDA\DfvRa\lib\cudnn\dropout.jl:40
cudnn\dropout                                 (4) |     1.69 |   0.00 |  0.0 |       0.86 |      N/A |   0.04 |  2.6 |      99.68 |  3301.07 |
cudnn\inplace                                 (4) |     1.06 |   0.00 |  0.0 |       0.01 |      N/A |   0.03 |  3.1 |      46.31 |  3301.07 |
examples                                      (5) |   164.63 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      40.13 |  3107.14 |
cudnn\multiheadattn                           (4) |    17.55 |   0.00 |  0.0 |       0.15 |      N/A |   0.49 |  2.8 |    1615.85 |  3954.91 |
cudnn\optensor                                (4) |     1.72 |   0.00 |  0.0 |       0.00 |      N/A |   0.05 |  3.1 |     115.35 |  3954.91 |
cudnn\normalization                           (5) |    23.07 |   0.00 |  0.0 |       0.08 |      N/A |   1.09 |  4.7 |    1912.91 |  3193.63 |
cudnn\reduce                                  (5) |     2.91 |   0.00 |  0.0 |       0.02 |      N/A |   0.13 |  4.5 |     238.41 |  3193.63 |
cudnn\pooling                                 (4) |     8.27 |   0.00 |  0.0 |       0.06 |      N/A |   0.18 |  2.2 |     668.78 |  3954.91 |
cudnn\softmax                                 (4) |     1.71 |   0.00 |  0.0 |       0.01 |      N/A |   0.05 |  2.9 |     107.35 |  3954.91 |
cudnn\tensor                                  (4) |     1.22 |   0.00 |  0.0 |       0.00 |      N/A |   0.71 | 58.3 |      23.18 |  3954.91 |
texture                                       (6) |    70.68 |   0.00 |  0.0 |       0.09 |      N/A |   3.22 |  4.6 |    8461.58 |  1680.64 |
      From worker 4:    ┌ Warning: `cholesky(A::Union{StridedMatrix, RealHermSymComplexHerm{<:Real, <:StridedMatrix}}, ::Val{false}; check::Bool = true)` is deprecated, use `cholesky(A, NoPivot(); check)` instead.
      From worker 4:    │   caller = macro expansion at dense.jl:19 [inlined]
      From worker 4:    └ @ Core C:\Users\jrp29\.julia\packages\CUDA\DfvRa\test\cusolver\dense.jl:19
cudnn\rnn                                     (5) |    11.34 |   0.01 |  0.1 |     961.57 |      N/A |   0.36 |  3.2 |     898.44 |  3709.54 |
      From worker 4:    ┌ Warning: `cholesky(A::Union{StridedMatrix, RealHermSymComplexHerm{<:Real, <:StridedMatrix}}, ::Val{false}; check::Bool = true)` is deprecated, use `cholesky(A, NoPivot(); check)` instead.
      From worker 4:    │   caller = macro expansion at dense.jl:20 [inlined]
      From worker 4:    └ @ Core C:\Users\jrp29\.julia\packages\CUDA\DfvRa\test\cusolver\dense.jl:20
      From worker 4:    ┌ Warning: `cholesky(A::Union{StridedMatrix, RealHermSymComplexHerm{<:Real, <:StridedMatrix}}, ::Val{false}; check::Bool = true)` is deprecated, use `cholesky(A, NoPivot(); check)` instead.
      From worker 4:    │   caller = macro expansion at dense.jl:25 [inlined]
      From worker 4:    └ @ Core C:\Users\jrp29\.julia\packages\CUDA\DfvRa\test\cusolver\dense.jl:25
cusolver\multigpu                             (6) |    13.68 |   0.00 |  0.0 |     545.90 |      N/A |   0.27 |  2.0 |    1029.24 |  2233.47 |
cusolver\sparse                               (5) |    20.22 |   0.00 |  0.0 |       0.18 |      N/A |   0.53 |  2.6 |    1291.90 |  3709.54 |
cusparse\conversions                          (5) |    15.40 |   0.00 |  0.0 |       0.02 |      N/A |   0.52 |  3.4 |    1670.71 |  3709.54 |
cusparse\device                               (5) |     1.87 |   0.00 |  0.0 |       0.01 |      N/A |   0.02 |  0.9 |      25.50 |  3709.54 |
cusparse\generic                              (5) |     5.43 |   0.00 |  0.0 |       0.05 |      N/A |   0.10 |  1.8 |     441.58 |  3709.54 |
cusparse\broadcast                            (6) |    68.59 |   0.00 |  0.0 |       0.02 |      N/A |   2.53 |  3.7 |    8053.32 |  2233.47 |
cusparse\linalg                               (6) |     7.46 |   0.00 |  0.0 |       0.01 |      N/A |   0.19 |  2.6 |     695.37 |  2233.47 |
cutensor\base                                 (6) |     0.26 |   0.00 |  0.0 |       0.05 |      N/A |   0.00 |  0.0 |      10.05 |  2233.47 |
cusparse\interfaces                           (5) |    57.20 |   0.02 |  0.0 |       0.97 |      N/A |   1.64 |  2.9 |    4573.26 |  3709.54 |
cutensor\contractions                         (6) |    36.36 |   0.01 |  0.0 |    7553.94 |      N/A |   0.94 |  2.6 |    3329.24 |  2233.47 |
cutensor\elementwise_binary                   (5) |    29.73 |   0.00 |  0.0 |       6.11 |      N/A |   0.74 |  2.5 |    2902.17 |  3709.54 |
cutensor\permutations                         (5) |     3.36 |   0.00 |  0.0 |       1.36 |      N/A |   0.10 |  2.9 |     306.19 |  3709.54 |
cutensor\elementwise_trinary                  (6) |    35.80 |   0.00 |  0.0 |       2.71 |      N/A |   1.11 |  3.1 |    4181.00 |  2233.47 |
cutensor\reductions                           (5) |    20.08 |   0.01 |  0.0 |      21.36 |      N/A |   0.44 |  2.2 |    1646.22 |  3709.54 |
sorting                                       (3) |   258.09 |   0.01 |  0.0 |     543.84 |      N/A |  11.79 |  4.6 |   26205.65 |  6283.56 |
device\array                                  (6) |     9.17 |   0.00 |  0.0 |       0.00 |      N/A |   0.31 |  3.4 |    1149.78 |  2233.47 |
device\ldg                                    (3) |    14.39 |   0.00 |  0.0 |       0.00 |      N/A |   0.52 |  3.6 |    1593.06 |  6283.56 |
cusolver\dense                                (4) |   192.92 |   0.15 |  0.1 |    2491.33 |      N/A |   7.18 |  3.7 |   19188.23 |  3954.91 |
device\intrinsics                             (5) |    55.10 |   0.00 |  0.0 |       0.00 |      N/A |   1.78 |  3.2 |    6266.75 |  3709.54 |
device\random                                 (6) |    59.67 |   0.00 |  0.0 |       0.17 |      N/A |   1.77 |  3.0 |    6857.68 |  2233.47 |
device\intrinsics\memory                      (5) |    36.04 |   0.00 |  0.0 |       0.02 |      N/A |   1.15 |  3.2 |    3544.92 |  3709.54 |
device\intrinsics\output                      (6) |    37.00 |   0.00 |  0.0 |       0.00 |      N/A |   1.45 |  3.9 |    3935.17 |  2233.47 |
      From worker 6:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4 
      From worker 6:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 6:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 6:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 6:    1. nbrCtas=480, threads(32,4)
      From worker 6:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4
      From worker 6:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 6:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 6:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 6:    1. nbrCtas=480, threads(32,4)
      From worker 6:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4
      From worker 6:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 6:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 6:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 6:    1. nbrCtas=480, threads(32,4)
      From worker 6:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4
      From worker 6:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 6:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 6:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 6:    1. nbrCtas=480, threads(32,4)
      From worker 6:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4
      From worker 6:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 6:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 6:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 6:    1. nbrCtas=480, threads(32,4)
      From worker 6:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4
      From worker 6:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 6:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 6:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 6:    1. nbrCtas=480, threads(32,4)
device\intrinsics\math                        (4) |    88.32 |   0.00 |  0.0 |       0.00 |      N/A |   2.57 |  2.9 |    7939.25 |  3954.91 |
device\intrinsics\atomics                     (3) |   115.81 |   0.00 |  0.0 |       0.00 |      N/A |   3.72 |  3.2 |   11778.67 |  6283.56 |
device\intrinsics\wmma                        (5) |   133.16 |   0.01 |  0.0 |       0.63 |      N/A |   3.85 |  2.9 |   13508.93 |  3709.54 |
      From worker 5:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4 
      From worker 5:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 5:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 5:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 5:    1. nbrCtas=480, threads(32,4)
      From worker 5:    [cusparseDenseToCsc] THREAD_COUNT_X = 32, THREAD_COUNT_Y = 4
      From worker 5:    [cusparseDenseToCsc] maxCtas = 288, threadsPerCta = 128, nbrCtas = 1
      From worker 5:    [cusparseDenseToCsc] call cusparseDense2CscCopySetBase_kernel
      From worker 5:    [cusparseDenseToCsc] cscColPtr = prefix sum(nnzPerCol)
      From worker 5:    1. nbrCtas=480, threads(32,4)
Testing finished in 26 minutes, 49 seconds, 400 milliseconds
cudadrv: Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(2)

Test Summary:                                  |  Pass  Error  Broken  Total  Time
  Overall                                      | 16159      1       4  16164
    initialization                             |    30                    30
    gpuarrays\indexing scalar                  |   476                   476
    gpuarrays\math/power                       |                        None
    gpuarrays\linalg/mul!/vector-matrix        |   168                   168
    gpuarrays\interface                        |     7                     7
    gpuarrays\indexing multidimensional        |    46                    46
    gpuarrays\reductions/reducedim!            |   192                   192
    gpuarrays\linalg                           |   231                   231
    gpuarrays\uniformscaling                   |    56                    56
    gpuarrays\reductions/any all count         |   101                   101
    gpuarrays\math/intrinsics                  |    12                    12
    gpuarrays\statistics                       |    84                    84
    gpuarrays\linalg/mul!/matrix-matrix        |   432                   432
    gpuarrays\constructors                     |   899                   899
    gpuarrays\random                           |    62                    62
    gpuarrays\base                             |    75                    75
    gpuarrays\linalg/norm                      |   696                   696
    gpuarrays\reductions/minimum maximum extrema |   666                   666
    gpuarrays\reductions/mapreduce             |   396                   396
    gpuarrays\reductions/== isequal            |   312                   312
    gpuarrays\reductions/reduce                |   264                   264
    apiutils                                   |     6                     6
    gpuarrays\reductions/mapreducedim!         |   312                   312      
    broadcast                                  |    22                    22
    array                                      |   354                   354
    codegen                                    |    10                    10
    cudadrv                                    |            1              1
    cufft                                      |   177                   177
    curand                                     |     1                     1
    gpuarrays\reductions/sum prod              |   862                   862
    gpuarrays\broadcasting                     |   390                   390
    cublas                                     |  2178                  2178
    cusparse                                   |   998                   998
    iterator                                   |    30                    30
    linalg                                     |    21                    21
    nvml                                       |    11                    11
    nvtx                                       |                        None
    pointer                                    |    35                    35
    pool                                       |    10                    10
    exceptions                                 |    17                    17
    random                                     |   117                   117
    execution                                  |    78                    78
    threading                                  |                        None
    utils                                      |    55                    55
    cudnn\activation                           |    43                    43
    cudnn\convolution                          |                        None
    cudnn\dropout                              |     7                     7
    cudnn\inplace                              |    12                    12
    examples                                   |     7                     7
    cudnn\multiheadattn                        |    69                    69
    cudnn\optensor                             |    43                    43
    cudnn\normalization                        |    32                    32
    cudnn\reduce                               |    51                    51
    cudnn\pooling                              |    48                    48
    cudnn\softmax                              |    16                    16
    cudnn\tensor                               |    10                    10
    texture                                    |    38              4     42
    cudnn\rnn                                  |    90                    90
    cusolver\multigpu                          |    30                    30
    cusolver\sparse                            |   112                   112
    cusparse\conversions                       |    24                    24
    cusparse\device                            |    10                    10
    cusparse\generic                           |    32                    32
    cusparse\broadcast                         |    65                    65
    cusparse\linalg                            |     8                     8
    cutensor\base                              |     8                     8
    cusparse\interfaces                        |   258                   258
    cutensor\contractions                      |   472                   472
    cutensor\elementwise_binary                |   220                   220
    cutensor\permutations                      |    40                    40      
    cutensor\elementwise_trinary               |   180                   180
    cutensor\reductions                        |   160                   160
    sorting                                    |   272                   272
    device\array                               |    32                    32
    device\ldg                                 |    22                    22
    cusolver\dense                             |  1912                  1912
    device\intrinsics                          |    38                    38
    device\random                              |   156                   156
    device\intrinsics\memory                   |    16                    16
    device\intrinsics\output                   |    40                    40
    device\intrinsics\math                     |   104                   104
    device\intrinsics\atomics                  |   147                   147
    device\intrinsics\wmma                     |   446                   446
    FAILURE

Error in testset cudadrv:
Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(2)
ERROR: LoadError: Test run finished with errors
in expression starting at C:\Users\jrp29\.julia\packages\CUDA\DfvRa\test\runtests.jl:552
ERROR: Package CUDA errored during testing
Stacktrace:
  [1] pkgerror(msg::String)
    @ Pkg.Types C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\Types.jl:67
  [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
    @ Pkg.Operations C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\Operations.jl:1813
  [3] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::Base.Pairs{Symbol, Base.TTY, Tuple{Symbol}, NamedTuple{(:io,), Tuple{Base.TTY}}})
    @ Pkg.API C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\API.jl:434
  [4] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::Base.TTY, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Pkg.API C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\API.jl:156
  [5] test(pkgs::Vector{Pkg.Types.PackageSpec})
    @ Pkg.API C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\API.jl:145
  [6] #test#87
    @ C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\API.jl:144 [inlined]
  [7] test
    @ C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\API.jl:144 [inlined]
  [8] #test#86
    @ C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\API.jl:143 [inlined]
  [9] test(pkg::String)
    @ Pkg.API C:\Users\jrp29\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Pkg\src\API.jl:143
 [10] top-level scope
    @ REPL[3]:1
Manifest.toml

The submission was too long if I included that text...

Expected behavior

I expected the test to pass I guess.

Version info

Details on Julia:

Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 4 on 12 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Details on CUDA:

CUDA toolkit 11.7, artifact installation
NVIDIA driver 527.56.0, for CUDA 12.0
CUDA driver 12.0

Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.1
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 12.0.0+527.56
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.3
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 2070 (sm_75, 7.831 GiB / 8.000 GiB available)

Additional context

I'm still using 1.8.3 because 1.8.4 breaks GLMakie unless you add Julia to the "PATH" when installing

@oneg1101 oneg1101 added the bug Something isn't working label Jan 6, 2023
@maleadt
Copy link
Member

maleadt commented Jan 6, 2023

From worker 2:
From worker 2: Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
From worker 2: Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ff9a584df70 -- unknown function (ip: 00007ff9a584df70)
From worker 2: in expression starting at C:\Users\jrp29.julia\packages\CUDA\DfvRa\test\cudadrv.jl:680
From worker 2: unknown function (ip: 00007ff9a584df70)
From worker 2: Allocations: 2091531478 (Pool: 2089436083; Big: 2095395); GC: 750
cudadrv (2) | failed at 2023-01-05T19:31:33.374
Worker 2 terminated.

There's no actual backtrace in here, so I can't isolate the issue.

Try with #1715 though, that contains a (test) fix for CUDA 12.

@maleadt
Copy link
Member

maleadt commented Jan 6, 2023

I've tagged that version, so just try CUDA.jl 3.12.1

@maleadt maleadt added the needs information Further information is requested label Jan 6, 2023
@oneg1101
Copy link
Author

oneg1101 commented Jan 7, 2023

Using CUDA.jl version 3.12.1 seems to have done the trick. The test passed successfully.

All I had to do was:

Pkg.rm("CUDA")
Pkg.add(Pkg.PackageSpec(name="CUDA", version="3.12.1"))

@maleadt maleadt closed this as completed Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs information Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants