Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal memory access during complex exponential with large imaginary part as exponent #1085

Closed
wsshin opened this issue Aug 5, 2021 · 2 comments · Fixed by #1086
Closed
Labels
bug Something isn't working cuda kernels Stuff about writing CUDA kernels.

Comments

@wsshin
Copy link

wsshin commented Aug 5, 2021

I have a CuArray of complex numbers with large imaginary parts (geater than 1e7). If I take the element-wise complex exponential of this CuArray, I get an illegal memory access error.

Here is an example:

julia> VERSION
v"1.6.2-pre.0"

(@v1.6) pkg> st CUDA
      Status `~/.julia/environments/v1.6/Project.toml`
  [052768ef] CUDA v3.3.4

julia> using CUDA

julia> dev = device()
CuDevice(0): Tesla V100-PCIE-32GB

julia> capability(dev)
v"7.0.0"

julia> v = CuVector{ComplexF64}(undef, 1);  # length-1 vector

julia> v .= 1e6im
1-element CuArray{ComplexF64, 1}:
 0.0 + 1.0e6im

julia> exp.(v)  # works fine
1-element CuArray{ComplexF64, 1}:
 0.9367521275331447 - 0.34999350217129294im

julia> v .= 1e7im
1-element CuArray{ComplexF64, 1}:
 0.0 + 1.0e7im

julia> exp.(v);
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)WARNING: Error while freeing CuPtr{Nothing}(0x0000000402000400):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), meta=nothing)

The error doesn't seem device-specific. I tested the same example on an older GPU with device capability of v3.7.0, and got the same error.

Here is the stack trace of the above example:

Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/lwSps/lib/cudadrv/error.jl:105
  [2] macro expansion
    @ ~/.julia/packages/CUDA/lwSps/lib/cudadrv/error.jl:115 [inlined]
  [3] cuMemFreeAsync(dptr::CUDA.Mem.DeviceBuffer, hStream::CuStream)
    @ CUDA ~/.julia/packages/CUDA/lwSps/lib/utils/call.jl:26
  [4] #free#6
    @ ~/.julia/packages/CUDA/lwSps/lib/cudadrv/memory.jl:103 [inlined]
  [5] macro expansion
    @ ~/.julia/packages/CUDA/lwSps/src/pool.jl:147 [inlined]
  [6] macro expansion
    @ ./timing.jl:287 [inlined]
  [7] actual_free(block::CUDA.PoolUtils.Block; stream_ordered::Bool, stream::CuStream)
    @ CUDA ~/.julia/packages/CUDA/lwSps/src/pool.jl:146
  [8] #free#158
    @ ~/.julia/packages/CUDA/lwSps/src/pool/none.jl:27 [inlined]
  [9] macro expansion
    @ ~/.julia/packages/CUDA/lwSps/src/pool.jl:361 [inlined]
 [10] macro expansion
    @ ./timing.jl:287 [inlined]
 [11] #free#177
    @ ~/.julia/packages/CUDA/lwSps/src/pool.jl:360 [inlined]
 [12] macro expansion
    @ ~/.julia/packages/CUDA/lwSps/src/array.jl:62 [inlined]
 [13] macro expansion
    @ ~/.julia/packages/CUDA/lwSps/src/state.jl:175 [inlined]
 [14] unsafe_free!(xs::CuArray{ComplexF64, 1}, stream::CuStream)
    @ CUDA ~/.julia/packages/CUDA/lwSps/src/array.jl:61
 [15] unsafe_finalize!(xs::CuArray{ComplexF64, 1})
    @ CUDA ~/.julia/packages/CUDA/lwSps/src/array.jl:82
 [16] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [17] invokelatest
    @ ./essentials.jl:706 [inlined]
 [18] show_backtrace(io::IOContext{Base.TTY}, t::Vector{Base.StackTraces.StackFrame})
    @ Base ./errorshow.jl:776
 [19] showerror(io::IOContext{Base.TTY}, ex::CuError, bt::Vector{Base.StackTraces.StackFrame}; backtrace::Bool)
    @ Base ./errorshow.jl:90
 [20] show_exception_stack(io::IOContext{Base.TTY}, stack::Vector{Any})
    @ Base ./errorshow.jl:877
 [21] display_error(io::IOContext{Base.TTY}, stack::Vector{Any})
    @ Base ./client.jl:104
 [22] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [23] invokelatest
    @ ./essentials.jl:706 [inlined]
 [24] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL ~/pkg/julia/julia-1.6/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:242
 [25] (::REPL.var"#40#41"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL ~/pkg/julia/julia-1.6/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:231
 [26] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/pkg/julia/julia-1.6/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:462
 [27] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL ~/pkg/julia/julia-1.6/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:229
 [28] (::REPL.var"#do_respond#61"{Bool, Bool, REPL.var"#72#82"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL ~/pkg/julia/julia-1.6/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:798
 [29] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [30] invokelatest
    @ ./essentials.jl:706 [inlined]
 [31] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit ~/pkg/julia/julia-1.6/usr/share/julia/stdlib/v1.6/REPL/src/LineEdit.jl:2441
 [32] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL ~/pkg/julia/julia-1.6/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:1126
 [33] (::REPL.var"#44#49"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ./task.jl:411
WARNING: Error while freeing CuPtr{Nothing}(0x0000000402000200):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), meta=nothing)
@wsshin wsshin added the bug Something isn't working label Aug 5, 2021
@wsshin
Copy link
Author

wsshin commented Aug 5, 2021

Maybe related to #58. Defining

complex_exp(z) = exp(real(z)) * (cos(imag(z)) + im * sin(imag(z)))

and using complex_exp.(v) instead of exp.(v) in the above example seems to be working.

@maleadt
Copy link
Member

maleadt commented Aug 6, 2021

Running this under compute-sanitizer reveals:

========= COMPUTE-SANITIZER
========= Invalid __global__ read of size 8 bytes
=========     at 0xef0 in array.jl:835:julia_kernel_2816(CuDeviceArray<Complex<Float64>, (int)1, (int)1>)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f43e2e9a320 is out of bounds
=========     Device Frame:/home/tim/Julia/pkg/CUDA/wip2.jl:6:julia_kernel_2816(CuDeviceArray<Complex<Float64>, (int)1, (int)1>) [0x0]

So very cleary a CPU pointer. Looking at the IR:

;  @ special/rem_pio2.jl:168 within `paynehanek`
; ┌ @ array.jl:835 within `getindex`
   %10 = load i64*, i64** inttoptr (i64 140370921739040 to i64**), align 32
   %11 = getelementptr inbounds i64, i64* %10, i64 %8
   %12 = load i64, i64* %11, align 8

which comes from the Base sincos implementation. Overriding that one with an implementation from libdevice fixes the test here.

Ideally GPUCompiler should catch these loads, but they're pretty hard to spot (there may be legal inttoptr operations).

@maleadt maleadt added the cuda kernels Stuff about writing CUDA kernels. label Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda kernels Stuff about writing CUDA kernels.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants