You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sanity checks (read this first, then remove this section)
Describe the bug
I suspect, that if a kernel has multiple arrays created by @cuDynamicSharedMem, that CUDA.jl treats these as the same place in memory. By contrast, using @cuStaticSharedMem two distinct arrays are made.
To reproduce
The Minimal Working Example (MWE) for this bug:
using CUDA, Logging
function body(arr, my_shmem1, my_shmem2, i)
my_shmem1[i] = arr[i] + 1
my_shmem2[i] = arr[i]
sync_threads()
arr[i] = my_shmem1[i] - my_shmem2[i]
end
function static(arr :: CuDeviceArray{T}) where {T}
i = threadIdx().x
my_shmem1 = @cuStaticSharedMem(T, 1)
my_shmem2 = @cuStaticSharedMem(T, 1)
body(arr, my_shmem1, my_shmem2, i)
return nothing
end
function dynamic(arr :: CuDeviceArray{T}) where {T}
i = threadIdx().x
my_shmem1 = @cuDynamicSharedMem(T, 1)
my_shmem2 = @cuDynamicSharedMem(T, 1)
body(arr, my_shmem1, my_shmem2, i)
return nothing
end
N = 1
T = Int32
a = ones(T, 1)
b = CuArray(a)
c = CuArray(a)
@cuda threads=N static(b)
synchronize()
@cuda threads=N shmem=sizeof(T) * 2 * N dynamic(c)
synchronize()
@info "b = $b"
@info "c = $c"
Output:
[ Info: b = Int32[1]
[ Info: c = Int32[0]
Manifest.toml
(@v1.5) pkg> status
Status `C:\Users\ellis\.julia\environments\v1.5\Project.toml`
[537997a7] AbstractPlotting v0.13.4
[79e6a3ab] Adapt v2.3.0
[c52e3926] Atom v0.12.25
[6e4b80f9] BenchmarkTools v0.5.0
[052768ef] CUDA v2.1.0
[864edb3b] DataStructures v0.18.8
[7a1cc6ca] FFTW v1.2.4
[1a297f60] FillArrays v0.10.0
[e9467ef8] GLMakie v0.1.14
[0c68f7d7] GPUArrays v6.1.1
[e5e0dc1b] Juno v0.8.4
Expected behavior
These two simple kernels only differ in the type of shared memory used. I expected the output to be identical. My only assumption was that when launching a kernel with dynamic shmem, I needed to only specify total shared memory.
Version info
Details on Julia:
julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = "C:\Users\ellis\AppData\Local\atom\app-1.53.0\atom.exe" -a
JULIA_NUM_THREADS = 4
Correct, this is by design. That's why there's an offset argument to the macro. I don't think it would be better if the order of cuDynamicSharedMem invocations determines the offset, as that would be very fragile. AFAIK, CUDA C behaves the same.
Sanity checks (read this first, then remove this section)
Describe the bug
I suspect, that if a kernel has multiple arrays created by
@cuDynamicSharedMem
, that CUDA.jl treats these as the same place in memory. By contrast, using@cuStaticSharedMem
two distinct arrays are made.To reproduce
The Minimal Working Example (MWE) for this bug:
Output:
Manifest.toml
Expected behavior
These two simple kernels only differ in the type of shared memory used. I expected the output to be identical. My only assumption was that when launching a kernel with dynamic shmem, I needed to only specify total shared memory.
Version info
Details on Julia:
Details on CUDA:
Additional context
#431
The text was updated successfully, but these errors were encountered: