Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple @cuDynamicSharedMem in kernel causes unexpected behavior #555

Closed
xaellison opened this issue Nov 21, 2020 · 1 comment
Closed

Comments

@xaellison
Copy link
Contributor

Sanity checks (read this first, then remove this section)

Describe the bug

I suspect, that if a kernel has multiple arrays created by @cuDynamicSharedMem, that CUDA.jl treats these as the same place in memory. By contrast, using @cuStaticSharedMem two distinct arrays are made.

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA, Logging

function body(arr, my_shmem1, my_shmem2, i)
    my_shmem1[i] = arr[i] + 1
    my_shmem2[i] = arr[i]
    sync_threads()
    arr[i] = my_shmem1[i] - my_shmem2[i]
end

function static(arr :: CuDeviceArray{T}) where {T}
    i = threadIdx().x
    my_shmem1 = @cuStaticSharedMem(T, 1)
    my_shmem2 = @cuStaticSharedMem(T, 1)
    body(arr, my_shmem1, my_shmem2, i)
    return nothing
end

function dynamic(arr :: CuDeviceArray{T}) where {T}
    i = threadIdx().x
    my_shmem1 = @cuDynamicSharedMem(T, 1)
    my_shmem2 = @cuDynamicSharedMem(T, 1)
    body(arr, my_shmem1, my_shmem2, i)
    return nothing
end

N = 1
T = Int32
a = ones(T, 1)
b = CuArray(a)
c = CuArray(a)

@cuda threads=N static(b)
synchronize()

@cuda threads=N shmem=sizeof(T) * 2 * N dynamic(c)
synchronize()

@info "b = $b"
@info "c = $c"

Output:

[ Info: b = Int32[1]
[ Info: c = Int32[0]
Manifest.toml

(@v1.5) pkg> status
Status `C:\Users\ellis\.julia\environments\v1.5\Project.toml`
  [537997a7] AbstractPlotting v0.13.4
  [79e6a3ab] Adapt v2.3.0
  [c52e3926] Atom v0.12.25
  [6e4b80f9] BenchmarkTools v0.5.0
  [052768ef] CUDA v2.1.0
  [864edb3b] DataStructures v0.18.8
  [7a1cc6ca] FFTW v1.2.4
  [1a297f60] FillArrays v0.10.0
  [e9467ef8] GLMakie v0.1.14
  [0c68f7d7] GPUArrays v6.1.1
  [e5e0dc1b] Juno v0.8.4

Expected behavior
These two simple kernels only differ in the type of shared memory used. I expected the output to be identical. My only assumption was that when launching a kernel with dynamic shmem, I needed to only specify total shared memory.

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\ellis\AppData\Local\atom\app-1.53.0\atom.exe"  -a
  JULIA_NUM_THREADS = 4

Details on CUDA:

CUDA toolkit 11.0.3, artifact installation
CUDA driver 11.0.0
NVIDIA driver 451.67.0

Libraries:
- CUBLAS: 11.2.0
- CURAND: 10.2.1
- CUFFT: 10.2.1
- CUSOLVER: 10.6.0
- CUSPARSE: 11.1.1
- CUPTI: 13.0.0
- NVML: 11.0.0+451.67
- CUDNN: 8.0.4 (for CUDA 11.0.0)
- CUTENSOR: 1.2.1 (for CUDA 11.0.0)

Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: GeForce RTX 2070 (sm_75, 6.916 GiB / 8.000 GiB available)

Additional context
#431

@xaellison xaellison added the bug Something isn't working label Nov 21, 2020
@maleadt
Copy link
Member

maleadt commented Nov 21, 2020

Correct, this is by design. That's why there's an offset argument to the macro. I don't think it would be better if the order of cuDynamicSharedMem invocations determines the offset, as that would be very fragile. AFAIK, CUDA C behaves the same.

@maleadt maleadt removed the bug Something isn't working label Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants