This repository was archived by the owner on May 27, 2021. It is now read-only.

Description
Cause seems to be an added checkbounds, if that even makes sense.
Repro:
using CUDAdrv, CUDAnative
@target ptx function kernel(arr::Ptr{Int32})
temp = @cuStaticSharedMem(Int32, (2, 1))
tx = Int(threadIdx().x)
if tx == 1
for i = 1:2
# THIS BREAKS STUFF: checkbounds(temp, i)
Base.pointerset(temp.ptr, 1, i, 8)
end
end
sync_threads()
Base.pointerset(arr, Base.pointerref(temp.ptr, tx, 8), tx, 8)
return nothing
end
dev = CuDevice(0)
ctx = CuContext(dev)
d_arr = CuArray(Int32, (2, 1))
@cuda (1,2) kernel(d_arr.ptr)
println(Array(d_arr))
destroy(ctx)
Result without checkbounds: [1; 1]. With: [1; 0].
cc @cfoket