Skip to content

@index(Global, NTuple) Giving incorrect behavior with CPU() backend #461

@cmhamel

Description

@cmhamel

Hi all,
first off great package!

I found an issue using @index(Global, NTuple) with a CPU backend. Below is a minimum working example.

using Adapt
using CUDA
using KernelAbstractions

@kernel function set_matrix!(A)
  i, j = @index(Global, NTuple)
  A[i, j] = 1.0
  return nothing
end

function run_kernel_cpu()
  A = zeros(10, 10)
  kernel! = set_matrix!(CPU())
  kernel!(A, ndrange=size(A))
  return A
end

function run_kernel_gpu()
  A = zeros(10, 10)
  A = Adapt.adapt_structure(CuArray, A)
  kernel! = set_matrix!(CUDABackend())
  kernel!(A, ndrange=size(A))
  return A
end

A_cpu = run_kernel_cpu()
display(A_cpu)

A_gpu = run_kernel_gpu()
display(A_gpu)

Here is the output

10×10 Matrix{Float64}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
10×10 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

As you can see, the GPU kernel behaves as expected but the CPU kernel seems to only iterate on the first index. I checked this by printing out the index IDs and it looked like only 10 iterations of the kernel were run rather than 100.

Here's my current environment as well

(@v1.10) pkg> status
Status `~/.julia/environments/v1.10/Project.toml`
  [79e6a3ab] Adapt v4.0.1
  [6e4b80f9] BenchmarkTools v1.4.0
  [052768ef] CUDA v5.2.0
  [63c18a36] KernelAbstractions v0.9.16
  [295af30f] Revise v3.5.14

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions