@index(Global, NTuple) Giving incorrect behavior with CPU() backend

Hi all,
first off great package!

I found an issue using ```@index(Global, NTuple)``` with a CPU backend. Below is a minimum working example.

```
using Adapt
using CUDA
using KernelAbstractions

@kernel function set_matrix!(A)
  i, j = @index(Global, NTuple)
  A[i, j] = 1.0
  return nothing
end

function run_kernel_cpu()
  A = zeros(10, 10)
  kernel! = set_matrix!(CPU())
  kernel!(A, ndrange=size(A))
  return A
end

function run_kernel_gpu()
  A = zeros(10, 10)
  A = Adapt.adapt_structure(CuArray, A)
  kernel! = set_matrix!(CUDABackend())
  kernel!(A, ndrange=size(A))
  return A
end

A_cpu = run_kernel_cpu()
display(A_cpu)

A_gpu = run_kernel_gpu()
display(A_gpu)
```

Here is the output

```
10×10 Matrix{Float64}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
10×10 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
```

As you can see, the GPU kernel behaves as expected but the CPU kernel seems to only iterate on the first index. I checked this by printing out the index IDs and it looked like only 10 iterations of the kernel were run rather than 100.

Here's my current environment as well

```
(@v1.10) pkg> status
Status `~/.julia/environments/v1.10/Project.toml`
  [79e6a3ab] Adapt v4.0.1
  [6e4b80f9] BenchmarkTools v1.4.0
  [052768ef] CUDA v5.2.0
  [63c18a36] KernelAbstractions v0.9.16
  [295af30f] Revise v3.5.14

```

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

@index(Global, NTuple) Giving incorrect behavior with CPU() backend #461

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

@index(Global, NTuple) Giving incorrect behavior with CPU() backend #461

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions