cupy raw kernel cannot handle view of cupy ndarray #7267

kanglcn · 2022-12-28T20:56:35Z

Description

When feeding a view of cupy ndarray into a kernel, for example, a slice of a big ndarray, the result looks like the kernel read the original big ndarray not a slice of it.

To Reproduce

import cupy as cp
x = cp.arange(10, dtype=cp.complex64).reshape(2,5)

show = cp.RawKernel(r'''
#include <cuComplex.h>

extern "C" __global__
void show(const cuFloatComplex* x, const int N){
  int i = blockDim.x * blockIdx.x + threadIdx.x;
  
  if(i == 0 ){
      printf("%f\n",cuCrealf(x[N]));
  }
}
''', 'show')

When call the kernel:

show((2,), (5,), (x,cp.int32(6)))
cp.cuda.runtime.deviceSynchronize()

It will print:

6.000000

But if feed a slice of x:

x_slice = x[:,:4]
show((2,), (5,), (x_slice,cp.int32(6)))
cp.cuda.runtime.deviceSynchronize()

It also print:

6.000000

which is not as wanted.

However, if a copy is fed:

x_slice = x[:,:4].copy()
show((2,), (5,), (x_slice,cp.int32(6)))
cp.cuda.runtime.deviceSynchronize()

It print:

7.000000

as expected.

Installation

Conda-Forge (conda install ...)

Environment

OS                           : Linux-3.10.0-1127.18.2.el7.x86_64-x86_64-with-glibc2.17
Python Version               : 3.9.13
CuPy Version                 : 11.2.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.4
SciPy Version                : 1.9.3
Cython Build Version         : 0.29.32
Cython Runtime Version       : None
CUDA Root                    : /users/kangl/miniconda3/envs/rapids-22.10
nvcc PATH                    : /users/kangl/miniconda3/envs/rapids-22.10/bin/nvcc
CUDA Build Version           : 11020
CUDA Driver Version          : 11040
CUDA Runtime Version         : 11070
cuBLAS Version               : (available)
cuFFT Version                : 10702
cuRAND Version               : 10301
cuSOLVER Version             : (11, 4, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 7)
Thrust Version               : 101000
CUB Build Version            : 101000
Jitify Build Version         : 343be31
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : 21403
NCCL Runtime Version         : 21403
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : Tesla V100-SXM2-32GB
Device 0 Compute Capability  : 70
Device 0 PCI Bus ID          : 0000:15:00.0

Additional Information

No response

The text was updated successfully, but these errors were encountered:

leofang · 2022-12-29T21:22:29Z

This is expected. When you only pass a raw pointer to a kernel, it's assumed that the array is contiguous in memory. If you want to handle views, you should also pass the array shape/strides and distribute the workload to each thread yourself. CuPy has the machinery for doing so, but I am not sure if they are considered public API. If it's simple Elementwise operation, it's better to just use ElementwiseKernel instead.

kanglcn · 2022-12-30T00:06:00Z

Thank you @leofang for your comments! I didn't find the public API in cupy document so I will run deep copy before feeding my RawKernel. I want each thread work on a small patch of a big array. I didn't find the way to use ElementwiseKernel. If there are any smart way to achieve, please let me know! I appreciate it!

leofang · 2022-12-31T00:20:35Z

I want each thread work on a small patch of a big array.

It's unclear to me what you intended to do. If you can show us a version that you expect to work with contiguous arrays, we might be able to suggest alternative approaches.

kanglcn · 2023-01-02T07:40:35Z

Thanks for your nice help! I finally find a way to implant it with elementwise kernel with the raw keyword so I don't need to bother you for this question.

I wonder if there is any easier way to index a multidimensional array? For example, the element of the diagonal of a N*N matrix a can be easily indexed by a[i,i], but in the kernel, I have to use a[i*N+i] which is not intuitive. I have tried to use reduce_dims parameters in ElementwiseKernel() but it doesn't work. Thanks!

leofang · 2023-01-02T17:54:35Z

Glad to know it works for you! AFAIK you need to use 1D indexing with ElementwiseKernel. It has to do how the internal machinery is designed to handle array views. Moreover, 1D indexing should be familiar for most C/C++ programmers 🙂

kanglcn added the cat:bug Bugs label Dec 28, 2022

asi1024 added issue-checked and removed cat:bug Bugs labels Jan 4, 2023

kanglcn closed this as completed Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cupy raw kernel cannot handle view of cupy ndarray #7267

cupy raw kernel cannot handle view of cupy ndarray #7267

kanglcn commented Dec 28, 2022

leofang commented Dec 29, 2022

kanglcn commented Dec 30, 2022

leofang commented Dec 31, 2022

kanglcn commented Jan 2, 2023 •

edited

leofang commented Jan 2, 2023

cupy raw kernel cannot handle view of cupy ndarray #7267

cupy raw kernel cannot handle view of cupy ndarray #7267

Comments

kanglcn commented Dec 28, 2022

Description

To Reproduce

Installation

Environment

Additional Information

leofang commented Dec 29, 2022

kanglcn commented Dec 30, 2022

leofang commented Dec 31, 2022

kanglcn commented Jan 2, 2023 • edited

leofang commented Jan 2, 2023

kanglcn commented Jan 2, 2023 •

edited