Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linalg.misc.mult_matvec protential bug for Fortran Matrix-vector element wise mulitplication #316

Open
yumais2 opened this issue Apr 5, 2021 · 1 comment

Comments

@yumais2
Copy link

yumais2 commented Apr 5, 2021

Problem

Potential Bugs in cuda code of Fortran array. The case where input Matrix is Fortran array, and axis == 0. The kernel code seems to use the wrong indexing. Here is our workable version, please confirm.

        if axis == 0:
            row_kernel(x_gpu, a_gpu, out, n, m,
                       block=block, grid=grid, stream=stream,
                       shared=24*x_gpu.dtype.itemsize)
__global__ void opRowVecToMat(const ${type}* mat, const ${type}* vec, ${type}* out,
                                   const int n, const int m){
        const int tx = threadIdx.x;
        const int ty = threadIdx.y;
        const int tidx = blockIdx.x * blockDim.x + threadIdx.x;
        const int tidy = blockIdx.y * blockDim.y + threadIdx.y;

        extern __shared__ ${type} shared_vec[];

        if ((ty == 0) & (tidx < n))
            shared_vec[tx] = vec[tidx];
        __syncthreads();

        if ((tidy < m) & (tidx < n)) {
            out[tidy*n+tidx] = mat[tidy*n+tidx] ${binary_op} shared_vec[tx];
        }
    }

We haven't verify, but we think axis == 1 case for Fortran array is also buggy. The following are what we think how the code supposed to be like.

        elif axis == 1:
            col_kernel(x_gpu, a_gpu, out, n, m,
                       block=block, grid=grid, stream=stream,
                       shared=24*x_gpu.dtype.itemsize)

    __global__ void opColVecToMat(const ${type} *mat, const ${type} *vec, ${type} *out,
                                   const int n, const int m){
        const int tx = threadIdx.x;
        const int ty = threadIdx.y;
        const int tidx = blockIdx.x * blockDim.x + threadIdx.x;
        const int tidy = blockIdx.y * blockDim.y + threadIdx.y;

        extern __shared__ ${type} shared_vec[];

        if ((tx == 0) & (tidy < m))
            shared_vec[ty] = vec[tidy];
        __syncthreads();

        if ((tidy < m) & (tidx < n)) {
            out[tidy*n+tidx] = mat[tidy*n+tidx] ${binary_op} shared_vec[ty];
        }
    }

Environment

CUDA - 11.0.0
skcuda - 0.5.2
pycuda - 2020.1
python - 3. 7. 4
Linux - ubuntu- 18.04

@yumais2 yumais2 changed the title linalg.misc.mult_matvec linalg.misc.mult_matvec protential bug for Fortran Matrix-vector element wise mulitplication Apr 5, 2021
@yuanz4
Copy link

yuanz4 commented Apr 5, 2021

Same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants