Skip to content

Implement mapslices without scalar iteration #807

@yeruoforever

Description

@yeruoforever

When I use mapslices(f,a,dims) to manipulate CuArray, a warning appears. It reminds me that using scalar operations on the GPU is inefficient.

a=CUDA.rand(3,4,5)
b=CUDA.rand(2,3)
mapslices(a,dims=[1,2])do t
           b*t
end

I had to use additional code to perform the operation.

c=map(eachslice(a,dims=3)) do t
           b*t
end
cat(c...,dims=3)

In neural networks or machine learning, mini-batch is often used. When a sample is not a vector or matrix, the input of the model will have multiple dimensions each time, such as size(x)==(100,3,4,batch_size). In this case, mapslices() seems very convenient.

However, when the model and input are both CuArray, the GPU will be very inefficient due to too many scalar operations. Can the internal operations of mapslices() be optimized to make it more efficient?
Describe the solution you'd like

Is there a more elegant way to implement mapslices(f,a,dims) that enables it to use vectorized operations instead of scalar operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    hardThis is difficult.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions