-
Notifications
You must be signed in to change notification settings - Fork 258
Description
When I use mapslices(f,a,dims) to manipulate CuArray, a warning appears. It reminds me that using scalar operations on the GPU is inefficient.
a=CUDA.rand(3,4,5)
b=CUDA.rand(2,3)
mapslices(a,dims=[1,2])do t
b*t
endI had to use additional code to perform the operation.
c=map(eachslice(a,dims=3)) do t
b*t
end
cat(c...,dims=3)In neural networks or machine learning, mini-batch is often used. When a sample is not a vector or matrix, the input of the model will have multiple dimensions each time, such as size(x)==(100,3,4,batch_size). In this case, mapslices() seems very convenient.
However, when the model and input are both CuArray, the GPU will be very inefficient due to too many scalar operations. Can the internal operations of mapslices() be optimized to make it more efficient?
Describe the solution you'd like
Is there a more elegant way to implement mapslices(f,a,dims) that enables it to use vectorized operations instead of scalar operations.