Is this a duplicate?
Area
CUB
Is your feature request related to a problem? Please describe.
As of #7277 @fbusato added tensor copy algorithm. We occasionally get requests for other classes of tensor algorithms, such as reductions and cumulative sums along a dimension. Before opening implementation issues, we should identify which algorithms need mdspan overloads or tensor-aware APIs.
Describe the solution you'd like
As a first step, we should review CuPy (cc @leofang) and similar array libraries to see which generic kernels they had to reimplement to support multidimensional, strided, broadcasted, or axis-based input. This should help us distinguish cases covered by existing primitives from cases where users still need custom tensor kernels.
Describe alternatives you've considered
No response
Additional context
No response
Is this a duplicate?
Area
CUB
Is your feature request related to a problem? Please describe.
As of #7277 @fbusato added tensor copy algorithm. We occasionally get requests for other classes of tensor algorithms, such as reductions and cumulative sums along a dimension. Before opening implementation issues, we should identify which algorithms need
mdspanoverloads or tensor-aware APIs.Describe the solution you'd like
As a first step, we should review CuPy (cc @leofang) and similar array libraries to see which generic kernels they had to reimplement to support multidimensional, strided, broadcasted, or axis-based input. This should help us distinguish cases covered by existing primitives from cases where users still need custom tensor kernels.
Describe alternatives you've considered
No response
Additional context
No response