When multiplying Dense matrices by sparse vectors the speed seems to degrade too much if the dense matrix is an adjoint.
percent_nonzeros = 0.01
W = rand(15000, 2);
x_sp = sprand(15000, percent_nonzeros);
Wt = copy(W')
@btime W'*x_sp
285.560 μs (6 allocations: 304 bytes)
@btime Wt*x_sp
522.068 ns (1 allocation: 96 bytes)
This seems to be related with #32195 but this is not matrix-matrix multiply. This is matrix-vector multiply