-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mul! performance regression for sparse matrices in v1.11.0-DEV.1035 #52429
Comments
Probably due to the excision of SparseArrays from the sysimage? But that might affect only compile time, if this is reproducible after compilation it might be something else. |
I believe it is due to a change in algorithm for the function |
See also #52137 |
The major refactoring (for significantly reduced load times of SparseArrays.jl) happened in v1.10, I'm not even sure we've had any major merged PRs in SparseArrays.jl in the v1.11 cycle. In that mentioned refactoring, the underlying multiplication algorithms have remained untouched, it's only the dispatch that might have changed, but if it has, that was clearly not the intention. |
Ok, I suspect it's 0cf2bf1 (#52038) that has led to the increase in runtime. The reason is that the case of sparse output for sparse-sparse multiplication has always been handled by the most generic julia> @which LinearAlgebra.generic_matmatmul!(c, 'N', 'T', a, b, LinearAlgebra.MulAddMul(-0.001, 1.0))
generic_matmatmul!(C::AbstractVecOrMat, tA, tB, A::AbstractVecOrMat, B::AbstractVecOrMat, _add::LinearAlgebra.MulAddMul)
@ LinearAlgebra /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:767 If that's the case, then this is different from #52137, where the exact same multiplication kernel has significantly reduced performance. EDIT: Well, that's exactly #52429 (comment). |
Furthermore, I suspect that it has to do with the tiling algorithm. If I understand that one correctly, then it takes tiles of the arrays and copies them to some temporary (dense) array. The intermediate steps are then much faster than accessing sparse arrays element-by-element. Maybe we need to bring back that tiling-based algorithm and restrict the current ones to strided arrays with generic/mixed eltypes? For those cases, it's been pretty much optimized. @chriselrod |
I bisected this to see where the minimum time from example @dmbates provided exceeded 5ms on my MacBook. (I get similar times to @dmbates on my laptop -- minimum time is ~0.5ms on 1.10, and >>5ms on nightly).
|
Perhaps we should move the tiling-based algorithm to SparseArrays.jl? It performed worse than what we have now for strided arrays with generic and/or mixed eltypes. So the real benefit of it seems to be in locally densifying sparse arrays. |
We have noticed that one of the tests in MixedModels.jl has had a dramatic increase in execution time in recent DEV versions (locally compiled on an Mac M1 but the same performance regression is seen on Intel-based computers.) The test is called
grouseticks
.A big part of the problem seems to be a performance regression in
mul!
when C is a sparse matrix, A is a sparse matrix and B is the transpose of a sparse matrix. A MWE isUnder 1.10.0-rc2 the benchmark on my M1 laptop runs in around 0.5 ms and under 1.11.0-DEV.1035 it takes about 8 ms
The text was updated successfully, but these errors were encountered: