mul! performance regression for sparse matrices in v1.11.0-DEV.1035 #52429

dmbates · 2023-12-06T20:30:16Z

We have noticed that one of the tests in MixedModels.jl has had a dramatic increase in execution time in recent DEV versions (locally compiled on an Mac M1 but the same performance regression is seen on Intel-based computers.) The test is called grouseticks.

A big part of the problem seems to be a performance regression in mul! when C is a sparse matrix, A is a sparse matrix and B is the transpose of a sparse matrix. A MWE is

using BenchmarkTools, LinearAlgebra, SparseArrays

Is = repeat(1:10; inner=200);
Js = repeat(1:100; inner=20);
Ks = repeat(1:1000; inner=2);
V = ones(2000);
A = sparse(Is, Ks, V);
B = sparse(Js, Ks, V);
C = sparse(Is, Js, V);

@benchmark mul!(c, a, b', -0.001, 1.0) seconds=1 setup=(a=A; b=B; c=C))

Under 1.10.0-rc2 the benchmark on my M1 laptop runs in around 0.5 ms and under 1.11.0-DEV.1035 it takes about 8 ms

The text was updated successfully, but these errors were encountered:

giordano · 2023-12-06T21:25:54Z

Probably due to the excision of SparseArrays from the sysimage? But that might affect only compile time, if this is reproducible after compilation it might be something else.

dmbates · 2023-12-07T20:55:59Z

I believe it is due to a change in algorithm for the function _generic_matmatmul! defined in LinearAlgebra/src/matmul.jl. When profiling a similar calculation the majority of the time under 1.11.0-DEV is spent in that function and the time spent in this function is much greater than that under 1.10.0-rc2 and earlier versions.

aravindh-krishnamoorthy · 2023-12-08T14:37:35Z

See also #52137

dkarrasch · 2023-12-08T17:35:51Z

The major refactoring (for significantly reduced load times of SparseArrays.jl) happened in v1.10, I'm not even sure we've had any major merged PRs in SparseArrays.jl in the v1.11 cycle. In that mentioned refactoring, the underlying multiplication algorithms have remained untouched, it's only the dispatch that might have changed, but if it has, that was clearly not the intention.

dkarrasch · 2023-12-08T18:01:03Z

Ok, I suspect it's 0cf2bf1 (#52038) that has led to the increase in runtime. The reason is that the case of sparse output for sparse-sparse multiplication has always been handled by the most generic LinearAlgebra._generic_matmatmul!. Specialized multiplication kernels exist only for strided output.

julia> @which LinearAlgebra.generic_matmatmul!(c, 'N', 'T', a, b, LinearAlgebra.MulAddMul(-0.001, 1.0))
generic_matmatmul!(C::AbstractVecOrMat, tA, tB, A::AbstractVecOrMat, B::AbstractVecOrMat, _add::LinearAlgebra.MulAddMul)
     @ LinearAlgebra /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:767

If that's the case, then this is different from #52137, where the exact same multiplication kernel has significantly reduced performance.

EDIT: Well, that's exactly #52429 (comment).

dkarrasch · 2023-12-08T18:23:09Z

Furthermore, I suspect that it has to do with the tiling algorithm. If I understand that one correctly, then it takes tiles of the arrays and copies them to some temporary (dense) array. The intermediate steps are then much faster than accessing sparse arrays element-by-element. Maybe we need to bring back that tiling-based algorithm and restrict the current ones to strided arrays with generic/mixed eltypes? For those cases, it's been pretty much optimized. @chriselrod

palday · 2023-12-08T18:40:27Z

I bisected this to see where the minimum time from example @dmbates provided exceeded 5ms on my MacBook. (I get similar times to @dmbates on my laptop -- minimum time is ~0.5ms on 1.10, and >>5ms on nightly).

0cf2bf143552c2f425150fcb2a60225c67ea042d is the first bad commit
commit 0cf2bf143552c2f425150fcb2a60225c67ea042d
Author: Daniel Karrasch <daniel.karrasch@posteo.de>
Date:   Tue Nov 14 22:58:03 2023 +0100

    Reduce compile time for generic matmatmul (#52038)
    
    This is another attempt at improving the compile time issue with generic
    matmatmul, hopefully improving runtime performance also.
    
    @chriselrod @jishnub
    
    There seems to be a little typo/oversight somewhere, but it shows how it
    could work. Locally, this reduces benchmark times from
    https://github.com/JuliaLang/julia/pull/51812#issuecomment-1780394475 by
    more than 50%.
    
    ---------
    
    Co-authored-by: Chris Elrod <elrodc@gmail.com>

dkarrasch · 2023-12-09T12:45:39Z

Perhaps we should move the tiling-based algorithm to SparseArrays.jl? It performed worse than what we have now for strided arrays with generic and/or mixed eltypes. So the real benefit of it seems to be in locally densifying sparse arrays.

giordano added performance Must go faster regression Regression in behavior compared to a previous version sparse Sparse arrays labels Dec 6, 2023

giordano added the linear algebra Linear algebra label Dec 7, 2023

aravindh-krishnamoorthy mentioned this issue Dec 8, 2023

Regression in sparse-dense multiplication #52137

Open

dkarrasch mentioned this issue Jan 9, 2024

Add generic matmatmul for inplace sparse x sparse JuliaSparse/SparseArrays.jl#486

Merged

dkarrasch closed this as completed in JuliaSparse/SparseArrays.jl#486 Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mul! performance regression for sparse matrices in v1.11.0-DEV.1035 #52429

mul! performance regression for sparse matrices in v1.11.0-DEV.1035 #52429

dmbates commented Dec 6, 2023

giordano commented Dec 6, 2023 •

edited

Loading

dmbates commented Dec 7, 2023

aravindh-krishnamoorthy commented Dec 8, 2023

dkarrasch commented Dec 8, 2023

dkarrasch commented Dec 8, 2023 •

edited

Loading

dkarrasch commented Dec 8, 2023

palday commented Dec 8, 2023

dkarrasch commented Dec 9, 2023

mul! performance regression for sparse matrices in v1.11.0-DEV.1035 #52429

mul! performance regression for sparse matrices in v1.11.0-DEV.1035 #52429

Comments

dmbates commented Dec 6, 2023

giordano commented Dec 6, 2023 • edited Loading

dmbates commented Dec 7, 2023

aravindh-krishnamoorthy commented Dec 8, 2023

dkarrasch commented Dec 8, 2023

dkarrasch commented Dec 8, 2023 • edited Loading

dkarrasch commented Dec 8, 2023

palday commented Dec 8, 2023

dkarrasch commented Dec 9, 2023

giordano commented Dec 6, 2023 •

edited

Loading

dkarrasch commented Dec 8, 2023 •

edited

Loading