Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance improvement for Symmetric/Hermitian of sparse times vector #30018

Merged
merged 32 commits into from
Jan 18, 2019
Merged

performance improvement for Symmetric/Hermitian of sparse times vector #30018

merged 32 commits into from
Jan 18, 2019

Conversation

KlausC
Copy link
Contributor

@KlausC KlausC commented Nov 13, 2018

If A is a sparse matrix (Float64, 10_000x10_000, nnz=10^6) and B a dense vector, we have currently an execution time for Symmetric(A) * B of 3 s. This PR reduces execution time to 1.8 ms, which is a factor of more than 1500!
For element type BigFloat the times are 19.8 s and 360 ms, which is still factor 55.
Same situation for all variants of Symmetric/Hermitian, :U/:L, and Real/Complex.

Benchmark values current situation:

julia> n = 10000
10000

julia> Random.seed!(0); A = sprandn(n, n, 0.01); nnz(A)
1000598

julia> B = randn(n);

julia> Asym = Symmetric(A); As = sparse(Asym);

julia> @benchmark $Asym * $B
BenchmarkTools.Trial: 
  memory estimate:  390.77 KiB
  allocs estimate:  10004
  --------------
  minimum time:     3.055 s (0.00% GC)
  median time:      3.082 s (0.00% GC)
  mean time:        3.082 s (0.00% GC)
  maximum time:     3.110 s (0.00% GC)
  --------------
  samples:          2
  evals/sample:     1

Benchmark with this PR included:

julia> @benchmark $Asym * $B
BenchmarkTools.Trial: 
  memory estimate:  78.20 KiB
  allocs estimate:  2
  --------------
  minimum time:     1.826 ms (0.00% GC)
  median time:      1.858 ms (0.00% GC)
  mean time:        1.890 ms (0.09% GC)
  maximum time:     2.566 ms (0.00% GC)
  --------------
  samples:          2633
  evals/sample:     1

@kshyatt kshyatt added performance Must go faster domain:linear algebra Linear algebra domain:arrays:sparse Sparse arrays labels Nov 14, 2018
@KlausC
Copy link
Contributor Author

KlausC commented Nov 23, 2018

The immense improvement factor implies widening the realm of solvable matrix calculations on a given hardware enormously. I think, in the case "performance improvement" is a slight understatement - I am wondering, why I cannot attract more attention. (@StefanKarpinski, @andreasnoack).

@chriscoey
Copy link

I'm with you! I wish there was a regular linear algebra triage call.

@andreasnoack
Copy link
Member

Please see #22200. It's always a good idea to ask before implementing.

I wish there was a regular linear algebra triage call.

I'd okay with trying that out. I think the main limiting factor is still reviews though.

@KristofferC
Copy link
Sponsor Member

FWIW, comparing the speed improvement against something that hits the abstractarray fallback is not terribly interesting since you can gain arbitrary speedup by changing the sparsity density. Comparing to the non-symmetric sparse case is perhaps more relevant.

@KlausC
Copy link
Contributor Author

KlausC commented Nov 24, 2018

comparing the speed improvement against something that hits the abstractarray fallback is not terribly interesting

I don't think so, because currently there is no good way to get the job done.
IMO the question is, how to get Asym * b done fast if Asym isa Symmetric{,SparseMatrixCSC} and length(Asym) is big while nnz(Asym.data) is small. Fast means O(nnz(Asym.data)) in contrast to O(A.n^2).
The most obvious approach Asym * b should not lead to the "abstractarray fallback" trap, because the naive user is not aware of that.
The smart user could try to find a work-around, but he will have a hard time and not a real success:

julia> @btime $Asym * $b;            # naive user is disappointed with the current situation
  3.016 s (2 allocations: 78.20 KiB)

julia> @btime $Asym * sparse($b);      # uses the SuiteSparse.CHOLMOD C-library
  24.539 ms (50 allocations: 31.80 MiB)

julia> @btime $(sparse(Asym)) * $b;    # only multiplication measured
  1.949 ms (2 allocations: 78.20 KiB)

julia> @btime sparse($Asym) * $b; # included time for sparse(Asym) (BTW: reducible to 10 ms !)
  2.809 s (37 allocations: 40.10 MiB)

after the PR is merged:

julia> @btime $Asym * $b;           # the most natural approach is convincing now
  1.835 ms (2 allocations: 78.20 KiB)

In consequence the user is forced avoid the Symmetric/Hermitian wrappers of sparse matrices, if he wants good performance. But that is not desirable or possible in all cases.

you can gain arbitrary speedup by changing the sparsity density

Of course that is true. But I don't think, the example is unrealistic. Actually I looked for an example, which clearly demonstrates the difference between O(nnz(A)) and O(n²).
In the SuiteSparse Matrix Collection more than 50% of the symmetric examples fall in this category:

julia> using MatrixDepot   # v"0.7.0"
julia> length(mdlist(sp(:) & issymmetric))
1148
julia> length(mdlist(sp(:) & issymmetric & @pred(n >= 10^4 && nnz/n^2 <= 0.01)))
587

@KlausC
Copy link
Contributor Author

KlausC commented Nov 24, 2018

Please see #22200. It's always a good idea to ask before implementing.

I looked at #22200 and it is a pity, that I didn't see it before. I would not have started my efforts, though, if that PR had been merged to master before I was stopped in my own project by the missing linalg operation on wrapped sparse matrices. I do not know, what prevented #22200 from accepted. Now we have 2 implementations for Symmetric{,SparseMatrixCSC} * StridedVector and we should select one and whichever is better merge it into master immediately.

@andreasnoack andreasnoack self-requested a review December 12, 2018 18:21
@KlausC KlausC closed this Dec 16, 2018
@KlausC KlausC reopened this Dec 16, 2018
@ViralBShah
Copy link
Member

ViralBShah commented Dec 17, 2018

@KlausC Please bump often to make sure we get this done, if you don't mind.

@ViralBShah
Copy link
Member

@andreasnoack Do we have a preference over which of the two PRs to merge? Perhaps this one is newer and may be lesser work?

@ViralBShah
Copy link
Member

Bump

@KlausC
Copy link
Contributor Author

KlausC commented Jan 10, 2019

@KlausC Please bump often to make sure we get this done, if you don't mind.

I will do...

@StefanKarpinski StefanKarpinski added this to the 1.2 milestone Jan 10, 2019
@KlausC
Copy link
Contributor Author

KlausC commented Jan 18, 2019

humbly knocking at the door...

@ViralBShah
Copy link
Member

This looks good to me.

@ViralBShah ViralBShah merged commit 30c6ee7 into JuliaLang:master Jan 18, 2019
@ViralBShah
Copy link
Member

I guess we can backport this to 1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays:sparse Sparse arrays domain:linear algebra Linear algebra kind:potential benchmark Could make a good benchmark in BaseBenchmarks performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants