draft: symmetric sparse matrix support #22200

dpo · 2017-06-03T03:51:11Z

There is background information for this, including benchmarks of a draft version of this code at https://discourse.julialang.org/t/slow-sparse-matrix-vector-product-with-symmetric-matrices. The more general implementation in this PR, which follows the code for generic sparse matrix-vector product, is a bit slower than what's reported in the thread.

I'd like to gather some feedback and, assuming this is of interest, I'll be happy to implement what's missing and do the same for Hermitian.

@andreasnoack

tkelman · 2017-06-03T03:55:48Z

+1 for the direction. Symmetric{SparseMatrixCSC} has a missing-method problem, that this makes a nice dent in.

jebej · 2017-06-03T11:11:49Z

Nice! Is this faster than just using a regular and symmetric sparse matrix?

Also, would it make sense to store only the upper or lower triangle of A, instead of the full A?

fredrikekre · 2017-06-03T11:28:51Z

Also, would it make sense to store only the upper or lower triangle of A, instead of the full A?

Symmetric is just ~~should just be~~ a wrapper. It is up to the user to decide what to store (i.e. when creating the sparse matrix)

jebej · 2017-06-03T12:40:16Z

Symmetric is just should just be a wrapper. It is up to the user to decide what to store (i.e. when creating the sparse matrix)

That's fair, but I am assuming that there are two reasons to have an optimized sparse symmetric implementation: storage and speed. I was curious about this because I use Hermitian sparse matrices and so an optimized type is something I am interested in. I made a quick benchmark to compare the regular sparse matrix multiplication, with this one, and with one where the storage data comprises only the upper triangle.

Code is there, and here are the results, for a density of 0.32, and square matrices of different sizes:

SymmetricSparseTests.runtests(5)
Normal sparse: Trial(146.313 ns)
Symmetric sparse: Trial(164.220 ns)
Symmetric sparse (triu only): Trial(161.990 ns)
Symmetric sparse (triu only) optimized: Trial(152.664 ns)

SymmetricSparseTests.runtests(10)
Normal sparse: Trial(613.115 ns)
Symmetric sparse: Trial(632.982 ns)
Symmetric sparse (triu only): Trial(605.547 ns)
Symmetric sparse (triu only) optimized: Trial(562.321 ns)

SymmetricSparseTests.runtests(25)
Normal sparse: Trial(5.992 μs)
Symmetric sparse: Trial(6.255 μs)
Symmetric sparse (triu only): Trial(6.196 μs)
Symmetric sparse (triu only) optimized: Trial(5.992 μs)

SymmetricSparseTests.runtests(50)
Normal sparse: Trial(37.411 μs)
Symmetric sparse: Trial(41.211 μs)
Symmetric sparse (triu only): Trial(40.626 μs)
Symmetric sparse (triu only) optimized: Trial(37.119 μs)

SymmetricSparseTests.runtests(200)
Normal sparse: Trial(2.304 ms)
Symmetric sparse: Trial(2.766 ms)
Symmetric sparse (triu only): Trial(2.678 ms)
Symmetric sparse (triu only) optimized: Trial(2.422 ms)

SymmetricSparseTests.runtests(500)
Normal sparse: Trial(34.553 ms)
Symmetric sparse: Trial(45.762 ms)
Symmetric sparse (triu only): Trial(44.226 ms)
Symmetric sparse (triu only) optimized: Trial(40.840 ms)

KristofferC · 2017-06-03T12:44:09Z

You can always create Symmetric(triu(K)). It just probably shouldn't be done automatically by the Symmetric constructor.

jebej · 2017-06-03T12:51:39Z

You can always create Symmetric(triu(K)). It just probably shouldn't be done automatically by the Symmetric constructor.

I understand that, but if the wrapper cannot assume that only the triu part of the matrix is stored, there still need to be checks on which triangle should be looked at, and which entries should be looked at.

dpo · 2017-06-03T13:04:54Z

There were some benchmarks in the Discourse thread using a large sparse matrix. You can see them in this gist: https://gist.github.com/dpo/481b0c03dd08d26af342573df98ddc21. Profiling reveals that index tests inside the multiplication kernel do consume substantial time.

jebej · 2017-06-03T19:47:39Z

base/sparse/symmetric.jl

+                row > col && break  # assume indices are sorted
+                a = nzval[j]
+                C[row, k] += a * αxj
+                row == col || (tmp += a * B[row, k])


I think this should have an alpha

Of course, thanks. I chose to multiply tmp by alpha after the loop to save a few cycles.

ps: should the kernels be exported?

I don't think so.

tkelman · 2017-06-03T21:23:38Z

base/sparse/symmetric.jl

+
+function A_mul_B_L_kernel!(α::Number, A::Symmetric{TA,SparseMatrixCSC{TA,S}}, B::StridedVecOrMat, β::Number, C::StridedVecOrMat) where {TA,S}
+
+  colptr = A.data.colptr


4 space indent, and shouldn't start a function body with a blank line

Should be fixed. Besides doing the same job for Hermitian, what other methods not yet covered would users find crucial?

jebej · 2017-06-03T22:22:06Z

Is there anything that can be done to make it faster than the regular sparse type? Right now this is slower, which seems undesirable.

fredrikekre · 2017-06-03T22:21:10Z

base/sparse/symmetric.jl

@@ -0,0 +1,72 @@
+# This file is a part of Julia. License is MIT: https://julialang.org/license
+
+function Symmetric(A::SparseMatrixCSC, uplo::Symbol=:U)


This is not needed since it is equivalent to the default constructor:

julia/base/linalg/symmetric.jl

Line 43 in 063e8f1

Symmetric(A::AbstractMatrix, uplo::Symbol=:U) = (checksquare(A); Symmetric{eltype(A),typeof(A)}(A, char_uplo(uplo)))

Right. Fixed.

fredrikekre · 2017-06-03T22:47:04Z

base/sparse/symmetric.jl

+    A_mul_B!(one(T), A, B, zero(T), similar(B, T, (A.data.n, size(B, 2))))
+end
+
+function A_mul_B_U_kernel!(α::Number, A::Symmetric{TA,SparseMatrixCSC{TA,S}}, B::StridedVecOrMat, β::Number, C::StridedVecOrMat) where {TA,S}


There are just 2 lines that are different between the U/L kernels. I think we can merge them, and add a Val{:U}/Val{:L} as last argument to the kernel? Something like:

function A_mul_B_UL_kernel!(α::Number, A::Symmetric{TA,SparseMatrixCSC{TA,S}}, B::StridedVecOrMat, β::Number, C::StridedVecOrMat, ::Type{Val{uplo}}) where {TA,S,uplo} colptr = A.data.colptr rowval = A.data.rowval nzval = A.data.nzval if β != 1 β != 0 ? scale!(C, β) : fill!(C, zero(eltype(C))) end @inbounds for k = 1 : size(C, 2) @inbounds for col = 1 : A.data.n αxj = α * B[col, k] tmp = TA(0) @inbounds for j = (uplo == :U ? (colptr[col] : (colptr[col + 1] - 1)) : (colptr[col] : (colptr[col + 1] - 1))) row = rowval[j] uplo == :U ? (row > col && break) : (row > col && break) # assume indices are sorted a = nzval[j] C[row, k] += a * αxj row == col || (tmp += a * B[row, k]) end C[col, k] += α * tmp end end C end

There's a performance impact for that, which is why I separated them.

In fact, there's a performance impact for allowing C to be general, which makes me think that a separate version where C is just a vector is desirable.

There's a performance impact for that, which is why I separated them.

That should be compiled away.

In fact, there's a performance impact for allowing C to be general, which makes me think that a separate version where C is just a vector is desirable.

If the impact is noticeable it makes sense to have a separate kernel for matvec.

Using the matrix in my gist, here's the benchmark of A * b:

BenchmarkTools.Trial: memory estimate: 3.85 MiB allocs estimate: 2 -------------- minimum time: 21.475 ms (0.00% GC) median time: 22.218 ms (0.00% GC) mean time: 22.425 ms (1.17% GC) maximum time: 27.263 ms (5.92% GC) -------------- samples: 223 evals/sample: 1

Now U * b and L * b using separate kernels:

BenchmarkTools.Trial: memory estimate: 3.85 MiB allocs estimate: 2 -------------- minimum time: 28.259 ms (0.00% GC) median time: 29.112 ms (0.00% GC) mean time: 29.481 ms (0.88% GC) maximum time: 35.077 ms (0.00% GC) -------------- samples: 170 evals/sample: 1 BenchmarkTools.Trial: memory estimate: 3.85 MiB allocs estimate: 2 -------------- minimum time: 33.965 ms (0.00% GC) median time: 35.219 ms (0.00% GC) mean time: 35.797 ms (0.92% GC) maximum time: 40.802 ms (0.00% GC) -------------- samples: 140 evals/sample: 1

and U * b and L * b using the joint kernel (after obvious corrections):

BenchmarkTools.Trial: memory estimate: 3.85 MiB allocs estimate: 4 -------------- minimum time: 33.787 ms (0.00% GC) median time: 34.744 ms (0.00% GC) mean time: 35.347 ms (0.88% GC) maximum time: 40.835 ms (0.00% GC) -------------- samples: 142 evals/sample: 1 BenchmarkTools.Trial: memory estimate: 3.85 MiB allocs estimate: 4 -------------- minimum time: 33.890 ms (0.00% GC) median time: 34.753 ms (0.00% GC) mean time: 35.240 ms (0.89% GC) maximum time: 41.668 ms (0.00% GC) -------------- samples: 142 evals/sample: 1

I didn't know about Val. It improves over my initial joint version somewhat, but there'll still be a noticeable difference over large amounts of matrix-vector products.

fredrikekre · 2017-06-03T22:55:22Z

Is there anything that can be done to make it faster than the regular sparse type? Right now this is slower, which seems undesirable.

Try some larger matrices, 500x500 matrix with .32 density is not really representable for a real use case of sparse matrices. For the cases in your benchmark post above, calling BLAS is more than 13 times faster on my computer.

jebej · 2017-06-03T23:11:28Z

Try some larger matrices, 500x500 matrix with .32 density is not really representable for a real use case of sparse matrices. For the cases in your benchmark post above, calling BLAS is more than 13 times faster on my computer.

I have tried larger and sparser matrices with similar results. I think the cost of accessing the dense array not in column order is high:

SymmetricSparseTests.run_tests(800,0.05)
Normal sparse: Trial(25.844 ms)
Symmetric sparse: Trial(32.871 ms)
Symmetric sparse (triu only): Trial(30.654 ms)
Symmetric sparse (triu only) optimized: Trial(27.898 ms)

SymmetricSparseTests.run_tests(1500,0.05)
Normal sparse: Trial(159.849 ms)
Symmetric sparse: Trial(205.979 ms)
Symmetric sparse (triu only): Trial(194.236 ms)
Symmetric sparse (triu only) optimized: Trial(179.533 ms)

andreasnoack · 2017-06-04T10:52:46Z

I just did a small test using MKL with n=1000 and A=sprandn(n,n,0.01) |> t -> t + t' and the symmetric multiplication is quite a bit slower than the general multiplication so I don't think we should assume that we can beat the general multiplication. Users would need to choose between speed and memory efficiency.

We could consider adding the check

all(1:size(A,2)) do j
    i = last(colptr[j]:colptr[j+1] - 1)
    return i > 0 ? rowval[i] < j : true
end

to determine if only the triangle is stored. It's an O(n) before an O(nnz) and a few examples suggests that it might be worth it although it complicates the code a bit.

dpo · 2017-06-04T13:58:06Z

Option 3 in my gist above adds O(n) storage to basically recover the performance of general matric-vector multiplication. The disadvantage is that a lot of methods involving sparse Symmetric matrices must be reimplemented.

fredrikekre · 2017-06-04T14:17:38Z

Perhaps the index vector in that option can be computed in A_mul_B instead of the Symmetric constructor then?

jebej · 2017-06-04T14:24:57Z

Is there a plan for implementing more sophisticated algorithms for sparse-dense matrix multiplication?

dpo · 2017-06-04T16:54:10Z

Perhaps the index vector in that option can be computed in A_mul_B instead of the Symmetric constructor then?

You could but it would be very inefficient to reconstruct the same array over and over each time you multiply. In iterative methods for linear systems, you may need tens or hundreds of products.

dpo · 2017-06-04T21:17:43Z

Variant 3 in the gist above, with O(n) extra storage, does lend itself to a "fused" implementation using Val{:L}/Val{:U}. The timings are essentially unchanged and the code is shorter.

dpo · 2017-06-09T20:05:05Z

I wonder if it wouldn't make more sense for Symmetric to be more than just a wrapper in the sparse case. In the dense case, there are no savings to be had in storing only a triangle, but in the large and sparse case, it makes a lot of sense and I would argue that that's one of the main points of having a Symmetric type (besides promising that a matrix is indeed symmetric). In addition, most (probably all) sparse factorization routines I know for symmetric matrices claim that they only access one triangle of the input. This has me wonder whether it wouldn't make more sense for Triangular to call tril()/triu() in the sparse case, and for Symmetric and Hermitian to take a Triangular matrix as input. In my opinion, this is a place where Matlab has always missed out. Working with actual triangular matrices would eliminate the performance hit and the extra storage of Variant 3 in my gist above. It would correspond to Variant 1. If properly documented, I can't imagine it would startle users.

andreasnoack · 2017-06-09T20:31:53Z

A problem might be that Symmetric(SparseMatrixCSC) would then make a copy which it usually doesn't do. The check suggested in #22200 (comment) should be pretty cheap. Have you considered that option?

dpo · 2017-06-09T21:31:01Z

Sure, but what do we do if the test isn't satisfied? If Symmetric takes a Triangular as input, it wouldn't make a copy, but perhaps Triangular would. In the sparse case, it doesn't bother me because it would simply encourage users to build only one triangle of symmetric/hermitian matrices.

andreasnoack · 2017-06-09T21:43:57Z

Sure, but what do we do if the test isn't satisfied?

Just fall back on the slower version that checks the indices. In most cases, users will provide a genuinely triangular matrix anyway and we could document that if it isn't already then using Symmetric(triu!(A)) will be better. I don't think that XTriangular will help since it is also just a view/interpretation of the underlying array so the same issue applies. It is not that I completely oppose your idea. I just want to think through the alternatives.

Sacha0 · 2017-06-11T21:48:52Z

(Tangentially, the above (and related past) discussion highlights that introducing a separate class of matrix annotations might be worthwhile at some future point: In some cases you need an annotation that asserts some property of the wrapped storage's contents (for example that some Matrix is nonnegative, or is lower triangular), rather than merely indicating how the wrapped storage's contents should be interpreted (without providing guarantees about the wrapped storage's contents).)

andreasnoack · 2017-08-23T01:50:23Z

@dpo Would you be able to revisit this? It would be great to have.

dpo · 2017-08-23T04:20:13Z

@andreasnoack Sure thing. I'll try to get to it this week.

dpo · 2017-08-24T22:24:40Z

@andreasnoack Is this what you have in mind: https://gist.github.com/dpo/481b0c03dd08d26af342573df98ddc21#file-symmetric_matvec_v4-jl

Here's the expected behavior.

The performance will be good if the user supplies a triangular matrix and poor if they don't.

andreasnoack · 2017-08-28T17:31:15Z

I'd actually prefer the version that uses the current Symmetric type but checks if just the triangle is stored in the multiplication method.

dpo · 2017-08-28T18:08:22Z

You don't want to do that each time you compute a product. It would completely defeat the savings of storing and using a sparse matrix. Krylov methods would grind to a halt.

fredrikekre · 2017-08-28T18:09:30Z

That check is presumably very cheap in comparison to the actual product, no?

andreasnoack · 2017-08-28T20:27:34Z

I just did some timings. I thought it would be cheaper to check so I now agree with @dpo's conclusion and we might need a different approach. I can see two solutions

Give up on using Symmetric/Hermitian for sparse matrices and define a custom SparseMatrixCSCHermitian
Make Symmetric always modify sparse input such that we can assume that the storage is triangular

fredrikekre · 2017-08-28T20:29:23Z

Out of curiosity, how expensive is the check compared to the multiply?

andreasnoack · 2017-08-28T20:34:58Z

About 50% for a tridiagonal and 35-40% for a pentadiagonal matvec.

dpo · 2017-08-28T20:50:47Z

Make Symmetric always modify sparse input such that we can assume that the storage is triangular

In my view, that's the whole point of symmetric sparse matrices. More than just promise that the underlying matrix is symmetric (as in the dense case), you want to save storage and preserve efficient matvecs.

Sacha0 · 2017-08-28T20:53:27Z

Ref. #17367 for discussion of mutating constructors (Hermitian!). Best!

dpo · 2017-08-29T18:56:25Z

Symmetric! sounds sensible to me.

andreasnoack · 2019-01-18T14:42:54Z

Should be obsolete with #30018. Please comment if not.

dpo · 2019-02-09T16:40:37Z

Are people still open to Symmetric!?

andreasnoack · 2019-02-17T21:01:13Z

Are people still open to Symmetric!?

Could you remind us, what would be the benefit? To faster symmetric multiplication, you'd need the guarantee that the storage is triangular, right?

dpo · 2019-02-17T21:48:57Z

Right, symmetric! would begin with a tril!() (or triu!()) to save storage.

andreasnoack · 2019-02-18T13:11:13Z

So the only benefit would be storage, right? In that case, I think it would be more transparent to ask people to write Symmetric(tril!(A)). That clearly signals the intention. I'm not completely against the idea of introducing mutating constructors but I think it would be nice to have more significant benefits when introducing a new idiom.

fredrikekre · 2019-02-18T13:28:05Z

AFAIU from the discussion above (e.g. #22200 (comment) and following comments) the benefit would be that we could know that not entries were stored on the upper/lower half and not have to check it, which wouldn't be the case with Symmetric(tril!(A)).

andreasnoack · 2019-02-18T13:31:17Z

...but then Symmetric! would have to be the only constructor, right? Otherwise, you wouldn't have the guarantee.

fredrikekre · 2019-02-18T13:33:43Z

Yea. Maybe we could specialize on Symmetric{T,LowerTriangular{T,SparseMatrixCSC{T,Int}}} where T.

Edit: Doesn't work because LowerTriangular is still backed by a regular matrix. I guess Symmetric! would have to create a new type .

dpo force-pushed the sparse-symmetric branch from adfe0b4 to 68839f3 Compare June 3, 2017 04:48

draft: symmetric sparse matrix support

812a73e

dpo force-pushed the sparse-symmetric branch from 68839f3 to 812a73e Compare June 3, 2017 05:34

jebej reviewed Jun 3, 2017

View reviewed changes

fix multiplication with alpha

c5a11dd

tkelman reviewed Jun 3, 2017

View reviewed changes

fix formatting

2c16404

fredrikekre reviewed Jun 3, 2017

View reviewed changes

remove default constructor

d3a3272

Sacha0 mentioned this pull request Jun 11, 2017

Avoid checking the diagonal for non-real elements when constructing Hermitians #17367

Merged

ararslan added domain:linear algebra Linear algebra domain:arrays:sparse Sparse arrays labels Jun 13, 2017

andreasnoack mentioned this pull request Nov 23, 2018

performance improvement for Symmetric/Hermitian of sparse times vector #30018

Merged

andreasnoack closed this Jan 18, 2019


		function A_mul_B_L_kernel!(α::Number, A::Symmetric{TA,SparseMatrixCSC{TA,S}}, B::StridedVecOrMat, β::Number, C::StridedVecOrMat) where {TA,S}

		colptr = A.data.colptr

		@@ -0,0 +1,72 @@
		# This file is a part of Julia. License is MIT: https://julialang.org/license

		function Symmetric(A::SparseMatrixCSC, uplo::Symbol=:U)

draft: symmetric sparse matrix support #22200

draft: symmetric sparse matrix support #22200

Conversation

dpo commented Jun 3, 2017

tkelman commented Jun 3, 2017

jebej commented Jun 3, 2017

fredrikekre commented Jun 3, 2017 • edited Loading

jebej commented Jun 3, 2017 • edited Loading

KristofferC commented Jun 3, 2017

jebej commented Jun 3, 2017

dpo commented Jun 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jebej commented Jun 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpo Jun 4, 2017 • edited Loading

Choose a reason for hiding this comment

fredrikekre commented Jun 3, 2017

jebej commented Jun 3, 2017

andreasnoack commented Jun 4, 2017

dpo commented Jun 4, 2017

fredrikekre commented Jun 4, 2017

jebej commented Jun 4, 2017 • edited Loading

dpo commented Jun 4, 2017

dpo commented Jun 4, 2017 • edited Loading

dpo commented Jun 9, 2017

andreasnoack commented Jun 9, 2017

dpo commented Jun 9, 2017

andreasnoack commented Jun 9, 2017

Sacha0 commented Jun 11, 2017

andreasnoack commented Aug 23, 2017

dpo commented Aug 23, 2017

dpo commented Aug 24, 2017

andreasnoack commented Aug 28, 2017

dpo commented Aug 28, 2017

fredrikekre commented Aug 28, 2017

andreasnoack commented Aug 28, 2017

fredrikekre commented Aug 28, 2017

andreasnoack commented Aug 28, 2017

dpo commented Aug 28, 2017

Sacha0 commented Aug 28, 2017

dpo commented Aug 29, 2017

andreasnoack commented Jan 18, 2019

dpo commented Feb 9, 2019

andreasnoack commented Feb 17, 2019

dpo commented Feb 17, 2019

andreasnoack commented Feb 18, 2019

fredrikekre commented Feb 18, 2019

andreasnoack commented Feb 18, 2019 • edited Loading

fredrikekre commented Feb 18, 2019 • edited Loading

fredrikekre commented Jun 3, 2017 •

edited

Loading

jebej commented Jun 3, 2017 •

edited

Loading

dpo Jun 4, 2017 •

edited

Loading

jebej commented Jun 4, 2017 •

edited

Loading

dpo commented Jun 4, 2017 •

edited

Loading

andreasnoack commented Feb 18, 2019 •

edited

Loading

fredrikekre commented Feb 18, 2019 •

edited

Loading