RFC: SparseVectors, Take 2 #13440

mbauman · 2015-10-03T21:54:30Z

This introduces the SparseVector to the standard library. It is the descendent in spirit of #11424, but the entirety of the code is imported from JuliaSparse/SparseVectors.jl, with credit to @lindahua as the primary author. I've simply been working on getting it to mesh with Base idioms and names.

TODO:

lindahua · 2015-10-03T22:51:26Z

Thanks a lot for working on this!

tkelman · 2015-10-04T02:22:34Z

base/sparse/sparsevector.jl

+    end
+    half_screen_rows = limit ? div(rows - 8, 2) : typemax(Int)
+    pad = ndigits(n)
+    k = 0


this, and the k += 1 below are redundant since k is the loop variable

ViralBShah · 2015-10-04T07:26:54Z

How about deprecating sparsevec? The SparseVector constructor should be good enough to replace it, right?

lindahua · 2015-10-04T13:57:29Z

Generally, I think SparseVector may be just a field setter, while functions like sparsevec or sparse can do complicated things (such as sorting/combining elements).

I think we probably don't need sparsevec, but let sparse returns an instance of SparseVector when there is only one dimension.

ViralBShah · 2015-10-05T05:32:19Z

I deleted comments that were completely off-topic here.

kshyatt · 2015-10-05T20:15:47Z

base/sparse/cholmod.jl

+doc"""
+    ldltfact(::Union{SparseMatrixCSC,Symmetric{Float64,SparseMatrixCSC{Flaot64,SuiteSparse_long}},Hermitian{Complex{Float64},SparseMatrixCSC{Complex{Float64},SuiteSparse_long}}}; shift=0, perm=Int[]) -> CHOLMOD.Factor
+
+Compute the `LDLt` factorization of a sparse symmetric or Hermitian matrix. A fill-reducing permutation is used. `F = ldltfact(A)` is most frequently used to solve systems of equations `A*x = b` with `F\b`, but also the methods `diag`, `det`, `logdet` are defined for `F`. You can also extract individual factors from `F`, using `F[:L]`. However, since pivoting is on by default, the factorization is internally represented as `A == P'*L*D*L'*P` with a permutation matrix `P`; using just `L` without accounting for `P` will give incorrect answers. To include the effects of permutation, it's typically preferable to extact "combined" factors like `PtL = F[:PtL]` (the equivalent of `P'*L`) and `LtP = F[:UP]` (the equivalent of `L'*P`). The complete list of supported factors is `:L, :PtL, :D, :UP, :U, :LD, :DU, :PtLD, :DUP`.


Wording here is a little awkward - comma splice. Maybe F = ldltfact(A) is most frequently used to solve systems of equations A*x = b with F\b. The factorization also the methods diag, det, logdet defined.`?

Would be nice to have an explanation of what all these factors are (what is DUP?)

what is DUP

I agree, but I'm not the one to write that documentation. I just moved the docs here from helpdb.jl.

mbauman · 2015-10-05T20:28:08Z

How about deprecating sparsevec? The SparseVector constructor should be good enough to replace it, right?

I think for now it can be split into a combination of the orthogonal names sparse and vec. I'm not entirely sure what to do with the SparseVector name; it'd be nice to have the constructor mirror Vector, but that's awkward until the deprecation for the SparseMatrix module goes through.

KristofferC · 2015-10-05T20:45:22Z

base/sparse/sparsevector.jl

+            r += 1
+            i2 = I[r]
+            if i2 == i  # accumulate r-th to the l-th entry
+                V[l] = call(combine, V[l], V[r])


combine(V[l], V[r]) to be similar with the SparseMatrix construct?

Ah, yup, thanks, this comes from SparseVectors.jl's 0.3 support.

mbauman · 2015-10-05T21:12:43Z

I think for now [sparsevec] can be split into a combination of the orthogonal names sparse and vec

So close. There'd be a clash of meanings with sparse(::AbstractVector, ::AbstractVector, ::Integer).

Is sparse([1,2],[1,2],3) a 2x2 matrix with two threes ([3 0; 0 3]) or is it a 3-element sparse vector ([1,2,0])? Any thoughts here?

ScottPJones · 2015-10-05T21:14:31Z

base/sparse/sparsevector.jl

@@ -0,0 +1,1441 @@
+### common.jl


Doesn't this need the standard license header? Also the file is not common.jl.

ScottPJones · 2015-10-05T21:24:27Z

I'd asked before about transposing a sparse vector (but Viral deleted question), why couldn't that be returned as a sparse matrix? It wouldn't be quite as compact, but I'd think it would still be smaller than a dense array, or wouldn't it?

mbauman · 2015-10-05T21:30:22Z

Our sparse matrices are CSC format. It'd be good for you to thoroughly understand how CSC works if you're interesting in contributing to our sparse library.

julia> S = sparse([1,10^7],[1,1],[1,1])
10000000x1 sparse matrix with 2 Int64 entries:
    [1       ,        1]  =  1
    [10000000,        1]  =  1

julia> Base.summarysize(S)
88

julia> Base.summarysize(S')
80000080

julia> Base.summarysize(zeros(10^7))
80000000

ScottPJones · 2015-10-05T21:57:32Z

I do understand perfectly well how the CSC format works.
Try the following:
S = sparse(Int32[1,10^7],Int32[1,1],[10,20])
Base.summarysize(S)
You will then see that in that case, you can save almost 50% of the memory.
However, this is really dependent on the # of zero entries, and the type of the value array.
S = sparse(Int32[1,10^7],Int32[1,1],Int128[10,20])
Base.summarysize(S)
Compare that to Base.summarysize(zeros(Int128,10^7)), and you'll see the difference.

ScottPJones · 2015-10-05T22:07:05Z

test/sparsedir/sparsevector.jl

@@ -0,0 +1,649 @@
+## sparsevec.jl


This also has incorrect name and is missing Julia license info.

lindahua · 2015-10-06T03:35:05Z

Looks really great!

jakebolewski · 2015-10-06T11:27:37Z

I see that you replaced the comparison Ty = eltype(SparseVector(...)); v != zero(Ty) with v != 0. Does this assumption (not assuming a zero method for a given element type) actually hold throughout the sparsematrix code? I feel that comparison to numeric zero has been gradually removed as it can cause perf problems due to type instability.

mbauman · 2015-10-06T15:42:55Z

The SparseMatrixCSC code is a little mixed as to whether it compares to zero(T) or 0. These changes were required to get tests to pass. At least some of those tests fail because they store Char, and probably date back to the days when Char <: Integer.

jakebolewski · 2015-10-06T15:51:05Z

I think we should be more explicit as to what the interface is assumed rather than having partially implemented / ill defined behavior.

ex:

julia> A = sparse([1,5,10],[1,5,10],['a','b','c'])
10x10 sparse matrix with 3 Char entries:
    [1 ,  1]  =  'a'
    [5 ,  5]  =  'b'
    [10, 10]  =  'c'

julia> full(A)
ERROR: MethodError: `zero` has no method matching zero(::Type{Char})
 in full at sparse/sparsematrix.jl:227


julia> A[1]
'a'

julia> A[2]
ERROR: MethodError: `zero` has no method matching zero(::Type{Char})
 in getindex at sparse/sparsematrix.jl:1398
 in getindex at abstractarray.jl:483

but that might be another issue / PR.

mbauman · 2015-10-07T23:39:37Z

Thank you all for the review comments so far. Unless anyone has anything else, this is now ready as far as I'm concerned. I'm sure there will be a long tail of missing methods, performance improvements, and perhaps some bugs, but I think it's really quite good. It's well-covered by tests, and behaves as I'd expect.

Renamings: * Rename sparse module to SparseArrays, and improve its separation from base. This makes it very simple to dynamically reload the sparse module. Move docstrings to their proper place * _copy_convert → collect * Rename sparsevector to the existing spzeros and sparsevec. * Use call overloading instead of call Remove functionality from SparseVectors.jl: * Simplify and remove some functors * Remove SparseVectorView * Remove no-longer-needed ambiguity preventers Add functionality for SparseVectors: * Add similar for SparseVector * Allow sparsevec(::AbstractArray), not just vectors * Add spzeros(n); adapt some tests to SparseVector * Allow CHOLMOD linalg with vectors * Implement (c)transpose(::SparseVector). Returns a dense vector since a one-row CSC structure is effectively dense but with worse performance. * Add vector sprandbool and allow passing RNG to all vector sprand* functions. Harden tests against random failures. * Implement, test and doc spones(::SparseVector) Improve performance for SparseVector indexing: * Punt to SparseMatrix for some indexing behaviors. Since the datastructures are compatible and SparseMatrix's routines are more optimized, it is easiest to just construct a temporary SparseMatrix and index into that. This is faster in all but the smallest of cases (N<10) * Use searchsorted for indexing SparseVector by UnitRange. This is about 20% slower on very small vectors, but is faster in general. Change SparseMatrix behaviors to take advantage of vectors * Add indexing behaviors for SparseMatrix->SparseVector * `vec` and `sparsevec` for CSC return SparseVectors * Update documentation to incorporate vectors Minor bugfixes and changes to SparseVectors: * Compare to literal 0 in vector construction and setindex. This matches SparseMatrix semantics, and makes indexing semantics consistent with regard to stored zeros * Use checkbounds more thoroughly * Use error types that are consistent with SparseMatrixCSC * Minor sparse vector display tweaks. Turn on output limiting by default, remove unused variable `k`, implement and use Base.summary * Fix missing return, add test Add some tests: * Add a test and comment to ensure nobody else tries to share data between vectors and matrices * Ensure indexing is consistent between one-column sparse matrices and sparse vectors, with special attention to stored zeros.

mbauman · 2015-10-13T19:11:22Z

Both 64 bit linux workers got hit by the OOM killer.

RFC: SparseVectors, Take 2

jakebolewski · 2015-10-13T21:35:02Z

🍰

KristofferC · 2015-10-13T21:40:30Z

Great!

malmaud · 2015-10-14T08:52:39Z

🔢

Is there a constructor to create an all-zero sparse matrix of a given size? My brief foray didn't reveal one.

StefanKarpinski · 2015-10-14T08:54:11Z

spzeros?

malmaud · 2015-10-14T08:55:36Z

yep, thanks.

On Wed, Oct 14, 2015 at 4:54 AM, Stefan Karpinski notifications@github.com
wrote:

spzeros?

—
Reply to this email directly or view it on GitHub
#13440 (comment).

lindahua · 2015-10-14T08:57:52Z

Great, thanks a lot!

ViralBShah · 2015-10-15T10:09:50Z

This is amazing! Thanks everyone. Kicking the tires.

mbauman added sparse Sparse arrays breaking This change will break code labels Oct 3, 2015

tkelman reviewed Oct 4, 2015
View reviewed changes

ViralBShah added this to the 0.5.0 milestone Oct 4, 2015

kshyatt reviewed Oct 5, 2015
View reviewed changes

KristofferC reviewed Oct 5, 2015
View reviewed changes

ScottPJones reviewed Oct 5, 2015
View reviewed changes

mbauman force-pushed the mb/sparsevec branch from 495b010 to 3fc4b78 Compare October 7, 2015 21:25

mbauman changed the title ~~WIP: SparseVectors, Take 2~~ RFC: SparseVectors, Take 2 Oct 7, 2015

mbauman force-pushed the mb/sparsevec branch from 3fc4b78 to 30faf63 Compare October 7, 2015 23:39

lindahua and others added 5 commits October 13, 2015 11:12

Import code from JuliaSparse/SparseVectors.jl

9802263

Add SparseVector to NEWS.md

12fbcab

Move SparseArrays exports back to their usual home

e316b60

Add missing imports

eb90416

mbauman force-pushed the mb/sparsevec branch from 30faf63 to eb90416 Compare October 13, 2015 16:18

mbauman added a commit that referenced this pull request Oct 13, 2015

Merge pull request #13440 from JuliaLang/mb/sparsevec

6e4c9f1

RFC: SparseVectors, Take 2

mbauman merged commit 6e4c9f1 into master Oct 13, 2015

mbauman deleted the mb/sparsevec branch October 13, 2015 21:34

malmaud mentioned this pull request Oct 14, 2015

About SparseVectors #11324

Closed

This was referenced Oct 14, 2015

module SparseMatrix should be renamed #12323

Closed

Column slices of sparse matrices should be sparse vectors #6479

Closed

Feature request method for spzeros(N) #11583

Closed

yuyichao added a commit to JuliaLang/Compat.jl that referenced this pull request Oct 18, 2015

Compat for SparseArrays. Ref JuliaLang/julia#13440

4290673

yuyichao added a commit to JuliaLang/Compat.jl that referenced this pull request Oct 18, 2015

Compat for SparseArrays. Ref JuliaLang/julia#13440

bdedf62

yuyichao mentioned this pull request Oct 18, 2015

Compat for SparseArrays module JuliaLang/Compat.jl#154

Merged

ghost mentioned this pull request Nov 16, 2015

problem of conversion from sparse matrix slice to sparse vector #14013

Closed

mbauman mentioned this pull request Dec 19, 2015

Remove erroneous SparseVector early exit JuliaAttic/SparseVectorsOLD.jl#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: SparseVectors, Take 2 #13440

RFC: SparseVectors, Take 2 #13440

mbauman commented Oct 3, 2015

lindahua commented Oct 3, 2015

tkelman Oct 4, 2015

ViralBShah commented Oct 4, 2015

lindahua commented Oct 4, 2015

ViralBShah commented Oct 5, 2015

kshyatt Oct 5, 2015

mbauman Oct 5, 2015

mbauman commented Oct 5, 2015

KristofferC Oct 5, 2015

mbauman Oct 5, 2015

mbauman commented Oct 5, 2015

ScottPJones Oct 5, 2015

ScottPJones commented Oct 5, 2015

mbauman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

ScottPJones Oct 5, 2015

lindahua commented Oct 6, 2015

jakebolewski commented Oct 6, 2015

mbauman commented Oct 6, 2015

jakebolewski commented Oct 6, 2015

mbauman commented Oct 7, 2015

mbauman commented Oct 13, 2015

jakebolewski commented Oct 13, 2015

KristofferC commented Oct 13, 2015

malmaud commented Oct 14, 2015

StefanKarpinski commented Oct 14, 2015

malmaud commented Oct 14, 2015

lindahua commented Oct 14, 2015

ViralBShah commented Oct 15, 2015

RFC: SparseVectors, Take 2 #13440

RFC: SparseVectors, Take 2 #13440

Conversation

mbauman commented Oct 3, 2015

lindahua commented Oct 3, 2015

tkelman Oct 4, 2015

Choose a reason for hiding this comment

ViralBShah commented Oct 4, 2015

lindahua commented Oct 4, 2015

ViralBShah commented Oct 5, 2015

kshyatt Oct 5, 2015

Choose a reason for hiding this comment

mbauman Oct 5, 2015

Choose a reason for hiding this comment

mbauman commented Oct 5, 2015

KristofferC Oct 5, 2015

Choose a reason for hiding this comment

mbauman Oct 5, 2015

Choose a reason for hiding this comment

mbauman commented Oct 5, 2015

ScottPJones Oct 5, 2015

Choose a reason for hiding this comment

ScottPJones commented Oct 5, 2015

mbauman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

ScottPJones Oct 5, 2015

Choose a reason for hiding this comment

lindahua commented Oct 6, 2015

jakebolewski commented Oct 6, 2015

mbauman commented Oct 6, 2015

jakebolewski commented Oct 6, 2015

mbauman commented Oct 7, 2015

mbauman commented Oct 13, 2015

jakebolewski commented Oct 13, 2015

KristofferC commented Oct 13, 2015

malmaud commented Oct 14, 2015

StefanKarpinski commented Oct 14, 2015

malmaud commented Oct 14, 2015

lindahua commented Oct 14, 2015

ViralBShah commented Oct 15, 2015