Sparsity-preserving outer products #24980

jmert · 2017-12-08T08:46:26Z

This PR adds sparsity-preserving outer products of sparse vectors and views into sparse matrices, implemented as methods of kron and *.

The motivation that caused me to dig into this is the desire to have fast/efficient quadratic matrix products. My specific case is to compute a very large sparse-dense-sparse product, but computation of the middle dense matrix can be parallelized as computations of single columns.

Some example code showing the impact:

using BenchmarkTools
using Test

m,n = (25_000, 100_000);  # dimensions of matrices
k = 1000;                 # column vector location to fill
A = sprand(m, n, 0.01);   # large, sparse operator
b = rand(n);              # column subset of dense matrix
B = sparse(collect(1:n), fill(k, n), b, n, n); # sparse representation

if true
    # outer products only
    @btime $b * $A[:, $k]';
    @btime $b * view($A, :, $k)';

    # quadratic matrix products which motivated
    C1 = @btime $A * $B * $A';
    C2 = @btime ($A * $b) * view($A, :, $k)';

    @test C1 == C2
end

On master (Version 0.7.0-DEV.2770, Commit 11d5a53):

  1.830 s (6 allocations: 1.86 GiB)
  1.953 s (2 allocations: 1.86 GiB)
  39.669 ms (190 allocations: 116.06 MiB)
  199.583 ms (4 allocations: 190.77 MiB)
Test Passed

This PR:

  4.948 ms (12 allocations: 40.48 MiB)
  4.938 ms (10 allocations: 40.47 MiB)
  36.562 ms (205 allocations: 116.23 MiB)
  3.570 ms (12 allocations: 4.12 MiB)
Test Passed

I've marked this as a work-in-progress because I'd welcome comments on how to potentially better integrate the new methods with existing code. The first commit could be thrown away / extracted to separate PR — it was just something I noticed while grapping with current code — ~~and the tests still need to add a Complex case which exercises handling of the RowVector{<:Any, ConjArray} case~~.

andreasnoack

Great. Thanks for adding this functionality.

base/sparse/linalg.jl

jmert · 2017-12-08T10:29:02Z

~~Looks like the first commit created a method ambiguity~~ (resolved), and I might want to work on an implementation which will rebase on top of #24969 easily.

ViralBShah · 2018-12-17T03:13:24Z

@jmert Would it be possible for you to revive this PR so that it can help resolve various open issues?

jmert · 2018-12-17T15:35:00Z

I have been meaning to return to this for quite some time, so yes, but possibly not until mid-January or so. Between the holidays and my own academic work, I don't have much free time in the next few weeks. (Maybe I'll squeeze it in between things as my own form of relaxing and getting away from Matlab ;-P)

jmert · 2018-12-18T04:54:30Z

As a reference point for myself — current master (v1.2.0-DEV.27) using similar benchmarks as the original post destroys my computer for a few minutes. I think the second @benchmark's timing is inflated because I actually run out of RAM and start hitting swap (i.e. during the second benchmark, htop output showed all 16GB of RAM on my laptop being used, and ~57 GB of virtual memory mapped, with periodic stalls as my computer becomes overtaxed.)

  1.998 s (46 allocations: 37.25 GiB)
  43.104 s (4 allocations: 18.63 GiB)
  945.432 ms (26 allocations: 904.92 MiB)
  6.910 s (6 allocations: 4.66 GiB)

jmert · 2018-12-26T22:34:12Z

I've reimplemented this PR on the current master. The big differences from before are:

The core optimization is implemented within the broadcasting code path rather than as specific methods of kron and *. Unfortunately, I don't really understand the sparse broadcasting all that well, so adding in this special case is done through a sort of hack. (See is_specialcase_sparse_broadcast().)
No specialization occurs for mixed dense-sparse outer products (yet).

The updated benchmark I'm working with is:

using BenchmarkTools, SparseArrays, Test

m,n = (2_000, 10_000);
k = 1000;
A = sprand(m, n, 0.01);
a = view(A, :, k);
u = A[:, k];
x = sparse(rand(n));
X = sparse(collect(1:n), fill(k, n), x, n, n);

suite = BenchmarkGroup()
# outer products only
suite["x × u'"] = @benchmarkable $x * $(u');
suite["x × a'"] = @benchmarkable $x * $(a');
# quadratic matrix products which motivated
suite["A × X × A'"] = @benchmarkable $A * $X * $(A');
suite["(A × x) × a'"] = @benchmarkable ($A * $x) * $(a');
tune!(suite)

before = run(suite)

using Revise
Revise.track(Base)

after = run(suite)

if true
    println("Before vs after for specific operations")
    println("=======================================\n")
    for key in keys(after)
        t1, t0 = minimum.((after[key], before[key]))
        r = judge(t1, t0)
        print(key, ": ")
        show(stdout, MIME"text/plain"(), r)
        println()
    end

    println("\n\n")
    println("Optimized quadratic product, before and after")
    println("=============================================\n")
    for (trial,name) in [(before,"before"), (after,"after")]
        key1 = "(A × x) × a'"
        key2 = "A × X × A'"
        t1, t0 = minimum.((trial[key1], trial[key2]))
        r = judge(t1, t0)
        print("$key1 vs $key2 ($name): ")
        show(stdout, MIME"text/plain"(), r)
        println()
    end
end

and gives results

Before vs after for specific operations
=======================================

(A × x) × a': BenchmarkTools.TrialJudgement:
  time:   -99.82% => improvement (5.00% tolerance)
  memory: -97.70% => improvement (1.00% tolerance)
x × a': BenchmarkTools.TrialJudgement:
  time:   -99.96% => improvement (5.00% tolerance)
  memory: -97.89% => improvement (1.00% tolerance)
A × X × A': BenchmarkTools.TrialJudgement:
  time:   +0.38% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
x × u': BenchmarkTools.TrialJudgement:
  time:   -98.03% => improvement (5.00% tolerance)
  memory: -98.95% => improvement (1.00% tolerance)



Optimized quadratic product, before and after
=============================================

(A × x) × a' vs A × X × A' (before): BenchmarkTools.TrialJudgement:
  time:   +6941.20% => regression (5.00% tolerance)
  memory: +290.58% => regression (1.00% tolerance)
(A × x) × a' vs A × X × A' (after): BenchmarkTools.TrialJudgement:
  time:   -87.66% => improvement (5.00% tolerance)
  memory: -91.02% => improvement (1.00% tolerance)

StefanKarpinski · 2019-01-01T17:27:05Z

cc @mbauman, @KristofferC

ViralBShah · 2019-01-03T22:53:25Z

The outer product part is straightforward. If @mbauman or @KristofferC can check the broadcasting bits, we can probably merge. @andreasnoack mentioned about lowrankupdate! and I am not sure if we want to do anything further.

Also, it is fine to get this performance improvement in and do further improvements in other PRs.

mbauman

I'm impressed — the sparse broadcasting machinery has survived a quite a few incremental code changes and is a bit of a tangled mess. You've adeptly hooked in right where it'd make sense.

I think doing this kind of peep-hole optimization is worth doing and with a few minor renames we can set the stage for supporting more and more array types without needing to _sparsifystructured them.

stdlib/SparseArrays/src/higherorderfns.jl

mbauman · 2019-01-03T23:40:42Z

Am I correct in understanding that the only functional change here is that kron used to return a dense matrix whereas now it's appropriately sparse? Ah, and also multiplication with views. Everything else is performance.

ViralBShah · 2019-01-03T23:57:06Z

IIUC, this used to be ok, but after the whole adjoint business, the kron of sparse with a transposed matrix became dense. That's one of the cases this fixes, views being the other one.

jmert · 2019-01-04T23:33:37Z

@andreasnoack mentioned about lowrankupdate! and I am not sure if we want to do anything further.

Correct. The old PR had put in a stub function, but any performance "improvement" implied by providing an inplace update would have been a lie since I wasn't implementing such a feature at that time. If someone wants to tackle that, I agree that it can be in a new PR.

I'm impressed — the sparse broadcasting machinery has survived a quite a few incremental code changes and is a bit of a tangled mess. You've adeptly hooked in right where it'd make sense.

Thanks!! I did have to abandon several approaches after realizing it wasn't going to work out as I'd hoped.

Am I correct in understanding that the only functional change here is that kron used to return a dense matrix whereas now it's appropriately sparse? Ah, and also multiplication with views. Everything else is performance.

@ViralBShah has the current story correct — since I started this PR a year ago, the switch to Transpose/Adjoint has caused a huge slow-down by calculating the outer product via the dense LinearAlgebra generic code. This PR now maintains sparsity in kron(u, v), u * v', and u .* v'.

After the rewrite to current master, I dropped one of my original goals — having dense-sparse outer products produce a sparse output. I decided I was unsure enough of the broadcasting implementation at this point to spend time working on that goal; issue #26613 would be a good place to revive that conversation and let those enhancements occur in synchrony with a future PR to extend this implementation.

ViralBShah · 2019-01-05T03:21:19Z

@mbauman @andreasnoack Is this good to merge?

jmert · 2019-01-05T03:28:05Z

I did complete a test of just SparseArrays and broadcast test suites that passed locally when switching return broadcast to return _copy as @mbauman suggested in a review comment — I can add a final commit with that change if it's desired.

ViralBShah · 2019-01-05T03:30:49Z

Since it was requested, please go ahead.

jmert · 2019-01-05T17:34:49Z

Commit added. This is ready to go from my perspective if/when it passes CI.

* Change is_specialcase_sparse_broadcast -> can_skip_sparsification. * Lift parent(y) to one function earlier for clarify

…again

jmert · 2019-01-05T17:49:22Z

Can you rebase on master (unless you did it in the last day or two) so that we pick up all the CI fixes?

No problem — done.

KristofferC · 2019-01-10T00:35:42Z

Can you rebase on master (unless you did it in the last day or two) so that we pick up all the CI fixes?

Note that CI runs on the merge commit so rebasing doesn't really do anything w.r.t to CI.

I don't really see how this can be backported to a patch release since it is changing behavior? Tentatively removing backport label. Also, this needs a NEWS entry.

mbauman · 2019-01-10T16:27:08Z

Agreed — this shouldn't be backported.

abraunst · 2019-01-19T12:48:31Z

Should kron(::Diagonal, ::Union{SparseVector, SparseMatrix}) (and viceversa) return SparseMatrixCSC? (sure!)
Should kron(::Diagonal, ::Matrix) (and viceversa) return SparseMatrixCSC? (this would be consistent with this PR -- but currently there is an implementation for diagonal times dense...)

saolof · 2019-04-05T03:56:35Z

This will be utterly amazing for my research and I'd like to thank you all for working on this.

The feature was added to Julia in time for v1.2 in JuliaLang/julia#24980, so get rid of the custom `outer()` method here and rewrite `quadprod()` in terms of just standard matrix methods. Julia v1.2 is the minimum-supported version at this point, so no need to worry about backporting the functionality. In the future, this function may yet still go away since the implementation is nearly trivial at this point, but that can be a follow-up PR.

First, this removes the option to do row-wise quadratic products since they aren't being used within this package anyway. That allows removing the "keyword" argument for choosing which direction to apply. Second, the original implementation was optimized for the fast vector outer product (that I got added to SparseArrays in JuliaLang/julia#24980 and made it into Julia v1.2), but when scaling up to multiple columns the performance was disastrous because the dispatch of a transposed view led to generic matrix multiplication which did the full dense-dense style loops. By not using views, we get the desired sparse matrix multiplication instead.

The feature was added to Julia in time for v1.2 in JuliaLang/julia#24980, so get rid of the custom `outer()` method here and rewrite `quadprod()` in terms of just standard matrix methods. Julia v1.2 is the minimum-supported version at this point, so no need to worry about backporting the functionality. In the future, this function may yet still go away since the implementation is nearly trivial at this point, but that can be a follow-up PR.

First, this removes the option to do row-wise quadratic products since they aren't being used within this package anyway. That allows removing the "keyword" argument for choosing which direction to apply. Second, the original implementation was optimized for the fast vector outer product (that I got added to SparseArrays in JuliaLang/julia#24980 and made it into Julia v1.2), but when scaling up to multiple columns the performance was disastrous because the dispatch of a transposed view led to generic matrix multiplication which did the full dense-dense style loops. By not using views, we get the desired sparse matrix multiplication instead.

ararslan requested a review from andreasnoack December 8, 2017 08:48

ararslan added domain:linear algebra Linear algebra domain:arrays:sparse Sparse arrays labels Dec 8, 2017

andreasnoack reviewed Dec 8, 2017

View reviewed changes

base/sparse/linalg.jl Outdated Show resolved Hide resolved

jmert mentioned this pull request Dec 8, 2017

Make use of efficient Julia sparse outer products jmert/CMB.jl#4

Closed

jmert force-pushed the sparse_outer branch 2 times, most recently from 5421d1d to 0aa366b Compare December 13, 2017 07:22

ViralBShah mentioned this pull request Dec 17, 2018

Kron of transpose/adjoint of sparse matrix with other sparse matrix becomes dense #30271

Closed

jmert force-pushed the sparse_outer branch from 0aa366b to df1e18f Compare December 26, 2018 20:53

ViralBShah requested review from mbauman and KristofferC January 3, 2019 22:50

mbauman approved these changes Jan 3, 2019

View reviewed changes

stdlib/SparseArrays/src/higherorderfns.jl Outdated Show resolved Hide resolved

stdlib/SparseArrays/src/higherorderfns.jl Outdated Show resolved Hide resolved

stdlib/SparseArrays/src/higherorderfns.jl Outdated Show resolved Hide resolved

mbauman added performance Must go faster domain:broadcast Applying a function over a collection labels Jan 3, 2019

ViralBShah added backport pending 1.0 labels Jan 5, 2019

jmert added 3 commits January 5, 2019 11:48

Support unitful types

16fac74

Address review comments.

26ed753

* Change is_specialcase_sparse_broadcast -> can_skip_sparsification. * Lift parent(y) to one function earlier for clarify

Simply call _copy instead of passing through the broadcast machinery …

9bec023

…again

jmert force-pushed the sparse_outer branch from e28e256 to 9bec023 Compare January 5, 2019 17:49

jmert changed the title ~~[WIP] Sparsity-preserving outer products~~ Sparsity-preserving outer products Jan 5, 2019

andreasnoack merged commit dffe119 into JuliaLang:master Jan 7, 2019

KristofferC added needs news A NEWS entry is required for this change and removed backport pending 1.0 labels Jan 10, 2019

StefanKarpinski added the kind:minor change Marginal behavior change acceptable for a minor release label Jan 10, 2019

jmert deleted the sparse_outer branch January 13, 2019 15:51

jmert added a commit to jmert/julia that referenced this pull request Jan 13, 2019

News for JuliaLang#24980 (sparsity preserving outer products)

c2869e3

jmert mentioned this pull request Jan 13, 2019

News for #24980 (sparsity preserving outer products) #30710

Merged

abraunst mentioned this pull request Jan 20, 2019

kron(a::Diagonal, b) ref #24980 #30770

Closed

ViralBShah pushed a commit that referenced this pull request Jan 20, 2019

News for #24980 (sparsity preserving outer products) (#30710)

51849df

jmert mentioned this pull request Oct 28, 2020

Improve quadprod with columnar-only implementation jmert/CMB.jl#46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparsity-preserving outer products #24980

Sparsity-preserving outer products #24980

jmert commented Dec 8, 2017 •

edited

Loading

andreasnoack left a comment

jmert commented Dec 8, 2017 •

edited

Loading

ViralBShah commented Dec 17, 2018

jmert commented Dec 17, 2018 •

edited

Loading

jmert commented Dec 18, 2018

jmert commented Dec 26, 2018

StefanKarpinski commented Jan 1, 2019

ViralBShah commented Jan 3, 2019

mbauman left a comment

mbauman commented Jan 3, 2019 •

edited

Loading

ViralBShah commented Jan 3, 2019

jmert commented Jan 4, 2019

ViralBShah commented Jan 5, 2019

jmert commented Jan 5, 2019

ViralBShah commented Jan 5, 2019

jmert commented Jan 5, 2019

jmert commented Jan 5, 2019

KristofferC commented Jan 10, 2019

mbauman commented Jan 10, 2019

abraunst commented Jan 19, 2019 •

edited

Loading

saolof commented Apr 5, 2019

Sparsity-preserving outer products #24980

Sparsity-preserving outer products #24980

Conversation

jmert commented Dec 8, 2017 • edited Loading

andreasnoack left a comment

Choose a reason for hiding this comment

jmert commented Dec 8, 2017 • edited Loading

ViralBShah commented Dec 17, 2018

jmert commented Dec 17, 2018 • edited Loading

jmert commented Dec 18, 2018

jmert commented Dec 26, 2018

StefanKarpinski commented Jan 1, 2019

ViralBShah commented Jan 3, 2019

mbauman left a comment

Choose a reason for hiding this comment

mbauman commented Jan 3, 2019 • edited Loading

ViralBShah commented Jan 3, 2019

jmert commented Jan 4, 2019

ViralBShah commented Jan 5, 2019

jmert commented Jan 5, 2019

ViralBShah commented Jan 5, 2019

jmert commented Jan 5, 2019

jmert commented Jan 5, 2019

KristofferC commented Jan 10, 2019

mbauman commented Jan 10, 2019

abraunst commented Jan 19, 2019 • edited Loading

saolof commented Apr 5, 2019

jmert commented Dec 8, 2017 •

edited

Loading

jmert commented Dec 8, 2017 •

edited

Loading

jmert commented Dec 17, 2018 •

edited

Loading

mbauman commented Jan 3, 2019 •

edited

Loading

abraunst commented Jan 19, 2019 •

edited

Loading