# Sparsity-preserving outer products #24980

Merged
merged 7 commits into from Jan 7, 2019

## Conversation

Contributor

### jmert commented Dec 8, 2017 • edited

 This PR adds sparsity-preserving outer products of sparse vectors and views into sparse matrices, implemented as methods of `kron` and `*`. The motivation that caused me to dig into this is the desire to have fast/efficient quadratic matrix products. My specific case is to compute a very large sparse-dense-sparse product, but computation of the middle dense matrix can be parallelized as computations of single columns. Some example code showing the impact: ```using BenchmarkTools using Test m,n = (25_000, 100_000); # dimensions of matrices k = 1000; # column vector location to fill A = sprand(m, n, 0.01); # large, sparse operator b = rand(n); # column subset of dense matrix B = sparse(collect(1:n), fill(k, n), b, n, n); # sparse representation if true # outer products only @btime \$b * \$A[:, \$k]'; @btime \$b * view(\$A, :, \$k)'; # quadratic matrix products which motivated C1 = @btime \$A * \$B * \$A'; C2 = @btime (\$A * \$b) * view(\$A, :, \$k)'; @test C1 == C2 end``` On master (Version 0.7.0-DEV.2770, Commit 11d5a53): `````` 1.830 s (6 allocations: 1.86 GiB) 1.953 s (2 allocations: 1.86 GiB) 39.669 ms (190 allocations: 116.06 MiB) 199.583 ms (4 allocations: 190.77 MiB) Test Passed `````` This PR: `````` 4.948 ms (12 allocations: 40.48 MiB) 4.938 ms (10 allocations: 40.47 MiB) 36.562 ms (205 allocations: 116.23 MiB) 3.570 ms (12 allocations: 4.12 MiB) Test Passed `````` I've marked this as a work-in-progress because I'd welcome comments on how to potentially better integrate the new methods with existing code. The first commit could be thrown away / extracted to separate PR — it was just something I noticed while grapping with current code — and the tests still need to add a `Complex` case which exercises handling of the `RowVector{<:Any, ConjArray}` case.

reviewed
Member

### andreasnoack left a comment

 Great. Thanks for adding this functionality.
base/sparse/linalg.jl Outdated Show resolved Hide resolved
Contributor Author

### jmert commented Dec 8, 2017 • edited

 Looks like the first commit created a method ambiguity (resolved), and I might want to work on an implementation which will rebase on top of #24969 easily.
referenced this pull request Dec 8, 2017

### jmertforce-pushed the jmert:sparse_outer branch 2 times, most recently from `5421d1d` to `0aa366b`Dec 10, 2017

referenced this pull request Mar 29, 2018
referenced this pull request Dec 17, 2018
Member

### ViralBShah commented Dec 17, 2018

 @jmert Would it be possible for you to revive this PR so that it can help resolve various open issues?
Contributor Author

### jmert commented Dec 17, 2018 • edited

 I have been meaning to return to this for quite some time, so yes, but possibly not until mid-January or so. Between the holidays and my own academic work, I don't have much free time in the next few weeks. (Maybe I'll squeeze it in between things as my own form of relaxing and getting away from Matlab ;-P)
Contributor Author

### jmert commented Dec 18, 2018

 As a reference point for myself — current master (v1.2.0-DEV.27) using similar benchmarks as the original post destroys my computer for a few minutes. I think the second `@benchmark`'s timing is inflated because I actually run out of RAM and start hitting swap (i.e. during the second benchmark, `htop` output showed all 16GB of RAM on my laptop being used, and ~57 GB of virtual memory mapped, with periodic stalls as my computer becomes overtaxed.) `````` 1.998 s (46 allocations: 37.25 GiB) 43.104 s (4 allocations: 18.63 GiB) 945.432 ms (26 allocations: 904.92 MiB) 6.910 s (6 allocations: 4.66 GiB) ``````

### jmertforce-pushed the jmert:sparse_outer branch from `0aa366b` to `df1e18f`Dec 26, 2018

Contributor Author

### jmert commented Dec 26, 2018

 I've reimplemented this PR on the current master. The big differences from before are: The core optimization is implemented within the broadcasting code path rather than as specific methods of `kron` and `*`. Unfortunately, I don't really understand the sparse broadcasting all that well, so adding in this special case is done through a sort of hack. (See `is_specialcase_sparse_broadcast()`.) No specialization occurs for mixed dense-sparse outer products (yet). The updated benchmark I'm working with is: ```using BenchmarkTools, SparseArrays, Test m,n = (2_000, 10_000); k = 1000; A = sprand(m, n, 0.01); a = view(A, :, k); u = A[:, k]; x = sparse(rand(n)); X = sparse(collect(1:n), fill(k, n), x, n, n); suite = BenchmarkGroup() # outer products only suite["x × u'"] = @benchmarkable \$x * \$(u'); suite["x × a'"] = @benchmarkable \$x * \$(a'); # quadratic matrix products which motivated suite["A × X × A'"] = @benchmarkable \$A * \$X * \$(A'); suite["(A × x) × a'"] = @benchmarkable (\$A * \$x) * \$(a'); tune!(suite) before = run(suite) using Revise Revise.track(Base) after = run(suite) if true println("Before vs after for specific operations") println("=======================================\n") for key in keys(after) t1, t0 = minimum.((after[key], before[key])) r = judge(t1, t0) print(key, ": ") show(stdout, MIME"text/plain"(), r) println() end println("\n\n") println("Optimized quadratic product, before and after") println("=============================================\n") for (trial,name) in [(before,"before"), (after,"after")] key1 = "(A × x) × a'" key2 = "A × X × A'" t1, t0 = minimum.((trial[key1], trial[key2])) r = judge(t1, t0) print("\$key1 vs \$key2 (\$name): ") show(stdout, MIME"text/plain"(), r) println() end end``` and gives results ``````Before vs after for specific operations ======================================= (A × x) × a': BenchmarkTools.TrialJudgement: time: -99.82% => improvement (5.00% tolerance) memory: -97.70% => improvement (1.00% tolerance) x × a': BenchmarkTools.TrialJudgement: time: -99.96% => improvement (5.00% tolerance) memory: -97.89% => improvement (1.00% tolerance) A × X × A': BenchmarkTools.TrialJudgement: time: +0.38% => invariant (5.00% tolerance) memory: +0.00% => invariant (1.00% tolerance) x × u': BenchmarkTools.TrialJudgement: time: -98.03% => improvement (5.00% tolerance) memory: -98.95% => improvement (1.00% tolerance) Optimized quadratic product, before and after ============================================= (A × x) × a' vs A × X × A' (before): BenchmarkTools.TrialJudgement: time: +6941.20% => regression (5.00% tolerance) memory: +290.58% => regression (1.00% tolerance) (A × x) × a' vs A × X × A' (after): BenchmarkTools.TrialJudgement: time: -87.66% => improvement (5.00% tolerance) memory: -91.02% => improvement (1.00% tolerance) ``````
Member

Member

### ViralBShah commented Jan 3, 2019

 The outer product part is straightforward. If @mbauman or @KristofferC can check the broadcasting bits, we can probably merge. @andreasnoack mentioned about `lowrankupdate!` and I am not sure if we want to do anything further. Also, it is fine to get this performance improvement in and do further improvements in other PRs.
approved these changes
Member

### mbauman left a comment

 I'm impressed — the sparse broadcasting machinery has survived a quite a few incremental code changes and is a bit of a tangled mess. You've adeptly hooked in right where it'd make sense. I think doing this kind of peep-hole optimization is worth doing and with a few minor renames we can set the stage for supporting more and more array types without needing to `_sparsifystructured` them.
Show resolved Hide resolved
Show resolved Hide resolved
Show resolved Hide resolved
Member

### mbauman commented Jan 3, 2019 • edited

 Am I correct in understanding that the only functional change here is that `kron` used to return a dense matrix whereas now it's appropriately sparse? Ah, and also multiplication with views. Everything else is performance.

Member

### ViralBShah commented Jan 3, 2019

 IIUC, this used to be ok, but after the whole adjoint business, the kron of sparse with a transposed matrix became dense. That's one of the cases this fixes, views being the other one.
Contributor Author

### jmert commented Jan 4, 2019

 @andreasnoack mentioned about lowrankupdate! and I am not sure if we want to do anything further. Correct. The old PR had put in a stub function, but any performance "improvement" implied by providing an inplace update would have been a lie since I wasn't implementing such a feature at that time. If someone wants to tackle that, I agree that it can be in a new PR. I'm impressed — the sparse broadcasting machinery has survived a quite a few incremental code changes and is a bit of a tangled mess. You've adeptly hooked in right where it'd make sense. Thanks!! I did have to abandon several approaches after realizing it wasn't going to work out as I'd hoped. Am I correct in understanding that the only functional change here is that kron used to return a dense matrix whereas now it's appropriately sparse? Ah, and also multiplication with views. Everything else is performance. @ViralBShah has the current story correct — since I started this PR a year ago, the switch to `Transpose`/`Adjoint` has caused a huge slow-down by calculating the outer product via the dense `LinearAlgebra` generic code. This PR now maintains sparsity in `kron(u, v)`, `u * v'`, and `u .* v'`. After the rewrite to current master, I dropped one of my original goals — having dense-sparse outer products produce a sparse output. I decided I was unsure enough of the broadcasting implementation at this point to spend time working on that goal; issue #26613 would be a good place to revive that conversation and let those enhancements occur in synchrony with a future PR to extend this implementation.

Member

### ViralBShah commented Jan 5, 2019

 @mbauman @andreasnoack Is this good to merge?
Contributor Author

### jmert commented Jan 5, 2019

 I did complete a test of just SparseArrays and broadcast test suites that passed locally when switching `return broadcast` to `return _copy` as @mbauman suggested in a review comment — I can add a final commit with that change if it's desired.
Member

### ViralBShah commented Jan 5, 2019

Contributor Author

### jmert commented Jan 5, 2019

 Commit added. This is ready to go from my perspective if/when it passes CI.
Member

### ViralBShah commented Jan 5, 2019

 Can you rebase on master (unless you did it in the last day or two) so that we pick up all the CI fixes?
added 7 commits Dec 26, 2018
``` Add indtype and nnz definitions for SparseColumnView ```
``` a1d0d9b ```
``` Handle sparse outer products specially in broadcast ```
``` f9a6634 ```
``` Add specialized kron for sparse outer products ```
``` 6d27803 ```
``` Add tests ```
``` 945bdbb ```
``` Support unitful types ```
``` 16fac74 ```
``` Address review comments. ```
```* Change is_specialcase_sparse_broadcast -> can_skip_sparsification.
* Lift parent(y) to one function earlier for clarify```
``` 26ed753 ```
``` Simply call _copy instead of passing through the broadcast machinery … ```
`…again`
``` 9bec023 ```

### jmertforce-pushed the jmert:sparse_outer branch from `e28e256` to `9bec023`Jan 5, 2019

Contributor Author

### jmert commented Jan 5, 2019

 Can you rebase on master (unless you did it in the last day or two) so that we pick up all the CI fixes? No problem — done.

### andreasnoack merged commit `dffe119` into JuliaLang:master Jan 7, 2019 3 checks passed

#### 3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
julia freebsd ci Build done
Details
Contributor

### KristofferC commented Jan 10, 2019

 Can you rebase on master (unless you did it in the last day or two) so that we pick up all the CI fixes? Note that CI runs on the merge commit so rebasing doesn't really do anything w.r.t to CI. I don't really see how this can be backported to a patch release since it is changing behavior? Tentatively removing backport label. Also, this needs a NEWS entry.

Member

### mbauman commented Jan 10, 2019

 Agreed — this shouldn't be backported.

### jmert deleted the jmert:sparse_outer branch Jan 13, 2019

added a commit to jmert/julia that referenced this pull request Jan 13, 2019
``` News for JuliaLang#24980 (sparsity preserving outer products) ```
``` c2869e3 ```
referenced this pull request Jan 13, 2019
Contributor

### abraunst commented Jan 19, 2019 • edited

 Should `kron(::Diagonal, ::Union{SparseVector, SparseMatrix})` (and viceversa) return `SparseMatrixCSC`? (sure!) Should `kron(::Diagonal, ::Matrix)` (and viceversa) return `SparseMatrixCSC`? (this would be consistent with this PR -- but currently there is an implementation for diagonal times dense...)
referenced this pull request Jan 20, 2019
added a commit that referenced this pull request Jan 20, 2019
``` News for #24980 (sparsity preserving outer products) (#30710) ```
``` 51849df ```

### saolof commented Apr 5, 2019

 This will be utterly amazing for my research and I'd like to thank you all for working on this.