[SYSTEMDS-3123] Rewrite cbind 0 Matrix Multiplication by Baunsgaard · Pull Request #1385 · apache/systemds

Baunsgaard · 2021-09-07T17:19:16Z

This PR contains a rewrite that change the sequences if number of rows in X is 2x larger than Y:

 cbind((X %*% Y), matrix(0, nrow(X), 1)) -> X %*% (cbind(Y, matrix(0, nrow(Y), 1)))

This rewrite effects MLogReg in line 215 to not force allocation of the large X twice.

Review appreciated!

Shafaq-Siddiqi · 2021-09-08T10:04:36Z

LGTM!

mboehm7 · 2021-09-08T20:06:55Z

Thanks for staring the discussion, a few additional points to consider (besides separating it from the compression-related changed):

we need to check for know dims, otherwise the rewrite also triggers if hi.getDim2() is unknown (-1), and as we don't consider the other way around, it would never come back during recompilation
The overall benefit comes from reduced allocation in the cbind output (while incurring more floating point operations in the matrix multiply in some kernels that avoid branches in the inner loop). Accordingly, we cannot just look at hi.getDim1() > hi.getDim2() * 2. Instead given a m-x-n matrix X and n-x-k matrix Y, we have to compare the following: (a): mk + m(k+1) versus (b) nk + m(k+1), which reduces to comparing m and n, not m and (k+1), isn't it?

Baunsgaard · 2021-09-08T20:44:11Z

Thanks for staring the discussion, a few additional points to consider (besides separating it from the compression-related changed):

I merged in the compression changes when i was happy with the rewrite.
I will separate the different elements in different commits once it goes to master.

* we need to check for know dims, otherwise the rewrite also triggers if hi.getDim2() is unknown (-1), and as we don't consider the other way around, it would never come back during recompilation

Yes good point, i will include it in the a check that the dims are known

* The overall benefit comes from reduced allocation in the cbind output (while incurring more floating point operations in the matrix multiply in some kernels that avoid branches in the inner loop). Accordingly, we cannot just look at `hi.getDim1() > hi.getDim2() * 2`. Instead given a m-x-n matrix X and n-x-k matrix Y, we have to compare the following: (a): m_k + m_(k+1) versus (b) n_k + m_(k+1), which reduces to comparing m and n, not m and (k+1), isn't it?

The comparison i make currently is based on the extra allocation therefore it has to compare m (the left hand side number of rows) with n (the common dimension/ number of rows in right).
this is because these two correspond to the memory allocation in the cbind operation. To make the rewrite better i could include a cost of the matrix multiplication for cbind before vs after. this would make the rewrite more stable.
Alternatively i think what i am going to do is to set the hi.getDim1() > hi.getDim2() * 2 to something like hi.getDim1() > hi.getDim2() * 100, since this still tricker the rewrite in the place intended, and is less prone to executing the rewrite in bad locations. (not that it matters much it is a very specific rewrite.)

mboehm7 · 2021-09-08T20:48:15Z

no, you misunderstood my main comment - we need to compare m vs n (and apparently that's what you wanted to do), but what the code does is to compare m vs (k+1) because you look at rows and columns of the output cbind (variable hi) - simply replace hi.getDim2() with hi.getInput(1).getDim1()

- Compressed matrix factory improvements - Add decompression if the data is serialized and larger in compressed format - Decompress on write to HDFS - Abort compression after cocode if the compression sizes are bad - Make c bind decompressing in workload tree (worst case) - Add a minimum compression ratio argument to the CompressionSettings - Reduce the sampling size in c bind compression and set high minimum compression ratio - Fix order of operations in compressed append - Add compressed output size to unary hops - More utilization of the cached decompressed matrix if it fits in memory by looking for soft reference of uncompressed in certain cases

``` cbind((X %*% Y), matrix(0, nrow(X), 1)) -> X %*% (cbind(Y, matrix(0, nrow(Y), 1))) ``` This commit contains a rewrite that change the sequences if number of rows in X is 2x larger than Y: This rewrite effects MLogReg in line 215 to not force allocation of the large X twice.

This commit move the decompression instruction to the input hop of a decompressing instruction execution on the workload trees. this in practice means that if a variable is used in a forloop and needs decompression it is taken into account that the decompression only happens once, and outside the for loop. - Update to allow return of constant column groups in various cases - Remove System.out.println from estim estimatorDensityMap - Add Compression in loop right mult with decompression - Fix Binary matrix matrix operation Compressed - Add isDensifying boolean to compression cost, to allow compression to compare to dense allocation.

Baunsgaard changed the title ~~[SYSTEMDS-???] Rewrite cbind 0 Matrix Multiplication~~ [SYSTEMDS-3123] Rewrite cbind 0 Matrix Multiplication Sep 8, 2021

Baunsgaard force-pushed the REWRITEMMCBIND branch from dc6c2f7 to dbf7498 Compare September 8, 2021 17:19

Baunsgaard force-pushed the REWRITEMMCBIND branch 2 times, most recently from 233ff13 to 3a42ae0 Compare September 14, 2021 11:25

Baunsgaard added 2 commits September 15, 2021 13:32

Baunsgaard force-pushed the REWRITEMMCBIND branch 3 times, most recently from c532220 to 7f28983 Compare September 15, 2021 12:17

Baunsgaard force-pushed the REWRITEMMCBIND branch from 9b20fd6 to 2794ba1 Compare September 20, 2021 09:35

add missing minus on empty operator

e866b42

Baunsgaard closed this in 61f40bd Sep 20, 2021

Baunsgaard deleted the REWRITEMMCBIND branch September 25, 2021 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-3123] Rewrite cbind 0 Matrix Multiplication#1385

[SYSTEMDS-3123] Rewrite cbind 0 Matrix Multiplication#1385
Baunsgaard wants to merge 4 commits intoapache:masterfrom
Baunsgaard:REWRITEMMCBIND

Baunsgaard commented Sep 7, 2021

Uh oh!

Shafaq-Siddiqi commented Sep 8, 2021

Uh oh!

mboehm7 commented Sep 8, 2021

Uh oh!

Baunsgaard commented Sep 8, 2021

Uh oh!

mboehm7 commented Sep 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Baunsgaard commented Sep 7, 2021

Uh oh!

Shafaq-Siddiqi commented Sep 8, 2021

Uh oh!

mboehm7 commented Sep 8, 2021

Uh oh!

Baunsgaard commented Sep 8, 2021

Uh oh!

mboehm7 commented Sep 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants