Skip to content

[SYSTEMDS-3123] Rewrite cbind 0 Matrix Multiplication#1385

Closed
Baunsgaard wants to merge 4 commits intoapache:masterfrom
Baunsgaard:REWRITEMMCBIND
Closed

[SYSTEMDS-3123] Rewrite cbind 0 Matrix Multiplication#1385
Baunsgaard wants to merge 4 commits intoapache:masterfrom
Baunsgaard:REWRITEMMCBIND

Conversation

@Baunsgaard
Copy link
Copy Markdown
Contributor

This PR contains a rewrite that change the sequences if number of rows in X is 2x larger than Y:

 cbind((X %*% Y), matrix(0, nrow(X), 1)) -> X %*% (cbind(Y, matrix(0, nrow(Y), 1)))

This rewrite effects MLogReg in line 215 to not force allocation of the large X twice.

Review appreciated!

@Shafaq-Siddiqi
Copy link
Copy Markdown
Contributor

LGTM!

@Baunsgaard Baunsgaard changed the title [SYSTEMDS-???] Rewrite cbind 0 Matrix Multiplication [SYSTEMDS-3123] Rewrite cbind 0 Matrix Multiplication Sep 8, 2021
@mboehm7
Copy link
Copy Markdown
Contributor

mboehm7 commented Sep 8, 2021

Thanks for staring the discussion, a few additional points to consider (besides separating it from the compression-related changed):

  • we need to check for know dims, otherwise the rewrite also triggers if hi.getDim2() is unknown (-1), and as we don't consider the other way around, it would never come back during recompilation
  • The overall benefit comes from reduced allocation in the cbind output (while incurring more floating point operations in the matrix multiply in some kernels that avoid branches in the inner loop). Accordingly, we cannot just look at hi.getDim1() > hi.getDim2() * 2. Instead given a m-x-n matrix X and n-x-k matrix Y, we have to compare the following: (a): mk + m(k+1) versus (b) nk + m(k+1), which reduces to comparing m and n, not m and (k+1), isn't it?

@Baunsgaard
Copy link
Copy Markdown
Contributor Author

Thanks for staring the discussion, a few additional points to consider (besides separating it from the compression-related changed):

I merged in the compression changes when i was happy with the rewrite.
I will separate the different elements in different commits once it goes to master.

* we need to check for know dims, otherwise the rewrite also triggers if hi.getDim2() is unknown (-1), and as we don't consider the other way around, it would never come back during recompilation

Yes good point, i will include it in the a check that the dims are known

* The overall benefit comes from reduced allocation in the cbind output (while incurring more floating point operations in the matrix multiply in some kernels that avoid branches in the inner loop). Accordingly, we cannot just look at `hi.getDim1() > hi.getDim2() * 2`. Instead given a m-x-n matrix X and n-x-k matrix Y, we have to compare the following: (a): m_k + m_(k+1) versus (b) n_k + m_(k+1), which reduces to comparing m and n, not m and (k+1), isn't it?

The comparison i make currently is based on the extra allocation therefore it has to compare m (the left hand side number of rows) with n (the common dimension/ number of rows in right).
this is because these two correspond to the memory allocation in the cbind operation. To make the rewrite better i could include a cost of the matrix multiplication for cbind before vs after. this would make the rewrite more stable.
Alternatively i think what i am going to do is to set the hi.getDim1() > hi.getDim2() * 2 to something like hi.getDim1() > hi.getDim2() * 100, since this still tricker the rewrite in the place intended, and is less prone to executing the rewrite in bad locations. (not that it matters much it is a very specific rewrite.)

@mboehm7
Copy link
Copy Markdown
Contributor

mboehm7 commented Sep 8, 2021

no, you misunderstood my main comment - we need to compare m vs n (and apparently that's what you wanted to do), but what the code does is to compare m vs (k+1) because you look at rows and columns of the output cbind (variable hi) - simply replace hi.getDim2() with hi.getInput(1).getDim1()

@Baunsgaard Baunsgaard force-pushed the REWRITEMMCBIND branch 2 times, most recently from 233ff13 to 3a42ae0 Compare September 14, 2021 11:25
- Compressed matrix factory improvements
- Add decompression if the data is serialized and larger in compressed format
- Decompress on write to HDFS
- Abort compression after cocode if the compression sizes are bad
- Make c bind decompressing in workload tree (worst case)
- Add a minimum compression ratio argument to the CompressionSettings
- Reduce the sampling size in c bind compression and set high minimum compression ratio
- Fix order of operations in compressed append
- Add compressed output size to unary hops
- More utilization of the cached decompressed matrix if it fits in
memory by looking for soft reference of uncompressed in certain cases
```
 cbind((X %*% Y), matrix(0, nrow(X), 1))

 ->

 X %*% (cbind(Y, matrix(0, nrow(Y), 1)))
```

This commit contains a rewrite that change the sequences if number
of rows in X is 2x larger than Y:

This rewrite effects MLogReg in line 215 to not force allocation of the
large X twice.
@Baunsgaard Baunsgaard force-pushed the REWRITEMMCBIND branch 3 times, most recently from c532220 to 7f28983 Compare September 15, 2021 12:17
This commit move the decompression instruction to the input hop
of a decompressing instruction execution on the workload trees.
this in practice means that if a variable is used in a forloop
and needs decompression it is taken into account that the decompression
only happens once, and outside the for loop.

- Update to allow return of constant column groups in various cases
- Remove System.out.println from estim estimatorDensityMap
- Add Compression in loop right mult with decompression
- Fix Binary matrix matrix operation Compressed
- Add isDensifying boolean to compression cost, to allow compression
to compare to dense allocation.
@Baunsgaard Baunsgaard deleted the REWRITEMMCBIND branch September 25, 2021 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants