Skip to content

Various Updates in CLA#1276

Closed
Baunsgaard wants to merge 7 commits intoapache:masterfrom
Baunsgaard:DecompressionOptimization
Closed

Various Updates in CLA#1276
Baunsgaard wants to merge 7 commits intoapache:masterfrom
Baunsgaard:DecompressionOptimization

Conversation

@Baunsgaard
Copy link
Copy Markdown
Contributor

This PR contains multiple updates to the compression framework:

  • Decompression Parallelization optimizations, to be more friendly to cache.
  • Compressed Left Matrix Multiplication Subtract common element trick to skip many rows.
  • Left Matrix Multiplication exploiting Sparsity in Dictionary
  • MatrixBlockDictionary, building for sparse dictionaries, and in the future recursive compression.
  • Fully cost based Compression coCoding algorithm, to optimize the coCoding of columns based on a potpourri of operations given to the compression framework.

The PR is not ready for merge, but here for testing.

@Baunsgaard Baunsgaard force-pushed the DecompressionOptimization branch from 201cd8f to c0f0d6a Compare May 24, 2021 10:44
@Baunsgaard Baunsgaard changed the title [WIP] Decompression optimization Various Updates in CLA May 31, 2021
@Baunsgaard Baunsgaard force-pushed the DecompressionOptimization branch from 86db6f5 to e117dff Compare June 1, 2021 09:52
This commit change the blocking of the decompression, to no longer align
perfectly with 64k blocks, since if a column group contain many columns
this is sub optimal.
A future update is to introduce skip lists in SDC col Group since these
suffer from iterating through their offset lists, of reach thread starting
at a offset.

[SYSTEMDS-3000] CLA MM Most Common Element Addition

This commit adds an exploitation of the compressed representation
that allows add the most common element when multiplying on the
left side with a compressed transposed matrix.
This is a common occurrence in MMChain and TSMM and allows sparsity
exploitation of dense compressed column groups.
add sparse dictionary, and cocodeMatrixCost
mm mult cost update remove scaling with sparsity
Compressed left multiplication have two phases, first preaggregation
then a matrix multiplication. This commit make the matrix mult use the
default systemds kernels. This allows for exploitation of the various
dedicated mm kernels already in SystemDS.
Initial version of cost based mm cocoding
This commit change the dictionary of the column groups to support
MatrixBlocks, this further enforce the previous design of using already
implemented kernels, and allow for sparse dictionary exploitation in
operations.
Add various tests compression

- InsertionSorterTests
- OffsetTests
- MappingTests

Minor bug fixes and better mapping test,

Better compression ration on SDC with 3 distinct elements since the
dictionary contains number of distinct elements -1 for SDC. Therefore if
the SDC contain 3 distinct values, it only need 2 distinct identifiers
in the dictionary.
@Baunsgaard Baunsgaard force-pushed the DecompressionOptimization branch from e117dff to af16b5d Compare June 1, 2021 09:54
@Baunsgaard Baunsgaard closed this in d430902 Jun 1, 2021
@Baunsgaard Baunsgaard deleted the DecompressionOptimization branch June 8, 2021 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant