[SYSTEMDS-2885] CLA MMChain Optimization#1197
Merged
Baunsgaard merged 5 commits intoapache:masterfrom Mar 9, 2021
Merged
Conversation
96d7a52 to
082b2c2
Compare
This commit modifies the code our MMChain operation in CLA to use the matrix operations rather than the vector operations. furthermore if found to be prudent the mmchain will now no longer decompress. The right Matrix Multiplication is changed to include a decompression from compressed overlapping, since the decompression operation is more optimized than the decompression internal to the right matrix multiplication. This also gives a clearer view of where we are using our time in the statistics output of the execution. The modifications made LmCG go from ~250 to ~90 sec while ULA is at 200sec (unlike the paper this is with num cols iterations)
This commit adds an abstraction of insertionSort for construction of SDC colGroups, Previously the merge of all the arrays for indexes of each dictionary entry increased the compression time by an order of magnitude when SDC groups were selected. Now there is only a few ms difference. The abstraction allows us to down the line implement more efficient insertion tree abstracts for different cases.
The row aggregate is usually slow in CLA, this was further amplified by inefficient row aggregates in DDC by use of "quick" get and set operations on MatrixBlocks. This commits removes this abstraction layer and works directly on the underlying double arrays. Furthermore all error correcting from Kahn is removed from the compressed operations reducing allocations from aggregates by 2 or 3 x memory used. This improved execution times for InfiniMNIST 1m to 2x slower than ULA for sparse inputs and equal on dense. On BinaryMNIST 1m row aggregates are now 10x faster (both sparse and dense)
082b2c2 to
de6574a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit modifies the code our MMChain operation in CLA to use the
matrix operations rather than the vector operations. furthermore if found
to be prudent the mmchain will now no longer decompress.
The right Matrix Multiplication is changed to include a decompression
from compressed overlapping, since the decompression operation is more
optimized than the decompression internal to the right matrix multiplication.
This also gives a clearer view of where we are using our time in the
statistics output of the execution.
The modifications made LmCG go from ~250 to ~90 sec
while ULA is at 200sec (unlike the paper this is with num cols iterations)