Skip to content

[SYSTEMDS-2885] CLA MMChain Optimization#1197

Merged
Baunsgaard merged 5 commits intoapache:masterfrom
Baunsgaard:MMChainOptimization
Mar 9, 2021
Merged

[SYSTEMDS-2885] CLA MMChain Optimization#1197
Baunsgaard merged 5 commits intoapache:masterfrom
Baunsgaard:MMChainOptimization

Conversation

@Baunsgaard
Copy link
Contributor

This commit modifies the code our MMChain operation in CLA to use the
matrix operations rather than the vector operations. furthermore if found
to be prudent the mmchain will now no longer decompress.

The right Matrix Multiplication is changed to include a decompression
from compressed overlapping, since the decompression operation is more
optimized than the decompression internal to the right matrix multiplication.
This also gives a clearer view of where we are using our time in the
statistics output of the execution.

The modifications made LmCG go from ~250 to ~90 sec
while ULA is at 200sec (unlike the paper this is with num cols iterations)

@Baunsgaard Baunsgaard force-pushed the MMChainOptimization branch from 96d7a52 to 082b2c2 Compare March 9, 2021 15:35
This commit modifies the code our MMChain operation in CLA to use the
matrix operations rather than the vector operations. furthermore if found
to be prudent the mmchain will now no longer decompress.

The right Matrix Multiplication is changed to include a decompression
from compressed overlapping, since the decompression operation is more
optimized than the decompression internal to the right matrix multiplication.
This also gives a clearer view of where we are using our time in the
statistics output of the execution.

The modifications made LmCG go from ~250 to ~90 sec
while ULA is at 200sec (unlike the paper this is with num cols iterations)
This commit adds an abstraction of insertionSort for construction of
SDC colGroups, Previously the merge of all the arrays for indexes
of each dictionary entry increased the compression time by an order of
magnitude when SDC groups were selected. Now there is only a few ms
difference.

The abstraction allows us to down the line implement more efficient
insertion tree abstracts for different cases.
The row aggregate is usually slow in CLA, this was further amplified by
inefficient row aggregates in DDC by use of "quick" get and set operations
on MatrixBlocks. This commits removes this abstraction layer and works
directly on the underlying double arrays.
Furthermore all error correcting from Kahn is removed from the compressed
operations reducing allocations from aggregates by 2 or 3 x memory used.

This improved execution times for InfiniMNIST 1m to 2x slower than ULA for
sparse inputs and equal on dense.
On BinaryMNIST 1m row aggregates are now 10x faster (both sparse and dense)
@Baunsgaard Baunsgaard force-pushed the MMChainOptimization branch from 082b2c2 to de6574a Compare March 9, 2021 19:44
@Baunsgaard Baunsgaard merged commit de6574a into apache:master Mar 9, 2021
@github-pages github-pages bot temporarily deployed to github-pages March 9, 2021 20:27 Inactive
@Baunsgaard Baunsgaard deleted the MMChainOptimization branch March 30, 2021 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant