[SYSTEMDS-2885] CLA MMChain Optimization by Baunsgaard · Pull Request #1197 · apache/systemds

Baunsgaard · 2021-03-08T10:11:00Z

This commit modifies the code our MMChain operation in CLA to use the
matrix operations rather than the vector operations. furthermore if found
to be prudent the mmchain will now no longer decompress.

The right Matrix Multiplication is changed to include a decompression
from compressed overlapping, since the decompression operation is more
optimized than the decompression internal to the right matrix multiplication.
This also gives a clearer view of where we are using our time in the
statistics output of the execution.

The modifications made LmCG go from ~250 to ~90 sec
while ULA is at 200sec (unlike the paper this is with num cols iterations)

This commit modifies the code our MMChain operation in CLA to use the matrix operations rather than the vector operations. furthermore if found to be prudent the mmchain will now no longer decompress. The right Matrix Multiplication is changed to include a decompression from compressed overlapping, since the decompression operation is more optimized than the decompression internal to the right matrix multiplication. This also gives a clearer view of where we are using our time in the statistics output of the execution. The modifications made LmCG go from ~250 to ~90 sec while ULA is at 200sec (unlike the paper this is with num cols iterations)

This commit adds an abstraction of insertionSort for construction of SDC colGroups, Previously the merge of all the arrays for indexes of each dictionary entry increased the compression time by an order of magnitude when SDC groups were selected. Now there is only a few ms difference. The abstraction allows us to down the line implement more efficient insertion tree abstracts for different cases.

The row aggregate is usually slow in CLA, this was further amplified by inefficient row aggregates in DDC by use of "quick" get and set operations on MatrixBlocks. This commits removes this abstraction layer and works directly on the underlying double arrays. Furthermore all error correcting from Kahn is removed from the compressed operations reducing allocations from aggregates by 2 or 3 x memory used. This improved execution times for InfiniMNIST 1m to 2x slower than ULA for sparse inputs and equal on dense. On BinaryMNIST 1m row aggregates are now 10x faster (both sparse and dense)

Baunsgaard force-pushed the MMChainOptimization branch from 96d7a52 to 082b2c2 Compare March 9, 2021 15:35

Baunsgaard added 5 commits March 9, 2021 20:44

[MINOR] l2svm predict builtin

bc9880c

[MINOR] Repeat test argument in pom

de6574a

Baunsgaard force-pushed the MMChainOptimization branch from 082b2c2 to de6574a Compare March 9, 2021 19:44

Baunsgaard merged commit de6574a into apache:master Mar 9, 2021

github-pages bot temporarily deployed to github-pages March 9, 2021 20:27 Inactive

Baunsgaard deleted the MMChainOptimization branch March 30, 2021 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-2885] CLA MMChain Optimization#1197

[SYSTEMDS-2885] CLA MMChain Optimization#1197
Baunsgaard merged 5 commits intoapache:masterfrom
Baunsgaard:MMChainOptimization

Baunsgaard commented Mar 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Baunsgaard commented Mar 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant