[SYSTEMDS-2748] TSMM Compressed Optimize by Baunsgaard · Pull Request #1177 · apache/systemds

Baunsgaard · 2021-02-03T09:23:20Z

No description provided.

This commit optimize the transpose self matrix multiplication in compressed space. There are a few key upgrades - We avoid calculating the lower triangle of the output (~50% improvement) - The parallelization scheme is containing equal number of elements, since only the upper triangle have to calculated, this means that the same task is tasked with a row and the reflected row, reflected in the middle row of the matrix. (no tailing tasks giving full cpu util) - Second the diagonal is calculated without decompression, since the diagonal values can be computed without decompression by simply multiplying the dictionary entries. This makes tsmm, on single colGroup matrices many times faster than normal tsmm. (exploit cocode)

Baunsgaard force-pushed the OptTransposeSelfCompress branch from 66e160b to ba1ba26 Compare February 4, 2021 09:57

Baunsgaard closed this Feb 4, 2021

Baunsgaard deleted the OptTransposeSelfCompress branch March 7, 2021 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-2748] TSMM Compressed Optimize#1177

[SYSTEMDS-2748] TSMM Compressed Optimize#1177
Baunsgaard wants to merge 1 commit intoapache:masterfrom
Baunsgaard:OptTransposeSelfCompress

Baunsgaard commented Feb 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Baunsgaard commented Feb 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant