Skip to content

[SYSTEMDS-2748] TSMM Compressed Optimize#1177

Closed
Baunsgaard wants to merge 1 commit intoapache:masterfrom
Baunsgaard:OptTransposeSelfCompress
Closed

[SYSTEMDS-2748] TSMM Compressed Optimize#1177
Baunsgaard wants to merge 1 commit intoapache:masterfrom
Baunsgaard:OptTransposeSelfCompress

Conversation

@Baunsgaard
Copy link
Contributor

No description provided.

This commit optimize the transpose self matrix multiplication in
compressed space.
There are a few key upgrades

- We avoid calculating the lower triangle of the output
  (~50% improvement)
- The parallelization scheme is containing equal number of elements,
  since only the upper triangle have to calculated, this means that
  the same task is tasked with a row and the reflected row,
  reflected in the middle row of the matrix.
  (no tailing tasks giving full cpu util)
- Second the diagonal is calculated without decompression, since
  the diagonal values can be computed without decompression by simply
  multiplying the dictionary entries.
  This makes tsmm, on single colGroup matrices many times faster than
  normal tsmm.
  (exploit cocode)
@Baunsgaard Baunsgaard force-pushed the OptTransposeSelfCompress branch from 66e160b to ba1ba26 Compare February 4, 2021 09:57
@Baunsgaard Baunsgaard closed this Feb 4, 2021
@Baunsgaard Baunsgaard deleted the OptTransposeSelfCompress branch March 7, 2021 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant