Skip to content

[SYSTEMDS-3037] CLA Spark support#1320

Closed
Baunsgaard wants to merge 32 commits intoapache:masterfrom
Baunsgaard:CLASpark
Closed

[SYSTEMDS-3037] CLA Spark support#1320
Baunsgaard wants to merge 32 commits intoapache:masterfrom
Baunsgaard:CLASpark

Conversation

@Baunsgaard
Copy link
Copy Markdown
Contributor

This PR adds spark support back for CLA, with various tests for verifying that no crashes is experienced. Further work is to allow workload aware SP compression, but currently the support covers default compression instructions.

Currently the main limitation is left multiplications where if the compressed side of a multiplication is smaller (in terms of cells) it is re blocked to be broadcasted, in that process the all blocks in that matrix RDD is decompressed.

…ning

This commit contains a new package in the compression framework, that
allow selection of cost models for compression.
One classical cost model is memory size, in this commit there are
multiple more, most notably
Workload analysis now counts operations left and right

MEMORY, W_TREE, HYBRID, DISTINCT.

- Memory is the memory based optimization,
- W_TREE is based on the extracted WTree of instructions
- HYBRID is a combination of memory and W_TREE,
- DISTINCT is based on making the dictionaries as small as possible.
- LEFT_MATRIX_MULT is based on optimizing only for Left Matrix Multiplications
- Decompression is ... for decompression time
- TSMM for fast transpose self matrix multiplication

Additionally to these additions all CoCoding thechniques now optimize
towards the given type of cost model, making it posible to make for
for instance BIN_PACKING DISTINCT optimization, or HYBRID Brute Force,
etc.
…load tree and make a cost estimator for spark instructions that is serializable
@Baunsgaard
Copy link
Copy Markdown
Contributor Author

Closing Because of continued work in #1322

@Baunsgaard Baunsgaard closed this Jun 23, 2021
@Baunsgaard Baunsgaard deleted the CLASpark branch July 1, 2021 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant