[SYSTEMDS-3037] CLA Spark support#1320
Closed
Baunsgaard wants to merge 32 commits intoapache:masterfrom
Closed
Conversation
…ning This commit contains a new package in the compression framework, that allow selection of cost models for compression. One classical cost model is memory size, in this commit there are multiple more, most notably Workload analysis now counts operations left and right MEMORY, W_TREE, HYBRID, DISTINCT. - Memory is the memory based optimization, - W_TREE is based on the extracted WTree of instructions - HYBRID is a combination of memory and W_TREE, - DISTINCT is based on making the dictionaries as small as possible. - LEFT_MATRIX_MULT is based on optimizing only for Left Matrix Multiplications - Decompression is ... for decompression time - TSMM for fast transpose self matrix multiplication Additionally to these additions all CoCoding thechniques now optimize towards the given type of cost model, making it posible to make for for instance BIN_PACKING DISTINCT optimization, or HYBRID Brute Force, etc.
…load tree and make a cost estimator for spark instructions that is serializable
…e a jira task for it.
This was referenced Jun 21, 2021
Contributor
Author
|
Closing Because of continued work in #1322 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds spark support back for CLA, with various tests for verifying that no crashes is experienced. Further work is to allow workload aware SP compression, but currently the support covers default compression instructions.
Currently the main limitation is left multiplications where if the compressed side of a multiplication is smaller (in terms of cells) it is re blocked to be broadcasted, in that process the all blocks in that matrix RDD is decompressed.