[SYSTEMDS-THESIS] Add a DP optimization for matrix chains with transposes#2465
Closed
Elmanjhg wants to merge 2 commits into
Closed
[SYSTEMDS-THESIS] Add a DP optimization for matrix chains with transposes#2465Elmanjhg wants to merge 2 commits into
Elmanjhg wants to merge 2 commits into
Conversation
I added 5 scripts as dml script with matrix chain multplications and transposes included, with comments that state the optimal Rewrite Plan.
…oses This adds a new HOP rewrite rule, RewriteMatrixMultChainWithTransOptimization.java, to find the optimal execution plan for matrix multiplication chains containing transposes. Previously, these chains were optimized using a simple heuristic that just pushes transposes down from t(A %*% B) -> t(B) %*% t(A), which fails to be the optimal plan in some instances especially with large matrices. An example would be R = t(A %*% B) %*% C with dimensions A = [16, 23], B = [23, 22], C = [16, 34] which would be according to the old rewrite class solved with (t(B) %*% t(A)) %*% C -> costs: t(B) -> 23*22 + t(A) -> 16 * 23 + t(B) %*% t(A) -> 22*23*16 + [...] %*% C -> 22*16*34 = 20938 FLOPs Optimal would be simply: t(A %*% B) %*% C - costs: A %*% B -> 16*23*22 + t(A %*% B) -> 16*22 + [...] %*% C -> 22*16*34 = 20416 FLOPs - difference gets larger with higher matrix dimensions. To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper. This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement. = 20938 FLOPs Optimal would be simply: t(A %*% B) %*% C - costs: A %*% B -> 16*23*22 + t(A %*% B) -> 16*22 + [...] %*% C -> 22*16*34 = 20416 FLOPs - difference gets larger with higher matrix dimensions. To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper. This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement.
Contributor
|
LGTM - thanks for the patch @Elmanjhg. During the merge I resolved the merge conflict of the pom.xml (and reverted the new dependency), disabled the new flags, added the licenses to java and dml test files, moved the dml test files to rewrites, and fixed the formatting (tabs vs spaces) in the java test. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds a new HOP rewrite rule, RewriteMatrixMultChainWithTransOptimization.java, to find the optimal execution plan for matrix multiplication chains containing transposes. Previously, these chains were optimized using a simple heuristic that just pushes transposes down from t(A %% B) -> t(B) %% t(A), which fails to be the optimal plan in some instances especially with large matrices.
An example would be R = t(A %% B) %% C with dimensions A = [16, 23], B = [23, 22], C = [16, 34]
which would be according to the old rewrite class solved with (t(B) %% t(A)) %% C -> costs: t(B) -> 2322 + t(A) -> 16 * 23 + t(B) %% t(A) -> 222316 + [...] %% C -> 221634 = 20938 FLOPs
Optimal would be simply: t(A %% B) %% C - costs: A %% B -> 162322 + t(A %% B) -> 1622 + [...] %% C -> 2216*34 = 20416 FLOPs - difference gets larger with higher matrix dimensions.
To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper.
This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement. = 20938 FLOPs
Optimal would be simply: t(A %% B) %% C - costs: A %% B -> 162322 + t(A %% B) -> 1622 + [...] %% C -> 221634 = 20416 FLOPs - difference gets larger with higher matrix dimensions.
To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper.
This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement.