Skip to content

[SYSTEMDS-THESIS] Add a DP optimization for matrix chains with transposes#2465

Closed
Elmanjhg wants to merge 2 commits into
apache:mainfrom
Elmanjhg:DPSizeRewrite
Closed

[SYSTEMDS-THESIS] Add a DP optimization for matrix chains with transposes#2465
Elmanjhg wants to merge 2 commits into
apache:mainfrom
Elmanjhg:DPSizeRewrite

Conversation

@Elmanjhg
Copy link
Copy Markdown
Contributor

@Elmanjhg Elmanjhg commented May 4, 2026

This adds a new HOP rewrite rule, RewriteMatrixMultChainWithTransOptimization.java, to find the optimal execution plan for matrix multiplication chains containing transposes. Previously, these chains were optimized using a simple heuristic that just pushes transposes down from t(A %% B) -> t(B) %% t(A), which fails to be the optimal plan in some instances especially with large matrices.

An example would be R = t(A %% B) %% C with dimensions A = [16, 23], B = [23, 22], C = [16, 34]
which would be according to the old rewrite class solved with (t(B) %% t(A)) %% C -> costs: t(B) -> 2322 + t(A) -> 16 * 23 + t(B) %% t(A) -> 222316 + [...] %% C -> 221634 = 20938 FLOPs
Optimal would be simply: t(A %
% B) %% C - costs: A %% B -> 162322 + t(A %% B) -> 1622 + [...] %% C -> 2216*34 = 20416 FLOPs - difference gets larger with higher matrix dimensions.

To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper.

This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement. = 20938 FLOPs
Optimal would be simply: t(A %% B) %% C - costs: A %% B -> 162322 + t(A %% B) -> 1622 + [...] %% C -> 221634 = 20416 FLOPs - difference gets larger with higher matrix dimensions.

To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper.

This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement.

Elmanjhg added 2 commits March 3, 2026 20:07
I added 5 scripts as dml script with matrix chain multplications and transposes included, with comments that state the optimal Rewrite Plan.
…oses

This adds a new HOP rewrite rule, RewriteMatrixMultChainWithTransOptimization.java, to find the optimal execution plan for matrix multiplication chains containing transposes. Previously, these chains were optimized using a simple heuristic that just pushes transposes down from t(A %*% B) -> t(B) %*% t(A), which fails to be the optimal plan in some instances especially with large matrices.

An example would be R = t(A %*% B) %*% C with dimensions A = [16, 23], B = [23, 22], C = [16, 34]
which would be according to the old rewrite class solved with (t(B) %*% t(A)) %*% C -> costs: t(B) -> 23*22 + t(A) -> 16 * 23 + t(B) %*% t(A) -> 22*23*16 + [...] %*% C -> 22*16*34 = 20938 FLOPs
Optimal would be simply: t(A %*% B) %*% C - costs: A %*% B -> 16*23*22 + t(A %*% B) -> 16*22 + [...] %*% C -> 22*16*34 = 20416 FLOPs - difference gets larger with higher matrix dimensions.

To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper.

This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement. = 20938 FLOPs
Optimal would be simply: t(A %*% B) %*% C - costs: A %*% B -> 16*23*22 + t(A %*% B) -> 16*22 + [...] %*% C -> 22*16*34 = 20416 FLOPs - difference gets larger with higher matrix dimensions.

To solve this, we applied a DP Algorithm with a Memo Table containing Plans without transposing and Plans containing Transposing subchains calculating wether an algebraic transpose pushdown or direct transpose operation is cheaper.

This also includes 24 automated DML test cases asserting intermediate HOP dimensions to validate optimal parenthesization and transpose placement.
@mboehm7
Copy link
Copy Markdown
Contributor

mboehm7 commented May 4, 2026

LGTM - thanks for the patch @Elmanjhg. During the merge I resolved the merge conflict of the pom.xml (and reverted the new dependency), disabled the new flags, added the licenses to java and dml test files, moved the dml test files to rewrites, and fixed the formatting (tabs vs spaces) in the java test.

@mboehm7 mboehm7 closed this in b748091 May 4, 2026
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants