[SYSTEMDS-3492] Lineage-based reuse of all RDDs #1777

phaniarnab · 2023-02-01T08:41:32Z

This patch enables reuse of RDDs of redundant Spark operations. We also persist a subset of operations in the executors, where the rest are just cached locally. Reuse of even unpersisted RDDs allows Spark to apply optimizations and skip stages. In addition, this patch removes the compile-time flag to indicate reuse and instead reuse all RDDs. Local RDD caching is now disabled due to bugs.

LinCache Spark (Col/Loc/Dist): 16/2/2. =>
indicates the number of reused collects/prefetches (=16), local RDDs (=2) and persisted RDDs(=2).

This patch enables reuse of RDDs of redundant Spark operations. We also persist a subset of operations in the executors, where the rest are just cached locally. Reuse of even unpersisted RDDs allows Spark to apply optimizations and skip stages. In addition, this patch removes the compile-time flag to indicate reuse and instead reuse all RDDs. Local RDD caching is now disabled due to bugs. LinCache Spark (Col/Loc/Dist): 16/2/2. => indicates the number of reused collects/prefetches (=16), local RDDs (=2) and persisted RDDs(=2).

phaniarnab closed this in c93f9f9 Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYSTEMDS-3492] Lineage-based reuse of all RDDs #1777

[SYSTEMDS-3492] Lineage-based reuse of all RDDs #1777

Uh oh!

phaniarnab commented Feb 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SYSTEMDS-3492] Lineage-based reuse of all RDDs #1777

[SYSTEMDS-3492] Lineage-based reuse of all RDDs #1777

Uh oh!

Conversation

phaniarnab commented Feb 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant