Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses #9708

Closed
wants to merge 10 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Mar 20, 2024

Which issue does this PR close?

Related to #9637

Rationale for this change

@jayzhan211 @mustafasrepo and I have been discussing how to speed up planning by avoiding copying Exprs / LogicalPlans

What changes are included in this PR?

This PR is a prototype of a suggestion here #9637 (comment) to see if I can get a significant boost in Expr simplify

The theory is that if I can rerite the optimizer to take ownership of the plans they can be rewritten in place

If this works out, I'll turn this into an actual PR for review.

Update: this approach improves planning speed by 25%

Are these changes tested?

NA

Timings (main):

cargo bench   --bench sql_planner -- physical_plan_tpch_all

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.4s, or reduce sample count to 60.
physical_plan_tpch_all  time:   [70.511 ms 71.102 ms 71.828 ms]
                        change: [-1.3186% +2.1648% +4.6865%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

This branch

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.4s, or reduce sample count to 90.
physical_plan_tpch_all  time:   [54.371 ms 54.512 ms 54.675 ms]
                        change: [-24.152% -23.333% -22.648%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core datafusion crate labels Mar 20, 2024
@@ -279,10 +294,10 @@ impl Optimizer {
/// invoking observer function after each call
pub fn optimize<F>(
&self,
plan: &LogicalPlan,
plan: LogicalPlan,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewriting the optimizer passes to take in owned LogicalPlan is the key idea

I haven't plumbed it completely through yet, but the approach is looking promising

@alamb
Copy link
Contributor Author

alamb commented Mar 21, 2024

Update: I think this PR now works well enough to show this is a promising approach. Specifically, by just fixing SimplifyExprs to not copy plans around, the planning benchmark goes 25% faster. I am pretty sure if we fix common subexpr eliminate and a few other passes we could get the benchmark running 2x as fast

physical_plan_tpch_all  time:   [54.371 ms 54.512 ms 54.675 ms]
                        change: [-24.152% -23.333% -22.648%] (p = 0.00 < 0.05)
                        Performance has improved.

The code in this PR is atrocious, but I think works well enough to demonstrate the idea

The core problem is that when the LogicalPasses work, they copy the LogicalPlans many times (e.g. each call to new_with_exprs or new_with_children results in Copying the entire node and its embedded exprs (though not its children as they are Arcd)

cc @sadboy who I think has observed this before as well

@alamb alamb closed this Mar 21, 2024
@alamb alamb reopened this Mar 21, 2024
@alamb
Copy link
Contributor Author

alamb commented Mar 24, 2024

Continuing in #9780, so closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant