Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop copying LogicalPlan and Exprs in CommonSubexprEliminate (2-3% planning speed improvement) #10835

Merged
merged 13 commits into from
Jun 19, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 8, 2024

Draft: this is still a draft as the performance doesn't really change -- I need to investigate more

Which issue does this PR close?

Closes #9873

Closes #9637 -- the last one of my planned PRs to make DataFusion planning faster by not copying so much

Note that we have more plans for CSE here: #10426

Rationale for this change

For low latency use cases the overhead of planning can be substantial, especially for relatively complex plans

What changes are included in this PR?

  1. rewrite CommonSubexprEliminate to avoid deep copying plans using TreeNode API

Are these changes tested?

Functionally by existing CI

Performance tests: show a 2% improvement for tpch_all and a 3% improvement for tpcds_all


group                                         main                                   no-cse-copy
-----                                         ----                                   -----------
physical_plan_tpcds_all                       1.03   1218.3±6.43ms        ? ?/sec    1.00   1186.9±5.74ms        ? ?/sec
physical_plan_tpch_all                        1.02     83.5±0.88ms        ? ?/sec    1.00     81.9±0.89ms        ? ?/sec
Details

++ critcmp main no-cse-copy
group                                         main                                   no-cse-copy
-----                                         ----                                   -----------
logical_aggregate_with_join                   1.00  1005.3±23.99µs        ? ?/sec    1.01  1011.1±43.54µs        ? ?/sec
logical_plan_tpcds_all                        1.00    152.1±0.87ms        ? ?/sec    1.01    153.0±0.98ms        ? ?/sec
logical_plan_tpch_all                         1.01     16.8±0.23ms        ? ?/sec    1.00     16.7±0.15ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.8±0.11ms        ? ?/sec    1.00     18.8±0.08ms        ? ?/sec
logical_select_one_from_700                   1.00    810.9±6.64µs        ? ?/sec    1.03   833.9±29.78µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   766.2±24.45µs        ? ?/sec    1.00   762.8±13.87µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   747.9±16.08µs        ? ?/sec    1.00   750.9±10.32µs        ? ?/sec
physical_plan_tpcds_all                       1.03   1218.3±6.43ms        ? ?/sec    1.00   1186.9±5.74ms        ? ?/sec
physical_plan_tpch_all                        1.02     83.5±0.88ms        ? ?/sec    1.00     81.9±0.89ms        ? ?/sec
physical_plan_tpch_q1                         1.00      4.5±0.03ms        ? ?/sec    1.25      5.6±0.04ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.0±0.04ms        ? ?/sec    1.03      4.1±0.09ms        ? ?/sec
physical_plan_tpch_q11                        1.03      3.5±0.02ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.7±0.04ms        ? ?/sec    1.02      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q13                        1.01      2.0±0.02ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.4±0.02ms        ? ?/sec    1.02      2.4±0.01ms        ? ?/sec
physical_plan_tpch_q16                        1.02      3.4±0.04ms        ? ?/sec    1.00      3.3±0.05ms        ? ?/sec
physical_plan_tpch_q17                        1.06      3.3±0.02ms        ? ?/sec    1.00      3.1±0.04ms        ? ?/sec
physical_plan_tpch_q18                        1.02      3.7±0.02ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_tpch_q19                        1.03      5.4±0.07ms        ? ?/sec    1.00      5.2±0.04ms        ? ?/sec
physical_plan_tpch_q2                         1.06      7.2±0.06ms        ? ?/sec    1.00      6.8±0.06ms        ? ?/sec
physical_plan_tpch_q20                        1.09      4.3±0.18ms        ? ?/sec    1.00      3.9±0.05ms        ? ?/sec
physical_plan_tpch_q21                        1.07      5.8±0.07ms        ? ?/sec    1.00      5.4±0.04ms        ? ?/sec
physical_plan_tpch_q22                        1.04      3.1±0.03ms        ? ?/sec    1.00      3.0±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.9±0.02ms        ? ?/sec    1.05      3.1±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.03      2.2±0.02ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.1±0.03ms        ? ?/sec    1.02      4.2±0.04ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1434.6±17.67µs        ? ?/sec    1.02  1467.0±10.84µs        ? ?/sec
physical_plan_tpch_q7                         1.06      5.2±0.03ms        ? ?/sec    1.00      4.9±0.05ms        ? ?/sec
physical_plan_tpch_q8                         1.05      6.6±0.07ms        ? ?/sec    1.00      6.3±0.05ms        ? ?/sec
physical_plan_tpch_q9                         1.06      5.1±0.06ms        ? ?/sec    1.00      4.8±0.15ms        ? ?/sec
physical_select_all_from_1000                 1.30     61.1±0.25ms        ? ?/sec    1.00     47.0±0.19ms        ? ?/sec
physical_select_one_from_700                  1.04      3.5±0.06ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec

Are there any user-facing changes?

Performance Improvements (see above)

@@ -870,37 +870,7 @@ impl LogicalPlan {
LogicalPlan::Filter { .. } => {
assert_eq!(1, expr.len());
let predicate = expr.pop().unwrap();

// filter predicates should not contain aliased expressions so we remove any aliases
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is extract into Filter::remove_aliases as it is needed for CSE.

@@ -173,9 +179,8 @@ impl CommonSubexprEliminate {
&mut common_exprs,
)?;

let mut new_input = self
.try_optimize(input, config)?
.unwrap_or_else(|| input.clone());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of this PR is to remove clones -- I'll try and point out example places such as this where the plans and/or expressions were being cloned

// WindowAggr: windowExpr=[[sum(c9) ORDER BY [c3 + c4] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
// --WindowAggr: windowExpr=[[sum(c9) ORDER BY [c3 + c4] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
// where, it is referred once by each `WindowAggr` (total of 2) in the plan.
let mut plan = LogicalPlan::Window(window.clone());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone removed

let Window {
input, window_expr, ..
} = window;
plan = input.as_ref().clone();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone removed

.zip(arrays_list.iter())
.map(|(exprs, arrays)| {
exprs
.iter()
.cloned()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another clone

.iter()
.zip(aggr_expr.iter())
.map(|(new_expr, old_expr)| {
new_expr.clone().alias_if_changed(old_expr.display_name()?)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another

@peter-toth peter-toth mentioned this pull request Jun 15, 2024
@alamb
Copy link
Contributor Author

alamb commented Jun 15, 2024

Update -- q1 gets significantly worse -- I'll try and profile it and figure out why

@alamb
Copy link
Contributor Author

alamb commented Jun 17, 2024

UPDATE: I ran the wrong code. I am now rerunning...

@alamb alamb marked this pull request as ready for review June 17, 2024 13:02
@alamb
Copy link
Contributor Author

alamb commented Jun 17, 2024

Ok, update here is that after some additional tracking of when plans are rewritten, this PR now improves planning performance by 2-3% so I think it is ready for review

@alamb alamb changed the title Stop copying LogicalPlan and Exprs in CommonSubexprEliminate Stop copying LogicalPlan and Exprs in CommonSubexprEliminate (2-3% planning speed improvement) Jun 17, 2024
@alamb
Copy link
Contributor Author

alamb commented Jun 18, 2024

I believe this PR has conflicts with #10939 which I am now resolving

Thank you @crepererum for your review

@alamb
Copy link
Contributor Author

alamb commented Jun 18, 2024

@peter-toth would you like to review this PR as well?

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pinging me @alamb, LGTM, only minor suggestions.

/// Decimal128(Some(69999999999999),30,15)
/// AND lineitem.l_quantity < Decimal128(Some(2400),15,2)
/// ```
pub fn remove_aliases(predicate: Expr) -> Result<Transformed<Expr>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it make sense to move this method into Expr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does -- I will do so as a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if !common_exprs.is_empty() {
new_input = build_common_expr_project_plan(new_input, common_exprs)?;
transformed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this because if !common_exprs.is_empty() is true then rewrite_exprs.transformed must also be true a bit above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good invariant -- I did not know that. Changed to an assert in in 8dd6256

Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump))
}
Expr::Alias(_) => Ok(Transformed::new(
expr.unalias(),
Copy link
Contributor

@peter-toth peter-toth Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know that expr is an Expr::Alias maybe we could just use Expr::Alias(alias) => *alias.expr instead of calling .unalias(), that matches the expr again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done in 550e445

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peter-toth

/// Decimal128(Some(69999999999999),30,15)
/// AND lineitem.l_quantity < Decimal128(Some(2400),15,2)
/// ```
pub fn remove_aliases(predicate: Expr) -> Result<Transformed<Expr>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does -- I will do so as a follow on PR


if !common_exprs.is_empty() {
new_input = build_common_expr_project_plan(new_input, common_exprs)?;
transformed = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good invariant -- I did not know that. Changed to an assert in in 8dd6256

Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump))
}
Expr::Alias(_) => Ok(Transformed::new(
expr.unalias(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done in 550e445

@alamb alamb merged commit c6b2efc into apache:main Jun 19, 2024
23 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jun 19, 2024

Thanks again everyone for all the reviews

@alamb alamb deleted the alamb/no-cse-copy branch June 19, 2024 14:13
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
…planning speed improvement) (apache#10835)

* Stop copying LogicalPlan and Exprs in `CommonSubexprEliminate`

* thread transformed

* Update unary to report transformed correctly

* Preserve through window transforms

* track aggregate

* Avoid re-computing Aggregate schema

* Update datafusion/optimizer/src/common_subexpr_eliminate.rs

* Avoid unecessary setting transform flat

* Cleanup unaliasing
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
…planning speed improvement) (apache#10835)

* Stop copying LogicalPlan and Exprs in `CommonSubexprEliminate`

* thread transformed

* Update unary to report transformed correctly

* Preserve through window transforms

* track aggregate

* Avoid re-computing Aggregate schema

* Update datafusion/optimizer/src/common_subexpr_eliminate.rs

* Avoid unecessary setting transform flat

* Cleanup unaliasing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
3 participants