simplify the `between` expr during logical plan optimization #3404

kmitchener · 2022-09-08T20:38:59Z

Which issue does this PR close?

Closes #3402 .

Rationale for this change

Allows the plan to be further optimized and simplified, resulting in better plans.

What changes are included in this PR?

Are there any user-facing changes?

…shed down

datafusion/optimizer/src/simplify_expressions.rs

Dandandan · 2022-09-08T21:13:31Z

datafusion/core/tests/sql/predicates.rs

        "        CrossJoin: [l_partkey:Int64, l_quantity:Float64, p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
        "          TableScan: lineitem projection=[l_partkey, l_quantity] [l_partkey:Int64, l_quantity:Float64]",
-        "          TableScan: part projection=[p_partkey, p_brand, p_size] [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
+        "          Filter: #part.p_size >= Int32(1) [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",


This is a great optimization. Next goal is to get it to see that part.p_size <= 15 as well :D

Oof, that would be excellent. For another PR if someone doesn't beat me to it :)

Or maybe sooner .. looks like the order of the projections changes between runs, which is causing the test failure here.

codecov-commenter · 2022-09-08T21:45:53Z

Codecov Report

Merging #3404 (82bdb61) into master (e6378f4) will decrease coverage by 0.09%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master    #3404      +/-   ##
==========================================
- Coverage   85.58%   85.49%   -0.10%     
==========================================
  Files         296      296              
  Lines       54252    54328      +76     
==========================================
+ Hits        46432    46446      +14     
- Misses       7820     7882      +62

Impacted Files	Coverage Δ
datafusion/core/tests/sql/predicates.rs	`100.00% <ø> (ø)`
datafusion/optimizer/src/simplify_expressions.rs	`82.83% <82.85%> (-0.24%)`	⬇️
datafusion/core/tests/sql/select.rs	`99.77% <100.00%> (ø)`
benchmarks/src/bin/tpch.rs	`37.59% <0.00%> (-3.56%)`	⬇️
datafusion/physical-expr/src/planner.rs	`93.54% <0.00%> (-0.65%)`	⬇️
datafusion/proto/src/to_proto.rs	`48.25% <0.00%> (-0.64%)`	⬇️
datafusion/core/src/physical_plan/planner.rs	`76.87% <0.00%> (-0.58%)`	⬇️
datafusion/proto/src/logical_plan.rs	`17.46% <0.00%> (-0.24%)`	⬇️
datafusion/expr/src/logical_plan/plan.rs	`77.02% <0.00%> (-0.17%)`	⬇️
... and 8 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kmitchener · 2022-09-08T21:56:05Z

Putting in draft until I can figure out this bug that's triggered by this change. :/

Dandandan · 2022-09-09T05:13:58Z

Putting in draft until I can figure out this bug that's triggered by this change. :/

Commonly, a hash table or hash set generates these kind of random results. Maybe an optimization rule uses this, where this could be fixed (I e. producing in order of appearance or sorting the results).
You could explain verbose to see which rule this is?

kmitchener · 2022-09-09T11:59:19Z

@Dandandan good advice, thank you :)

Dandandan

Looks great!

xudong963

LGTM, thanks @kmitchener . I believe the optimization can bring more potential benefits 👍

ursabot · 2022-09-09T12:31:27Z

Benchmark runs are scheduled for baseline = eaf1d46 and contender = 73447b5. 73447b5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

kmitchener added 2 commits September 8, 2022 12:45

rewrite between expression so that it can be further optimized and pu…

b92bb57

…shed down

update tests

1f24afb

github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Sep 8, 2022

Dandandan reviewed Sep 8, 2022

View reviewed changes

datafusion/optimizer/src/simplify_expressions.rs Outdated Show resolved Hide resolved

update for comment and test

82bdb61

Dandandan reviewed Sep 8, 2022

View reviewed changes

Dandandan approved these changes Sep 8, 2022

View reviewed changes

kmitchener marked this pull request as draft September 8, 2022 21:56

fix common_subexpr_eliminate to retain predictable ordering between runs

180b7cc

kmitchener marked this pull request as ready for review September 9, 2022 11:59

Dandandan approved these changes Sep 9, 2022

View reviewed changes

xudong963 approved these changes Sep 9, 2022

View reviewed changes

xudong963 merged commit 73447b5 into apache:master Sep 9, 2022

kmitchener deleted the simplify-between-expr branch September 9, 2022 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

simplify the `between` expr during logical plan optimization #3404

simplify the `between` expr during logical plan optimization #3404

Uh oh!

kmitchener commented Sep 8, 2022

Uh oh!

Uh oh!

Dandandan Sep 8, 2022

Uh oh!

kmitchener Sep 8, 2022

Uh oh!

kmitchener Sep 8, 2022

Uh oh!

codecov-commenter commented Sep 8, 2022

Uh oh!

kmitchener commented Sep 8, 2022

Uh oh!

Dandandan commented Sep 9, 2022

Uh oh!

kmitchener commented Sep 9, 2022

Uh oh!

Dandandan left a comment

Uh oh!

xudong963 left a comment

Uh oh!

ursabot commented Sep 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

simplify the between expr during logical plan optimization #3404

simplify the between expr during logical plan optimization #3404

Uh oh!

Conversation

kmitchener commented Sep 8, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Uh oh!

Dandandan Sep 8, 2022

Choose a reason for hiding this comment

Uh oh!

kmitchener Sep 8, 2022

Choose a reason for hiding this comment

Uh oh!

kmitchener Sep 8, 2022

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Sep 8, 2022

Codecov Report

Uh oh!

kmitchener commented Sep 8, 2022

Uh oh!

Dandandan commented Sep 9, 2022

Uh oh!

kmitchener commented Sep 9, 2022

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

ursabot commented Sep 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

simplify the `between` expr during logical plan optimization #3404

simplify the `between` expr during logical plan optimization #3404