Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Aug 23, 2022

Which issue does this PR close?

Closes #3234

Rationale for this change

Optimizer rules re-write DistributeBy partitioning without any expressions without this fix.

What changes are included in this PR?

  • Handle Partitioning::DistributeBy
  • New integration test suite

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Aug 23, 2022
..
}) => match partitioning_scheme {
Partitioning::Hash(expr, _) => expr.clone(),
_ => vec![],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch-all is partly responsible for the bug creeping in, so I removed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bites me all the time in my code base. Good find! I ran this PR against the test I was seeing the issue in and it resolved my issue.

@codecov-commenter
Copy link

codecov-commenter commented Aug 23, 2022

Codecov Report

Merging #3229 (662da4f) into master (c72f547) will decrease coverage by 0.00%.
The diff coverage is 82.69%.

@@            Coverage Diff             @@
##           master    #3229      +/-   ##
==========================================
- Coverage   85.83%   85.83%   -0.01%     
==========================================
  Files         291      293       +2     
  Lines       52947    53162     +215     
==========================================
+ Hits        45447    45629     +182     
- Misses       7500     7533      +33     
Impacted Files Coverage Δ
datafusion/expr/src/logical_plan/plan.rs 78.73% <50.00%> (+0.72%) ⬆️
datafusion/optimizer/tests/integration-test.rs 84.00% <84.00%> (ø)
datafusion/core/tests/provider_filter_pushdown.rs 70.45% <0.00%> (-14.48%) ⬇️
datafusion/expr/src/window_frame.rs 92.43% <0.00%> (-0.85%) ⬇️
datafusion/core/tests/sql/timestamp.rs 99.65% <0.00%> (-0.16%) ⬇️
datafusion/core/src/execution/context.rs 78.21% <0.00%> (-0.03%) ⬇️
...fusion/optimizer/src/pre_cast_lit_in_comparison.rs 83.33% <0.00%> (ø)
datafusion/expr/src/expr_fn.rs 91.06% <0.00%> (+0.05%) ⬆️
datafusion/core/src/dataframe.rs 88.91% <0.00%> (+0.05%) ⬆️
datafusion/proto/src/from_proto.rs 35.40% <0.00%> (+0.12%) ⬆️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@andygrove andygrove changed the title MINOR: Add support for Partitioning::DistributeBy in expressions function Add support for Partitioning::DistributeBy in expressions function Aug 23, 2022
@andygrove andygrove marked this pull request as ready for review August 23, 2022 13:58
@github-actions github-actions bot added the optimizer Optimizer rules label Aug 23, 2022
@andygrove
Copy link
Member Author

@jdye64 This is ready for review

Copy link
Contributor

@jdye64 jdye64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes fixed my issue and really like the inclusion of the integration tests that run through the optimizer. I think in general that is a good thing to have. Thanks @andygrove !

..
}) => match partitioning_scheme {
Partitioning::Hash(expr, _) => expr.clone(),
_ => vec![],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bites me all the time in my code base. Good find! I ran this PR against the test I was seeing the issue in and it resolved my issue.

@andygrove andygrove changed the title Add support for Partitioning::DistributeBy in expressions function Fix bug where optimizer was removing Partitioning::DistributeBy expressions Aug 23, 2022
@andygrove andygrove added the bug Something isn't working label Aug 23, 2022
@andygrove andygrove merged commit e1d4069 into apache:master Aug 23, 2022
@andygrove andygrove deleted the distribute-by-fix branch August 23, 2022 15:02
@ursabot
Copy link

ursabot commented Aug 23, 2022

Benchmark runs are scheduled for baseline = 74872a3 and contender = e1d4069. e1d4069 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working logical-expr Logical plan and expressions optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DISTRIBUTE BY expressions get removed during optimization

4 participants