Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API-break: Support SubqueryAlias and remove Alias in Projection #4333

Merged
merged 6 commits into from
Nov 27, 2022

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Nov 23, 2022

Which issue does this PR close?

close #3927
closes #2212
closes #4291

Rationale for this change

Remove alias in projection, and replace it by SubqueryAlias.

Some discussion in #4232.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Projection Struct change

@github-actions github-actions bot added core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql labels Nov 23, 2022
@jackwener
Copy link
Member Author

Github file diff rendering is weird. We should open Hide whitespace.
This is link with opening it. https://github.com/apache/arrow-datafusion/pull/4333/files?&w=1.

@jackwener
Copy link
Member Author

jackwener commented Nov 23, 2022

#4293 is part job of this PR. It don't include remove alias in projection and fix test

@jackwener
Copy link
Member Author

@jackwener jackwener changed the title API-break: Support SubqueryAlias and remove Projection-Alias API-break: Support SubqueryAlias and remove Alias in Projection Nov 23, 2022
@liukun4515
Copy link
Contributor

PTAL, @Dandandan @alamb @andygrove @liukun4515 @mingmwang

Thanks @jackwener I will review it later

@alamb alamb added the api change Changes the API exposed to users of the crate label Nov 23, 2022
@alamb
Copy link
Contributor

alamb commented Nov 23, 2022

I will review this but it may take a few days -- there are a bunch of other PRs in the queue before this one

@jackwener jackwener force-pushed the subquery_alias branch 2 times, most recently from 0ea7de4 to d18301b Compare November 24, 2022 14:57
@jackwener
Copy link
Member Author

jackwener commented Nov 25, 2022

Followup we need to remove alias in projection in proto, and ballista also may need to do some change and support for it? cc @andygrove

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jackwener -- I think this design is much clearer.

I reviewed the plan changes carefully and they looked reasonable to me

The only thing I am concerned about is the regression in supporting limit pushdown through subquery. Otherwise I think this PR could be merged.

For anyone else reviewing this PR, I found whitespace blind diff very helpful: https://github.com/apache/arrow-datafusion/pull/4333/files?w=1

@@ -474,7 +474,8 @@ mod tests {
let formatted = arrow::util::pretty::pretty_format_batches(&plan)
.unwrap()
.to_string();
assert!(formatted.contains("ParquetExec: limit=Some(10)"));
// TODO: limit_push_down support SubqueryAlias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should perhaps track this with a ticket -- it seems like it is a regression not to push limits into the subquery

Copy link
Member Author

@jackwener jackwener Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new issue to trace #4381.

Yes, I'll follow up on this soon, the reason I didn't do this in this PR is because I didn't want to mix too many features into one big PR (Not only pushdown limit, other rules also need to support it, I want to support them altogether, and add ut for it, it will be easy to review.).

datafusion/expr/src/logical_plan/builder.rs Show resolved Hide resolved
Comment on lines +338 to +339
" SubqueryAlias: d [a:Int64, b:Utf8]",
" SubqueryAlias: _data2 [a:Int64, b:Utf8]",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future ticket #4383

Comment on lines +1638 to +1640
" Left Join: t3.t1_int = t2.t2_int [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
" Filter: t3.t1_id < UInt32(100) [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
" SubqueryAlias: t3 [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
Copy link
Member Author

@jackwener jackwener Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter: t3.t1_id < UInt32(100) don't be pushdown.

#4381 push_down_filter need to support SubqueryAlias

wait for #4365

has been resolved in #4384

@jackwener
Copy link
Member Author

jackwener commented Nov 26, 2022

The only thing I am concerned about is the regression in supporting limit pushdown through subquery. Otherwise I think this PR could be merged.

Regression will be resolved in #4384, you could see it has been fixed inside this PR.

@alamb
Copy link
Contributor

alamb commented Nov 27, 2022

I merged this PR to the latest master branch locally to ensure it has no logical conflicts. Thanks again @jackwener

@alamb alamb merged commit ad3df7d into apache:master Nov 27, 2022
@ursabot
Copy link

ursabot commented Nov 27, 2022

Benchmark runs are scheduled for baseline = da54fa5 and contender = ad3df7d. ad3df7d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@jackwener jackwener deleted the subquery_alias branch November 28, 2022 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql
Projects
None yet
4 participants