-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-29805][SQL] Enable nested schema pruning and nested pruning on expressions by default #26443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_ENABLED NESTED_PRUNING_ON_EXPRESSIONS by default
|
In title it should be [SQL] instead of [Core]. |
|
Test build #113483 has finished for PR 26443 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM for 3.0.0.
Do you have any concern on this, @gatorsmile and @cloud-fan ?
| "executing unnecessary nested expressions.") | ||
| .booleanConf | ||
| .createWithDefault(false) | ||
| .createWithDefault(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, these two spark.sql.optimizer.serializer.nestedSchemaPruning.enabled and spark.sql.optimizer.expression.nestedPruning.enabled were added as of Spark 3.0 (SPARK-26837 and SPARK-27707). I thought it's rather usual to have one minor release term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is added as part of Spark 3.0, but on really early stage of Spark 3.0 development. We internally cherry picked them into 2.4.x in our production Spark distributions, and those help a lot in many nested column use-cases.
|
Thanks all for reviewing. Merged into master. |
|
I hope we can enable them in the preview release of Spark 3.0. The community can help us verify the quality. @jiangxb1987 Let us have one more preview release next month? |
|
Sounds good! |
|
Can we give the opportunity to another committer? That would be helpful for the Apache community growth, @gatorsmile and @jiangxb1987 . |
|
cc @wangyum , @viirya , @gengliangwang , @rdblue , @zhengruifeng |
|
Also, cc @HyukjinKwon and @holdenk if you are interested~ |
|
I assume Xingbo already has an environment for preview release. He can do it very quickly. |
|
@gatorsmile . That's not a good reason~ :) |
|
I do not care who do the release manager for preview. I only care whether it will delay the release of 3.0. I expect we will have one or two new Spark 3.0 preview releases. |
|
And, it's a good chance for the committers to involve the Apache Spark community more. |
|
We have only a few releases in one year, and the increment of Apache Spark committers is bigger than that. |
|
@dongjoon-hyun Please do not misunderstand my point. It took @jiangxb1987 more than two weeks for releasing Spark 3.0 preview. As long as the other committers can finish it very quickly, I am totally fine to do it. This is just like a new RC for Spark 3.0 preview. We need to release Spark 3.0 preview ASAP and make the community to try it and verify the fix. The quality of 3.0 release is our top priority. Hopefully, you agree on it. Doing the release manager is the labor work. Even if we have a new release manager for each RC, it will not grow the community I think. |
|
Is there a committer who is interested in learning the release process? If so I think a preview release is a great lower stakes than usual opportunity to have someone skill up. |
What changes were proposed in this pull request?
Enable nested schema pruning and nested pruning on expressions by default. We have been using those features in production in Apple for couple months with great success. For some jobs, we reduce the data reading by more than 8x and 21x faster in wall clock time.
Why are the changes needed?
Better performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.