[SPARK-29805][SQL] Enable nested schema pruning and nested pruning on expressions by default #26443

dbtsai · 2019-11-09T01:00:12Z

What changes were proposed in this pull request?

Enable nested schema pruning and nested pruning on expressions by default. We have been using those features in production in Apple for couple months with great success. For some jobs, we reduce the data reading by more than 8x and 21x faster in wall clock time.

Why are the changes needed?

Better performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

…_ENABLED NESTED_PRUNING_ON_EXPRESSIONS by default

viirya · 2019-11-09T01:12:14Z

In title it should be [SQL] instead of [Core].

SparkQA · 2019-11-09T04:41:42Z

Test build #113483 has finished for PR 26443 at commit 8d4e7d1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM for 3.0.0.

Do you have any concern on this, @gatorsmile and @cloud-fan ?

HyukjinKwon · 2019-11-11T01:47:41Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

        "executing unnecessary nested expressions.")
      .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)


Hm, these two spark.sql.optimizer.serializer.nestedSchemaPruning.enabled and spark.sql.optimizer.expression.nestedPruning.enabled were added as of Spark 3.0 (SPARK-26837 and SPARK-27707). I thought it's rather usual to have one minor release term.

Yes, this is added as part of Spark 3.0, but on really early stage of Spark 3.0 development. We internally cherry picked them into 2.4.x in our production Spark distributions, and those help a lot in many nested column use-cases.

dbtsai · 2019-11-11T19:08:44Z

Thanks all for reviewing. Merged into master.

gatorsmile · 2019-11-12T18:48:48Z

I hope we can enable them in the preview release of Spark 3.0. The community can help us verify the quality.

@jiangxb1987 Let us have one more preview release next month?

jiangxb1987 · 2019-11-12T19:28:29Z

Sounds good!

dongjoon-hyun · 2019-11-12T21:41:59Z

Can we give the opportunity to another committer? That would be helpful for the Apache community growth, @gatorsmile and @jiangxb1987 .

dongjoon-hyun · 2019-11-12T21:53:16Z

cc @wangyum , @viirya , @gengliangwang , @rdblue , @zhengruifeng

dongjoon-hyun · 2019-11-12T21:54:02Z

Also, cc @HyukjinKwon and @holdenk if you are interested~

gatorsmile · 2019-11-12T22:00:24Z

I assume Xingbo already has an environment for preview release. He can do it very quickly.

dongjoon-hyun · 2019-11-12T22:07:03Z

@gatorsmile . That's not a good reason~ :)
Actually, during two releases, I also built the environment and have it, too.

gatorsmile · 2019-11-12T22:08:17Z

I do not care who do the release manager for preview. I only care whether it will delay the release of 3.0. I expect we will have one or two new Spark 3.0 preview releases.

dongjoon-hyun · 2019-11-12T22:13:51Z

And, it's a good chance for the committers to involve the Apache Spark community more.
A PMC member should try to give more opportunities to the Apache Spark committer to grow as a PMC. That's the reason why we waited @jiangxb1987 to do that. And, both of us know that @jiangxb1987 also learned during this process.

dongjoon-hyun · 2019-11-12T22:15:13Z

We have only a few releases in one year, and the increment of Apache Spark committers is bigger than that.

gatorsmile · 2019-11-12T22:18:43Z

@dongjoon-hyun Please do not misunderstand my point. It took @jiangxb1987 more than two weeks for releasing Spark 3.0 preview. As long as the other committers can finish it very quickly, I am totally fine to do it. This is just like a new RC for Spark 3.0 preview.

We need to release Spark 3.0 preview ASAP and make the community to try it and verify the fix. The quality of 3.0 release is our top priority. Hopefully, you agree on it. Doing the release manager is the labor work. Even if we have a new release manager for each RC, it will not grow the community I think.

holdenk · 2019-11-13T01:54:09Z

Is there a committer who is interested in learning the release process? If so I think a preview release is a great lower stakes than usual opportunity to have someone skill up.

enable NESTED_SCHEMA_PRUNING_ENABLED SERIALIZER_NESTED_SCHEMA_PRUNING…

8d4e7d1

…_ENABLED NESTED_PRUNING_ON_EXPRESSIONS by default

dbtsai requested review from cloud-fan, dongjoon-hyun, gatorsmile and viirya November 9, 2019 01:00

viirya approved these changes Nov 9, 2019

View reviewed changes

dongjoon-hyun added the SPARK CORE label Nov 9, 2019

dongjoon-hyun changed the title ~~[SPARK-29805] [Core] Enable nested schema pruning and nested pruning on expressions by default~~ [SPARK-29805][Core] Enable nested schema pruning and nested pruning on expressions by default Nov 9, 2019

dongjoon-hyun approved these changes Nov 9, 2019

View reviewed changes

dbtsai changed the title ~~[SPARK-29805][Core] Enable nested schema pruning and nested pruning on expressions by default~~ [SPARK-29805][SQL] Enable nested schema pruning and nested pruning on expressions by default Nov 9, 2019

HyukjinKwon reviewed Nov 11, 2019

View reviewed changes

HyukjinKwon added SQL and removed SPARK CORE labels Nov 11, 2019

dbtsai closed this in a6a2748 Nov 11, 2019

dbtsai deleted the enableNestedSchemaPrunning branch November 11, 2019 23:05

[SPARK-29805][SQL] Enable nested schema pruning and nested pruning on expressions by default #26443

[SPARK-29805][SQL] Enable nested schema pruning and nested pruning on expressions by default #26443

Uh oh!

Conversation

dbtsai commented Nov 9, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

viirya commented Nov 9, 2019

Uh oh!

SparkQA commented Nov 9, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

dbtsai Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

dbtsai commented Nov 11, 2019

Uh oh!

gatorsmile commented Nov 12, 2019

Uh oh!

jiangxb1987 commented Nov 12, 2019

Uh oh!

dongjoon-hyun commented Nov 12, 2019

Uh oh!

dongjoon-hyun commented Nov 12, 2019

Uh oh!

dongjoon-hyun commented Nov 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Nov 12, 2019

Uh oh!

dongjoon-hyun commented Nov 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Nov 12, 2019

Uh oh!

dongjoon-hyun commented Nov 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 12, 2019

Uh oh!

gatorsmile commented Nov 12, 2019

Uh oh!

holdenk commented Nov 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dongjoon-hyun commented Nov 12, 2019 •

edited

Loading

dongjoon-hyun commented Nov 12, 2019 •

edited

Loading

dongjoon-hyun commented Nov 12, 2019 •

edited

Loading