-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project #27517
Conversation
…out Project ### What changes were proposed in this pull request? This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it. ### Why are the changes needed? In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes apache#26978 from viirya/SPARK-29721. Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This is re-submitssion of SPARK-29721. I think original patch did not consider a corner case at beginning. The original patch was reverted. This patch adds the corner case. I will also go to add more tests here. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Outdated
Show resolved
Hide resolved
The fix seems good to me. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Outdated
Show resolved
Hide resolved
Test build #118117 has finished for PR 27517 at commit
|
retest this please |
Test build #118132 has finished for PR 27517 at commit
|
Thank you for working on this again, but this is now |
BTW, is this including @gatorsmile 's test case on the previous PR? |
@dongjoon-hyun Yes, this is for 3.1.0 now. The new test case was added here too. |
retest this please |
ec1ae43
to
0383592
Compare
Test build #118188 has finished for PR 27517 at commit
|
retest this please |
Test build #118180 has finished for PR 27517 at commit
|
Test build #118192 has finished for PR 27517 at commit
|
Got it. Thank you for updating. |
Test build #119110 has finished for PR 27517 at commit
|
@dongjoon-hyun @HyukjinKwon @cloud-fan More tests are added now. Please help take another look when you have time. Thanks! |
Test build #119378 has finished for PR 27517 at commit
|
retest this please |
Test build #119786 has finished for PR 27517 at commit
|
retest this please |
Test build #119789 has finished for PR 27517 at commit
|
@dongjoon-hyun @HyukjinKwon @cloud-fan Please take another look. Thanks! |
retest this please |
Test build #120152 has finished for PR 27517 at commit
|
retest this please |
Test build #120159 has finished for PR 27517 at commit
|
Retest this please. |
Test build #120471 has finished for PR 27517 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @viirya and @HyukjinKwon .
Merged to master for Apache Spark 3.1.0.
Thanks! |
…out Project ### What changes were proposed in this pull request? This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it. ### Why are the changes needed? In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes apache#27517 from viirya/SPARK-29721-2. Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it.
Why are the changes needed?
In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test.