Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project #27517

Closed
wants to merge 5 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Feb 10, 2020

What changes were proposed in this pull request?

This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it.

Why are the changes needed?

In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test.

viirya and others added 2 commits February 9, 2020 21:20
…out Project

### What changes were proposed in this pull request?

This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it.

### Why are the changes needed?

In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit test.

Closes apache#26978 from viirya/SPARK-29721.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@viirya
Copy link
Member Author

viirya commented Feb 10, 2020

This is re-submitssion of SPARK-29721. I think original patch did not consider a corner case at beginning. The original patch was reverted. This patch adds the corner case.

I will also go to add more tests here.

cc @dongjoon-hyun

@HyukjinKwon
Copy link
Member

The fix seems good to me.

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118117 has finished for PR 27517 at commit f622d53.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118132 has finished for PR 27517 at commit f622d53.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 10, 2020

Thank you for working on this again, but this is now 3.1.0. Please don't land this to branch-3.0.

@dongjoon-hyun
Copy link
Member

BTW, is this including @gatorsmile 's test case on the previous PR?

@viirya
Copy link
Member Author

viirya commented Feb 10, 2020

@dongjoon-hyun Yes, this is for 3.1.0 now. The new test case was added here too.

@viirya
Copy link
Member Author

viirya commented Feb 10, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118188 has finished for PR 27517 at commit 0383592.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Feb 11, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118180 has finished for PR 27517 at commit f622d53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118192 has finished for PR 27517 at commit 0383592.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Got it. Thank you for updating.

@SparkQA
Copy link

SparkQA commented Feb 29, 2020

Test build #119110 has finished for PR 27517 at commit 3928eb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [WIP][SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project Mar 5, 2020
@viirya
Copy link
Member Author

viirya commented Mar 5, 2020

@dongjoon-hyun @HyukjinKwon @cloud-fan More tests are added now. Please help take another look when you have time. Thanks!

@SparkQA
Copy link

SparkQA commented Mar 5, 2020

Test build #119378 has finished for PR 27517 at commit 289355f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 14, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2020

Test build #119786 has finished for PR 27517 at commit 289355f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Mar 14, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2020

Test build #119789 has finished for PR 27517 at commit 289355f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 16, 2020

@dongjoon-hyun @HyukjinKwon @cloud-fan Please take another look. Thanks!

@viirya
Copy link
Member Author

viirya commented Mar 22, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Mar 22, 2020

Test build #120152 has finished for PR 27517 at commit 289355f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 22, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Mar 22, 2020

Test build #120159 has finished for PR 27517 at commit 289355f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Mar 27, 2020

Test build #120471 has finished for PR 27517 at commit 289355f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @viirya and @HyukjinKwon .
Merged to master for Apache Spark 3.1.0.

cc @dbtsai and @holdenk

@viirya
Copy link
Member Author

viirya commented Mar 27, 2020

Thanks!

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…out Project

### What changes were proposed in this pull request?

This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it.

### Why are the changes needed?

In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit test.

Closes apache#27517 from viirya/SPARK-29721-2.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@viirya viirya deleted the SPARK-29721-2 branch December 27, 2023 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants