Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38285][SQL] Avoid generator pruning for invalid extractor #35749

Closed
wants to merge 2 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Mar 7, 2022

What changes were proposed in this pull request?

This fixes a bug in generator nested column pruning. The bug happens when the extractor pattern is like GetArrayStructFields(GetStructField(...), ...) on the generator output. Once the input to the generator is an array, after replacing with the extractor based on pruning logic, it becomes an extractor of GetArrayStructFields(GetArrayStructFields(...), ...) which is not valid.

Why are the changes needed?

To fix a bug in generator nested column pruning.

Does this PR introduce any user-facing change?

Yes, fixing a user-facing bug.

How was this patch tested?

Added unit test.

@github-actions github-actions bot added the SQL label Mar 7, 2022
withTempView("v1") {
val sqlText =
"""
|create or replace temp view v1 as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Shall we capitalize the SQL keywords? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea

|""".stripMargin
sql(sqlText)

val df = sql("select eo.b.e from (select explode(o) as eo from v1)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -372,6 +372,13 @@ object GeneratorNestedColumnAliasing {
e.withNewChildren(Seq(extractor))
}

val invalidExtractor = rewrittenG.generator.children.head.collect {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments for this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will add some.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-38285][SQL] Avoid generator runing for invalid extractor [SPARK-38285][SQL] Avoid generator pruning for invalid extractor Mar 7, 2022
@dongjoon-hyun
Copy link
Member

+1, LGTM. Thank you, @viirya !

@viirya
Copy link
Member Author

viirya commented Mar 7, 2022

Thank you @dongjoon-hyun !

@viirya
Copy link
Member Author

viirya commented Mar 7, 2022

Merging to master/3.2.

@viirya viirya closed this in 71991f7 Mar 7, 2022
viirya added a commit that referenced this pull request Mar 7, 2022
### What changes were proposed in this pull request?

This fixes a bug in generator nested column pruning. The bug happens when the extractor pattern is like `GetArrayStructFields(GetStructField(...), ...)` on the generator output. Once the input to the generator is an array, after replacing with the extractor based on pruning logic, it becomes an extractor of `GetArrayStructFields(GetArrayStructFields(...), ...)` which is not valid.

### Why are the changes needed?

To fix a bug in generator nested column pruning.

### Does this PR introduce _any_ user-facing change?

Yes, fixing a user-facing bug.

### How was this patch tested?

Added unit test.

Closes #35749 from viirya/SPARK-38285.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 71991f7)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
@viirya viirya deleted the SPARK-38285 branch March 7, 2022 20:05
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
### What changes were proposed in this pull request?

This fixes a bug in generator nested column pruning. The bug happens when the extractor pattern is like `GetArrayStructFields(GetStructField(...), ...)` on the generator output. Once the input to the generator is an array, after replacing with the extractor based on pruning logic, it becomes an extractor of `GetArrayStructFields(GetArrayStructFields(...), ...)` which is not valid.

### Why are the changes needed?

To fix a bug in generator nested column pruning.

### Does this PR introduce _any_ user-facing change?

Yes, fixing a user-facing bug.

### How was this patch tested?

Added unit test.

Closes apache#35749 from viirya/SPARK-38285.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 71991f7)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 8cee32d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants