Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37855][SQL] IllegalStateException when transforming an array inside a nested struct #35170

Closed
wants to merge 1 commit into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jan 11, 2022

What changes were proposed in this pull request?

Skip alias the ExtractValue whose children contains NamedLambdaVariable.

Why are the changes needed?

Since #32773, the NamedLambdaVariable can produce the references, however it cause the rule NestedColumnAliasing alias the ExtractValue which contains NamedLambdaVariable. It fails since we can not match a NamedLambdaVariable to an actual attribute.

Talk more:
During NamedLambdaVariable#replaceWithAliases, it uses the references of nestedField to match the output attributes of grandchildren. However NamedLambdaVariable is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of NamedLambdaVariable to match the grandchildren's output.

Does this PR introduce any user-facing change?

yes, bug fix

How was this patch tested?

Add new test

@github-actions github-actions bot added the SQL label Jan 11, 2022
@HyukjinKwon
Copy link
Member

cc @viirya FYI

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Could you also mention when it fails to match the attribute in the description? Thanks.

@ulysses-you
Copy link
Contributor Author

@viirya has updated the description, hope it is clear now

@viirya
Copy link
Member

viirya commented Jan 12, 2022

Thanks. As #32773 was also merged to 3.1, is this also an issue on branch-3.1 too? @ulysses-you

@ulysses-you
Copy link
Contributor Author

I think land this to branch-3.2 is enough, since the backport of branch-3.1 is revered.
see https://github.com/apache/spark/blob/branch-3.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala

@viirya
Copy link
Member

viirya commented Jan 12, 2022

Okay, thanks! Merging to master.

@viirya viirya closed this in 189b205 Jan 12, 2022
@viirya
Copy link
Member

viirya commented Jan 12, 2022

Oh, there is a conflict. @ulysses-you Can you submit a backport PR to branch-3.2? Thanks.

@ulysses-you
Copy link
Contributor Author

thank you @viirya created #35175

@ulysses-you ulysses-you deleted the SPARK-37855 branch January 12, 2022 05:30
viirya pushed a commit that referenced this pull request Jan 12, 2022
…ray inside a nested struct

This is a backport of #35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since #32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes #35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
dchvn pushed a commit to dchvn/spark that referenced this pull request Jan 19, 2022
…nside a nested struct

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35170 from ulysses-you/SPARK-37855.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit a58b8a8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants