Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37855][SQL][3.2] IllegalStateException when transforming an array inside a nested struct #35175

Closed

Conversation

ulysses-you
Copy link
Contributor

This is a backport of #35170 for branch-3.2.

What changes were proposed in this pull request?

Skip alias the ExtractValue whose children contains NamedLambdaVariable.

Why are the changes needed?

Since #32773, the NamedLambdaVariable can produce the references, however it cause the rule NestedColumnAliasing alias the ExtractValue which contains NamedLambdaVariable. It fails since we can not match a NamedLambdaVariable to an actual attribute.

Talk more:
During NamedLambdaVariable#replaceWithAliases, it uses the references of nestedField to match the output attributes of grandchildren. However NamedLambdaVariable is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of NamedLambdaVariable to match the grandchildren's output.

Does this PR introduce any user-facing change?

yes, bug fix

How was this patch tested?

Add new test

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending CI.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-37855][SQL][branch-3.2] IllegalStateException when transforming an array inside a nested struct [SPARK-37855][SQL][3.2] IllegalStateException when transforming an array inside a nested struct Jan 12, 2022
@dongjoon-hyun
Copy link
Member

cc @huaxingao

@huaxingao
Copy link
Contributor

This is a regression. Do we need to fix this in 3.2.1?

@viirya
Copy link
Member

viirya commented Jan 12, 2022

@ulysses-you Can you re-trigger GA?

@ulysses-you
Copy link
Contributor Author

Can you re-trigger GA?

@viirya re-triggered and it's ok now.

This is a regression. Do we need to fix this in 3.2.1?

@huaxingao yes, I think so. Since It's a regression of 3.2.0

@viirya
Copy link
Member

viirya commented Jan 12, 2022

Thanks. Merging to branch-3.2.

viirya pushed a commit that referenced this pull request Jan 12, 2022
…ray inside a nested struct

This is a backport of #35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since #32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes #35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
@viirya viirya closed this Jan 12, 2022
@ulysses-you ulysses-you deleted the SPARK-37855-branch-3.2 branch January 13, 2022 01:14
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit a58b8a8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants