New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37855][SQL] IllegalStateException when transforming an array inside a nested struct #35170
Conversation
cc @viirya FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable. Could you also mention when it fails to match the attribute in the description? Thanks.
@viirya has updated the description, hope it is clear now |
Thanks. As #32773 was also merged to 3.1, is this also an issue on branch-3.1 too? @ulysses-you |
I think land this to branch-3.2 is enough, since the backport of branch-3.1 is revered. |
Okay, thanks! Merging to master. |
Oh, there is a conflict. @ulysses-you Can you submit a backport PR to branch-3.2? Thanks. |
…ray inside a nested struct This is a backport of #35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since #32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes #35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…nside a nested struct ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35170 from ulysses-you/SPARK-37855. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…ray inside a nested struct This is a backport of apache#35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…ray inside a nested struct This is a backport of apache#35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…ray inside a nested struct This is a backport of apache#35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit a58b8a8) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Skip alias the
ExtractValue
whose children containsNamedLambdaVariable
.Why are the changes needed?
Since #32773, the
NamedLambdaVariable
can produce the references, however it cause the ruleNestedColumnAliasing
alias theExtractValue
which containsNamedLambdaVariable
. It fails since we can not match aNamedLambdaVariable
to an actual attribute.Talk more:
During
NamedLambdaVariable#replaceWithAliases
, it uses the references of nestedField to match the output attributes of grandchildren. HoweverNamedLambdaVariable
is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references ofNamedLambdaVariable
to match the grandchildren's output.Does this PR introduce any user-facing change?
yes, bug fix
How was this patch tested?
Add new test