Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38030][SQL][3.2] Canonicalization should not remove nullability of AttributeReference dataType #35446

Closed
wants to merge 2 commits into from

Conversation

shardulm94
Copy link
Contributor

This is a backport of #35332 to branch 3.2

What changes were proposed in this pull request?

Canonicalization of AttributeReference should not remove nullability information of its dataType.

Why are the changes needed?

SPARK-38030 lists an issue where canonicalization of cast resulted in an unresolved expression, thus causing query failure. The issue was that the child AttributeReference's dataType was converted to nullable during canonicalization and hence the Cast's checkInputDataTypes fails. Although the exact repro listed in SPARK-38030 no longer works in branch-3.2 due to an unrelated change (details in the JIRA), some other codepaths which depend on canonicalized representations can trigger the same issue.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit test to ensure that canonicalization preserves nullability of AttributeReference and does not result in an unresolved cast

@dongjoon-hyun
Copy link
Member

cc @cloud-fan and @viirya

@@ -42,7 +42,7 @@ object Canonicalize {
/** Remove names and nullability from types, and names from `GetStructField`. */
private[expressions] def ignoreNamesTypes(e: Expression): Expression = e match {
case a: AttributeReference =>
AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
AttributeReference("none", a.dataType)(exprId = a.exprId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I realized that we should fix line 42 too. Could you fix it master branch first as a follow-up of SPARK-38030, @shardulm94 ?

- Remove names and nullability from types
+ Remove names from types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The canonicalization code has change substantiantially between 3.2 and master due to #34883. This comment no longer exists in the master branch so it should not be an issue. I will fix it in 3.2 and 3.1 PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for checking, @shardulm94 .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CI)

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

dongjoon-hyun pushed a commit that referenced this pull request Feb 9, 2022
…y of AttributeReference dataType

This is a backport of #35332 to branch 3.2

### What changes were proposed in this pull request?
Canonicalization of AttributeReference should not remove nullability information of its dataType.

### Why are the changes needed?
SPARK-38030 lists an issue where canonicalization of cast resulted in an unresolved expression, thus causing query failure. The issue was that the child AttributeReference's dataType was converted to nullable during canonicalization and hence the Cast's `checkInputDataTypes` fails. Although the exact repro listed in SPARK-38030 no longer works in branch-3.2 due to an unrelated change (details in the JIRA), some other codepaths which depend on canonicalized representations can trigger the same issue.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit test to ensure that canonicalization preserves nullability of AttributeReference and does not result in an unresolved cast

Closes #35446 from shardulm94/SPARK-38030-3.2.

Authored-by: Shardul Mahadik <smahadik@linkedin.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to branch-3.2.

kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…y of AttributeReference dataType

This is a backport of apache#35332 to branch 3.2

### What changes were proposed in this pull request?
Canonicalization of AttributeReference should not remove nullability information of its dataType.

### Why are the changes needed?
SPARK-38030 lists an issue where canonicalization of cast resulted in an unresolved expression, thus causing query failure. The issue was that the child AttributeReference's dataType was converted to nullable during canonicalization and hence the Cast's `checkInputDataTypes` fails. Although the exact repro listed in SPARK-38030 no longer works in branch-3.2 due to an unrelated change (details in the JIRA), some other codepaths which depend on canonicalized representations can trigger the same issue.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit test to ensure that canonicalization preserves nullability of AttributeReference and does not result in an unresolved cast

Closes apache#35446 from shardulm94/SPARK-38030-3.2.

Authored-by: Shardul Mahadik <smahadik@linkedin.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d62735d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants