-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38030][SQL][3.1] Canonicalization should not remove nullability of AttributeReference dataType #35444
Conversation
…y of AttributeReference dataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we need this at branch-3.2 first, @shardulm94 ?
If there is no issue at branch-3.2, please add a test coverage to branch-3.2 first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll block this PR until we clarify the status of branch-3.2.
@dongjoon-hyun 👍 I created a PR for 3.2 #35446 |
Thanks, @shardulm94 . |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
Show resolved
Hide resolved
@@ -1502,4 +1502,24 @@ class AdaptiveQueryExecSuite | |||
} | |||
} | |||
} | |||
|
|||
test("SPARK-38030: Query with cast containing non-nullable columns should succeed with AQE") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you provide this at branch-3.1
PR only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We encountered this issue in production when running Spark 3.1.1 (see the stacktrace in the JIRA for more details). However, this repro does not work on 3.2+. This is because the code which would previously trigger access on an unresolved object is now lazy and hence does not trigger the issue in 3.2+, although the root cause still existed in 3.2+. So I only added this test in 3.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put that explanation into the PR description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an explanation in the "How was this patch tested?" section.
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to branch-3.1.
…y of AttributeReference dataType This is a backport of #35332 to branch 3.1 ### What changes were proposed in this pull request? Canonicalization of AttributeReference should not remove nullability information of its dataType. ### Why are the changes needed? SPARK-38030 lists an issue where canonicalization of cast resulted in an unresolved expression, thus causing query failure. The issue was that the child AttributeReference's dataType was converted to nullable during canonicalization and hence the Cast's `checkInputDataTypes` fails. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test to ensure that canonicalization preserves nullability of AttributeReference and does not result in an unresolved cast. Also added a test to ensure that the issue observed in SPARK-38030 (interaction of this bug with AQE) is fixed. This test/repro only works on 3.1 because the code which triggers access on an unresolved object is [lazy](https://github.com/apache/spark/blob/7e5c3b216431b6a5e9a0786bf7cded694228cdee/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L132) in 3.2+ and hence does not trigger the issue in 3.2+. Closes #35444 from shardulm94/SPARK-38030-3.1. Authored-by: Shardul Mahadik <smahadik@linkedin.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This is a backport of #35332 to branch 3.1
What changes were proposed in this pull request?
Canonicalization of AttributeReference should not remove nullability information of its dataType.
Why are the changes needed?
SPARK-38030 lists an issue where canonicalization of cast resulted in an unresolved expression, thus causing query failure. The issue was that the child AttributeReference's dataType was converted to nullable during canonicalization and hence the Cast's
checkInputDataTypes
fails.Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit test to ensure that canonicalization preserves nullability of AttributeReference and does not result in an unresolved cast. Also added a test to ensure that the issue observed in SPARK-38030 (interaction of this bug with AQE) is fixed. This test/repro only works on 3.1 because the code which triggers access on an unresolved object is lazy in 3.2+ and hence does not trigger the issue in 3.2+.