New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40054][SQL] Restore the error handling syntax of try_cast() #37486
Conversation
Some of the code are from the removed part of #36703, which is still reliable. |
import org.apache.spark.SparkFunSuite | ||
import org.apache.spark.sql.catalyst.InternalRow | ||
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.UTC_OPT | ||
import org.apache.spark.sql.types._ | ||
import org.apache.spark.unsafe.types.UTF8String | ||
|
||
// A test suite to check analysis behaviors of `TryCast`. | ||
class TryCastSuite extends SparkFunSuite { | ||
class TryCastSuite extends CastWithAnsiOnSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This recovers the test coverage before #36703
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TryCastSuite.scala
Show resolved
Hide resolved
// TODO: Implement a more accurate method for checking whether a decimal value can be cast | ||
// as integral types without overflow. Currently, the cast can overflow even if | ||
// "Cast.canUpCast" method returns true. | ||
case (_: DecimalType, _: IntegralType) => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a bug in spark 3.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal(10, 0) is considered as "canUpCast" as Int. However, Int.MaxValue + 1
can fit in Decimal(10, 0) too..
It has been a long time and the strict table insertion policy relies on this.
I would consider fixing it in another PR. For this one I will leave it as it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not a bug in Spark 3.3. I am avoiding making the nullability of try_cast as always null here.
} | ||
} | ||
|
||
override def checkExceptionInExpression[T <: Throwable : ClassTag]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit weird, I see code like this
if (!isTryCast) {
checkExceptionInExpression[ArithmeticException](ret, "overflow")
} else {
checkEvaluation(ret, Row(null))
}
if we override checkExceptionInExpression
, why do we still need if-else
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one was "Row(null)" with schema struct<a:int>
. Different from the default answer null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird to have two different testing approaches for try_cast, but given that it was already the case before, I don't have a strong opinion.
@cloud-fan Thanks for the review. Merging to master. |
### What changes were proposed in this pull request? Similar to #37486 and #37663, this PR refactors the implementation of try_sum() so that the errors from its child should be shown instead of ignored. ### Why are the changes needed? Fix the semantics of try_sum() function ### Does this PR introduce _any_ user-facing change? Yes, after changes, the error from the child of try_sum function will be shown instead of ignored. ### How was this patch tested? New UT Closes #37764 from gengliangwang/trySum. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request? Similar to #37486, #37663 and #37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored. ### Why are the changes needed? Fix the semantics of try_avg() function ### Does this PR introduce _any_ user-facing change? Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored. ### How was this patch tested? New UT Closes #37775 from gengliangwang/tryAvg. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request? Similar to apache/spark#37486, apache/spark#37663 and apache/spark#37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored. ### Why are the changes needed? Fix the semantics of try_avg() function ### Does this PR introduce _any_ user-facing change? Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored. ### How was this patch tested? New UT Closes #37775 from gengliangwang/tryAvg. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request? Similar to apache/spark#37486, apache/spark#37663 and apache/spark#37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored. ### Why are the changes needed? Fix the semantics of try_avg() function ### Does this PR introduce _any_ user-facing change? Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored. ### How was this patch tested? New UT Closes #37775 from gengliangwang/tryAvg. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request? Similar to apache/spark#37486, apache/spark#37663 and apache/spark#37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored. ### Why are the changes needed? Fix the semantics of try_avg() function ### Does this PR introduce _any_ user-facing change? Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored. ### How was this patch tested? New UT Closes #37775 from gengliangwang/tryAvg. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
For the following query
Spark 3.3 will throw an exception for the division by zero error. In the current master branch, it returns null after the refactoring PR #36703
This PR is to restore the previous error handling syntax.
Why are the changes needed?
try_cast
.Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests