Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40054][SQL] Restore the error handling syntax of try_cast() #37486

Closed
wants to merge 8 commits into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

For the following query

SET spark.sql.ansi.enabled=true;
SELECT try_cast(1/0 AS string); 

Spark 3.3 will throw an exception for the division by zero error. In the current master branch, it returns null after the refactoring PR #36703

This PR is to restore the previous error handling syntax.

Why are the changes needed?

  1. Restore the behavior of try_cast()
  2. The previous syntax is more reasonable. Users can cleanly catch the exception from the child of try_cast.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

@gengliangwang
Copy link
Member Author

Some of the code are from the removed part of #36703, which is still reliable.

import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.UTC_OPT
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String

// A test suite to check analysis behaviors of `TryCast`.
class TryCastSuite extends SparkFunSuite {
class TryCastSuite extends CastWithAnsiOnSuite {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recovers the test coverage before #36703

// TODO: Implement a more accurate method for checking whether a decimal value can be cast
// as integral types without overflow. Currently, the cast can overflow even if
// "Cast.canUpCast" method returns true.
case (_: DecimalType, _: IntegralType) => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bug in spark 3.3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal(10, 0) is considered as "canUpCast" as Int. However, Int.MaxValue + 1 can fit in Decimal(10, 0) too..
It has been a long time and the strict table insertion policy relies on this.
I would consider fixing it in another PR. For this one I will leave it as it is now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not a bug in Spark 3.3. I am avoiding making the nullability of try_cast as always null here.

}
}

override def checkExceptionInExpression[T <: Throwable : ClassTag](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird, I see code like this

    if (!isTryCast) {
      checkExceptionInExpression[ArithmeticException](ret, "overflow")
    } else {
      checkEvaluation(ret, Row(null))
    }

if we override checkExceptionInExpression, why do we still need if-else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one was "Row(null)" with schema struct<a:int>. Different from the default answer null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to have two different testing approaches for try_cast, but given that it was already the case before, I don't have a strong opinion.

@gengliangwang
Copy link
Member Author

@cloud-fan Thanks for the review. Merging to master.

gengliangwang added a commit that referenced this pull request Sep 2, 2022
### What changes were proposed in this pull request?

Similar to #37486 and #37663,  this PR refactors the implementation of try_sum() so that the errors from its child should be shown instead of ignored.

### Why are the changes needed?

Fix the semantics of try_sum() function
### Does this PR introduce _any_ user-facing change?

Yes, after changes, the error from the child of try_sum function will be shown instead of ignored.

### How was this patch tested?

New UT

Closes #37764 from gengliangwang/trySum.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
gengliangwang added a commit that referenced this pull request Sep 6, 2022
### What changes were proposed in this pull request?

Similar to #37486, #37663 and #37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored.

### Why are the changes needed?

Fix the semantics of try_avg() function
### Does this PR introduce _any_ user-facing change?

Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored.

### How was this patch tested?

New UT

Closes #37775 from gengliangwang/tryAvg.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
a0x8o added a commit to a0x8o/spark that referenced this pull request Sep 6, 2022
### What changes were proposed in this pull request?

Similar to apache/spark#37486, apache/spark#37663 and apache/spark#37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored.

### Why are the changes needed?

Fix the semantics of try_avg() function
### Does this PR introduce _any_ user-facing change?

Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored.

### How was this patch tested?

New UT

Closes #37775 from gengliangwang/tryAvg.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
a0x8o added a commit to a0x8o/spark that referenced this pull request Dec 30, 2022
### What changes were proposed in this pull request?

Similar to apache/spark#37486, apache/spark#37663 and apache/spark#37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored.

### Why are the changes needed?

Fix the semantics of try_avg() function
### Does this PR introduce _any_ user-facing change?

Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored.

### How was this patch tested?

New UT

Closes #37775 from gengliangwang/tryAvg.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
a0x8o added a commit to a0x8o/spark that referenced this pull request Dec 30, 2022
### What changes were proposed in this pull request?

Similar to apache/spark#37486, apache/spark#37663 and apache/spark#37764, this PR refactors the implementation of try_avg() so that the errors from its child should be shown instead of ignored.

### Why are the changes needed?

Fix the semantics of try_avg() function
### Does this PR introduce _any_ user-facing change?

Yes, after changes, the error from the child of the try_avg function will be shown instead of ignored.

### How was this patch tested?

New UT

Closes #37775 from gengliangwang/tryAvg.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants