[SPARK-42911][PYTHON] Introduce more basic exceptions#40538
Conversation
|
cc @itholic |
itholic
left a comment
There was a problem hiding this comment.
LGTM, left just nit question
| class PythonException(SparkConnectGrpcException, BasePythonException): | ||
| """ | ||
| Exception thrown because of Spark upgrade from Spark Connect | ||
| Exceptions thrown from Spark Connect server. |
There was a problem hiding this comment.
qq: Is Spark Connect server and Spark Connect different??
Only PythonException says it's thrown from Spark Connect "server".
There was a problem hiding this comment.
The comment is from the previous. We can change it to Spark Connect while we are here.
|
btw, the examples in "Does this PR introduce any user-facing change?" are the same?? |
No, previously we still see |
|
Ah I see. One for regular Spark session and the other for remote Spark session. |
|
Merged to master. |
|
@ueshin it has a conflict w/ branch-3.4. would you mind creating a backport PR? |
### What changes were proposed in this pull request?
Introduces more basic exceptions.
- ArithmeticException
- ArrayIndexOutOfBoundsException
- DateTimeException
- NumberFormatException
- SparkRuntimeException
### Why are the changes needed?
There are more exceptions that Spark throws but PySpark doesn't capture.
We should introduce more basic exceptions; otherwise we still see `Py4JJavaError` or `SparkConnectGrpcException`.
```py
>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> spark.sql("select 1/0")
DataFrame[(1 / 0): double]
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
^^^
at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225)
... JVM's stacktrace
```
```py
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
^^^
```
### Does this PR introduce _any_ user-facing change?
The error message is more readable.
```py
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
^^^
```
or
```py
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
^^^
```
### How was this patch tested?
Added the related tests.
Closes apache#40538 from ueshin/issues/SPARK-42911/exceptions.
Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Introduces more basic exceptions.
Why are the changes needed?
There are more exceptions that Spark throws but PySpark doesn't capture.
We should introduce more basic exceptions; otherwise we still see
Py4JJavaErrororSparkConnectGrpcException.Does this PR introduce any user-facing change?
The error message is more readable.
or
How was this patch tested?
Added the related tests.