[SPARK-42911][PYTHON][3.4] Introduce more basic exceptions by ueshin · Pull Request #40547 · apache/spark

ueshin · 2023-03-24T18:28:16Z

What changes were proposed in this pull request?

Introduces more basic exceptions.

ArithmeticException
ArrayIndexOutOfBoundsException
DateTimeException
NumberFormatException
SparkRuntimeException

Why are the changes needed?

There are more exceptions that Spark throws but PySpark doesn't capture.

We should introduce more basic exceptions; otherwise we still see Py4JJavaError or SparkConnectGrpcException.

>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> spark.sql("select 1/0")
DataFrame[(1 / 0): double]
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225)
... JVM's stacktrace

>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

Does this PR introduce any user-facing change?

The error message is more readable.

>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

or

>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

How was this patch tested?

Added the related tests.

### What changes were proposed in this pull request? Introduces more basic exceptions. - ArithmeticException - ArrayIndexOutOfBoundsException - DateTimeException - NumberFormatException - SparkRuntimeException ### Why are the changes needed? There are more exceptions that Spark throws but PySpark doesn't capture. We should introduce more basic exceptions; otherwise we still see `Py4JJavaError` or `SparkConnectGrpcException`. ```py >>> spark.conf.set("spark.sql.ansi.enabled", True) >>> spark.sql("select 1/0") DataFrame[(1 / 0): double] >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString. : org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225) ... JVM's stacktrace ``` ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` ### Does this PR introduce _any_ user-facing change? The error message is more readable. ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` or ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` ### How was this patch tested? Added the related tests. Closes apache#40538 from ueshin/issues/SPARK-42911/exceptions. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

HyukjinKwon · 2023-03-27T00:24:42Z

Merged to branch-3.4.

### What changes were proposed in this pull request? Introduces more basic exceptions. - ArithmeticException - ArrayIndexOutOfBoundsException - DateTimeException - NumberFormatException - SparkRuntimeException ### Why are the changes needed? There are more exceptions that Spark throws but PySpark doesn't capture. We should introduce more basic exceptions; otherwise we still see `Py4JJavaError` or `SparkConnectGrpcException`. ```py >>> spark.conf.set("spark.sql.ansi.enabled", True) >>> spark.sql("select 1/0") DataFrame[(1 / 0): double] >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString. : org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225) ... JVM's stacktrace ``` ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` ### Does this PR introduce _any_ user-facing change? The error message is more readable. ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` or ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` ### How was this patch tested? Added the related tests. Closes #40547 from ueshin/issues/SPARK-42911/3.4/exceptions. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? Introduces more basic exceptions. - ArithmeticException - ArrayIndexOutOfBoundsException - DateTimeException - NumberFormatException - SparkRuntimeException ### Why are the changes needed? There are more exceptions that Spark throws but PySpark doesn't capture. We should introduce more basic exceptions; otherwise we still see `Py4JJavaError` or `SparkConnectGrpcException`. ```py >>> spark.conf.set("spark.sql.ansi.enabled", True) >>> spark.sql("select 1/0") DataFrame[(1 / 0): double] >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString. : org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225) ... JVM's stacktrace ``` ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` ### Does this PR introduce _any_ user-facing change? The error message is more readable. ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` or ```py >>> spark.sql("select 1/0").show() Traceback (most recent call last): ... pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select 1/0 ^^^ ``` ### How was this patch tested? Added the related tests. Closes apache#40547 from ueshin/issues/SPARK-42911/3.4/exceptions. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

ueshin requested a review from HyukjinKwon March 24, 2023 18:28

github-actions Bot added BUILD CONNECT CORE PYTHON SQL labels Mar 24, 2023

ueshin mentioned this pull request Mar 24, 2023

[SPARK-42911][PYTHON] Introduce more basic exceptions #40538

Closed

HyukjinKwon approved these changes Mar 27, 2023

View reviewed changes

HyukjinKwon closed this Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42911][PYTHON][3.4] Introduce more basic exceptions#40547

[SPARK-42911][PYTHON][3.4] Introduce more basic exceptions#40547
ueshin wants to merge 1 commit into
apache:branch-3.4from
ueshin:issues/SPARK-42911/3.4/exceptions

ueshin commented Mar 24, 2023

Uh oh!

HyukjinKwon commented Mar 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ueshin commented Mar 24, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon commented Mar 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants