Skip to content

[SPARK-42911][PYTHON] Introduce more basic exceptions#40538

Closed
ueshin wants to merge 3 commits into
apache:masterfrom
ueshin:issues/SPARK-42911/exceptions
Closed

[SPARK-42911][PYTHON] Introduce more basic exceptions#40538
ueshin wants to merge 3 commits into
apache:masterfrom
ueshin:issues/SPARK-42911/exceptions

Conversation

@ueshin
Copy link
Copy Markdown
Member

@ueshin ueshin commented Mar 24, 2023

What changes were proposed in this pull request?

Introduces more basic exceptions.

  • ArithmeticException
  • ArrayIndexOutOfBoundsException
  • DateTimeException
  • NumberFormatException
  • SparkRuntimeException

Why are the changes needed?

There are more exceptions that Spark throws but PySpark doesn't capture.

We should introduce more basic exceptions; otherwise we still see Py4JJavaError or SparkConnectGrpcException.

>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> spark.sql("select 1/0")
DataFrame[(1 / 0): double]
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225)
... JVM's stacktrace
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

Does this PR introduce any user-facing change?

The error message is more readable.

>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

or

>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

How was this patch tested?

Added the related tests.

@ueshin
Copy link
Copy Markdown
Member Author

ueshin commented Mar 24, 2023

cc @itholic

Copy link
Copy Markdown
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left just nit question

class PythonException(SparkConnectGrpcException, BasePythonException):
"""
Exception thrown because of Spark upgrade from Spark Connect
Exceptions thrown from Spark Connect server.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Is Spark Connect server and Spark Connect different??

Only PythonException says it's thrown from Spark Connect "server".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is from the previous. We can change it to Spark Connect while we are here.

@itholic
Copy link
Copy Markdown
Contributor

itholic commented Mar 24, 2023

btw, the examples in "Does this PR introduce any user-facing change?" are the same??

@ueshin
Copy link
Copy Markdown
Member Author

ueshin commented Mar 24, 2023

the examples in "Does this PR introduce any user-facing change?" are the same??

No, previously we still see py4j.protocol.Py4JJavaError or SparkConnectGrpcException and now we only see the actual exception classes.

@itholic
Copy link
Copy Markdown
Contributor

itholic commented Mar 24, 2023

Ah I see. One for regular Spark session and the other for remote Spark session.

@ueshin ueshin marked this pull request as draft March 24, 2023 03:15
@ueshin ueshin marked this pull request as ready for review March 24, 2023 03:22
@HyukjinKwon
Copy link
Copy Markdown
Member

HyukjinKwon commented Mar 24, 2023

Merged to master.

@HyukjinKwon
Copy link
Copy Markdown
Member

@ueshin it has a conflict w/ branch-3.4. would you mind creating a backport PR?

ueshin added a commit to ueshin/apache-spark that referenced this pull request Mar 24, 2023
### What changes were proposed in this pull request?

Introduces more basic exceptions.

- ArithmeticException
- ArrayIndexOutOfBoundsException
- DateTimeException
- NumberFormatException
- SparkRuntimeException

### Why are the changes needed?

There are more exceptions that Spark throws but PySpark doesn't capture.

We should introduce more basic exceptions; otherwise we still see `Py4JJavaError` or `SparkConnectGrpcException`.

```py
>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> spark.sql("select 1/0")
DataFrame[(1 / 0): double]
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225)
... JVM's stacktrace
```

```py
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^
```

### Does this PR introduce _any_ user-facing change?

The error message is more readable.

```py
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^
```

or

```py
>>> spark.sql("select 1/0").show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 8) ==
select 1/0
       ^^^
```

### How was this patch tested?

Added the related tests.

Closes apache#40538 from ueshin/issues/SPARK-42911/exceptions.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@ueshin
Copy link
Copy Markdown
Member Author

ueshin commented Mar 24, 2023

@HyukjinKwon #40547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants