[SPARK-41533][CONNECT] Proper Error Handling for Spark Connect Server / Client #39212

grundprinzip · 2022-12-25T23:23:44Z

What changes were proposed in this pull request?

This PR improves the error handling on the Spark Connect server and client side. First, this patch moves the error handling logic on the server into a common error handler partial function that differentiates between the internal Spark errors and other runtime errors.

For custom Spark exceptions, the actual internal error is wrapped into a Google RPC Status and sent as trailing metadata to the client.

On the client side, similarly, the error handling is moved into a common function. All GRPC errors are wrapped into custom exceptions to avoid presenting the user with confusing GRPC errors. If available the attached RPC status is extracted and added to the exception.

Lastly, this patch adds basic logging functionality that can be enabled using the environment variable SPARK_CONNECT_LOG_LEVEL and can be set to info, warn, error, and debug.

Why are the changes needed?

Usability

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

python/pyspark/sql/tests/connect/test_connect_basic.py

itholic · 2022-12-26T02:45:34Z

I'm working on implementing centralized PySpark error messages from #39137, and it will include whole errors generated by PySpark packages including Spark Connect.
Seems like the improvement errors from Spark Connect side here is not conflicts with my current work.
So, I think we can merge them separately, and I'll integrate the Spark Connect errors into the centralized error framework when the both PRs are getting merged. (Seems the merging order is not matter from my look)

itholic · 2022-12-26T02:48:54Z

python/pyspark/sql/connect/client.py

+class SparkConnectClientException(Exception):
+    def __init__(self, message: str) -> None:
+        super(SparkConnectClientException, self).__init__(message)
+
+
+class SparkConnectAnalysisException(SparkConnectClientException):
+    def __init__(self, reason: str, message: str, plan: str) -> None:
+        self._reason = reason
+        self._message = message
+        self._plan = plan
+
+    def __str__(self) -> str:
+        return f"{self._message}\nPlan: {self._plan}"


I'll integrate these new Exceptions into centralized PySpark error framework after #39137 is done.

Let me leave this comment as a just for reminder for now. (reminder ping @itholic myself)

FYI: just created ticket for migrating Spark Connect errors into error classes in the future: SPARK-41712

grundprinzip · 2022-12-26T06:53:51Z

I'm absolutely looking for suggestions on designing the error structure. If it doesn't make sense please let me know.

connector/connect/README.md

...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala

python/mypy.ini

python/pyspark/sql/connect/client.py

zhengruifeng

add tests for SparkConnectAnalysisException with detailed error message?

cc @itholic who has more experience on error message.

...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala

zhengruifeng · 2022-12-27T06:05:27Z

python/pyspark/sql/connect/client.py

@@ -436,3 +568,39 @@ def _execute_and_fetch(self, req: pb2.ExecutePlanRequest) -> "pandas.DataFrame":
        if m is not None:
            df.attrs["metrics"] = self._build_metrics(m)
        return df
+
+    def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn:


is it possible to match the same logic in

spark/python/pyspark/sql/utils.py

Lines 158 to 197 in 764edaf

def convert_exception(e: Py4JJavaError) -> CapturedException:

assert e is not None

assert SparkContext._jvm is not None

assert SparkContext._gateway is not None

jvm = SparkContext._jvm

gw = SparkContext._gateway

if is_instance_of(gw, e, "org.apache.spark.sql.catalyst.parser.ParseException"):

return ParseException(origin=e)

# Order matters. ParseException inherits AnalysisException.

elif is_instance_of(gw, e, "org.apache.spark.sql.AnalysisException"):

return AnalysisException(origin=e)

elif is_instance_of(gw, e, "org.apache.spark.sql.streaming.StreamingQueryException"):

return StreamingQueryException(origin=e)

elif is_instance_of(gw, e, "org.apache.spark.sql.execution.QueryExecutionException"):

return QueryExecutionException(origin=e)

elif is_instance_of(gw, e, "java.lang.IllegalArgumentException"):

return IllegalArgumentException(origin=e)

elif is_instance_of(gw, e, "org.apache.spark.SparkUpgradeException"):

return SparkUpgradeException(origin=e)

c: Py4JJavaError = e.getCause()

stacktrace: str = jvm.org.apache.spark.util.Utils.exceptionString(e)

if c is not None and (

is_instance_of(gw, c, "org.apache.spark.api.python.PythonException")

# To make sure this only catches Python UDFs.

and any(

map(

lambda v: "org.apache.spark.sql.execution.python" in v.toString(), c.getStackTrace()

)

)

):

msg = (

"\n An exception was thrown from the Python worker. "

"Please see the stack trace below.\n%s" % c.getMessage()

)

return PythonException(msg, stacktrace)

return UnknownException(desc=e.toString(), stackTrace=stacktrace, cause=c)

?

Let me see what I can do without replicating a gigantic list of branches on the server and client side.

I simplified the logic because it's going to be messy to replicate all of the exception types exactly from the SQL side. Right now the printed output looks like this for example:

SparkConnectException: (org.apache.spark.SparkNumberFormatException) [CAST_INVALID_INPUT] The value 'id' of the type "STRING" cannot be cast to "DOUBLE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select cast('id' as double) from range(10)

python/pyspark/sql/connect/client.py

HyukjinKwon

Looks good otherwise. It's nice to have a way for a better error.

AmplabJenkins · 2022-12-27T20:07:32Z

Can one of the admins verify this patch?

grundprinzip

Pushing new version now.

connector/connect/README.md

...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala

python/pyspark/sql/connect/client.py

grundprinzip · 2022-12-27T14:58:05Z

python/pyspark/sql/connect/client.py

@@ -436,3 +568,39 @@ def _execute_and_fetch(self, req: pb2.ExecutePlanRequest) -> "pandas.DataFrame":
        if m is not None:
            df.attrs["metrics"] = self._build_metrics(m)
        return df
+
+    def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn:


Let me see what I can do without replicating a gigantic list of branches on the server and client side.

grundprinzip · 2022-12-27T21:52:44Z

python/pyspark/sql/connect/client.py

@@ -436,3 +568,39 @@ def _execute_and_fetch(self, req: pb2.ExecutePlanRequest) -> "pandas.DataFrame":
        if m is not None:
            df.attrs["metrics"] = self._build_metrics(m)
        return df
+
+    def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn:


I simplified the logic because it's going to be messy to replicate all of the exception types exactly from the SQL side. Right now the printed output looks like this for example:

SparkConnectException: (org.apache.spark.SparkNumberFormatException) [CAST_INVALID_INPUT] The value 'id' of the type "STRING" cannot be cast to "DOUBLE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 8) == select cast('id' as double) from range(10)

HyukjinKwon

Last nit, maybe we should add the new deps into https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst#dependencies.

Otherwise LGTM

grundprinzip · 2022-12-28T17:39:32Z

Updated the documentation.

HyukjinKwon · 2022-12-29T00:03:59Z

Merged to master.

juliuszsompolski · 2023-10-31T14:28:45Z

python/pyspark/sql/connect/client.py

@@ -303,6 +376,7 @@ def __init__(self, connectionString: str, userId: Optional[str] = None):

        self._channel = self._builder.toChannel()
        self._stub = grpc_lib.SparkConnectServiceStub(self._channel)
+        # Configure logging for the SparkConnect client.


@grundprinzip @HyukjinKwon this comment is kind of dangling here... should something be called here?

SPARK-41533

741265b

github-actions bot added BUILD CONNECT CORE DOCS INFRA PYTHON SQL labels Dec 25, 2022

missing proto to string

bc6f289

HyukjinKwon changed the title ~~[SPARK-41533] Proper Error Handling for Spark Connect Server / Client~~ [SPARK-41533][CONNECT] Proper Error Handling for Spark Connect Server / Client Dec 26, 2022

HyukjinKwon reviewed Dec 26, 2022

View reviewed changes

python/pyspark/sql/tests/connect/test_connect_basic.py Outdated Show resolved Hide resolved

itholic reviewed Dec 26, 2022

View reviewed changes

grundprinzip added 7 commits December 26, 2022 08:44

build fail

0448b83

build update

0d23924

build update

30fe15a

build update

734b21b

build update

c879051

fix debug log

49ad8aa

Merge remote-tracking branch 'origin/master' into HEAD

fa32f44