Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27992][PYTHON] Allow Python to join with connection thread to propagate errors #24834

Conversation

BryanCutler
Copy link
Member

@BryanCutler BryanCutler commented Jun 10, 2019

What changes were proposed in this pull request?

Currently with toLocalIterator() and toPandas() with Arrow enabled, if the Spark job being run in the background serving thread errors, it will be caught and sent to Python through the PySpark serializer.
This is not the ideal solution because it is only catch a SparkException, it won't handle an error that occurs in the serializer, and each method has to have it's own special handling to propagate the error.

This PR instead returns the Python Server object along with the serving port and authentication info, so that it allows the Python caller to join with the serving thread. During the call to join, the serving thread Future is completed either successfully or with an exception. In the latter case, the exception will be propagated to Python through the Py4j call.

How was this patch tested?

Existing tests

@BryanCutler
Copy link
Member Author

I think might be a better way to propagate exceptions from the Python connection serving thread for the cases of toPandas() with Arrow and toLocalIterator(). This way, any exception in the serving thread will be raised in Python when making the call getResult() in Python, which will join the thread and evaluate the thread future and raise any exception that occurred.

Here I duplicated the serveToStream() code path so that if we want to backport this fix to branch-2.4, it will be a lot easier. If we don't want to backport, I can clean this up quite a bit. I still think it is probably better to not backport due to the risk, but I'll leave this as is if we want to discuss the possibility.

@BryanCutler
Copy link
Member Author

From the discussion in #24677 , regarding the DataFrame.collect() with collectToPython() code path. This doesn't have quite the same issue. collect() and the standard toPandas() methods first make a Py4j call, then in Scala they run the Spark job and gather all data in the main thread. Once data is local, the serializer is started in the background thread, completing the Py4j call. If the spark job raises an error, then the Py4j call is interrupted and the error is propagated. If somehow there is an error during the serializer part, then I don't think it will be propagated to Python.

toLocalIterator() and toPandas() work differently by first making a Py4j call, which sets up the socket connection and starts the background thread. The Py4j call is returned immediately, then the Spark job(s) are run in the background thread. The current approach catches a SparkException in the background thread and sends it through the serializer, where this PR returns the serving thread object in the first Py4j call, so that an additional Py4j call will synchronize and evaluate the thread future, raising the exception if occurred.

* This is the same as serveToStream, only it returns a server object that
* can be used to sync in Python.
*/
private[spark] def serveToStreamWithSync(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be cleaned up and replace the existing serveToStream. It just returns the SocketAuthServer object as the third element in the Array, and it could be ignored if no synchronization is needed.

private [spark] class SocketFuncServer(
authHelper: SocketAuthHelper,
threadName: String,
func: Socket => Unit) extends SocketAuthServer[Unit](authHelper, threadName) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need SockAuthServer.setupOneConnectionServer if we have this also, so it could be cleaned up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed SockAuthServer.setupOneConnectionServer and replaced usage with SocketFuncServer


# Collect list of un-ordered batches where last element is a list of correct order indices
results = list(_load_from_socket(sock_info, ArrowCollectSerializer()))
from pyspark.rdd import _create_local_socket
Copy link
Member Author

@BryanCutler BryanCutler Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be cleaned up. below basically duplicates _load_from_socket

@BryanCutler
Copy link
Member Author

I'm attaching error messages from toPandas()

  1. master without Arrow
    master_toPandas_error_no_arrow.txt

  2. master with Arrow
    master_toPandas_error_with_arrow.txt

  3. this PR with Arrow
    SPARK-27992_toPandas_error_with_arrow.txt

To sum up the differences:
(2) master with arrow, is a little different because the Python error is a RuntimeError and not Py4JJavaError and for some reason there is a Driver stacktrace section that is empty - probably not a big deal but still looks odd and I'm not sure why it is empty.

(3) this PR, it is more similar to (1) in that it has a Py4JJavaError and the Driver stacktrace is populated correctly, but the exception is actually displayed twice - first when the exception occurs in the JVM serving thread, and then again when it is propagated by Py4j. I'm not sure if there is a good way to avoid this..

@BryanCutler
Copy link
Member Author

What do you guys think @HyukjinKwon @felixcheung @dvogelbacher ?

@SparkQA
Copy link

SparkQA commented Jun 11, 2019

Test build #106364 has finished for PR 24834 at commit f20a156.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dvogelbacher
Copy link
Contributor

I really like the idea. This is much better than having to define a specific protocol for propagating errors like we currently do.

From having a short look at the R code it seems like R would also be affected by https://jira.apache.org/jira/browse/SPARK-27805 and using this same mechanism in R would fix it there, too?

from pyspark.rdd import _create_local_socket
sock_file = _create_local_socket((port, auth_secret))
results = list(ArrowCollectSerializer().load_stream(sock_file))
jserver_obj.getResult() # Join serving thread and raise any exceptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to have this in a finally clause, so that if we have an error during serialization (which might be caused by an exception in the JVM) we will still get the original exception.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable. agreed on @dvogelbacher comment

@BryanCutler
Copy link
Member Author

Thanks @dvogelbacher and @felixcheung , I will clean this up then and apply the same fix to toLocalIterator. If it's straightforward to also fix the R collection with Arrow, then I'll do that too, otherwise we can do it in a followup.

results = list(_load_from_socket(sock_info, ArrowCollectSerializer()))
from pyspark.rdd import _create_local_socket
sock_file = _create_local_socket((port, auth_secret))
results = list(ArrowCollectSerializer().load_stream(sock_file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea .. looks same as _load_from_socket ..

@@ -2200,10 +2200,13 @@ def _collectAsArrow(self):
.. note:: Experimental.
"""
with SCCallSiteSync(self._sc) as css:
sock_info = self._jdf.collectAsArrowToPython()
port, auth_secret, jserver_obj = self._jdf.collectAsArrowToPython()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think if it makes sense to make a _serialize_from_jvm (like _serialize_to_jvm)? This could be done separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, _load_from_socket is basically deserializing from the JVM

@HyukjinKwon
Copy link
Member

@BryanCutler, I don't mind but strongly feel about backporting. If you think we should, we can do.

@HyukjinKwon
Copy link
Member

The approach looks fine.

@BryanCutler BryanCutler force-pushed the pyspark-propagate-server-error-SPARK-27992 branch from f20a156 to ead8978 Compare June 20, 2019 20:49
JavaUtils.closeQuietly(serverSocket)
JavaUtils.closeQuietly(sock)
}
def serveToStream(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this from SocketAuthHelper because it seemed more fitting to be here

@@ -1389,7 +1389,9 @@ private[spark] object Utils extends Logging {
originalThrowable = cause
try {
logError("Aborting task", originalThrowable)
TaskContext.get().markTaskFailed(originalThrowable)
if (TaskContext.get() != null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this utility here https://github.com/apache/spark/pull/24834/files#diff-0a67bc4d171abe4df8eb305b0f4123a2R184, where the task fails and completes before hitting the catchBlock, so TaskContext.get() returns a null

@BryanCutler BryanCutler changed the title [WIP][SPARK-27992][PYTHON] Synchronize with Python connection thread to propagate errors [SPARK-27992][PYTHON] Synchronize with Python connection thread to propagate errors Jun 20, 2019
@BryanCutler
Copy link
Member Author

I cleaned up some things with SocketAuthServer and applied the fix to toLocalIterator also. I wasn't able to checkout doing a similar approach for R collection with Arrow. @HyukjinKwon would you be able to check it out and see if a followup is needed?

@SparkQA
Copy link

SparkQA commented Jun 20, 2019

Test build #106739 has finished for PR 24834 at commit ead8978.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2019

Test build #106740 has finished for PR 24834 at commit 5fd8684.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Let me give some input within 3 days ..

@@ -66,42 +87,45 @@ private[spark] abstract class SocketAuthServer[T](

}

/**
* Create a socket server class and run user function on the socket in a background thread.
* This is the same as calling SocketAuthServer.setupOneConnectionServer except it creates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we don't have setupOneConnectionServer anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch

"""
port = sock_info[0]
auth_secret = sock_info[1]
sockfile, sock = local_connect_and_auth(port, auth_secret)
# The RDD materialization time is unpredictable, if we set a timeout for socket reading
# operation, it will very possibly fail. See SPARK-18281.
sock.settimeout(None)
return sockfile


def _load_from_socket(sock_info, serializer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler, what does sock_info expect to be? Seems it can be both 2-tuple and 3-tuple (with server).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uggh, yeah I'm not too happy with this. Java returns a 3-tuple with (port, auth_secret, server) and most places only use the first 2, such as _load_from_socket. It gets a little confusing, so I thought it might be better to expand the values returned by java for serveToStream etc., but it ended up with a lot of changes where the third value is ignored like this

port, auth_secret, _ = ...

and I don't think it really made things clearer. I'll try to think of something better and maybe do a followup.

@HyukjinKwon
Copy link
Member

I wasn't able to checkout doing a similar approach for R collection with Arrow. @HyukjinKwon would you be able to check it out and see if a followup is needed?

R side, I will take a look after merging this in.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except that .. I think it's now difficult to read sock_info.. https://github.com/apache/spark/pull/24834/files#r296549441 ..

BTW, let's avoid refactor the codes with a fix .. it took me a while to understand the fix + track the changes ..

@HyukjinKwon
Copy link
Member

cc @vanzin and @squito as well.

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally makes sense to me -- just a few comments on things which would help me follow this code when I occasionally look at it, more as a java developer that occasionally needs to work on the communication protocol.

One important thing -- please change the summary / description to replace "synchronize" with "join".

* Create a socket server and run user function on the socket in a background thread.
*
* The socket server can only accept one connection, or close if no connection
* in 15 seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please save this comment -- I guess move it to the class SocketAuthServer. In particular ,its helpful to note that this only accepts one connection, its not a long-lived thing which is reused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is a useful comment, I didn't intend to take this out. I'll put it back in.

@@ -159,7 +174,8 @@ class PyLocalIterable(object):
""" Create a synchronous local iterable over a socket """

def __init__(self, _sock_info, _serializer):
self._sockfile = _create_local_socket(_sock_info)
port, auth_secret, self.jserver_obj = _sock_info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a general comment, might not make the most sense to address in this particular PR -- I'd find it really helpful if the python code which is dealing w/ java objects would annotate (somehow) the java types. Its hard for me to figure out if jserver_obj is a ServerSocket or a SocketAuthServer or Py4JJavaServer etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I can rename it to something more fitting and I agree it should be clear what the variable is by the name

@BryanCutler BryanCutler changed the title [SPARK-27992][PYTHON] Synchronize with Python connection thread to propagate errors [SPARK-27992][PYTHON] Allow Python to join with connection thread to propagate errors Jun 24, 2019
@SparkQA
Copy link

SparkQA commented Jun 24, 2019

Test build #106844 has finished for PR 24834 at commit 20eb748.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

Thanks @HyukjinKwon and @squito for reviewing, I addressed your comments.

@BryanCutler BryanCutler deleted the pyspark-propagate-server-error-SPARK-27992 branch June 26, 2019 20:10
@BryanCutler
Copy link
Member Author

merged to master, thanks all for reviewing!

dongjoon-hyun pushed a commit that referenced this pull request Aug 27, 2019
…nection thread to propagate errors

### What changes were proposed in this pull request?

This PR proposes to backport #24834 with minimised changes, and the tests added at #25594.

#24834 was not backported before because basically it targeted a better exception by propagating the exception from JVM.

However, actually this PR fixed another problem accidentally (see  #25594 and [SPARK-28881](https://issues.apache.org/jira/browse/SPARK-28881)). This regression seems introduced by #21546.

Root cause is that, seems

https://github.com/apache/spark/blob/23bed0d3c08e03085d3f0c3a7d457eedd30bd67f/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L3370-L3384

`runJob` with `resultHandler` seems able to write partial output.

JVM throws an exception but, since the JVM exception is not propagated into Python process, Python process doesn't know if the exception is thrown or not from JVM (it just closes the socket), which results as below:

```
./bin/pyspark --conf spark.driver.maxResultSize=1m
```
```python
spark.conf.set("spark.sql.execution.arrow.enabled",True)
spark.range(10000000).toPandas()
```
```
Empty DataFrame
Columns: [id]
Index: []
```

With this change, it lets Python process catches exceptions from JVM.

### Why are the changes needed?

It returns incorrect data. And potentially it returns partial results when an exception happens in JVM sides. This is a regression. The codes work fine in Spark 2.3.3.

### Does this PR introduce any user-facing change?

Yes.

```
./bin/pyspark --conf spark.driver.maxResultSize=1m
```
```python
spark.conf.set("spark.sql.execution.arrow.enabled",True)
spark.range(10000000).toPandas()
```

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../pyspark/sql/dataframe.py", line 2122, in toPandas
    batches = self._collectAsArrow()
  File "/.../pyspark/sql/dataframe.py", line 2184, in _collectAsArrow
    jsocket_auth_server.getResult()  # Join serving thread and raise any exceptions
  File "/.../lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/.../pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/.../lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o42.getResult.
: org.apache.spark.SparkException: Exception thrown in awaitResult:
    ...
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 1 tasks (6.5 MB) is bigger than spark.driver.maxResultSize (1024.0 KB)
```

now throws an exception as expected.

### How was this patch tested?

Manually as described above. unittest added.

Closes #25593 from HyukjinKwon/SPARK-27992.

Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…nection thread to propagate errors

### What changes were proposed in this pull request?

This PR proposes to backport apache#24834 with minimised changes, and the tests added at apache#25594.

apache#24834 was not backported before because basically it targeted a better exception by propagating the exception from JVM.

However, actually this PR fixed another problem accidentally (see  apache#25594 and [SPARK-28881](https://issues.apache.org/jira/browse/SPARK-28881)). This regression seems introduced by apache#21546.

Root cause is that, seems

https://github.com/apache/spark/blob/23bed0d3c08e03085d3f0c3a7d457eedd30bd67f/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L3370-L3384

`runJob` with `resultHandler` seems able to write partial output.

JVM throws an exception but, since the JVM exception is not propagated into Python process, Python process doesn't know if the exception is thrown or not from JVM (it just closes the socket), which results as below:

```
./bin/pyspark --conf spark.driver.maxResultSize=1m
```
```python
spark.conf.set("spark.sql.execution.arrow.enabled",True)
spark.range(10000000).toPandas()
```
```
Empty DataFrame
Columns: [id]
Index: []
```

With this change, it lets Python process catches exceptions from JVM.

### Why are the changes needed?

It returns incorrect data. And potentially it returns partial results when an exception happens in JVM sides. This is a regression. The codes work fine in Spark 2.3.3.

### Does this PR introduce any user-facing change?

Yes.

```
./bin/pyspark --conf spark.driver.maxResultSize=1m
```
```python
spark.conf.set("spark.sql.execution.arrow.enabled",True)
spark.range(10000000).toPandas()
```

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../pyspark/sql/dataframe.py", line 2122, in toPandas
    batches = self._collectAsArrow()
  File "/.../pyspark/sql/dataframe.py", line 2184, in _collectAsArrow
    jsocket_auth_server.getResult()  # Join serving thread and raise any exceptions
  File "/.../lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/.../pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/.../lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o42.getResult.
: org.apache.spark.SparkException: Exception thrown in awaitResult:
    ...
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 1 tasks (6.5 MB) is bigger than spark.driver.maxResultSize (1024.0 KB)
```

now throws an exception as expected.

### How was this patch tested?

Manually as described above. unittest added.

Closes apache#25593 from HyukjinKwon/SPARK-27992.

Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…nection thread to propagate errors

### What changes were proposed in this pull request?

This PR proposes to backport apache#24834 with minimised changes, and the tests added at apache#25594.

apache#24834 was not backported before because basically it targeted a better exception by propagating the exception from JVM.

However, actually this PR fixed another problem accidentally (see  apache#25594 and [SPARK-28881](https://issues.apache.org/jira/browse/SPARK-28881)). This regression seems introduced by apache#21546.

Root cause is that, seems

https://github.com/apache/spark/blob/23bed0d3c08e03085d3f0c3a7d457eedd30bd67f/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L3370-L3384

`runJob` with `resultHandler` seems able to write partial output.

JVM throws an exception but, since the JVM exception is not propagated into Python process, Python process doesn't know if the exception is thrown or not from JVM (it just closes the socket), which results as below:

```
./bin/pyspark --conf spark.driver.maxResultSize=1m
```
```python
spark.conf.set("spark.sql.execution.arrow.enabled",True)
spark.range(10000000).toPandas()
```
```
Empty DataFrame
Columns: [id]
Index: []
```

With this change, it lets Python process catches exceptions from JVM.

### Why are the changes needed?

It returns incorrect data. And potentially it returns partial results when an exception happens in JVM sides. This is a regression. The codes work fine in Spark 2.3.3.

### Does this PR introduce any user-facing change?

Yes.

```
./bin/pyspark --conf spark.driver.maxResultSize=1m
```
```python
spark.conf.set("spark.sql.execution.arrow.enabled",True)
spark.range(10000000).toPandas()
```

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../pyspark/sql/dataframe.py", line 2122, in toPandas
    batches = self._collectAsArrow()
  File "/.../pyspark/sql/dataframe.py", line 2184, in _collectAsArrow
    jsocket_auth_server.getResult()  # Join serving thread and raise any exceptions
  File "/.../lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/.../pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/.../lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o42.getResult.
: org.apache.spark.SparkException: Exception thrown in awaitResult:
    ...
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 1 tasks (6.5 MB) is bigger than spark.driver.maxResultSize (1024.0 KB)
```

now throws an exception as expected.

### How was this patch tested?

Manually as described above. unittest added.

Closes apache#25593 from HyukjinKwon/SPARK-27992.

Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Jun 4, 2020
…propagate errors

## What changes were proposed in this pull request?

Currently with `toLocalIterator()` and `toPandas()` with Arrow enabled, if the Spark job being run in the background serving thread errors, it will be caught and sent to Python through the PySpark serializer.
This is not the ideal solution because it is only catch a SparkException, it won't handle an error that occurs in the serializer, and each method has to have it's own special handling to propagate the error.

This PR instead returns the Python Server object along with the serving port and authentication info, so that it allows the Python caller to join with the serving thread. During the call to join, the serving thread Future is completed either successfully or with an exception. In the latter case, the exception will be propagated to Python through the Py4j call.

## How was this patch tested?

Existing tests

Closes apache#24834 from BryanCutler/pyspark-propagate-server-error-SPARK-27992.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Jun 5, 2020
…propagate errors

## What changes were proposed in this pull request?

Currently with `toLocalIterator()` and `toPandas()` with Arrow enabled, if the Spark job being run in the background serving thread errors, it will be caught and sent to Python through the PySpark serializer.
This is not the ideal solution because it is only catch a SparkException, it won't handle an error that occurs in the serializer, and each method has to have it's own special handling to propagate the error.

This PR instead returns the Python Server object along with the serving port and authentication info, so that it allows the Python caller to join with the serving thread. During the call to join, the serving thread Future is completed either successfully or with an exception. In the latter case, the exception will be propagated to Python through the Py4j call.

## How was this patch tested?

Existing tests

Closes apache#24834 from BryanCutler/pyspark-propagate-server-error-SPARK-27992.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants