-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45739][PYTHON] Catch IOException instead of EOFException alone for faulthandler #43600
Conversation
cc @ueshin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Although it looks irrelevant, could you re-trigger the failed PySpark test, @HyukjinKwon ?
|
Retriggered and passed at https://github.com/HyukjinKwon/spark/actions/runs/6705864326/job/18242116209 |
Merged to master. |
Late LGTM. |
…for Python execution in SQL ### What changes were proposed in this pull request? This PR proposes to make `faulthandler` as a runtime configuration so we can turn on and off during runtime. ### Why are the changes needed? `faulthandler` feature within PySpark is really useful especially to debug an errors that regular Python interpreter cannot catch out of the box such as a segmentation fault errors, see also #43600. It would be very useful to convert this as a runtime configuration without restarting the shell. ### Does this PR introduce _any_ user-facing change? Yes, users can now set `spark.sql.execution.pyspark.udf.faulthandler.enabled` during runtime to enable `faulthandler` feature. ### How was this patch tested? Unittest added ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43635 from HyukjinKwon/runtime-conf. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR improves
spark.python.worker.faulthandler.enabled
feature by catchingIOException
instead ofEOFException
(narrower).Why are the changes needed?
Exceptions such as
java.net.SocketException: Connection reset
can happen because the worker unexpectedly die. We should better catch all IO exception there.Does this PR introduce any user-facing change?
Yes, but only in special cases. When the worker dies unexpectedly during its initialization, this can happen.
How was this patch tested?
I tested this with Spark Connect:
./sbin/start-connect-server.sh --conf spark.python.daemon.module=malformed_daemon --conf spark.python.worker.faulthandler.enabled=true --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
./bin/pyspark --remote "sc://localhost:15002"
Before
After
Was this patch authored or co-authored using generative AI tooling?
No.