-
Notifications
You must be signed in to change notification settings - Fork 29.1k
SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, the behavior was that if the parent RDD threw any exception other
than IOException or FileNotFoundException (which is quite possible for Hadoop
input sources), the entire Executor would crash, because the default thread a
uncaught exception handler calls System.exit().
This patch avoids two related issues:
1. Always catch exceptions in this reader thread.
2. Don't mask readerException when Python throws an EOFError
after worker.shutdownOutput() is called.
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14338/ |
|
Hopefully this fix can go into 1.0, as it both obfuscates real errors while running pyspark and can kill a whole application if triggered. |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14362/ |
|
This PR was timed out inside of the pyspark tests. I killed it because it was blocking a bunch of other PR's from finishing. |
|
Might be worth running the pyspark tests locally. |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14368/ |
|
Merged build triggered. |
|
Merged build started. |
|
Whoops, turns out FileNotFoundException extends IOException. Who knew! (Also we need to get rid of that catch IOException...) |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Thanks - I merged this. |
Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit().
This patch avoids two related issues:
1. Always catch exceptions in this reader thread.
2. Don't mask readerException when Python throws an EOFError
after worker.shutdownOutput() is called.
Author: Aaron Davidson <aaron@databricks.com>
Closes #486 from aarondav/pyspark and squashes the following commits:
fbb11e9 [Aaron Davidson] Make sure FileNotFoundExceptions are handled same as before
b9acb3e [Aaron Davidson] SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent
(cherry picked from commit a967b00)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit().
This patch avoids two related issues:
1. Always catch exceptions in this reader thread.
2. Don't mask readerException when Python throws an EOFError
after worker.shutdownOutput() is called.
Author: Aaron Davidson <aaron@databricks.com>
Closes apache#486 from aarondav/pyspark and squashes the following commits:
fbb11e9 [Aaron Davidson] Make sure FileNotFoundExceptions are handled same as before
b9acb3e [Aaron Davidson] SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent
This avoids annoying issues with IDE integration and building where you have to remember incantation to run correct combination. This entails following changes * Hadoop default changed from 2.7.4 to 2.9.1 * yarn, kubernetes, hadoop-cloud and kinesis modules are on by default SparkR is left out since it requries a bit more invasive changes to enable by default
Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit().
This patch avoids two related issues:
after worker.shutdownOutput() is called.