SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent #486

aarondav · 2014-04-22T21:07:20Z

Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit().

This patch avoids two related issues:

Always catch exceptions in this reader thread.
Don't mask readerException when Python throws an EOFError
after worker.shutdownOutput() is called.

Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit(). This patch avoids two related issues: 1. Always catch exceptions in this reader thread. 2. Don't mask readerException when Python throws an EOFError after worker.shutdownOutput() is called.

AmplabJenkins · 2014-04-22T21:07:56Z

Merged build triggered.

AmplabJenkins · 2014-04-22T21:08:05Z

Merged build started.

AmplabJenkins · 2014-04-22T22:39:04Z

Merged build finished.

AmplabJenkins · 2014-04-22T22:39:04Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14338/

aarondav · 2014-04-23T07:22:27Z

Hopefully this fix can go into 1.0, as it both obfuscates real errors while running pyspark and can kill a whole application if triggered.

pwendell · 2014-04-23T07:24:06Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-23T07:27:57Z

Merged build triggered.

AmplabJenkins · 2014-04-23T07:28:03Z

Merged build started.

AmplabJenkins · 2014-04-23T08:27:27Z

Merged build finished.

AmplabJenkins · 2014-04-23T08:27:28Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14362/

pwendell · 2014-04-23T08:27:41Z

This PR was timed out inside of the pyspark tests. I killed it because it was blocking a bunch of other PR's from finishing.

pwendell · 2014-04-23T08:27:58Z

Might be worth running the pyspark tests locally.

pwendell · 2014-04-23T08:28:19Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-23T08:32:55Z

Merged build triggered.

AmplabJenkins · 2014-04-23T08:33:03Z

Merged build started.

AmplabJenkins · 2014-04-23T10:04:05Z

Merged build finished.

AmplabJenkins · 2014-04-23T10:04:05Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14368/

AmplabJenkins · 2014-04-23T20:52:56Z

Merged build triggered.

AmplabJenkins · 2014-04-23T20:53:03Z

Merged build started.

aarondav · 2014-04-23T20:54:55Z

Whoops, turns out FileNotFoundException extends IOException. Who knew! (Also we need to get rid of that catch IOException...)

AmplabJenkins · 2014-04-23T21:45:42Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-23T21:45:43Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14396/

pwendell · 2014-04-23T21:46:59Z

Thanks - I merged this.

Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit(). This patch avoids two related issues: 1. Always catch exceptions in this reader thread. 2. Don't mask readerException when Python throws an EOFError after worker.shutdownOutput() is called. Author: Aaron Davidson <aaron@databricks.com> Closes #486 from aarondav/pyspark and squashes the following commits: fbb11e9 [Aaron Davidson] Make sure FileNotFoundExceptions are handled same as before b9acb3e [Aaron Davidson] SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent (cherry picked from commit a967b00) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit(). This patch avoids two related issues: 1. Always catch exceptions in this reader thread. 2. Don't mask readerException when Python throws an EOFError after worker.shutdownOutput() is called. Author: Aaron Davidson <aaron@databricks.com> Closes apache#486 from aarondav/pyspark and squashes the following commits: fbb11e9 [Aaron Davidson] Make sure FileNotFoundExceptions are handled same as before b9acb3e [Aaron Davidson] SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent

This avoids annoying issues with IDE integration and building where you have to remember incantation to run correct combination. This entails following changes * Hadoop default changed from 2.7.4 to 2.9.1 * yarn, kubernetes, hadoop-cloud and kinesis modules are on by default SparkR is left out since it requries a bit more invasive changes to enable by default

Make sure FileNotFoundExceptions are handled same as before

fbb11e9

asfgit closed this in a967b00 Apr 23, 2014

SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent #486

SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent #486

Uh oh!

Conversation

aarondav commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

aarondav commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

aarondav commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants