Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands #6262

Closed
wants to merge 24 commits into from

Conversation

megatron-me-uk
Copy link
Contributor

This will allow problems with piped commands to be detected.
This will also allow tasks to be retried where errors are rare (such as network problems in piped commands).

This will allow problems with piped commands to be detected.
This will also allow tasks to be retried where errors are rare (such as network problems in piped commands).
@megatron-me-uk
Copy link
Contributor Author

A simple test of this:

a = sc.parallelize([1, 2, 3])
b = a.pipe('cc') # a clearly incorrect pipe command
b.collect()

The old behaviour is to return an empty list ([]) with no errors and fairly quiet logs if running on a distributed cluster.
The new behaviour is for the Job to fail with the following text in the logs:

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 101, in main
process()
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 96, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/rdd.py", line 270, in func
return f(iterator)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/rdd.py", line 666, in func
raise Exception("Pipe function %s' exited with error code %d" %(command, pipe.returncode) ) Exception: Pipe functioncc' exited with error code 1

@srowen
Copy link
Member

srowen commented May 19, 2015

@megatron-me-uk have a look at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first please -- you missed a few steps here. The change may well be fine.

@megatron-me-uk
Copy link
Contributor Author

Ah, I hadn't seen that! Will take a look.

@megatron-me-uk
Copy link
Contributor Author

I believe that this change will bring pyspark more in line with the operation of the scala implementation.
See:

throw new Exception("Subprocess exited with status " + exitStatus)

In scala, the output of an external command is read from stdout until EOF. Then the exit status is checked. If exitStatus is not Zero an Exception is thrown.
Here, I have implemented the same behaviour in python.

@srowen
Copy link
Member

srowen commented May 19, 2015

I think it makes sense; really I was getting at making a JIRA to track this.

@megatron-me-uk
Copy link
Contributor Author

OK, seems I have to create an account etc. I will put it on my to-do list. Thanks for the help!

@megatron-me-uk megatron-me-uk changed the title Raise Exception on non-zero exit from pipe commands [SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands May 19, 2015
@lucamartinetti
Copy link
Contributor

Ran into the same issue. Thanks for the patch!

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@JoshRosen
Copy link
Contributor

Can you add a regression test for this in python/pyspark/tests.py?

@SparkQA
Copy link

SparkQA commented May 31, 2015

Test build #33865 has finished for PR 6262 at commit f552d49.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34085 has finished for PR 6262 at commit 5745d85.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34175 has finished for PR 6262 at commit 0974f98.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

This is an error in PEP8 but not in pylint.
@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34178 has finished for PR 6262 at commit 1b3dc4e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@megatron-me-uk
Copy link
Contributor Author

The test failed on "org.apache.spark.network.netty.NettyBlockTransferSecuritySuite.security mismatch auth off on client" which I don't think is related to this pull request :(

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34945 has finished for PR 6262 at commit 8db4073.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34948 has finished for PR 6262 at commit cc1a73d.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35126 has finished for PR 6262 at commit 3ab8c7a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35136 has finished for PR 6262 at commit 3344a21.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Also be more specific about the Exception we expect to see
@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35142 has finished for PR 6262 at commit 491d3fc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Jun 23, 2015

The jenkins script is looking for results for JUnit, but we skipped all scala tests if only python or R is modified. I think we should fix the jenkins script.

Davies Liu
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

在 2015年6月23日 星期二,上午8:02,Josh Rosen 写道:

Yep, it's a spurious failure in the test script infra; SparkQA says it's okay through, which is what we want.

Sent from my phone

On Jun 23, 2015, at 2:48 AM, Scott Taylor <notifications@github.com (mailto:notifications@github.com)> wrote:

Recording test results
ERROR: Publisher 'Publish JUnit test result report' failed: No test report files were found. Configuration error?
Finished: FAILURE
I think that error is unrelated...


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub (#6262 (comment)).

raise Exception("Pipe function `%s' exited "
"with error code %d" % (command, pipe.returncode))
else:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make check_return_code a generator, for example:

def check_return_code():
      # check return code
      for x in range(0):
         yield x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point I will change it.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35676 has finished for PR 6262 at commit 8a9ef9c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35693 has finished for PR 6262 at commit a0c0161.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

permissive - do not check returncode
strict - only allow returncode 0
grep - allow returncode 0 or 1
add optional argument 'mode' for rdd.pipe
@SparkQA
Copy link

SparkQA commented Jun 30, 2015

Test build #36208 has finished for PR 6262 at commit b0ac3a4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

return True when an exception should be raised
@SparkQA
Copy link

SparkQA commented Jun 30, 2015

Test build #36213 has finished for PR 6262 at commit eb4801c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@megatron-me-uk
Copy link
Contributor Author

OK so I have implemented an optional argument 'mode' which by default ('permissive') maintains the current behaviour. I have added two other modes:
'strict' - raises an error if it encounters a non-zero returncode
'grep' - raises an error for returncodes that are not 1 or 0

For using grep I would recommend the 'grep' mode as it is probably better to raise an exception if grep fails for other reasons than not finding any matches.

I wanted to update the docstring with this information but haven't yet found an example where optional arguments are documented which I can use as a style guide.

@@ -687,13 +687,25 @@ def groupBy(self, f, numPartitions=None):
return self.map(lambda x: (f(x), x)).groupByKey(numPartitions)

@ignore_unicode_prefix
def pipe(self, command, env={}):
def pipe(self, command, env={}, mode='permissive'):
"""
Return an RDD created by piping elements to a forked external process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add doc for mode?

@davies
Copy link
Contributor

davies commented Jul 1, 2015

Could we only have two mode? Fail on non-zero code or not, then it will be easier to understand. We can call it checkCode, which is False by default, grep could work with checkCode=False. cc @JoshRosen

@megatron-me-uk
Copy link
Contributor Author

I guess the benefit of having a third mode is that grep can return 1 for no results without raising an exception but if grep encounters an error of some unknown kind it will return 2 and an exception will be raised.

use boolean checkCode rather than more complicated mode optional argument.

Also add param to docstring
Use boolean checkCode optional parameter
@SparkQA
Copy link

SparkQA commented Jul 7, 2015

Test build #36666 has finished for PR 6262 at commit 574b564.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 7, 2015

Test build #36667 has finished for PR 6262 at commit 98fa101.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@megatron-me-uk
Copy link
Contributor Author

I have changed the implementation of the optional parameter to a boolean checkCode and updated the docstring to reflect that.

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36790 has finished for PR 6262 at commit 04ae1d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@megatron-me-uk
Copy link
Contributor Author

I have simplified the optional parameter to be a boolean and added this to the docstring.

megatron-me-uk added a commit that referenced this pull request Jul 11, 2015
…mands

This will allow problems with piped commands to be detected.
This will also allow tasks to be retried where errors are rare (such as network problems in piped commands).

Author: Scott Taylor <github@megatron.me.uk>

Closes #6262 from megatron-me-uk/patch-2 and squashes the following commits:

04ae1d5 [Scott Taylor] Remove spurious empty line
98fa101 [Scott Taylor] fix blank line style error
574b564 [Scott Taylor] Merge pull request #2 from megatron-me-uk/patch-4
0c1e762 [Scott Taylor] Update rdd pipe method for checkCode
ab9a2e1 [Scott Taylor] Update rdd pipe tests for checkCode
eb4801c [Scott Taylor] fix fail_condition
b0ac3a4 [Scott Taylor] Merge pull request #1 from megatron-me-uk/megatron-me-uk-patch-1
a307d13 [Scott Taylor] update rdd tests to test pipe modes
34fcdc3 [Scott Taylor] add optional argument 'mode' for rdd.pipe
a0c0161 [Scott Taylor] fix generator issue
8a9ef9c [Scott Taylor] make check_return_code an iterator
0486ae3 [Scott Taylor] style fixes
8ed89a6 [Scott Taylor] Chain generators to prevent potential deadlock
4153b02 [Scott Taylor] fix list.sort returns None
491d3fc [Scott Taylor] Pass a function handle to assertRaises
3344a21 [Scott Taylor] wrap assertRaises with QuietTest
3ab8c7a [Scott Taylor] remove whitespace for style
cc1a73d [Scott Taylor] fix style issues in pipe test
8db4073 [Scott Taylor] Add a test for rdd pipe functions
1b3dc4e [Scott Taylor] fix missing space around operator style
0974f98 [Scott Taylor] add space between words in multiline string
45f4977 [Scott Taylor] fix line too long style error
5745d85 [Scott Taylor] Remove space to fix style
f552d49 [Scott Taylor] Catch non-zero exit from pipe commands
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@davies
Copy link
Contributor

davies commented Jul 13, 2015

@megatron-me-uk This is merged into master, could you close this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants