-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands #6262
Changes from all commits
f552d49
5745d85
45f4977
0974f98
1b3dc4e
8db4073
cc1a73d
3ab8c7a
3344a21
491d3fc
4153b02
8ed89a6
0486ae3
8a9ef9c
a0c0161
34fcdc3
a307d13
b0ac3a4
eb4801c
ab9a2e1
0c1e762
574b564
98fa101
04ae1d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -874,6 +874,18 @@ def test_sortByKey_uses_all_partitions_not_only_first_and_last(self): | |
for size in sizes: | ||
self.assertGreater(size, 0) | ||
|
||
def test_pipe_functions(self): | ||
data = ['1', '2', '3'] | ||
rdd = self.sc.parallelize(data) | ||
with QuietTest(self.sc): | ||
self.assertEqual([], rdd.pipe('cc').collect()) | ||
self.assertRaises(Py4JJavaError, rdd.pipe('cc', checkCode=True).collect) | ||
result = rdd.pipe('cat').collect() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you also add a test for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can do although I think a non-zero code from grep will raise an Exception in the scala implementation. See (that I just found): I see that grep will return 1 if it doesn't match any lines in a partition. Raising an exception in this case may well be annoying but other shell functions can return 1 on errors (for example http://www.postgresql.org/docs/8.3/static/app-psql.html). I wonder what the best solution is here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we have an option for this? At least, we should have an option to support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This issue can be worked around using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NAVER - http://www.naver.com/sujkh@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-7735] [pyspark] Raise Exception on non-zero exit from pipe commands (#6262)> 이 다음과 같은 이유로 전송 실패했습니다. 받는 사람이 회원님의 메일을 수신차단 하였습니다. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @davies, I guess that your comment about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's something in my mind. Maybe add a option to |
||
result.sort() | ||
[self.assertEqual(x, y) for x, y in zip(data, result)] | ||
self.assertRaises(Py4JJavaError, rdd.pipe('grep 4', checkCode=True).collect) | ||
self.assertEqual([], rdd.pipe('grep 4').collect()) | ||
|
||
|
||
class ProfilerTests(PySparkTestCase): | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add doc for
mode
?