Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2583] ConnectionManager cannot distinguish whether error occurred or not #1490

Closed
wants to merge 15 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Jul 19, 2014

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -41,6 +42,13 @@ private[spark] class MessageChunkHeader(
putInt(totalSize).
putInt(chunkSize).
putInt(other).
put{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

put(if (hasError) 1.asInstanceOf[Byte] else 0.asInstanceOf[Byte])

@rxin
Copy link
Contributor

rxin commented Jul 19, 2014

Would it be possible to add a unit test for this?

@sarutak
Copy link
Member Author

sarutak commented Jul 19, 2014

Thanks @rxin I'll try it.

@sarutak
Copy link
Member Author

sarutak commented Jul 24, 2014

I added some tests for this issue.

@sarutak
Copy link
Member Author

sarutak commented Jul 24, 2014

I added a test case to ConnectionManager.scala for this issue.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA tests have started for PR 1490. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17326/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA results for PR 1490:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17326/consoleFull

@JoshRosen
Copy link
Contributor

The Jenkins build failed because your code failed the automated Scala style checks (see the Spark Code Style Guide). You can run sbt/sbt scalastyle to run these checks locally.

@JoshRosen
Copy link
Contributor

Thanks for submitting this patch. I agree that we should propagate errors to senders if errors occurred while processing their messages. The addition of the hasError field in MessageHeader looks fine to me.

It looks like Future supports onFailure callbacks, so I wonder if we should signal errors by having sendMessageReliably check whether an error occurred and signal failure by calling promise.failed. Note that an exception will be thrown if you call Await.result() on a future that fails. Instead of registering an onSuccess callback and detecting failures inside of it, users would have to define an additional onFailure callback or define onComplete and pattern-match on Success or Failure.

Returning a failed future might prevent bugs where users forget to check the hasFailed field, treat the failure message as a valid response, and experience unexpected behavior; instead, their code would either throw an exception or their callback would never run (assuming they never defined onFailure).

@rxin Any thoughts on signaling failure through failed futures vs. the approach in this PR?

@rxin
Copy link
Contributor

rxin commented Jul 29, 2014

I looked at this (with some confusion). Yes, I agree it would be great to just signal failure using the promise when an error occurs.

@sarutak do you think you can do that change?

@sarutak
Copy link
Member Author

sarutak commented Jul 29, 2014

Thank you for you comment @JoshRosen , @rxin . Of course I'll try it.

JoshRosen added a commit to JoshRosen/spark that referenced this pull request Aug 4, 2014
…r error reporting

Use Futures to signal failures, rather than exposing empty messages to user code.
@sarutak
Copy link
Member Author

sarutak commented Aug 4, 2014

Thanks for your back up @JoshRosen .

@JoshRosen
Copy link
Contributor

Thanks for submitting this PR; it only took a little work for me to pick it up. Do you mind closing this issue, since I'll finish this in #1758? Thanks!

@sarutak
Copy link
Member Author

sarutak commented Aug 5, 2014

O.K. I take over this issue.

asfgit pushed a commit that referenced this pull request Aug 7, 2014
This patch modifies the ConnectionManager so that error messages are sent in reply when uncaught exceptions occur during message processing.  This prevents message senders from hanging while waiting for an acknowledgment if the remote message processing failed.

This is an updated version of sarutak's PR, #1490.  The main change is to use Futures / Promises to signal errors.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Josh Rosen <joshrosen@apache.org>

Closes #1758 from JoshRosen/connection-manager-fixes and squashes the following commits:

68620cb [Josh Rosen] Fix test in BlockFetcherIteratorSuite:
83673de [Josh Rosen] Error ACKs should trigger IOExceptions, so catch only those exceptions in the test.
b8bb4d4 [Josh Rosen] Fix manager.id vs managerServer.id typo that broke security tests.
659521f [Josh Rosen] Include previous exception when throwing new one
a2f745c [Josh Rosen] Remove sendMessageReliablySync; callers can wait themselves.
c01c450 [Josh Rosen] Return Try[Message] from sendMessageReliablySync.
f1cd1bb [Josh Rosen] Clean up @sarutak's PR #1490 for [SPARK-2583]: ConnectionManager error reporting
7399c6b [Josh Rosen] Merge remote-tracking branch 'origin/pr/1490' into connection-manager-fixes
ee91bb7 [Kousuke Saruta] Modified BufferMessage.scala to keep the spark code style
9dfd0d8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
e7d9aa6 [Kousuke Saruta] rebase to master
326a17f [Kousuke Saruta] Add test cases to ConnectionManagerSuite.scala for SPARK-2583
2a18d6b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
22d7ebd [Kousuke Saruta] Add test cases to BlockManagerSuite for SPARK-2583
e579302 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
281589c [Kousuke Saruta] Add a test case to BlockFetcherIteratorSuite.scala for fetching block from remote from successfully
0654128 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
ffaa83d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
12d3de8 [Kousuke Saruta] Added BlockFetcherIteratorSuite.scala
4117b8f [Kousuke Saruta] Modified ConnectionManager to be alble to handle error during processing message
717c9c3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
6635467 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
e2b8c4a [Kousuke Saruta] Modify to propagete error using ConnectionManager
(cherry picked from commit 17caae4)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
asfgit pushed a commit that referenced this pull request Aug 7, 2014
This patch modifies the ConnectionManager so that error messages are sent in reply when uncaught exceptions occur during message processing.  This prevents message senders from hanging while waiting for an acknowledgment if the remote message processing failed.

This is an updated version of sarutak's PR, #1490.  The main change is to use Futures / Promises to signal errors.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Josh Rosen <joshrosen@apache.org>

Closes #1758 from JoshRosen/connection-manager-fixes and squashes the following commits:

68620cb [Josh Rosen] Fix test in BlockFetcherIteratorSuite:
83673de [Josh Rosen] Error ACKs should trigger IOExceptions, so catch only those exceptions in the test.
b8bb4d4 [Josh Rosen] Fix manager.id vs managerServer.id typo that broke security tests.
659521f [Josh Rosen] Include previous exception when throwing new one
a2f745c [Josh Rosen] Remove sendMessageReliablySync; callers can wait themselves.
c01c450 [Josh Rosen] Return Try[Message] from sendMessageReliablySync.
f1cd1bb [Josh Rosen] Clean up @sarutak's PR #1490 for [SPARK-2583]: ConnectionManager error reporting
7399c6b [Josh Rosen] Merge remote-tracking branch 'origin/pr/1490' into connection-manager-fixes
ee91bb7 [Kousuke Saruta] Modified BufferMessage.scala to keep the spark code style
9dfd0d8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
e7d9aa6 [Kousuke Saruta] rebase to master
326a17f [Kousuke Saruta] Add test cases to ConnectionManagerSuite.scala for SPARK-2583
2a18d6b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
22d7ebd [Kousuke Saruta] Add test cases to BlockManagerSuite for SPARK-2583
e579302 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
281589c [Kousuke Saruta] Add a test case to BlockFetcherIteratorSuite.scala for fetching block from remote from successfully
0654128 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
ffaa83d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
12d3de8 [Kousuke Saruta] Added BlockFetcherIteratorSuite.scala
4117b8f [Kousuke Saruta] Modified ConnectionManager to be alble to handle error during processing message
717c9c3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
6635467 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
e2b8c4a [Kousuke Saruta] Modify to propagete error using ConnectionManager
@sarutak sarutak closed this Aug 9, 2014
@sarutak
Copy link
Member Author

sarutak commented Aug 9, 2014

This issue is addressed in #1758 .

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This patch modifies the ConnectionManager so that error messages are sent in reply when uncaught exceptions occur during message processing.  This prevents message senders from hanging while waiting for an acknowledgment if the remote message processing failed.

This is an updated version of sarutak's PR, apache#1490.  The main change is to use Futures / Promises to signal errors.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Josh Rosen <joshrosen@apache.org>

Closes apache#1758 from JoshRosen/connection-manager-fixes and squashes the following commits:

68620cb [Josh Rosen] Fix test in BlockFetcherIteratorSuite:
83673de [Josh Rosen] Error ACKs should trigger IOExceptions, so catch only those exceptions in the test.
b8bb4d4 [Josh Rosen] Fix manager.id vs managerServer.id typo that broke security tests.
659521f [Josh Rosen] Include previous exception when throwing new one
a2f745c [Josh Rosen] Remove sendMessageReliablySync; callers can wait themselves.
c01c450 [Josh Rosen] Return Try[Message] from sendMessageReliablySync.
f1cd1bb [Josh Rosen] Clean up @sarutak's PR apache#1490 for [SPARK-2583]: ConnectionManager error reporting
7399c6b [Josh Rosen] Merge remote-tracking branch 'origin/pr/1490' into connection-manager-fixes
ee91bb7 [Kousuke Saruta] Modified BufferMessage.scala to keep the spark code style
9dfd0d8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
e7d9aa6 [Kousuke Saruta] rebase to master
326a17f [Kousuke Saruta] Add test cases to ConnectionManagerSuite.scala for SPARK-2583
2a18d6b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
22d7ebd [Kousuke Saruta] Add test cases to BlockManagerSuite for SPARK-2583
e579302 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
281589c [Kousuke Saruta] Add a test case to BlockFetcherIteratorSuite.scala for fetching block from remote from successfully
0654128 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
ffaa83d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
12d3de8 [Kousuke Saruta] Added BlockFetcherIteratorSuite.scala
4117b8f [Kousuke Saruta] Modified ConnectionManager to be alble to handle error during processing message
717c9c3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
6635467 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
e2b8c4a [Kousuke Saruta] Modify to propagete error using ConnectionManager
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants