Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11098][Core]Add Outbox to cache the sending messages to resolve the message disorder issue #9197

Closed
wants to merge 2 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Oct 21, 2015

The current NettyRpc has a message order issue because it uses a thread pool to send messages. E.g., running the following two lines in the same thread,

ref.send("A")
ref.send("B")

The remote endpoint may see "B" before "A" because sending "A" and "B" are in parallel.
To resolve this issue, this PR added an outbox for each connection, and if we are connecting to the remote node when sending messages, just cache the sending messages in the outbox and send them one by one when the connection is established.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

cc @rxin, @vanzin

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44060 has finished for PR 9197 at commit 295f92a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class OutboxMessage(content: Array[Byte], callback: RpcResponseCallback)\n * class Outbox(nettyEnv: NettyRpcEnv, val address: RpcAddress) extends Logging\n

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44062 has finished for PR 9197 at commit a3246e5.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

Test build #44062 has finished for PR 9197 at commit a3246e5.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

Fixed this issue in #9198

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44072 has finished for PR 9197 at commit a3246e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


/**
* A map for [[RpcAddress]] and [[Outbox]]. When we are connecting to a remote [[RpcAddress]],
* we just put messages to its [[Outbox]] to implement a non-block `send` method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: non-blocking

@vanzin
Copy link
Contributor

vanzin commented Oct 22, 2015

I was hoping we could somehow avoid needing an outbox, but I guess that's the easiest way to go. The code LGTM, although I'll probably take another look.

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44114 has finished for PR 9197 at commit a5def40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class RegisterWorkerFailed(message: String) extends DeployMessage with RegisterWorkerResponse\n


override def onSuccess(response: Array[Byte]): Unit = {
val ack = deserialize[Ack](response)
logDebug(s"Receive ack from ${ack.sender}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this log useful? maybe we don't need to do much here ...

@rxin
Copy link
Contributor

rxin commented Oct 23, 2015

LGTM.

@vanzin if you have time to take a closer look, that'd be great.

@asfgit asfgit closed this in a88c66c Oct 23, 2015
@zsxwing zsxwing deleted the rpc-outbox branch October 23, 2015 10:13
// update messages and it's safe to just drain the queue.
var message = messages.poll()
while (message != null) {
message.callback.onFailure(new SparkException("Message is dropped because Outbox is stopped"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SparkException can be constructed outside the while loop and reused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants