New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6602][Core] Update Master, Worker, Client, AppClient and related classes to use RpcEndpoint #5392
Conversation
cc @rxin |
Test build #29794 has finished for PR 5392 at commit
|
with Logging { | ||
|
||
var master: Option[RpcEndpointRef] = None | ||
var alreadyDisconnected = false // To avoid calling listener.disconnected() multiple times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the above two vars be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added private
.
So I took a quick look, but I'm not familiar enough with the original code to provide good, informed feedback. The original protocol seems really weird to me, and I guess that leaks into the new code too. But fixing the protocol is probably out of the scope of this change... |
I used the following codes to handle errors for the new codes in
|
Test build #29860 has finished for PR 5392 at commit
|
Test build #29861 has finished for PR 5392 at commit
|
Test build #29867 has finished for PR 5392 at commit
|
Conflicts: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/test/scala/org/apache/spark/deploy/rest/StandaloneRestSubmitSuite.scala
Test build #30230 has finished for PR 5392 at commit
|
@vanzin I agree that this PR should focus on following the previous protocol instead of fixing issues of the protocol. Do you have other comments? |
Test build #30233 has finished for PR 5392 at commit
|
retest this please. |
Test build #30236 has finished for PR 5392 at commit
|
@zsxwing not really, I'm not familiar enough with the code to comment. Maybe @rxin or @andrewor14? |
private var master: Option[RpcEndpointRef] = None | ||
// To avoid calling listener.disconnected() multiple times | ||
private var alreadyDisconnected = false | ||
@volatile private var alreadyDead = false // To avoid calling listener.dead() multiple times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not generally true that ThreadSafeRpcEndpoints require their mutable state to be volatile, right? Perhaps this is just being modified from a separate thread pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be updated in a separate thread pool or the message loop of ClientEndpoint, so it's volatile.
The core logic all looks good to me, just had some nits. |
Test build #36121 has finished for PR 5392 at commit
|
retest this please |
Test build #36147 has finished for PR 5392 at commit
|
retest this please |
Test build #36148 has finished for PR 5392 at commit
|
@@ -504,6 +518,7 @@ private[master] class Master( | |||
} | |||
|
|||
private def completeRecovery() { | |||
// TODO Why synchronized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was due to an earlier state in the code when this method could be invoked from a different thread. It can be safely removed now.
Just one comment thanks to your proactive TODOifying :) LGTM, feel free to merge after. |
Alright I'm going to merge this. @zsxwing please submit a separate PR to address the TODO. |
A follow-up pr to address #5392 (comment) Author: zsxwing <zsxwing@gmail.com> Closes #7141 from zsxwing/pr5392-follow-up and squashes the following commits: fcf7b50 [zsxwing] Remove unnecessary synchronized
This PR updates the rest Actors in core to RpcEndpoint.
Because there is no
ActorSelection
in RpcEnv, I changes the logic ofregisterWithMaster
in Worker and AppClient to avoid blocking the message loop. These changes need to be reviewed carefully.