New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29236][CORE] Access 'executorDataMap' out of 'DriverEndpoint' should be protected by lock #25922
Conversation
Actually, it should also be safe when accessing the |
Hi, @cloud-fan Please help to review this, thanks a lot. |
ok to test |
executorDataMap.values.map { executor => | ||
executor.totalCores / scheduler.CPUS_PER_TASK | ||
}.sum | ||
} | ||
|
||
// this function is for testing only | ||
def getExecutorAvailableResources(executorId: String): Map[String, ExecutorResourceInfo] = { | ||
def getExecutorAvailableResources( | ||
executorId: String): Map[String, ExecutorResourceInfo] = synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConeyLiu So, are these all of them which needs synchronized
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this method is testing-only. We may not need for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dongjoon-hyun, thanks for reviewing. Only those who accessing executorDataMap
out of DriverEndpoint
. I suggest this method to synchronize too because it will not add too much overhead for the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. No problem.
Test build #111328 has finished for PR 25922 at commit
|
@@ -537,22 +537,23 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp | |||
*/ | |||
private def numExistingExecutors: Int = executorDataMap.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to add this too, however, all are synchronized when calling this method. I can add it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it's ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added it...
LGTM if tests pass |
// Accessing `executorDataMap` in the inherited methods from ThreadSafeRpcEndpoint doesn't need | ||
// any protection. But accessing `executorDataMap` out of the inherited methods must be | ||
// protected by `CoarseGrainedSchedulerBackend.this`. Besides, `executorDataMap` should only | ||
// be modified in the inherited methods from ThreadSafeRpcEndpoint with protection by | ||
// `CoarseGrainedSchedulerBackend.this`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing executorDataMap
in the inherited methods from ThreadSafeRpcEndpoint should also be OK.
Test build #111337 has finished for PR 25922 at commit
|
thanks, merging to master! |
thanks @cloud-fan @dongjoon-hyun |
What changes were proposed in this pull request?
Protected the
executorDataMap
under lock when accessing it out of 'DriverEndpoint''s methods.Why are the changes needed?
Just as the comments:
// Accessing
executorDataMap
inDriverEndpoint.receive/receiveAndReply
doesn't need any// protection. But accessing
executorDataMap
out ofDriverEndpoint.receive/receiveAndReply
// must be protected by
CoarseGrainedSchedulerBackend.this
. Besides,executorDataMap
should// only be modified in
DriverEndpoint.receive/receiveAndReply
with protection by//
CoarseGrainedSchedulerBackend.this
.executorDataMap
is not threadsafe, it should be protected by lock when accessing it out ofDriverEndpoint
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Existed UT.