Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29236][CORE] Access 'executorDataMap' out of 'DriverEndpoint' should be protected by lock #25922

Closed
wants to merge 2 commits into from

Conversation

ConeyLiu
Copy link
Contributor

What changes were proposed in this pull request?

Protected the executorDataMap under lock when accessing it out of 'DriverEndpoint''s methods.

Why are the changes needed?

Just as the comments:

// Accessing executorDataMap in DriverEndpoint.receive/receiveAndReply doesn't need any
// protection. But accessing executorDataMap out of DriverEndpoint.receive/receiveAndReply
// must be protected by CoarseGrainedSchedulerBackend.this. Besides, executorDataMap should
// only be modified in DriverEndpoint.receive/receiveAndReply with protection by
// CoarseGrainedSchedulerBackend.this.

executorDataMap is not threadsafe, it should be protected by lock when accessing it out of DriverEndpoint

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Existed UT.

@ConeyLiu
Copy link
Contributor Author

Actually, it should also be safe when accessing the executorDataMap under the methods of DriverEndpoint because we also do that in onDisconnected.

@ConeyLiu
Copy link
Contributor Author

Hi, @cloud-fan Please help to review this, thanks a lot.

@dongjoon-hyun
Copy link
Member

ok to test

executorDataMap.values.map { executor =>
executor.totalCores / scheduler.CPUS_PER_TASK
}.sum
}

// this function is for testing only
def getExecutorAvailableResources(executorId: String): Map[String, ExecutorResourceInfo] = {
def getExecutorAvailableResources(
executorId: String): Map[String, ExecutorResourceInfo] = synchronized {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConeyLiu So, are these all of them which needs synchronized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this method is testing-only. We may not need for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dongjoon-hyun, thanks for reviewing. Only those who accessing executorDataMap out of DriverEndpoint. I suggest this method to synchronize too because it will not add too much overhead for the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. No problem.

@SparkQA
Copy link

SparkQA commented Sep 25, 2019

Test build #111328 has finished for PR 25922 at commit 2dc9905.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -537,22 +537,23 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
*/
private def numExistingExecutors: Int = executorDataMap.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to add this too, however, all are synchronized when calling this method. I can add it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it's ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added it...

@cloud-fan
Copy link
Contributor

LGTM if tests pass

// Accessing `executorDataMap` in the inherited methods from ThreadSafeRpcEndpoint doesn't need
// any protection. But accessing `executorDataMap` out of the inherited methods must be
// protected by `CoarseGrainedSchedulerBackend.this`. Besides, `executorDataMap` should only
// be modified in the inherited methods from ThreadSafeRpcEndpoint with protection by
// `CoarseGrainedSchedulerBackend.this`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing executorDataMap in the inherited methods from ThreadSafeRpcEndpoint should also be OK.

@SparkQA
Copy link

SparkQA commented Sep 25, 2019

Test build #111337 has finished for PR 25922 at commit 08f2988.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e07cbbe Sep 25, 2019
@ConeyLiu
Copy link
Contributor Author

thanks @cloud-fan @dongjoon-hyun

@ConeyLiu ConeyLiu deleted the executorDataMap branch September 26, 2019 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants