Skip to content

[SPARK-19438] Both reading and updating executorDataMap should be guarded by CoarseGrainedSchedulerBackend.this.synchronized when handle RegisterExecutor.#16780

Closed
jinxing64 wants to merge 1 commit intoapache:masterfrom
jinxing64:SPARK-19438

Conversation

@jinxing64
Copy link

What changes were proposed in this pull request?

Currently when handle RegisterExecutor in CoarseGrainedSchedulerBackend, executorDataMap is guarded by CoarseGrainedSchedulerBackend.this.synchronized when updating, which can cause numPendingExecutors incorrect.
Code is like below:

        if (executorDataMap.contains(executorId)) {
          executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
          context.reply(true)
        } else {
          ...
          CoarseGrainedSchedulerBackend.this.synchronized {
            executorDataMap.put(executorId, data)
            if (currentExecutorIdCounter < executorId.toInt) {
              currentExecutorIdCounter = executorId.toInt
            }
            if (numPendingExecutors > 0) {
              numPendingExecutors -= 1
              logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
            }
          }

Consider SPARK-19437 and a scenario like below:
An executor sent RegisterExecutor twice by askWithRetry, and the interval between the two is quite small. Thus it might be possible that both of them will go to else branch, thus numPendingExecutors will be deducted twice and become incorrect. Currently, the askWithRetry of RegisterExecutor only exists in some unit tests, but it makes sense to make it stronger when handling RegisterExecutor.

TO FIX
Use CoarseGrainedSchedulerBackend.this.synchronized to guard executorDataMap when both reading and updating.

How was this patch tested?

The code logic is not changed, just manually tested. SPARK-19437 can also help verify this fix.

…rded by CoarseGrainedSchedulerBackend.this.synchronized when handle RegisterExecutor.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@zsxwing
Copy link
Member

zsxwing commented Feb 2, 2017

Thus it might be possible that both of them will go to else branch

receiveAnyReply processes messages in sequence just like using one thread. This won't happen.

@jinxing64
Copy link
Author

Thanks a lot for looking into this~ @zsxwing
You are right. My understanding about this is incorrect. CoarseGrainedSchedulerBackend: DriverEndpoint is a ThreadSafeRpcEndpoint, thus concurrent message processing is disabled.
I'll close this PR.

@jinxing64 jinxing64 closed this Feb 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants