Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27112][CORE] : Create a resource ordering between threads to r… #24134

Closed

Conversation

pgandhi999
Copy link

@pgandhi999 pgandhi999 commented Mar 18, 2019

…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes #24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi pgandhi@verizonmedia.com
Signed-off-by: Imran Rashid irashid@cloudera.com

What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

task-result-getter thread

spark-dynamic-executor-allocation thread

dispatcher-event-loop thread(makeOffers())

The fix ensures ordering synchronization constraint by acquiring lock on TaskSchedulerImpl before acquiring lock on CoarseGrainedSchedulerBackend in makeOffers() as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

How was this patch tested?

Manual Tests

…esolve the deadlocks encountered …

…when trying to kill executors either due to dynamic allocation or blacklisting

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

Manual Tests

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
@pgandhi999
Copy link
Author

cc @squito

@pgandhi999
Copy link
Author

ok to test

@SparkQA
Copy link

SparkQA commented Mar 18, 2019

Test build #103606 has finished for PR 24134 at commit 29a0481.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Mar 19, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes #24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes #24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
asfgit pushed a commit that referenced this pull request Mar 19, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes #24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes #24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
(cherry picked from commit 95e73b3)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
@squito
Copy link
Contributor

squito commented Mar 19, 2019

thanks @pgandhi999 , merged to branch-2.4 & 2.3.

Can you please close this?

@pgandhi999
Copy link
Author

Sure, will close it. Thank you @squito.

@pgandhi999 pgandhi999 closed this Mar 19, 2019
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants