Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27112][CORE] : Create a resource ordering between threads to r… #24134

Closed

Conversation

@pgandhi999
Copy link
Contributor

pgandhi999 commented Mar 18, 2019

…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes #24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi pgandhi@verizonmedia.com
Signed-off-by: Imran Rashid irashid@cloudera.com

What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

task-result-getter thread

spark-dynamic-executor-allocation thread

dispatcher-event-loop thread(makeOffers())

The fix ensures ordering synchronization constraint by acquiring lock on TaskSchedulerImpl before acquiring lock on CoarseGrainedSchedulerBackend in makeOffers() as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

How was this patch tested?

Manual Tests

…esolve the deadlocks encountered …

…when trying to kill executors either due to dynamic allocation or blacklisting

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

Manual Tests

Closes #24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
@pgandhi999

This comment has been minimized.

Copy link
Contributor Author

pgandhi999 commented Mar 18, 2019

cc @squito

@pgandhi999

This comment has been minimized.

Copy link
Contributor Author

pgandhi999 commented Mar 18, 2019

ok to test

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Mar 18, 2019

Test build #103606 has finished for PR 24134 at commit 29a0481.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
asfgit pushed a commit that referenced this pull request Mar 19, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes #24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes #24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
asfgit pushed a commit that referenced this pull request Mar 19, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes #24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes #24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
(cherry picked from commit 95e73b3)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
@squito

This comment has been minimized.

Copy link
Contributor

squito commented Mar 19, 2019

thanks @pgandhi999 , merged to branch-2.4 & 2.3.

Can you please close this?

@pgandhi999

This comment has been minimized.

Copy link
Contributor Author

pgandhi999 commented Mar 19, 2019

Sure, will close it. Thank you @squito.

@pgandhi999 pgandhi999 closed this Mar 19, 2019
kai-chi added a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
kai-chi added a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
kai-chi added a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
kai-chi added a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting

Closes apache#24072 from pgandhi999/SPARK-27112-2.

Authored-by: pgandhi <pgandhiverizonmedia.com>
Signed-off-by: Imran Rashid <irashidcloudera.com>

## What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

**task-result-getter thread**

**spark-dynamic-executor-allocation thread**

**dispatcher-event-loop thread(makeOffers())**

The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

## How was this patch tested?

Manual Tests

Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112.

Authored-by: pgandhi <pgandhi@verizonmedia.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.