Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14131][SQL[STREAMING] Improved fix for avoiding potential deadlocks in HDFSMetadataLog #14292

Closed
wants to merge 4 commits into from

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Jul 21, 2016

What changes were proposed in this pull request?

Current fix for deadlock disables interrupts in the StreamExecution which getting offsets for all sources, and when writing to any metadata log, to avoid potential deadlocks in HDFSMetadataLog(see JIRA for more details). However, disabling interrupts can have unintended consequences in other sources. So I am making the fix more narrow, by disabling interrupt it only in the HDFSMetadataLog. This is a narrower fix for something risky like disabling interrupt.

How was this patch tested?

Existing tests.

@tdas
Copy link
Contributor Author

tdas commented Jul 21, 2016

@marmbrus @zsxwing Can you take a look.

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62645 has finished for PR 14292 at commit 7a3e3fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Jul 21, 2016

test this

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62644 has finished for PR 14292 at commit d64e0c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Jul 21, 2016

@tdas this change breaks the tests as they don't run in UninterruptibleThread

@tdas
Copy link
Contributor Author

tdas commented Jul 21, 2016

Fixing it.

@zsxwing
Copy link
Member

zsxwing commented Jul 21, 2016

LGTM. Pending tests.

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62687 has finished for PR 14292 at commit 0e67e26.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2016

Test build #3189 has finished for PR 14292 at commit 0e67e26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* potential dead-lock in Hadoop "Shell.runCommand" before 2.5.0 (HADOOP-10622). If the thread
* running "Shell.runCommand" is interrupted, then the thread can get deadlocked. In our
* case, `writeBatch` creates a file using HDFS API and calls "Shell.runCommand" to set the
* file permissions, and can get deadlocked is the stream execution thread is stopped by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62835 has finished for PR 14292 at commit 26138b2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #3190 has finished for PR 14292 at commit 26138b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Jul 25, 2016

Tests have passed. Merging this to master and 2.0. Thanks for reviewing @zsxwing @jaceklaskowski

asfgit pushed a commit that referenced this pull request Jul 25, 2016
…locks in HDFSMetadataLog

## What changes were proposed in this pull request?
Current fix for deadlock disables interrupts in the StreamExecution which getting offsets for all sources, and when writing to any metadata log, to avoid potential deadlocks in HDFSMetadataLog(see JIRA for more details). However, disabling interrupts can have unintended consequences in other sources. So I am making the fix more narrow, by disabling interrupt it only in the HDFSMetadataLog. This is a narrower fix for something risky like disabling interrupt.

## How was this patch tested?
Existing tests.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #14292 from tdas/SPARK-14131.

(cherry picked from commit c979c8b)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in c979c8b Jul 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants