Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16599][CORE] java.util.NoSuchElementException: None.get at at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask #17290

Closed
wants to merge 2 commits into from

Conversation

@srowen
Copy link
Member

srowen commented Mar 14, 2017

What changes were proposed in this pull request?

Avoid None.get exception in (rare?) case that no readLocks exist
Note that while this would resolve the immediate cause of the exception, it's not clear it is the root problem.

How was this patch tested?

Existing tests

@srowen
Copy link
Member Author

srowen commented Mar 14, 2017

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74523 has finished for PR 17290 at commit 27234e1.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74526 has finished for PR 17290 at commit 5da4bcf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@srowen
Copy link
Member Author

srowen commented Mar 16, 2017

I'd like to merge this as a step towards possibly resolving the problem or uncovering its root.

@srowen
Copy link
Member Author

srowen commented Mar 18, 2017

Merged to master. I don't believe this necessarily resolves the JIRA.

@asfgit asfgit closed this in 54e61df Mar 18, 2017
@@ -340,7 +340,7 @@ private[storage] class BlockInfoManager extends Logging {
val blocksWithReleasedLocks = mutable.ArrayBuffer[BlockId]()

val readLocks = synchronized {
readLocksByTask.remove(taskAttemptId).get
readLocksByTask.remove(taskAttemptId).getOrElse(ImmutableMultiset.of[BlockId]())

This comment has been minimized.

Copy link
@mridulm

mridulm Mar 18, 2017

Contributor

Missed this PR earlier .. sorry about that.
Since this is an unexpected scenario (it really must not happen according to current code !), we should probably have logged the stack trace and warning message in the logs to help debug.
It silently ignores a fairly bad bug otherwise.

@srowen
Copy link
Member Author

srowen commented Mar 18, 2017

That's fine I can add a warning. I don't know if it is a bug situation but it sure could be.

@gatorsmile
Copy link
Member

gatorsmile commented Mar 20, 2017

As found in #17338, I am afraid this one caused the following test failure.

  • proactive block replication - 4 replicas - 3 block manager deletions *** FAILED ***
    1 did not equal 0 Read locks unreleased! (BlockManagerReplicationSuite.scala:504)
@gatorsmile
Copy link
Member

gatorsmile commented Mar 20, 2017

After removing the changes in this PR, the test case passed in my local environment. I think we need to revert it back.

@gatorsmile
Copy link
Member

gatorsmile commented Mar 20, 2017

Without code changes, https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74836/ still can pass after triggering a retest. I think it is not related to this PR.

@srowen
Copy link
Member Author

srowen commented Mar 21, 2017

Hm, that concerns me @gatorsmile . Without this change I might expect that it would have also failed, just in the same way the JIRA describes.

I could add a warning for the case where there is no read lock acquired here.

Is your gut feeling that this is risky? I can back it out too.

@mridulm
Copy link
Contributor

mridulm commented Mar 21, 2017

I agree with @srowen I dont see how this change affects the test. blocksWithReleasedLocks should be unchanged w.r.t this test.

@srowen srowen deleted the srowen:SPARK-16599 branch Mar 22, 2017
@aphasingnirvana
Copy link

aphasingnirvana commented Oct 13, 2017

@srowen I believe the bug still persists. Shouldn't we reopen it?

@srowen
Copy link
Member Author

srowen commented Oct 13, 2017

I don't think this change was the ultimate fix, and it caused another problem, so no I don't think this PR should be reopened.

@aphasingnirvana
Copy link

aphasingnirvana commented Oct 17, 2017

@srowen So could I open another issue altogether?

@srowen
Copy link
Member Author

srowen commented Oct 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.