[SPARK-13833] Guard against race condition when re-caching disk blocks in memory #11660

JoshRosen · 2016-03-11T22:18:01Z

When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory.

This patch accomplishes this by synchronizing on the block's BlockInfo object while trying to re-cache a block.

(Will file JIRA as soon as ASF JIRA stops being down / laggy).

JoshRosen · 2016-03-11T22:29:49Z

/cc @andrewor14. In addition to fixing that race-condition, this patch also addresses that duplicated code that you pointed out in a review of my previous PR.

andrewor14 · 2016-03-11T23:31:03Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

+   * Attempts to cache spilled bytes read from disk into the MemoryStore in order to speed up
+   * subsequent reads. This method requires the caller to hold a read lock on the block.
+   *
+   * @return a copy of the bytes. The original byes passed this method should no longer


SparkQA · 2016-03-12T00:53:36Z

Test build #52953 has finished for PR 11660 at commit 5342712.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-12T02:03:24Z

Test build #52965 has finished for PR 11660 at commit 83c1b96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-12T18:12:02Z

@andrewor14 did you have any other comments here or is this ready to merge?

andrewor14 · 2016-03-14T17:47:47Z

Sorry, merging now.

…s in memory When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory. This patch accomplishes this by synchronizing on the block's `BlockInfo` object while trying to re-cache a block. (Will file JIRA as soon as ASF JIRA stops being down / laggy). Author: Josh Rosen <joshrosen@databricks.com> Closes apache#11660 from JoshRosen/concurrent-recaching-fixes.

JoshRosen added 4 commits March 11, 2016 13:22

De-duplicate disk -> memory caching code.

00ea8d3

Clarify that read lock must be held by caller of maybeCache*

a0c68e2

Synchronize on blockInfo to guard against concurrent re-caching.

7f678d2

Add some BlockManager.dispose() calls to free disk buffer earlier.

5342712

JoshRosen changed the title ~~[SPARK-XXXXX] Guard against race condition when re-caching disk blocks in memory~~ [SPARK-13833] Guard against race condition when re-caching disk blocks in memory Mar 11, 2016

andrewor14 reviewed Mar 11, 2016
View reviewed changes

Update BlockManager.scala

83c1b96

asfgit closed this in 9a87afd Mar 14, 2016

JoshRosen deleted the concurrent-recaching-fixes branch March 14, 2016 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13833] Guard against race condition when re-caching disk blocks in memory #11660

[SPARK-13833] Guard against race condition when re-caching disk blocks in memory #11660

JoshRosen commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

andrewor14 Mar 11, 2016

SparkQA commented Mar 12, 2016

SparkQA commented Mar 12, 2016

JoshRosen commented Mar 12, 2016

andrewor14 commented Mar 14, 2016

[SPARK-13833] Guard against race condition when re-caching disk blocks in memory #11660

[SPARK-13833] Guard against race condition when re-caching disk blocks in memory #11660

Conversation

JoshRosen commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

andrewor14 Mar 11, 2016

Choose a reason for hiding this comment

SparkQA commented Mar 12, 2016

SparkQA commented Mar 12, 2016

JoshRosen commented Mar 12, 2016

andrewor14 commented Mar 14, 2016