Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13833] Guard against race condition when re-caching disk blocks in memory #11660

Closed
wants to merge 5 commits into from

Conversation

JoshRosen
Copy link
Contributor

When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory.

This patch accomplishes this by synchronizing on the block's BlockInfo object while trying to re-cache a block.

(Will file JIRA as soon as ASF JIRA stops being down / laggy).

@JoshRosen JoshRosen changed the title [SPARK-XXXXX] Guard against race condition when re-caching disk blocks in memory [SPARK-13833] Guard against race condition when re-caching disk blocks in memory Mar 11, 2016
@JoshRosen
Copy link
Contributor Author

/cc @andrewor14. In addition to fixing that race-condition, this patch also addresses that duplicated code that you pointed out in a review of my previous PR.

* Attempts to cache spilled bytes read from disk into the MemoryStore in order to speed up
* subsequent reads. This method requires the caller to hold a read lock on the block.
*
* @return a copy of the bytes. The original byes passed this method should no longer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byes

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52953 has finished for PR 11660 at commit 5342712.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52965 has finished for PR 11660 at commit 83c1b96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

@andrewor14 did you have any other comments here or is this ready to merge?

@andrewor14
Copy link
Contributor

Sorry, merging now.

@asfgit asfgit closed this in 9a87afd Mar 14, 2016
@JoshRosen JoshRosen deleted the concurrent-recaching-fixes branch March 14, 2016 17:58
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Mar 17, 2016
…s in memory

When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory.

This patch accomplishes this by synchronizing on the block's `BlockInfo` object while trying to re-cache a block.

(Will file JIRA as soon as ASF JIRA stops being down / laggy).

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#11660 from JoshRosen/concurrent-recaching-fixes.
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…s in memory

When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory.

This patch accomplishes this by synchronizing on the block's `BlockInfo` object while trying to re-cache a block.

(Will file JIRA as soon as ASF JIRA stops being down / laggy).

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#11660 from JoshRosen/concurrent-recaching-fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants