[SPARK-25422][CORE] Don't memory map blocks streamed to disk. #22511

squito · 2018-09-21T04:42:16Z

After data has been streamed to disk, the buffers are inserted into the
memory store in some cases (eg., with broadcast blocks). But broadcast
code also disposes of those buffers when the data has been read, to
ensure that we don't leave mapped buffers using up memory, which then
leads to garbage data in the memory store.

How was this patch tested?

Ran the old failing test in a loop. Full tests on jenkins

After data has been streamed to disk, the buffers are inserted into the memory store in some cases (eg., with broadcast blocks). But broadcast code also disposes of those buffers when the data has been read, to ensure that we don't leave mapped buffers using up memory, which then leads to garbage data in the memory store.

SparkQA · 2018-09-21T07:05:02Z

Test build #96394 has finished for PR 22511 at commit aee82ab.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-09-21T07:10:27Z

retest this please

cloud-fan · 2018-09-21T07:10:39Z

cc @jiangxb1987

cloud-fan · 2018-09-21T07:28:29Z

this seems like a big change, will we hit perf regression?

cloud-fan · 2018-09-21T07:33:12Z

is this a long-standing bug?

SparkQA · 2018-09-21T12:07:10Z

Test build #96408 has finished for PR 22511 at commit aee82ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2018-09-21T14:23:21Z

this seems like a big change, will we hit perf regression?

Not vs. 2.3. It only effects things when stream-to-disk is enabled, and when it is enabled, for reading remote cached blocks, this is actually going back to the old behavior. See https://github.com/apache/spark/pull/19476/files#diff-2b643ea78c1add0381754b1f47eec132R692 -- FileSegmentManagedBuffer.nioByteBuffer() reads the entire file into a regular byte buffer. I changed it to memory map the file as an attempted optimization.

This change will make things slower, but that's vs. other changes that are only in 2.4. There are TODOs in the code for ways to improve this further but that should not go into 2.4.

is this a long-standing bug?

the change here is not fixing a long-standing bug, its just updating new changes for 2.4.

however, I'm really wondering why TorrentBroadcast calls dispose on the blocks. For regular buffers, its a no-op, so it hasn't mattered, but I can't come up with a reason that you do want to dispose those blocks.

Secondly, it seems there is an implicit assumption that you never add memory-mapped byte buffers to the MemoryStore. Maybe that makes sense ... its kind of messing with the Memory / Disk management spark has. But the MemoryStore never checks that you don't add a mapped buffer, you'll just get weird behavior like this later on. Seems there should be a check at the very least, to avoid this kind of issue.

As neither of those things are new to 2.4, I don't think we should deal w/ them here.

The major motivation for memory mapping the file was not for broadcast blocks, it was for reading large cached blocks. But it actually makes more sense to change the interfaces in BlockManager to allow us to just get the managedBuffer, instead of a ChunkedByteBuffer (thats this TODO: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L728)

SparkQA · 2018-09-21T14:24:32Z

Test build #4346 has started for PR 22511 at commit aee82ab.

dongjoon-hyun · 2018-09-24T03:30:21Z

Retest this please.

gatorsmile · 2018-09-24T05:01:25Z

Also cc @zsxwing @JoshRosen

SparkQA · 2018-09-24T07:05:01Z

Test build #96497 has finished for PR 22511 at commit aee82ab.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-24T17:36:52Z

Retest this please

SparkQA · 2018-09-24T21:33:06Z

Test build #96512 has finished for PR 22511 at commit aee82ab.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-24T21:40:36Z

Retest this please.

vanzin · 2018-09-24T22:20:07Z

LGTM. I went back and took a look at the related changes, and agree with Imran that this is basically the same thing that 2.3 did; so no perf regression, just higher memory usage than in the mmap case (which didn't exist before anyway).

dongjoon-hyun · 2018-09-24T22:53:26Z

@squito This PR is directly heading to branch-2.4 by bypassing master branch. Is there a reason to bypass master branch? If there is no reason, in order to prevent future regression at Spark 2.5+, could you make a PR to master branch first?

SparkQA · 2018-09-25T02:37:55Z

Test build #96526 has finished for PR 22511 at commit aee82ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-09-25T03:49:16Z

The analysis makes sense to me. The thing I'm not sure is, how can we hit it? The "fetch block to temp file" code path is only enabled for big blocks (> 2GB).

cloud-fan · 2018-09-25T04:00:07Z

a possible approach: can we just not dispose the data in TorrentBroadcast? The memory store will dispose them anyway when it's removed.

squito · 2018-09-25T14:46:58Z

The analysis makes sense to me. The thing I'm not sure is, how can we hit it? The "fetch block to temp file" code path is only enabled for big blocks (> 2GB).

The failing tests cases "with replication as stream" turned on fetch to disk for all data:
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/DistributedSuite.scala#L166-L167

a possible approach: can we just not dispose the data in TorrentBroadcast? The memory store will dispose them anyway when it's removed.

yes I considered this, but I don't feel confident about making that change for 2.4. I need to spend some more time understanding that (seems it came from SPARK-19556 / b56ad2b). I think this change is the right one for the moment.

squito · 2018-09-25T14:48:06Z

This PR is directly heading to branch-2.4 by bypassing master branch

YEs good point, sorry I opened this against 2.4 just for testing in case the errors were more likely in 2.4 for some reason. Closing this and opened #22546

cloud-fan · 2018-09-25T15:47:46Z

but I don't feel confident about making that change for 2.4

Makes sense. cc @vanzin for more context about b56ad2b . We can revert this change and fix the TorrentBroadcast when we are confident.

vanzin · 2018-09-25T16:36:04Z

I think the deal with the dispose in TorrentBroadcast is that it's definitely needed in the local read case, but may need adjustments in the remote read case.

The local read case (doGetLocalBytes) can return a ByteBufferBlockData that needs to be disposed by the caller (check comment "The block was not found on disk, so serialize an in-memory copy:", where this happens).

vanzin · 2018-09-25T17:23:04Z

BTW it can be argued that you don't need a dispose in that code path since serializerManager.dataSerializeWithExplicitClassTag creates an on-heap buffer that doesn't need to be disposed. But it would be better to audit this stuff again to make sure nothing is being missed...

squito closed this Sep 25, 2018

squito mentioned this pull request Sep 25, 2018

[SPARK-25422][CORE] Don't memory map blocks streamed to disk. #22546

Closed

squito mentioned this pull request Nov 26, 2018

[SPARK-25905][CORE] When getting a remote block, avoid forcing a conversion to a ChunkedByteBuffer #23058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-25422][CORE] Don't memory map blocks streamed to disk. #22511

[SPARK-25422][CORE] Don't memory map blocks streamed to disk. #22511

squito commented Sep 21, 2018

SparkQA commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

SparkQA commented Sep 21, 2018

squito commented Sep 21, 2018

SparkQA commented Sep 21, 2018

dongjoon-hyun commented Sep 24, 2018

gatorsmile commented Sep 24, 2018

SparkQA commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

SparkQA commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

vanzin commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

SparkQA commented Sep 25, 2018

cloud-fan commented Sep 25, 2018

cloud-fan commented Sep 25, 2018 •

edited

squito commented Sep 25, 2018

squito commented Sep 25, 2018

cloud-fan commented Sep 25, 2018

vanzin commented Sep 25, 2018

vanzin commented Sep 25, 2018

[SPARK-25422][CORE] Don't memory map blocks streamed to disk. #22511

[SPARK-25422][CORE] Don't memory map blocks streamed to disk. #22511

Conversation

squito commented Sep 21, 2018

How was this patch tested?

SparkQA commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

cloud-fan commented Sep 21, 2018

SparkQA commented Sep 21, 2018

squito commented Sep 21, 2018

SparkQA commented Sep 21, 2018

dongjoon-hyun commented Sep 24, 2018

gatorsmile commented Sep 24, 2018

SparkQA commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

SparkQA commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

vanzin commented Sep 24, 2018

dongjoon-hyun commented Sep 24, 2018

SparkQA commented Sep 25, 2018

cloud-fan commented Sep 25, 2018

cloud-fan commented Sep 25, 2018 • edited

squito commented Sep 25, 2018

squito commented Sep 25, 2018

cloud-fan commented Sep 25, 2018

vanzin commented Sep 25, 2018

vanzin commented Sep 25, 2018

cloud-fan commented Sep 25, 2018 •

edited