Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25422][CORE] Don't memory map blocks streamed to disk. #22511

Closed
wants to merge 1 commit into from

Conversation

squito
Copy link
Contributor

@squito squito commented Sep 21, 2018

After data has been streamed to disk, the buffers are inserted into the
memory store in some cases (eg., with broadcast blocks). But broadcast
code also disposes of those buffers when the data has been read, to
ensure that we don't leave mapped buffers using up memory, which then
leads to garbage data in the memory store.

How was this patch tested?

Ran the old failing test in a loop. Full tests on jenkins

After data has been streamed to disk, the buffers are inserted into the
memory store in some cases (eg., with broadcast blocks).  But broadcast
code also disposes of those buffers when the data has been read, to
ensure that we don't leave mapped buffers using up memory, which then
leads to garbage data in the memory store.
@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96394 has finished for PR 22511 at commit aee82ab.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

cc @jiangxb1987

@cloud-fan
Copy link
Contributor

this seems like a big change, will we hit perf regression?

@cloud-fan
Copy link
Contributor

is this a long-standing bug?

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96408 has finished for PR 22511 at commit aee82ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor Author

squito commented Sep 21, 2018

this seems like a big change, will we hit perf regression?

Not vs. 2.3. It only effects things when stream-to-disk is enabled, and when it is enabled, for reading remote cached blocks, this is actually going back to the old behavior. See https://github.com/apache/spark/pull/19476/files#diff-2b643ea78c1add0381754b1f47eec132R692 -- FileSegmentManagedBuffer.nioByteBuffer() reads the entire file into a regular byte buffer. I changed it to memory map the file as an attempted optimization.

This change will make things slower, but that's vs. other changes that are only in 2.4. There are TODOs in the code for ways to improve this further but that should not go into 2.4.

is this a long-standing bug?

the change here is not fixing a long-standing bug, its just updating new changes for 2.4.

however, I'm really wondering why TorrentBroadcast calls dispose on the blocks. For regular buffers, its a no-op, so it hasn't mattered, but I can't come up with a reason that you do want to dispose those blocks.

Secondly, it seems there is an implicit assumption that you never add memory-mapped byte buffers to the MemoryStore. Maybe that makes sense ... its kind of messing with the Memory / Disk management spark has. But the MemoryStore never checks that you don't add a mapped buffer, you'll just get weird behavior like this later on. Seems there should be a check at the very least, to avoid this kind of issue.

As neither of those things are new to 2.4, I don't think we should deal w/ them here.

The major motivation for memory mapping the file was not for broadcast blocks, it was for reading large cached blocks. But it actually makes more sense to change the interfaces in BlockManager to allow us to just get the managedBuffer, instead of a ChunkedByteBuffer (thats this TODO: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L728)

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #4346 has started for PR 22511 at commit aee82ab.

@dongjoon-hyun
Copy link
Member

Retest this please.

@gatorsmile
Copy link
Member

Also cc @zsxwing @JoshRosen

@SparkQA
Copy link

SparkQA commented Sep 24, 2018

Test build #96497 has finished for PR 22511 at commit aee82ab.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Sep 24, 2018

Test build #96512 has finished for PR 22511 at commit aee82ab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@vanzin
Copy link
Contributor

vanzin commented Sep 24, 2018

LGTM. I went back and took a look at the related changes, and agree with Imran that this is basically the same thing that 2.3 did; so no perf regression, just higher memory usage than in the mmap case (which didn't exist before anyway).

@dongjoon-hyun
Copy link
Member

@squito This PR is directly heading to branch-2.4 by bypassing master branch. Is there a reason to bypass master branch? If there is no reason, in order to prevent future regression at Spark 2.5+, could you make a PR to master branch first?

@SparkQA
Copy link

SparkQA commented Sep 25, 2018

Test build #96526 has finished for PR 22511 at commit aee82ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

The analysis makes sense to me. The thing I'm not sure is, how can we hit it? The "fetch block to temp file" code path is only enabled for big blocks (> 2GB).

@cloud-fan
Copy link
Contributor

cloud-fan commented Sep 25, 2018

a possible approach: can we just not dispose the data in TorrentBroadcast? The memory store will dispose them anyway when it's removed.

@squito
Copy link
Contributor Author

squito commented Sep 25, 2018

The analysis makes sense to me. The thing I'm not sure is, how can we hit it? The "fetch block to temp file" code path is only enabled for big blocks (> 2GB).

The failing tests cases "with replication as stream" turned on fetch to disk for all data:
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/DistributedSuite.scala#L166-L167

a possible approach: can we just not dispose the data in TorrentBroadcast? The memory store will dispose them anyway when it's removed.

yes I considered this, but I don't feel confident about making that change for 2.4. I need to spend some more time understanding that (seems it came from SPARK-19556 / b56ad2b). I think this change is the right one for the moment.

@squito
Copy link
Contributor Author

squito commented Sep 25, 2018

This PR is directly heading to branch-2.4 by bypassing master branch

YEs good point, sorry I opened this against 2.4 just for testing in case the errors were more likely in 2.4 for some reason. Closing this and opened #22546

@cloud-fan
Copy link
Contributor

but I don't feel confident about making that change for 2.4

Makes sense. cc @vanzin for more context about b56ad2b . We can revert this change and fix the TorrentBroadcast when we are confident.

@vanzin
Copy link
Contributor

vanzin commented Sep 25, 2018

I think the deal with the dispose in TorrentBroadcast is that it's definitely needed in the local read case, but may need adjustments in the remote read case.

The local read case (doGetLocalBytes) can return a ByteBufferBlockData that needs to be disposed by the caller (check comment "The block was not found on disk, so serialize an in-memory copy:", where this happens).

@vanzin
Copy link
Contributor

vanzin commented Sep 25, 2018

BTW it can be argued that you don't need a dispose in that code path since serializerManager.dataSerializeWithExplicitClassTag creates an on-heap buffer that doesn't need to be disposed. But it would be better to audit this stuff again to make sure nothing is being missed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants